Deep Q-Learning on H\"older Spaces (opens in new tab)

We study the operator-theoretic core of Q-learning in continuous-time stochastic control with continuous states and actions. In value-based reinforcement learning, each Q-learning or DQN update is built from a Bellman optimality target; our analysis isolates this target in a diffusion setting and studies its regularity and approximation complexity. Under uniform ellipticity and H\"older-regular coefficients, we show that a Bellman update maps bo...

Read the original article