Truncated Variance-Reduced Value Iteration

Yujia Jin, Ishani Karmarkar, Aaron Sidford, and Jiayi Wang

Neural Information Processing Systems (NeurIPS) | Oral presentation at COLT Workshop on Reinforcement Learning, 2024

We study the problem of solving a discounted MDP with a generative model, a fundamental and extensively studied reinforcement learning setting. We present a new randomized variant of value iteration that improves the runtime for computing a coarse optimal solution. Importantly, our method is model-free and takes a step towards improving the sample efficiency gap between model-based and model-free methods in this setting. (arxiv)

← All publications