Truncated Variance-Reduced Value Iteration
Neural Information Processing Systems (NeurIPS) | Oral presentation at COLT Workshop on Reinforcement Learning, 2024
We study the problem of solving a discounted MDP with a generative model, a fundamental and extensively studied reinforcement learning setting. We present a new randomized variant of value iteration that improves the runtime for computing a coarse optimal solution. Importantly, our method is model-free and takes a step towards improving the sample efficiency gap between model-based and model-free methods in this setting. (arxiv)