Seminar: Fan Lu

“Convex Analytic Theory for Convex Q-Learning”
Wednesday, Feb. 22 at 3:00pm
NEB 409

Presented by the Computational NeuroEngineering Laboratory


Finding optimal control policies is one of the most important tasks in reinforcement learning. It is demonstrable that most of the effective exiting algorithms for solving this task are through the method of dynamic programming (DP), aimed at finding approximate fixed-point solutions to the Bellman recurrence. While hugely successful, DP-based methods are either constrained to the tabular setting or have the downside of being somewhat incompatible with classical machine-learning tools rooted in convex optimization when extended to general function approximation settings.

We propose a new set of reinforcement learning algorithms, called convex Q learning, based on the Linear Programming (LP) approach to DP. The dual of convex Q-learning has a similar structure to Manne’s LP, revealing the need for regularization to avoid over-fitting. A sufficient condition is obtained for a bounded solution to the Q-learning LP. Simulation studies reveal numerical challenges when addressing sampled-data systems based on a continuous time model. The challenge is addressed using state-dependent sampling. It is shown that convex Q-learning is successful in cases where standard Q-learning diverges, such as the LQR problem.


Fan Lu is a Ph.D. student in the Department of Electrical & Computer Engineering at the University of Florida. He obtained his Bachelor of Science in Electrical Engineering and Automation from Nanjing University of Science and Technology in 2017, and his Master of Science in Electrical and Computer Engineering from the University of Florida in 2019. His research interests include convex optimization, reinforcement learning, and deep learning.