ICUAS 2021 Paper Abstract

Paper WeA2.6

Hamadouche, Mohand (Université de Bretagne Occidentale), Dezan, Catherine (Université de Bretagne Occidentale), Espes, David (Université de Bretagne Occidentale), Branco, Kalinka Regina Lucas Jaquie Castelo (University of São Paulo)

Comparison of Value Iteration, Policy Iteration and Q-Learning for Solving Decision-Making Problems

Scheduled for presentation during the Regular Session "Learning Methods I" (WeA2), Wednesday, June 16, 2021, 12:10−12:30, Kozani

2021 International Conference on Unmanned Aircraft Systems (ICUAS), June 15-18, 2021, Athens, Greece

This information is tentative and subject to change. Compiled on July 13, 2025

Keywords Autonomy, Path Planning, Navigation

Abstract

21st century has seen a lot of progress, especially in robotics. Today, the evolution of electronics and computing capacities allows to develop more precise, faster and autonomous robots. They are able to automatically perform certain delicate or dangerous tasks. Robots should move, perceive their environment and make decisions by taking into account the goal(s) of a mission under uncertainty. One of the most current probabilistic model for description of missions and for planning under uncertainty is Markov Decision Process (MDP). In addition, there are three fundamental classes of methods for solving these MDPs: dynamic programming, Monte Carlo methods, and temporal difference learning. Each class of methods has its strengths and weaknesses. In this paper, we present our comparison on three methods for solving MDPs, Value Iteration and Policy Iteration (Dynamic Programming methods) and Q-Learning (Temporal-Difference method). We give new criteria to adapt the decision-making method to the application problem, with the parameters explanations. Policy Iteration is the most effective method for complex (and irregular) scenarios, and the modified Q-Learning for simple (and regular) scenarios. So, the regularity aspect of the decision-making has to be taken into account to choose the most appropriate resolution method in terms of execution time. Numerical simulation shows the conclusion results over simple and regular case of the grid, over the irregular case of the grid example and finally over the mission planning of an Unmanned Aerial Vehicle (UAV), representing is a very irregular case. We demonstrate that the Dynamic Programming (DP) methods are more efficient methods than the Temporal-Difference (TD) method while facing an irregular set of actions.