ICUAS 2021 Paper Abstract

Close

Paper WeB2.3

Peake, Ashley (Wake Forest University), McCalmon, Joe (Wake Forest University), Zhang, Yixin (Wake Forest University), Myers,, Daniel (Wake Forest University), alqahtani, Sarra (Wake Forest University ), Pauca, Paul (Wake Forest University)

Deep Reinforcement Learning for Adaptive Exploration of Unknown Environments

Scheduled for presentation during the Regular Session "Learning Methods II" (WeB2), Wednesday, June 16, 2021, 14:40−15:00, Kozani

2021 International Conference on Unmanned Aircraft Systems (ICUAS), June 15-18, 2021, Athens, Greece

This information is tentative and subject to change. Compiled on March 29, 2024

Keywords Navigation, Path Planning, Environmental Issues

Abstract

Autonomous exploration is an essential task for unmanned aerial vehicles (UAVs) operating in unknown environments. Often, UAVs on these missions must first build a map of the environment via pure exploration and subsequently use (i.e. exploit) the generated map for downstream navigation tasks. Accomplishing these navigation tasks in two separate steps is not always possible and can even be disadvantageous for UAVs deployed in outdoor and dynamically changing environments. Current exploration approaches typically use a priori human-generated maps or heuristics such as frontier-based exploration. Other approaches use learning but focus only on learning policies for specific tasks and use sample inefficient random exploration or make impractical assumptions about full map availability. In this paper, we develop an adaptive exploration approach that allows for a trade off between exploration and exploitation in a single step using Deep Reinforcement Learning (DRL). We specifically focus on UAVs searching for areas of interest (AoIs) in an unknown environment. The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps. DDQN and A2C algorithms are extended with a stack of LSTM layers and trained to generate optimal policies for the exploration and exploitation tasks, respectively. Then, an information gain function is repeatedly computed to determine the optimal trade-off between them. We test our approach in 3 different tasks against 4 baselines. The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time compared to the baselines.

 

 

All Content © PaperCept, Inc.

This site is protected by copyright and trademark laws under US and International law.
All rights reserved. © 2002-2024 PaperCept, Inc.
Page generated 2024-03-29  04:12:18 PST  Terms of use