Local motion simulation using deep reinforcement learning

Abstract Traditional local motion simulation focuses largely on avoiding collisions in the next frame. However, due to its lack of forward looking, the global trajectory of agents usually seems unreasonable. As a method of optimizing the overall reward, deep reinforcement learning (DRL) can better c...

Full description

Bibliographic Details
Published in:Transactions in GIS
Main Authors: Xu, Dong, Huang, Xiao, Li, Zhenlong, Li, Xiang
Format: Article in Journal/Newspaper
Language:English
Published: Wiley 2020
Subjects:
Online Access:http://dx.doi.org/10.1111/tgis.12620
https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Ftgis.12620
https://onlinelibrary.wiley.com/doi/pdf/10.1111/tgis.12620
https://onlinelibrary.wiley.com/doi/full-xml/10.1111/tgis.12620
Description
Summary:Abstract Traditional local motion simulation focuses largely on avoiding collisions in the next frame. However, due to its lack of forward looking, the global trajectory of agents usually seems unreasonable. As a method of optimizing the overall reward, deep reinforcement learning (DRL) can better correct the problems that exist in the traditional local motion simulation method. In this article, we propose a local motion simulation method integrating optimal reciprocal collision avoidance (ORCA) and DRL, referred to as ORCA‐DRL. The main idea of ORCA‐DRL is to perform local collision avoidance detection via ORCA and smooth the trajectory at the same time via DRL. We use a deep neural network (DNN) as the state‐to‐action mapping function, where the state information is detected by virtual visual sensors and the action space includes two continuous spaces: speed and direction. To improve data utilization and speed up the training process, we use the proximal policy optimization based on the actor–critic (AC) framework to update the DNN parameters. Three scenes (circle, hallway, and crossing) are designed to evaluate the performance of ORCA‐DRL. The results reveal that, compared with the ORCA, our proposed ORCA‐DRL method can: (a) reduce the total number of frames, leading to less time for agents to reach their destination; and (b) effectively avoid local optima, evidenced by smoothed global trajectories.