Local motion simulation using deep reinforcement learning
Abstract Traditional local motion simulation focuses largely on avoiding collisions in the next frame. However, due to its lack of forward looking, the global trajectory of agents usually seems unreasonable. As a method of optimizing the overall reward, deep reinforcement learning (DRL) can better c...
Published in: | Transactions in GIS |
---|---|
Main Authors: | , , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Wiley
2020
|
Subjects: | |
Online Access: | http://dx.doi.org/10.1111/tgis.12620 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Ftgis.12620 https://onlinelibrary.wiley.com/doi/pdf/10.1111/tgis.12620 https://onlinelibrary.wiley.com/doi/full-xml/10.1111/tgis.12620 |
Summary: | Abstract Traditional local motion simulation focuses largely on avoiding collisions in the next frame. However, due to its lack of forward looking, the global trajectory of agents usually seems unreasonable. As a method of optimizing the overall reward, deep reinforcement learning (DRL) can better correct the problems that exist in the traditional local motion simulation method. In this article, we propose a local motion simulation method integrating optimal reciprocal collision avoidance (ORCA) and DRL, referred to as ORCA‐DRL. The main idea of ORCA‐DRL is to perform local collision avoidance detection via ORCA and smooth the trajectory at the same time via DRL. We use a deep neural network (DNN) as the state‐to‐action mapping function, where the state information is detected by virtual visual sensors and the action space includes two continuous spaces: speed and direction. To improve data utilization and speed up the training process, we use the proximal policy optimization based on the actor–critic (AC) framework to update the DNN parameters. Three scenes (circle, hallway, and crossing) are designed to evaluate the performance of ORCA‐DRL. The results reveal that, compared with the ORCA, our proposed ORCA‐DRL method can: (a) reduce the total number of frames, leading to less time for agents to reach their destination; and (b) effectively avoid local optima, evidenced by smoothed global trajectories. |
---|