kengz/SLM-Lab: Resume mode, Plotly and PyTorch update, OnPolicyCrossEntropy memory

Resume mode #455 adds <code>train@</code> resume mode and refactors the <code>enjoy</code> mode. See PR for detailed info. <code>train@</code> usage example Specify train mode as <code>train@{predir}</code>, where <code>{predir} is the data direc...

Full description

Bibliographic Details
Main Authors: Wah Loon Keng, Laura Graesser, Pierre TASSEL, allan-avatar1, Snyk bot, Sean Gillen, Rahim16, Milan Cvitkovic, Michael Schock, Angel Ayala
Format: Other/Unknown Material
Language:unknown
Published: Zenodo 2020
Subjects:
Online Access:https://doi.org/10.5281/zenodo.3751787
Description
Summary:Resume mode #455 adds <code>train@</code> resume mode and refactors the <code>enjoy</code> mode. See PR for detailed info. <code>train@</code> usage example Specify train mode as <code>train@{predir}</code>, where <code>{predir} is the data directory of the last training run, or simply use</code>latest` to use the latest. e.g.: <code class="lang-bash">python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train # terminate run before its completion # optionally edit the spec file in a past-future-consistent manner # run resume with either of the commands: python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train@latest # or to use a specific run folder python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train@data/reinforce_cartpole_2020_04_13_232521 </code> <code>enjoy</code> mode refactor The <code>train@</code> resume mode API allows for the <code>enjoy</code> mode to be refactored. Both share similar syntax. Continuing with the example above, to enjoy a train model, we now use: <code class="lang-bash">python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole enjoy@data/reinforce_cartpole_2020_04_13_232521/reinforce_cartpole_t0_s0_spec.json </code> Plotly and PyTorch update #453 updates Plotly to 4.5.4 and PyTorch to 1.3.1. #454 explicitly shuts down Plotly orca server after plotting to prevent zombie processes PPO batch size optimization #453 adds chunking to allow PPO to run on larger batch size by breaking up the forward loop. New OnPolicyCrossEntropy memory #446 adds a new <code>OnPolicyCrossEntropy</code> memory class. See PR for details. Credits to @ingambe.