Skip to content
Snippets Groups Projects
Select Git revision
1 result

mso_3_4-td1

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    td authored
    - removal of hf and wandb logs from code (for security)
    - final version of the reamdme (last part added)
    089be291
    History

    Hands-On Reinforcement Learning

    Thomas DESGREYS

    REINFORCE algorithm

    Training

    see reinforce_cartpole.py

    The model is trained and as save as "reinforce_cartpole_best.pth" and the evolutions of loss and score (aka reward) through the episodes are shown below. cartpole loss cartpole score These graphics point out the instability of this training algorithm. Although, with a bit of luck we end up with a model that reaches the max steps permitted by this gym environment (500 steps)

    Evaluation

    see evaluate_reinforce_cartpole.py

    During evaluation, we get a 100% success rate for 100 trials.

    Familiarization with a complete RL pipeline:

    Application to training a robotic arm

    Stable-Baselines3

    see a2c_sb3_cartpole.py

    Hugging Face Hub

    Link to the trained model (cartpole)

    Weights & Biases

    Link to the wandb run (cartpole)

    Full workflow with panda-gym

    see a2c_sb3_panda_reach.py

    As I couldn't make it work on my PC (difficulties to install panda-gym), I've used Google Colab.

    see my notebook here (online) or directly a2c_sb3_panda_reach.ipynb

    Link to the trained model (panda reach)

    Link to the wandb run (panda reach)