【代碼集合】深度強化學習Pytorch實現集錦
本次分享的是用PyTorch語言編寫的深度強化學習演算法的高質量實現
,
這些IPython筆記本的目的主要是幫助練習和理解這些論文;因此,在某些情況下,我將選擇可讀性而不是效率。首先,我會上傳論文的實現,然後是標記來解釋代碼的每一部分。
相關論文
Human Level Control Through Deep Reinforement Learning
[Publication]
https://deepmind.com/research/publications/human-level-control-through-deep-reinforcement-learning/
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/01.DQN.ipynb
Multi-Step Learning (from Reinforcement Learning: An Introduction, Chapter 7)
[Publication]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/01.DQN.ipynb
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/02.NStep_DQN.ipynb
Deep Reinforcement Learning with Double Q-learning
[Publication]
https://arxiv.org/abs/1509.06461
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/03.Double_DQN.ipynb
Dueling Network Architectures for Deep Reinforcement Learning
[Publication]
https://arxiv.org/abs/1511.06581
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/04.Dueling_DQN.ipynb
Noisy Networks for Exploration
[Publication]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/04.Dueling_DQN.ipynb
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/05.DQN-NoisyNets.ipynb
Prioritized Experience Replay
[Publication]
https://arxiv.org/abs/1511.05952?context=cs
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/06.DQN_PriorityReplay.ipynb
A Distributional Perspective on Reinforcement Learning
[Publication]
https://arxiv.org/abs/1707.06887
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/07.Categorical-DQN.ipynb
Rainbow: Combining Improvements in Deep Reinforcement Learning
[Publication]
https://arxiv.org/abs/1710.02298
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/08.Rainbow.ipynb
Distributional Reinforcement Learning with Quantile Regression
[Publication]
https://arxiv.org/abs/1710.10044
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/09.QuantileRegression-DQN.ipynb
Rainbow with Quantile Regression
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/10.Quantile-Rainbow.ipynb
Deep Recurrent Q-Learning for Partially Observable MDPs
[Publication]
https://arxiv.org/abs/1507.06527
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/11.DRQN.ipynb
Advantage Actor Critic (A2C)
[Publication1]
https://arxiv.org/abs/1602.01783
[Publication2]
https://blog.openai.com/baselines-acktr-a2c/
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/12.A2C.ipynb
High-Dimensional Continuous Control Using Generalized Advantage Estimation
[Publication]
https://arxiv.org/abs/1506.02438
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/13.GAE.ipynb
Proximal Policy Optimization Algorithms
[Publication]
https://arxiv.org/abs/1707.06347
[code]
https://github.com/qfettes/DeepRL-Tutorials/blob/master/14.PPO.ipynb
PyTorch實現
關注公眾號,後天回復關鍵詞
20181023
推薦閱讀
宿命之戰:程序員VS產品經理
賽事發布 | 數字合肥廣邀智慧城市建設英才,三十萬重金等你來戰
800萬中文詞,騰訊AI Lab開源大規模NLP數據集
pandas入門教程
10 張令人噴飯的程序員漫畫
【資源】機器學習演算法工程師手冊(PDF下載)
源碼 | Python爬蟲之網易雲音樂下載
548頁MIT強化學習教程,收藏備用【PDF下載】