강화학습 2 강좌소개 : edwith

강화학습 2

KAIST 산업및시스템공학과 신하용 교수님 KOOC (KAIST Open Online Course)

KAIST 산업및시스템공학과 신하용 교수님
교수자 : 신하용
2001-현재 : KAIST 산업및시스템공학과 교수
1991~2001 : LG전자, ㈜큐빅테크, Chrysler(미) 연구원
대한산업공학회 부회장(저널), 정헌학술대상 수상 (2021)
한국CDE학회 수석부회장, 가헌학술상 수상 (2002, 2005, 2009)
Computer-Aided Design 저널 Editorial board member(2005~)

강의

8. Deep Q Network
1. Neural net
1. NN for RL
1. DQN
1. DQN 개선
1. Quiz 8
9. Policy based RL : Stochastic Policy Gradient
1. Policy based RL
1. Policy gradient theorem
1. Policy gradient algorithms
1. Quiz 9
10. Policy based RL : TRPO, PPO
1. Revisiting policy gradient
1. Trust region policy optimization (TRPO) algorithm
1. Proximal Policy Optimization (PPO) algorithm
1. Quiz 10
11. Policy based RL : DPG, DDPG, CEM
1. Theoretical foundation of DPG
1. DPG & DDPG algorithms
1. Derivative free method and CEM
1. Quiz 11
12. Exploration vs Exploitation
1. Multi-Armed Bandit problem
1. Basic MAB algorithm
1. Advanced MAB algorithms
1. Quiz 12
13. Average reward MDP and finite horizon MDP
1. Average reward RL
1. Finite horizon MDP
1. Finite horizon MDP examples
1. Quiz 13
14. AlphaGo & Reward shaping
1. Components of AlphaGo
1. Training AlphaGo and MCTS
1. AlphaGo Zero and next
1. Reward shaping
1. Quiz 14