로그인 바로가기 하위 메뉴 바로가기 본문 바로가기

강좌 개요

  • 타입 MOOC 강좌
  • 기간 상시 수강
  • 학습시간 14시간
  • 수강 승인 방식 자동 승인
  • 수료증 온라인 발급
http://www.edwith.org/reinforcement-learning2
둘러보기
좋아요 83 수강생 294

교수자 소개

  • KAIST 산업및시스템공학과 신하용 교수님

    교수자 : 신하용 
    2001-현재 : KAIST 산업및시스템공학과 교수
    1991~2001 : LG전자, ㈜큐빅테크, Chrysler(미) 연구원
    대한산업공학회 부회장(저널), 정헌학술대상 수상 (2021)
    한국CDE학회 수석부회장, 가헌학술상 수상 (2002, 2005, 2009)
    Computer-Aided Design 저널 Editorial board member(2005~)

강의계획

강의
  1. 8. Deep Q Network
    1. Neural net
    1. NN for RL
    1. DQN
    1. DQN 개선
    1. Quiz 8
  2. 9. Policy based RL : Stochastic Policy Gradient
    1. Policy based RL
    1. Policy gradient theorem
    1. Policy gradient algorithms
    1. Quiz 9
  3. 10. Policy based RL : TRPO, PPO
    1. Revisiting policy gradient
    1. Trust region policy optimization (TRPO) algorithm
    1. Proximal Policy Optimization (PPO) algorithm
    1. Quiz 10
  4. 11. Policy based RL : DPG, DDPG, CEM
    1. Theoretical foundation of DPG
    1. DPG & DDPG algorithms
    1. Derivative free method and CEM
    1. Quiz 11
  5. 12. Exploration vs Exploitation
    1. Multi-Armed Bandit problem
    1. Basic MAB algorithm
    1. Advanced MAB algorithms
    1. Quiz 12
  6. 13. Average reward MDP and finite horizon MDP
    1. Average reward RL
    1. Finite horizon MDP
    1. Finite horizon MDP examples
    1. Quiz 13
  7. 14. AlphaGo & Reward shaping
    1. Components of AlphaGo
    1. Training AlphaGo and MCTS
    1. AlphaGo Zero and next
    1. Reward shaping
    1. Quiz 14