[Free] Modern Reinforcement-Learning Using Deep Learning

Model types, Algorithms and approaches, Function approximation, Deep reinforcement-learning, Deep Multi-agent Reinforcem – Free Course

Added on March 15, 2023 Development 2 min read

What you’ll learn

Being able to start Deep reinforcement-learning research
Being able to start Deep reinforcement-learning engineering role
Understand modern state-of-the-art Deep reinforcement-learning knowledge
Understand Deep reinforcement-learning knowledge

Requirements

Interest in Deep reinforcement-learning

Description

Hello I am Nitsan Soffair, A Deep RL researcher at BGU.

In my Deep reinforcement-learning course you will learn the newest state-of-the-art Deep reinforcement-learning knowledge.

You will do the following

Get state-of-the-art knowledge regarding
1. Model types
2. Algorithms and approaches
3. Function approximation
4. Deep reinforcement-learning
5. Deep Multi-agent Reinforcement-learning
Validate your knowledge by answering short and very short quizzes of each lecture.
Be able to complete the course by ~2 hours.

Syllabus

Model types
1. Markov decision process (MDP)
  A discrete-time stochastic control process.
2. Partially observable Markov decision process (POMDP)
  A generalization of MDP in which an agent cannot observe the state.
3. Decentralized Partially observable Markov decision process (Dec-POMDP)
  A generalization of POMDP to consider multiple decentralized agents.
Algorithms and approaches
1. Bellman equations
  A condition for optimality of optimization of dynamic programming.
2. Model-free
  A model-free algorithm is an algorithm which does not use the policy of the MDP.
3. Off-policy
  An off-policy algorithm is an algorithm that use policy 1 for learning and policy 2 for acting in the environment.
4. Exploration-exploitation
  A trade-off in Reinforcement-learning between exploring new policies to use existing policies.
5. Value-iteration
  An iterative algorithm applying bellman optimality backup.
6. SARSA
  An algorithm for learning a Markov decision process policy
7. Q-learning
  A model-free reinforcement learning algorithm to learn the value of an action in a particular state.
Function approximation
1. Function approximators
  The problem asks us to select a function among a well-defined class that closely matches (“approximates”) a target function in a task-specific way.
2. Policy-gradient
  Value-based, Policy-based, Actor-critic, policy-gradient, and softmax policy
3. REINFORCE
  A policy-gradient algorithm.
Deep reinforcement-learning
1. Deep Q-Network (DQN)
  A deep reinforcement-learning algorithm using experience reply and fixed Q-targets.
2. Deep Recurrent Q-Learning (DRQN)
  Deep reinforcement-learning algorithm for POMDP extends DQN and uses LSTM.
3. Optimistic Exploration with Pessimistic Initialization (OPIQ)
  A deep reinforcement-learning for MDP based on DQN.
4. Value Decomposition Networks (VDN)
  A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
5. QMIX
  A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
6. QTRAN
  A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
7. Weighted QMIX
  A deep multi-agent reinforcement-learning for Dec-POMDP.