Summarize brief history of reinforcement learning.
[3 marks]Illustrate the general idea of reinforcement learning using Tic-Tac-Toe example.
[4 marks]Describe the Elements of Reinforcement Learning in detail.
[7 marks]Define and formulate probability density function (PDF)
[3 marks]Explain concepts of random variables in probability with suitable example.
[4 marks]Discuss the concept of joint, and conditional probability with their equations.
[7 marks]Define Probability Mass Function (PMF)? Explain with suitable example.
[7 marks]Illustrate Markov Reward Process (MRP) with suitable example.
[3 marks]What are the roles of Optimal Value Functions in Markov decision process (MDP)?
[4 marks]Formulate Policy Evaluation to compute the state-value function in dynamic programming.
[7 marks]What is The Markov Property? List the criteria to identify the Markov Property.
[3 marks]Solve the Bellman equation for v∗ for the simple Gridworld problem.
[4 marks]Explain Policy Iteration in detail.
[7 marks]Give overview of Monte Carlo (MC) methods for model free reinforcement learning.
[3 marks]Illustrate with example, On policy and off policy learning.
[4 marks]Write an algorithm for the first-visit MC method for estimating v . 𝜋
[7 marks]Explain how to apply Importance Sampling in off-policy technique.
[3 marks]Take down value iteration algorithm.
[4 marks]Write an algorithm for the Every-visit MC method for estimating v . 𝜋
[7 marks]Define Temporal-Difference Learning (TD). Give the overview of Overview TD (0).
[3 marks]What are the advantages of TD Prediction Methods?
[4 marks]Explain Q-Learning for an off-policy TD control algorithm.
[7 marks]What it means by the term “Eligibility Traces”? What are two ways to view eligibility traces?
[3 marks]Differentiate State-action-reward-state-action (SARSA) and Q-learning
[4 marks]Describe N-step TD prediction technique in detail.
[7 marks]