Information
Code | CENG045 |
Name | Reinforcement Learning |
Term | 2024-2025 Academic Year |
Term | Fall |
Duration (T+A) | 3-0 (T-A) (17 Week) |
ECTS | 6 ECTS |
National Credit | 3 National Credit |
Teaching Language | Türkçe |
Level | Yüksek Lisans Dersi |
Type | Normal |
Mode of study | Yüz Yüze Öğretim |
Catalog Information Coordinator | Mehmet SARIGÜL |
Course Instructor |
Mehmet SARIGÜL
(A Group)
(Ins. in Charge)
|
Course Goal / Objective
The goal of a reinforcement learning course is to teach students the fundamentals of reinforcement learning, which is a subfield of machine learning. Reinforcement learning is concerned with how agents can learn to make decisions in an environment to achieve a specific goal.
Course Content
This course covers the Introduction to Reinforcement Learning, Basic concepts of reinforcement learning, comparison with supervised and unsupervised learning, and types of reinforcement learning problems, Markov Decision Processes (MDPs), Formalism of MDPs, reward function, state transitions, policy, value function, and Bellman equations, Dynamic Programming (DP): Policy evaluation, policy iteration, value iteration, and Monte Carlo methods. Temporal Difference (TD) Learning: On-policy and off-policy learning, Q-learning, SARSA, and eligibility traces. Function Approximation: Linear and non-linear function approximation, and deep reinforcement learning. Exploration and Exploitation: Exploration strategies such as epsilon-greedy, softmax, and UCB.
Course Precondition
Knowledge of basic programming, linear algebra, and probability theory.
Resources
Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
Notes
Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
Course Learning Outcomes
Order | Course Learning Outcomes |
---|---|
LO01 | Understanding of the fundamentals of reinforcement learning |
LO02 | Ability to model problems as Markov Decision Processes (MDPs) |
LO03 | Ability to implement reinforcement learning algorithms |
LO04 | Ability to evaluate and compare reinforcement learning algorithms |
Relation with Program Learning Outcome
Order | Type | Program Learning Outcomes | Level |
---|---|---|---|
PLO01 | Bilgi - Kuramsal, Olgusal | On the basis of the competencies gained at the undergraduate level, it has an advanced level of knowledge and understanding that provides the basis for original studies in the field of Computer Engineering. | 3 |
PLO02 | Bilgi - Kuramsal, Olgusal | By reaching scientific knowledge in the field of engineering, he/she reaches the knowledge in depth and depth, evaluates, interprets and applies the information. | 3 |
PLO03 | Yetkinlikler - Öğrenme Yetkinliği | Being aware of the new and developing practices of his / her profession and examining and learning when necessary. | 3 |
PLO04 | Yetkinlikler - Öğrenme Yetkinliği | Constructs engineering problems, develops methods to solve them and applies innovative methods in solutions. | 2 |
PLO05 | Yetkinlikler - Öğrenme Yetkinliği | Designs and applies analytical, modeling and experimental based researches, analyzes and interprets complex situations encountered in this process. | 3 |
PLO06 | Yetkinlikler - Öğrenme Yetkinliği | Develops new and / or original ideas and methods, develops innovative solutions in system, part or process design. | 2 |
PLO07 | Beceriler - Bilişsel, Uygulamalı | Has the skills of learning. | 2 |
PLO08 | Beceriler - Bilişsel, Uygulamalı | Being aware of new and emerging applications of Computer Engineering examines and learns them if necessary. | 3 |
PLO09 | Beceriler - Bilişsel, Uygulamalı | Transmits the processes and results of their studies in written or oral form in the national and international environments outside or outside the field of Computer Engineering. | |
PLO10 | Beceriler - Bilişsel, Uygulamalı | Has comprehensive knowledge about current techniques and methods and their limitations in Computer Engineering. | 1 |
PLO11 | Beceriler - Bilişsel, Uygulamalı | Uses information and communication technologies at an advanced level interactively with computer software required by Computer Engineering. | 2 |
PLO12 | Bilgi - Kuramsal, Olgusal | Observes social, scientific and ethical values in all professional activities. | 2 |
Week Plan
Week | Topic | Preparation | Methods |
---|---|---|---|
1 | Introduction to reinforcement learning | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
2 | Markov Decision Processes (MDPs), reward function, state transitions. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
3 | Policy, value function, and Bellman equations. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
4 | Dynamic Programming (DP), policy evaluation, policy iteration | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
5 | Value iteration, and Monte Carlo methods. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
6 | Temporal Difference (TD) Learning, on-policy and off-policy learning | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
7 | Q-learning, SARSA, and eligibility traces. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
8 | Mid-Term Exam | Ölçme Yöntemleri: Yazılı Sınav |
|
9 | Function Approximation, linear and non-linear function approximation. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
10 | Exploration and Exploitation, exploration strategies such as epsilon-greedy, softmax, and UCB. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
11 | Policy Gradients, direct policy search methods. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
12 | REINFORCE algorithm, actor-critic methods, and A3C. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
13 | Multi-agent Reinforcement Learning, non-zero sum games. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
14 | Nash equilibria, and coordination in multi-agent systems. | Reading the lecture notes | Öğretim Yöntemleri: Anlatım |
15 | Review | Reading the lecture notes | Öğretim Yöntemleri: Tartışma |
16 | Term Exams | Ölçme Yöntemleri: Yazılı Sınav |
|
17 | Term Exams | Ölçme Yöntemleri: Yazılı Sınav |
Student Workload - ECTS
Works | Number | Time (Hour) | Workload (Hour) |
---|---|---|---|
Course Related Works | |||
Class Time (Exam weeks are excluded) | 14 | 3 | 42 |
Out of Class Study (Preliminary Work, Practice) | 14 | 5 | 70 |
Assesment Related Works | |||
Homeworks, Projects, Others | 0 | 0 | 0 |
Mid-term Exams (Written, Oral, etc.) | 1 | 14 | 14 |
Final Exam | 1 | 28 | 28 |
Total Workload (Hour) | 154 | ||
Total Workload / 25 (h) | 6,16 | ||
ECTS | 6 ECTS |