SIT215 – Artificial and Computational

2020.12.22 - 首页 >> Python编程

Intelligence

Project: Investigating

Reinforcement Learning

Overview

Within SIT215 you have been learning about a range of problems that can be solved using techniques from

artificial and computational intelligence. This study has included coverage of both models and algorithms

suitable for AI and CI solutions. A particular limitation of all of the solutions that we have considered is that

they are designed by hand, or rely on the problem being formulated as an optimisation task.

In this project you are going to explore an advanced technique for solving many interesting and challenging

real world problems. One in which an agent learns a solution to a problem through interaction with the

environment, and through perception of a reinforcement, or feedback signal. This field is called, naturally,

reinforcement learning (RL). RL can also be seen as an online method for solving Markov Decision Problems

– as opposed to the offline methods of policy iteration, value iteration or dynamic programming, presented in

lectures (in week 9 & 10).

This project will require you to undertake self-directed study and learning of RL solution methods, building

upon topics and content covered in the first 10 weeks of this course. While this might seem daunting (not

being told how to solve the problem), you’ve been practicing this approach throughout the unit in the groupbased PBL tasks, and so this is your chance to demonstrate individually what you’ve learned about problem

solving methodology.

Learning Objectives

This project addresses ULO2 and ULO3 for this unit:

• Design and implement software artefacts to demonstrate effectiveness and efficiency of solutions for

intelligent systems development • Apply theoretical concepts and models to explain and communicate the design of intelligent systems

Specifically, these are addressed through achievement of the following task-specific learning objectives:

• Demonstrate ability to work with and extend software systems and frameworks for RL • Describe and model RL problems using specific concepts and models

• Implement, evaluate and analyse the performance of different solutions on a range of RL problems

• Effectively communicate the process and outcomes of your research and development project

Preparatory Learning Activities

In order to complete this assessment task you will need to have first developed an understanding of a range

of topics covered in this unit in weeks 1 to 10. Given the assessment deadline, this may require you to

complete independent study of these topics prior to their presentation in lectures. The topics that you will

need to be familiar with are:

• Bayesian AI (working with probabilistic representations of uncertainty)

• State Space Search (understanding state space representations of systems)

• Normative Decision Theory (definitions of rational action, utility, intertemporal utility,

payoff/reward) • Markov Decision Problems (representing sequential decision problems for agents acting in complex

domains, reward processes and finite horizon decision problems, optimal policies)

• Dynamic Programming (optimal solutions to sequential decision problems under specified

constraints)

Ultimately you will be able to complete this assessment task without a sound theoretical grounding in each

of these areas. However, having some knowledge of these areas and understanding of how they inter-relate

will make it far easier to understand learning materials on reinforcement learning, and far easier to explain

and describe your investigations and outcomes in this project. Our advice is that you use this project as a

basis for further study of these underlying areas, to assist in integrating the knowledge covered in this unit

into a meaningful ‘whole’, which supports completing this assessment task.

Task Requirements

This project will require you to use the OpenAI Gym environment for experimenting with reinforcement

learning tasks. You should start by reviewing the website for the Gym: https://gym.openai.com/. There are

links to documentation and software downloads.

To complete this project, you need to complete the following requirements and sub-tasks.

1. Read the relevant documentation for installing AI Gym, starting with https://gym.openai.com/docs/.

2. Read and complete the following tutorial: https://www.learndatasci.com/tutorials/reinforcement-qlearning-scratch-python-openai-gym/, ensuring that you can reproduce all steps discussed.

3. Write a brief report (2-3 pages at most) on the Taxi problem, including a mathematical description of

the reinforcement learning problem and the Q-learning algorithm for its solution. To do this, you may

want to refer to a good textbook on reinforcement learning. A good starting point is the “bible” of RL:

“Reinforcement Learning: An Introduction”, by Sutton & Barto. You can find this book online as a

free PDF download. There’s even a 2

nd edition draft completed just this year. In your report you

should contrast the quality of solution of a random policy versus the “optimal” policy obtained by Qlearning.

4. Complete the following tutorial to explore the Cart-Pole environment in the Gym:

http://kvfrans.com/simple-algoritms-for-solving-cartpole/. In this case, implement a random policy

and Q-learning. It’s not essential that you attempt the policy gradient method, but you might like to

try it.

5. Extend your report to cover briefly the Cart-Pole problem, highlighting any differences with the Taxi

problem. Compare performance of Q-learning on both of these problems, presenting evidence (such as graphs) to support your evaluation.

If you’ve gotten to this point and created a good report that details what you’ve learned, you’ve met

the minimum requirements for this assessment task. Assuming a reasonable quality of report and

evidence, you can expect to earn a credit grade. Continue on to achieve a higher grade.

6. [Distinction] Select another environment from the OpenAI Gym, and implement Q-learning for this

environment. Extend your report to describe this new environment, including a mathematical model. Evaluate performance of Q-learning on this model, and identify any significant outcomes or

limitations of this approach on this new problem, compared to previous problems. Attempt to explain any difference or limitations.

7. [High Distinction] Implement Temporal Difference learning on the new environment you completed for step 6, as well as one of the Taxi problem, or the Cart-Pole problem. Contrast the performance of

TD learning and Q-learning in your report, providing evidence such as graphs and performance data.

Submission Components & Due Dates

This is an individual assessment task and as such, each student will complete their own project and

submission components. To be eligible for assessment in this task you must submit the following artefacts to

the relevant submission folder on the Unit Site no later than the given deadline:

1) The report detailing your models, experiments and outcomes of your reinforcement learning problems

and solutions. Your report should provide adequate information to evidence your learning against the

objectives stated above, and in line with the assessment rubric provided.

2) All code developed or used in this project. Your code must include appropriate documentation

(internal comments are sufficient) that explains what the code does. You should also provide

instructions on how to execute the code (for example, in a README file).

You may assume that the assessment team has access to OpenAI Gym and can execute your code. If you

rely on any third party libraries or applications that are required to run your solution, you need to provide

those, or make them accessible to the assessor (e.g., by providing a link to a dowload site, and instructions

on how to install and use the library in your solution).

Assignment Marking

This assignment will be marked on the following scale

Level Grade

Does Not Meet Minimum Standards N

Meets Minimum Standards P or C

Exceeds Minimum Standards D

Greatly Exceeds Minimum Standards HD

A numeric mark will be assigned based on the assessor’s determination as to where within the relevant grade

category the standard or work sits. A rubric will be provided on the Unit Site, under the

Resources>Assessment folder, to indicate the criteria upon which your submission components will be

assessed and the standards that will be applied for these criteria. Please contact the teaching team if you have

any concerns or questions regarding how you will be assessed.

Penalties

In accordance with Faculty assessment policies, late submissions to the submission folder will incur a penalty

of 5% of the total available marks per day, up to five days total, after which the score for this part of the task

is 0. Such penalties will be deducted from the awarded numeric mark to determine the final grade for this task.

Submissions will not be accepted or marked more than five days after the final submission deadline, except in

cases where an extension has been approved prior to the deadline.

Getting Help and Support

Students are encouraged to support each other to discuss the tasks, as well as to assist in overcoming problems

in understanding the concepts, models and algorithms relevant to RL problems and solutions. Getting feedback

from peers will certainly improve your understanding in this project, and help others to build their

understanding. Note however that as this is an individual assessment task, and all development work and report

writing must principally be the work of the student being assessed. Where you are asked to replicate the work

of others (e.g., completing the tutorials and reproducing the code and results of others), ensure that you

accurately and appropriately reference the source work. Academic penalties for collusion and plagiarism are

severe and students are urged to seek guidance and advice from the teaching team if they have any concerns

about how to complete this task appropriately.

Programming & Software Help

The School of IT runs a Learning Support Hub (LSH), which is accessible Monday to Friday, both on campus

and in the cloud. The LSH staff can assist with issues surrounding programming, and installation of software

(such as setting up Python), but they are NOT going to help you to complete the assignment. They cannot tell

you how to do this project, nor tell you if you’re doing it correctly. They are not teaching staff. You should

contact the LSH in the first instance to deal with any programming or software related issues.

Mathematics & Algorithm Help

The teaching team are here to support you in this task, in particular to assist you to develop an understanding

of reinforcement learning models and solution algorithms. We are also here to help you learn the underlying

knowledge, as described in the Preparatory Learning Activities section. If you are having trouble

understanding this material, or this task, please make contact with us. The best way to do this is by asking

questions in the Project Discussion forum on the Unit Site. Answers to your question will also help other

students, who are undoubtedly having the same kinds of problems as you.

Beyond this you may seek assistance from the teaching team during practical classes, or Bb Collaborate

sessions. Additional Bb Collaborate sessions dedicated to this project will be run in weeks 5 and 8. Details of

these will be provided on the Unit Site.

Report Writing Help

While the teaching team are also happy to provide advice and guidance on writing your report, the University

also provides support services for students. In particular, the Writing Mentors team offer great assistance for

students completing written assessment tasks – especially report writing. Visit

http://www.deakin.edu.au/students/studying/study-support/writing-mentors for more information.

Feedback

Students will receive verbal, written or recorded audio feedback on their project submission as part of their

assessment. Due to the timing of assessment and scheduling of exams by DSA, it cannot be guaranteed that

this feedback will be provided before the unit exam. Where a student requires specific feedback prior to the

exam, they should contact the Unit Chair, allowing sufficient time prior to the exam for this feedback to be

provided.

Students are actively encouraged to seek formative feedback from peers and teaching staff, on their work

completed before the submission deadline, to ensure they are on track with this task. Feedback may be

obtained during weekly scheduled practical classes upon request. Talk to us and we’ll support you!