subject

Consider the 3 × 3 world shown below. 80% of the time the agent goes in the direction it selects; the rest of the time it moves at right angles to the intended direction.

r -1 +10
-1 -1 -1
-1 -1 -1

Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99.
Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.

a) r = 100
b) r = −3
c) r = 0
d) r = +3

ansver
Answers: 3

Another question on Computers and Technology

question
Computers and Technology, 21.06.2019 22:40
State the parts of a variable declaration?
Answers: 2
question
Computers and Technology, 23.06.2019 06:20
What is a point-in-time measurement of system performance?
Answers: 3
question
Computers and Technology, 24.06.2019 04:30
Write and test a python program to find and print the largest number in a set of real (floating point) numbers. the program should first read a single positive integer number from the user, which will be how many numbers to read and search through. after reading in all of the numbers, the largest of the numbers input (not considering the count input) should be printed.
Answers: 1
question
Computers and Technology, 24.06.2019 13:00
Why should you evaluate trends when thinking about a career path?
Answers: 1
You know the right answer?
Consider the 3 × 3 world shown below. 80% of the time the agent goes in the direction it selects; th...
Questions
question
Mathematics, 01.07.2021 17:40
Questions on the website: 13722363