subject

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given with samples of what an agent actually experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action). In this problem, instead of first estimating the transition and reward functions, we will directly estimate the Q function using Q-learning. Assume, the discount factor, γ is 0.5 and the step size for Q-learning, α is 0.5. Our current Q function, Q(s, a), is shown in the left figure. The agent encounters the samples shown in the right figure: s A B a s' с r Clockwise 1.501 -0.451 2.73 A Counterclockwise C 8.0 Counterclockwise 3.153-6.055 2.133 Counterclockwise A 0.0
Provide the Q-values for all pairs of (state, action) after both samples have been accounted for.

ansver
Answers: 3

Another question on Computers and Technology

question
Computers and Technology, 23.06.2019 13:50
Explain how email technologies enable the exchange of messages between users. find out the typical parts of an email address and explain each part.
Answers: 1
question
Computers and Technology, 24.06.2019 01:30
Suppose a cpu with a write-through, write-allocate cache achieves a cpi of 2. what are the read and write bandwidths (measured by bytes per cycle) between ram and the cache? (assume each miss generates a request for one block.)
Answers: 1
question
Computers and Technology, 24.06.2019 02:30
How to apply the fly in effect to objects on a slide
Answers: 1
question
Computers and Technology, 24.06.2019 06:30
Adrawing that places all lines parallel to the z axis at an angle from the horizon is 99 ! a. an oblique drawing b. a perspective drawing c. an auxiliary view d. a one-point perspective drawing
Answers: 2
You know the right answer?
Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not k...
Questions
Questions on the website: 13722367