subject

Computers and Technology, 14.06.2021 15:10 kingteron6166

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given with samples of what an agent actually experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action). In this problem, instead of first estimating the transition and reward functions, we will directly estimate the Q function using Q-learning. Assume, the discount factor, γ is 0.5 and the step size for Q-learning, α is 0.5. Our current Q function, Q(s, a), is shown in the left figure. The agent encounters the samples shown in the right figure: s A B a s' с r Clockwise 1.501 -0.451 2.73 A Counterclockwise C 8.0 Counterclockwise 3.153-6.055 2.133 Counterclockwise A 0.0
Provide the Q-values for all pairs of (state, action) after both samples have been accounted for.

ansver

Answers: 3

Show answers

Another question on Computers and Technology

question

Computers and Technology, 23.06.2019 13:50

Explain how email technologies enable the exchange of messages between users. find out the typical parts of an email address and explain each part.

Answers: 1

question

Computers and Technology, 24.06.2019 01:30

Suppose a cpu with a write-through, write-allocate cache achieves a cpi of 2. what are the read and write bandwidths (measured by bytes per cycle) between ram and the cache? (assume each miss generates a request for one block.)

Answers: 1

question

Computers and Technology, 24.06.2019 02:30

How to apply the fly in effect to objects on a slide

Answers: 1

question

Computers and Technology, 24.06.2019 06:30

Adrawing that places all lines parallel to the z axis at an angle from the horizon is 99 ! a. an oblique drawing b. a perspective drawing c. an auxiliary view d. a one-point perspective drawing

Answers: 2

You know the right answer?

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not k...

Questions

question

Arts, 08.04.2020 15:41

1. What are some of the conventions (widely used and accepted devices, practices, or techniques) of science fiction films that Raiders of the Lost Ark...

question

Computers and Technology, 08.04.2020 15:41

List the unit numbers of any pair of condos that have the same square footage. For example, one pair would be unit number 201 and unit number 401, bec...

question

Biology, 08.04.2020 15:41

Why is the percentage similarity in the gene always lower...

question

Physics, 08.04.2020 15:41

What is the gravitational force on a 10 kg sled traveling at 3.5 m/s down a track? a. 9.8 N b. 3.5 N c. 35 N d. 98 N...

question

Mathematics, 08.04.2020 15:41

A density graph for all the possible weights between 0 pounds and 100 pounds is in the shape of a rectangle. What is the height of the rectangle...

question

Mathematics, 08.04.2020 15:41

Use demoivre's theorem to evaluate the expression [sqrt 3(cos 5pi/12 + i sin 5pi/12)]^4answer = 9/2- 9sqrt3/2i...

question

Mathematics, 08.04.2020 15:42

Let f(x) = 8x3 + 18x2 − 10 and g(x) = 4x + 1. Find f of x over g of x....

question

Biology, 08.04.2020 15:42

Why do plasma cells derived from b cells have extensive endoplasmic reticulum?...

question

English, 08.04.2020 15:42

PART B: Which of the following quotes best supports the answer to Part A? A “We but half express ourselves, and are ashamed of that divine idea which...

question

Mathematics, 08.04.2020 15:42

Write a word problem that would have a proof drawing with no new tens and no new hundreds. explain how you choose your numbers. Solve using a numerica...

question

Business, 08.04.2020 15:43

Kelly Malone plans to have $51 withheld from her monthly paycheck and deposited in a savings account that earns 12% annually, compounded monthly. If M...

question

Mathematics, 08.04.2020 15:44

A lunette is a semicircular window that is sometimes placed above a doorway or above a rectangular window. A certain lunette has a diameter of 40 in....

question

Computers and Technology, 08.04.2020 15:44

Which object that connects parts of a network can cause short connections due to the deterioration of the outer coming that connect parts of a n...

question

Chemistry, 08.04.2020 15:44

Λmax for the π → π* transition in ethylene is 170 nm. Is the HOMO-LUMO energy difference in ethylene greater than or less than that of cis, trans−1,3−...

question

Biology, 08.04.2020 15:44

What is meant by predation?what is the relationship between predator and prey...

question

Social Studies, 08.04.2020 15:44

The BLANK make up a small minority group on the Japanese island of Hokkaido. choices: Aino, meiji, samurai, tokugawa...

question

Social Studies, 08.04.2020 15:44

"The Fog Index and the Flesch–Kincaid Grade Level calculator consider the length of sentences and the difficulty of words to produce the approximate g...

question

English, 08.04.2020 15:45

In the poem, we wear the mask how do people overcome adversity...

question

Physics, 08.04.2020 15:45

Why does a forming star go from cold to hot as it forms?...

question

World Languages, 08.04.2020 15:45

What were rooney's mistake eastern state air environment case study...

More questions: Computers and Technology Another questions

Questions on the website: 13722367