Mathematics, 21.02.2020 17:58 deonceee4671
In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represents 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Answers: 3
Mathematics, 21.06.2019 22:00
Nikita wants to apply for student aid to fund her college education. arrange the steps involved in nikita’s application for financial aid
Answers: 3
In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents...
Computers and Technology, 15.05.2021 14:00
English, 15.05.2021 14:00
World Languages, 15.05.2021 14:00
Health, 15.05.2021 14:00
Mathematics, 15.05.2021 14:00
Mathematics, 15.05.2021 14:00
Mathematics, 15.05.2021 14:00
English, 15.05.2021 14:00
Mathematics, 15.05.2021 14:00
History, 15.05.2021 14:00
Mathematics, 15.05.2021 14:00
Arts, 15.05.2021 14:00
Chemistry, 15.05.2021 14:00
Mathematics, 15.05.2021 14:00
Mathematics, 15.05.2021 14:00