subject
Computers and Technology, 27.11.2019 02:31 xojade

Consider an agent starting in a room a in which it can take two possible actions: to leave the room (action "l") or to stay (action "s"). if it leaves a, the agent moves to room b, which is a terminal state (no more actions can be taken). the outcomes of the actions are uncertain, so that when executing action l (or action s), there is some probability that the agent will leave a (or stay in a). we assume that the reward in entering state b is r(b) = +1 and the reward for being in state a is r(a) = -0.1. (a) draw the (very simple) diagram corresponding to this mdp. answer by inspection of the diagram: what is the optimal policy? (b) assume that the agent knows neither the world (transition probabilities) nor the utilities of the states. assume that the agent, for some reason, happens to follow the optimal policy. the rewards received at states a and b are the same as described above.. in the process of executing this policy, the agent execute four trials and, in each trial, it stops after reaching state b. the following state sequences are recorded during the trials: aaab, aab, ab, ab. what is the estimate of t., what is the estimate of u(a), assuming a discount factor of = 0.5?

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 22:40
Write a program that defines symbolic names for several string literals (chars between quotes). * use each symbolic name in a variable definition. * use of symbolic to compose the assembly code instruction set can perform vara = (vara - varb) + (varc - vard); ensure that variable is in unsigned integer data type. * you should also further enhance your symbolic logic block to to perform expression by introducing addition substitution rule. vara = (vara+varb) - (varc+vard). required: debug the disassembly code and note down the address and memory information.
Answers: 3
question
Computers and Technology, 23.06.2019 17:00
1. which of the following is not an example of an objective question? a. multiple choice. b. essay. c. true/false. d. matching 2. why is it important to recognize the key word in the essay question? a. it will provide the answer to the essay. b. it will show you a friend's answer. c. it will provide you time to look for the answer. d. it will guide you on which kind of answer is required.
Answers: 1
question
Computers and Technology, 23.06.2019 21:30
Write a fragment of code that reads in strings from standard input, until end-of-file and prints to standard output the largest value. you may assume there is at least one value. (cascading/streaming logic, basic string processing)
Answers: 3
question
Computers and Technology, 25.06.2019 04:00
What was the name of the first computer (machine) language?
Answers: 2
You know the right answer?
Consider an agent starting in a room a in which it can take two possible actions: to leave the room...
Questions
question
Mathematics, 12.10.2020 20:01
question
Mathematics, 12.10.2020 20:01
Questions on the website: 13722361