How are computers motivated in Q Learning?

733 views

I get that it uses a number going up and down depending on the behavior, but why does the computer/AI try to make the number go up?

In: Technology

5 Answers

Anonymous 0 Comments

In Q learn, the Q value means the total value of a state-action pair. For example, let’s say there’s a fork in the sidewalk. If I go left, I will immediately be rewarded with $5. If I go right, I will be immediately lose $10; however, if I continue down the right path another 100 ft, I will be rewarded with an additional $100. This means the Q value for going left is $5, and the Q value for going right is $90. Even though I lose money in the beginning, the Q value is still higher going right because the total reward is $100 minus $10. We calculate the Q value as immediate reward plus all future rewards.

You are viewing 1 out of 5 answers, click here to view all answers.