In Q learn, the Q value means the total value of a state-action pair. For example, let’s say there’s a fork in the sidewalk. If I go left, I will immediately be rewarded with $5. If I go right, I will be immediately lose $10; however, if I continue down the right path another 100 ft, I will be rewarded with an additional $100. This means the Q value for going left is $5, and the Q value for going right is $90. Even though I lose money in the beginning, the Q value is still higher going right because the total reward is $100 minus $10. We calculate the Q value as immediate reward plus all future rewards.
Latest Answers