- All Implemented Interfaces:
public class RLearningPolicy
- extends QUpdatePolicy
Implementation of the R-Learning update policy. This class will return a new
Q-Table value using the R-Learning update described by Sutton and Barto in
their text: Reinforcement Learning: An Introduction.
R-Learning is appropriate for continuing undiscounted tasks where the
objective is to maximize the reward at each time step. This is contrasted
with Q-learning where the objective is to learn a sequence of actions that
lead to a reward.
- Apr 27, 2011
- Grant Braught, Dickinson College
- See Also:
- Serialized Form
Construct a new RLearningPolicy.
|Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public RLearningPolicy(QTable q,
- Construct a new RLearningPolicy.
q - the QTable that will be updated.
alpha - the learning rate to be used for adjusting Q values.
beta - the step-size parameter for learning the average reward per
public void updateQValue(State s,
- Update the new Q-Table using R-Learning. The each time this method is
invoked it saved the provided state to be used as sp on the next call.
Note that the Q-Table is not updated on the first call. On each
subsequent call the Q-Table is updated using the the saved previous state
and the parameters.
- Specified by:
updateQValue in class
s - the current state.
a - the action that led to the current state.
r - the reward received for performing the action.
actions - the list of possible actions from the current state.