dLife Home Page

dlife.rl
Class TDLearner

java.lang.Object
  extended by dlife.sys.SerializationBase
      extended by dlife.rl.TDLearner
All Implemented Interfaces:
Serializable

public class TDLearner
extends SerializationBase

Base class for Time Difference based reinforcement learning agents (e.g. Q-learning or SARSA). This class uses policy objects for action selection, and Q-value updates, allowing a variety of TD-based reinforcement learning methods to be implemented.

Version:
Apr 26, 2011
Author:
Grant Braught, Dickinson College
See Also:
Serialized Form

Constructor Summary
TDLearner(ArrayList<Action> actions, QTable qTable, ActionSelectionPolicy selector, QUpdatePolicy updater)
          Construct a new TDLearning agent.
 
Method Summary
 Action getNextAction(State s, double reward)
          Select the next action to be performed and update the Q-Table based on the reward for the previous action.
 QTable getQTable()
          Get the QTable that is being used by this TDLearner.
 
Methods inherited from class dlife.sys.SerializationBase
read, write
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TDLearner

public TDLearner(ArrayList<Action> actions,
                 QTable qTable,
                 ActionSelectionPolicy selector,
                 QUpdatePolicy updater)
Construct a new TDLearning agent. The actions available to the agent, how it learns and how it selects actions are dictated by the parameters. Note: References to each of the parameter objects are stored in fields. Thus, changes made to the objects (e.g. modifying the list of available actions) will be reflected when the getNextAction method is invoked.

Parameters:
actions - the actions available to the agent.
qTable - the Q-table to be used.
selector - the policy object that selects actions for the agent.
updater - the policy object that updates the agent's Q-Table.
Method Detail

getNextAction

public Action getNextAction(State s,
                            double reward)
Select the next action to be performed and update the Q-Table based on the reward for the previous action. This method does the following:
  1. Updates the Q-Table for the previous state and the last action using the provided reward value.
  2. Chooses the next action to be taken using the updated Q-Table.
  3. Updates the N value for the (State,Action) pair given by the current state and the chosen action.

Parameters:
s - the current state.
reward - the reward received for the action that led to this state.
Returns:
the next Action to be taken.

getQTable

public QTable getQTable()
Get the QTable that is being used by this TDLearner. Retrieving the QTable can be useful for transplanting a trained QTable from one TDLearner to another TDLearner that may use a different action selection policy.

Returns:
the qTable for this TDLearner.

dLife Home Page