dLife Home Page

Package dlife.rl

Classes and interfaces that support time difference (TD) based reinforcement learning.

See:
          Description

Interface Summary
LearningRateFunction Interface for the function that determines the learning rate.
QInitializationPolicy Interfaces for classes that are used to initialize the Q-value for the (State,Action) pairs in the Q-Table.
 

Class Summary
Action Base class for defining the Actions that can be taken by a TD-based reinforcement learning agent.
ActionSelectionPolicy Abstract base class for objects that set the policy for action selection in a TD-based reinforcement learning agent.
ConstantLearningRate Implementation of a constant learning rate.
ConstantQInitializationPolicy A QInitializationPolicy that will set the initial Q-Value for each (State,Action) pair to a fixed value specified by a parameter the constructor.
DecayingLearningRate A learning rate function that decreases as a function of the number of times a (State,Action) pair is tried.
EpsilonGreedyActionSelectionPolicy An epsilon-greedy action selection policy.
QLearningPolicy Implementation of the Q-Learning update policy.
QTable This class provides a Q-Table for TD-based reinforcement learning agents.
QUpdatePolicy Abstract base class for objects that update the Q values in the Q-Table used in a TD-based reinforcement learning agent.
RLearningPolicy Implementation of the R-Learning update policy.
SARSALearningPolicy Implementation of the SARSA-Learning update policy.
State Base class for representing the state of a TD-based reinforcement learning agent.
StringAction An Action that can be described by a String.
StringState A State that can be described by a String.
TDLearner Base class for Time Difference based reinforcement learning agents (e.g.
ThresholdActionSelectionPolicy An action selection policy that tries every action at least a specified number of times before reverting to a greedy algorithm.
 

Package dlife.rl Description

Classes and interfaces that support time difference (TD) based reinforcement learning. TD-based reinforcement learning is based on the idea of observing an agent's current situation (its State, selecting an action from a set of possible actions (the Action), carrying out that action and observing a reward value. The reward value indicates how good or bad the action was. If the action was good then the probability of executing that action again if the same situation arises is increased and vice versa.

The examples.dlife.rl package contains a working example that illustrates the use of the dlife.rl package.

TD Learning:

A Q-Table is used to track the reward that is expected for carrying out each possible action in each possible state. The Q-Table is examined to select the action to be taken at each step. Different techniques exist for selecting the action to be taken. Each different technique strikes a different balance between the exploration of new actions and the exploitation of actions that have already been observed to be good. Some examples include:

A number of different techniques also exist for updating the expected reward values in the Q-Table. Some examples include:

More information about TD-based reinforcement learning can be found in most introductory AI texts and also on line:

Package Structure:

The main class in this package, TDLearner, controls the learning process. It relies on the use of a number of objects that for support and to determine the specific type of TD-based learning that will occur:

The examples.dlife.rl package contains a working example that illustrates the use of the dlife.rl package.


dLife Home Page