|
dLife Home Page | |||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Interface Summary | |
|---|---|
| LearningRateFunction | Interface for the function that determines the learning rate. |
| QInitializationPolicy | Interfaces for classes that are used to initialize the Q-value for the (State,Action) pairs in the Q-Table. |
| Class Summary | |
|---|---|
| Action | Base class for defining the Actions that can be taken by a TD-based reinforcement learning agent. |
| ActionSelectionPolicy | Abstract base class for objects that set the policy for action selection in a TD-based reinforcement learning agent. |
| ConstantLearningRate | Implementation of a constant learning rate. |
| ConstantQInitializationPolicy | A QInitializationPolicy that will set the initial Q-Value for each (State,Action) pair to a fixed value specified by a parameter the constructor. |
| DecayingLearningRate | A learning rate function that decreases as a function of the number of times a (State,Action) pair is tried. |
| EpsilonGreedyActionSelectionPolicy | An epsilon-greedy action selection policy. |
| QLearningPolicy | Implementation of the Q-Learning update policy. |
| QTable | This class provides a Q-Table for TD-based reinforcement learning agents. |
| QUpdatePolicy | Abstract base class for objects that update the Q values in the Q-Table used in a TD-based reinforcement learning agent. |
| RLearningPolicy | Implementation of the R-Learning update policy. |
| SARSALearningPolicy | Implementation of the SARSA-Learning update policy. |
| State | Base class for representing the state of a TD-based reinforcement learning agent. |
| StringAction | An Action that can be described by a String. |
| StringState | A State that can be described by a String. |
| TDLearner | Base class for Time Difference based reinforcement learning agents (e.g. |
| ThresholdActionSelectionPolicy | An action selection policy that tries every action at least a specified number of times before reverting to a greedy algorithm. |
Classes and interfaces that support time difference (TD) based reinforcement learning. TD-based reinforcement learning is based on the idea of observing an agent's current situation (its State, selecting an action from a set of possible actions (the Action), carrying out that action and observing a reward value. The reward value indicates how good or bad the action was. If the action was good then the probability of executing that action again if the same situation arises is increased and vice versa.
The examples.dlife.rl
package contains a working example that illustrates the use of the dlife.rl
package.
A Q-Table is used to track the reward that is expected for carrying out each possible action in each possible state. The Q-Table is examined to select the action to be taken at each step. Different techniques exist for selecting the action to be taken. Each different technique strikes a different balance between the exploration of new actions and the exploitation of actions that have already been observed to be good. Some examples include:
ThresholdActionSelectionPolicy.
A number of different techniques also exist for updating the expected reward values in the Q-Table. Some examples include:
QLearningPolicy
More information about TD-based reinforcement learning can be found in most introductory AI texts and also on line:
The main class in this package, TDLearner,
controls the learning process. It relies on the use of a number of
objects that for support and to determine the specific type of TD-based
learning that will occur:
QTable: A Q-Table object is used to hold the
expected reward values for each (State,Action) pair that the agent
encounters.
Action: Implementations of this interface are
used to define the possible actions that an agent can take. There will
be one Action object for each possible action. See StringAction.
State: Implementations of this interface are
used to represent the possible states of the agent. Each time an action
is to be selected the current state of the agent must be presented
using a State object. See StringState.
ActionSelectionPolicy: Concrete
implementations of this class are used to specify different policies
for how the information in the Q-Table is used to select actions for
the agent. See ThresholdActionSelectionPolicy.
QUpdatePolicy: Concrete implementations of
this class are used to specify how the values in the Q-Table should be
updated based on the agent's experience. See QLearningPolicy.
QInitializationPolicy: Implementations of
this interface are used to specify how the values in the Q-Table should
be initialized. See ConstantQInitializationPolicy.
LearningRateFunction: Implementations of this
interface specify the learning rate, usually called alpha, that is used
when updating the Q-Table values. See DecayingLearningRate.
The examples.dlife.rl
package contains a working example that illustrates the use of the dlife.rl
package.
|
dLife Home Page | |||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||