MachineIntelligenceCore:ReinforcementLearning
|
Application of Partially Observable Deep Q-learning with Experience Reply to the maze of digits problem. There is an assumption that the agent observes only part of the environment (POMPD). More...
#include <MazeOfDigitsDLRERPOMPD.hpp>
Public Member Functions | |
MazeOfDigitsDLRERPOMPD (std::string node_name_="application") | |
virtual | ~MazeOfDigitsDLRERPOMPD () |
Protected Member Functions | |
virtual void | initialize (int argc, char *argv[]) |
virtual void | initializePropertyDependentVariables () |
virtual bool | performSingleStep () |
virtual void | startNewEpisode () |
virtual void | finishCurrentEpisode () |
Private Member Functions | |
float | computeBestValueForGivenStateAndPredictions (mic::types::Position2D player_position_, float *predictions_) |
mic::types::MatrixXfPtr | getPredictedRewardsForGivenState (mic::types::Position2D player_position_) |
mic::types::NESWAction | selectBestActionForGivenState (mic::types::Position2D player_position_) |
std::string | streamNetworkResponseTable () |
Private Attributes | |
WindowCollectorChart< float > * | w_chart |
Window for displaying statistics. More... | |
mic::utils::DataCollectorPtr < std::string, float > | collector_ptr |
Data collector. More... | |
WindowMazeOfDigits * | wmd_environment |
Window displaying the whole environment. More... | |
WindowMazeOfDigits * | wmd_observation |
Window displaying the observation. More... | |
mic::environments::MazeOfDigits | env |
The maze of digits environment. More... | |
std::shared_ptr< std::vector < mic::types::Position2D > > | saccadic_path |
Saccadic path - a sequence of consecutive agent positions. More... | |
size_t | batch_size |
Size of the batch in experience replay - set to the size of maze (width*height). More... | |
mic::configuration::Property < float > | step_reward |
mic::configuration::Property < float > | discount_rate |
mic::configuration::Property < float > | learning_rate |
mic::configuration::Property < double > | epsilon |
mic::configuration::Property< int > | step_limit |
mic::configuration::Property < std::string > | statistics_filename |
Property: name of the file to which the statistics will be exported. More... | |
mic::configuration::Property < std::string > | mlnn_filename |
Property: name of the file to which the neural network will be serialized (or deserialized from). More... | |
mic::configuration::Property < bool > | mlnn_save |
Property: flad denoting thether the nn should be saved to a file (after every episode end). More... | |
mic::configuration::Property < bool > | mlnn_load |
Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task). More... | |
BackpropagationNeuralNetwork < float > | neural_net |
Multi-layer neural network used for approximation of the Qstate rewards. More... | |
long long | sum_of_iterations |
double | sum_of_opt_to_episodic_lenghts |
SpatialExperienceMemory | experiences |
Application of Partially Observable Deep Q-learning with Experience Reply to the maze of digits problem. There is an assumption that the agent observes only part of the environment (POMPD).
Definition at line 51 of file MazeOfDigitsDLRERPOMPD.hpp.
mic::application::MazeOfDigitsDLRERPOMPD::MazeOfDigitsDLRERPOMPD | ( | std::string | node_name_ = "application" | ) |
Default Constructor. Sets the application/node name, default values of variables etc.
node_name_ | Name of the application/node (in configuration file). |
Definition at line 40 of file MazeOfDigitsDLRERPOMPD.cpp.
References discount_rate, epsilon, learning_rate, mlnn_filename, mlnn_load, mlnn_save, statistics_filename, step_limit, and step_reward.
|
virtual |
Destructor.
Definition at line 68 of file MazeOfDigitsDLRERPOMPD.cpp.
References w_chart, wmd_environment, and wmd_observation.
|
private |
Calculates the best value for the current state and predictions.
player_position_ | State (player position). |
predictions_ | Vector of predictions to be analyzed. |
Definition at line 267 of file MazeOfDigitsDLRERPOMPD.cpp.
References env, and mic::environments::Environment::isActionAllowed().
Referenced by performSingleStep().
|
protectedvirtual |
Method called when given episode ends (goal: export collected statistics to file etc.) - abstract, to be overridden.
Definition at line 158 of file MazeOfDigitsDLRERPOMPD.cpp.
References collector_ptr, env, mlnn_filename, mlnn_save, neural_net, mic::environments::MazeOfDigits::optimalPathLength(), statistics_filename, sum_of_iterations, and sum_of_opt_to_episodic_lenghts.
|
private |
Returns the predicted rewards for given state.
player_position_ | State (player position). |
Definition at line 291 of file MazeOfDigitsDLRERPOMPD.cpp.
References batch_size, mic::environments::MazeOfDigits::encodeObservation(), env, mic::environments::MazeOfDigits::getAgentPosition(), mic::environments::MazeOfDigits::getObservationSize(), mic::environments::MazeOfDigits::moveAgentToPosition(), and neural_net.
Referenced by selectBestActionForGivenState().
|
protectedvirtual |
Method initializes GLUT and OpenGL windows.
argc | Number of application parameters. |
argv | Array of application parameters. |
Definition at line 75 of file MazeOfDigitsDLRERPOMPD.cpp.
References collector_ptr, sum_of_iterations, sum_of_opt_to_episodic_lenghts, and w_chart.
|
protectedvirtual |
Initializes all variables that are property-dependent.
Definition at line 96 of file MazeOfDigitsDLRERPOMPD.cpp.
References batch_size, env, experiences, mic::environments::Environment::getEnvironment(), mic::environments::Environment::getEnvironmentHeight(), mic::environments::Environment::getEnvironmentWidth(), mic::environments::MazeOfDigits::getObservation(), mic::environments::Environment::getObservationHeight(), mic::environments::MazeOfDigits::getObservationSize(), mic::environments::Environment::getObservationWidth(), mic::environments::MazeOfDigits::initializeEnvironment(), mlnn_filename, mlnn_load, neural_net, saccadic_path, wmd_environment, and wmd_observation.
|
protectedvirtual |
Performs single step of computations.
Definition at line 364 of file MazeOfDigitsDLRERPOMPD.cpp.
References mic::types::SpatialExperienceMemory::add(), batch_size, computeBestValueForGivenStateAndPredictions(), discount_rate, mic::environments::MazeOfDigits::encodeObservation(), env, mic::environments::MazeOfDigits::environmentToString(), epsilon, experiences, mic::environments::MazeOfDigits::getAgentPosition(), mic::environments::MazeOfDigits::getObservation(), mic::environments::MazeOfDigits::getObservationSize(), mic::environments::MazeOfDigits::getStateReward(), mic::environments::MazeOfDigits::isStateTerminal(), learning_rate, mic::environments::Environment::moveAgent(), mic::environments::MazeOfDigits::moveAgentToPosition(), neural_net, mic::environments::MazeOfDigits::observationToString(), saccadic_path, selectBestActionForGivenState(), step_limit, step_reward, and streamNetworkResponseTable().
|
private |
Finds the best action for the current state.
player_position_ | State (player position). |
Definition at line 331 of file MazeOfDigitsDLRERPOMPD.cpp.
References env, getPredictedRewardsForGivenState(), and mic::environments::Environment::isActionAllowed().
Referenced by performSingleStep().
|
protectedvirtual |
Method called at the beginning of new episode (goal: to reset the statistics etc.) - abstract, to be overridden.
Definition at line 141 of file MazeOfDigitsDLRERPOMPD.cpp.
References env, mic::environments::MazeOfDigits::getAgentPosition(), mic::environments::MazeOfDigits::getObservation(), mic::environments::MazeOfDigits::initializeEnvironment(), and saccadic_path.
|
private |
Steams the current network response - values of actions associates with consecutive agent poses.
Definition at line 182 of file MazeOfDigitsDLRERPOMPD.cpp.
References batch_size, mic::environments::MazeOfDigits::encodeObservation(), env, mic::environments::MazeOfDigits::getAgentPosition(), mic::environments::Environment::getObservationHeight(), mic::environments::MazeOfDigits::getObservationSize(), mic::environments::Environment::getObservationWidth(), mic::environments::Environment::isActionAllowed(), mic::environments::MazeOfDigits::isStateAllowed(), mic::environments::MazeOfDigits::isStateTerminal(), mic::environments::MazeOfDigits::moveAgentToPosition(), and neural_net.
Referenced by performSingleStep().
|
private |
Size of the batch in experience replay - set to the size of maze (width*height).
Definition at line 115 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().
|
private |
Data collector.
Definition at line 100 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Property: future discount (should be in range 0.0-1.0).
Definition at line 125 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().
|
private |
The maze of digits environment.
Definition at line 109 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by computeBestValueForGivenStateAndPredictions(), finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), selectBestActionForGivenState(), startNewEpisode(), and streamNetworkResponseTable().
|
private |
Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). if epsilon < 0 then if will be set to 1/episode, hence change dynamically depending on the episode number.
Definition at line 136 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().
|
private |
Table of past experiences.
Definition at line 199 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
Property: neural network learning rate (should be in range 0.0-1.0).
Definition at line 130 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().
|
private |
Property: name of the file to which the neural network will be serialized (or deserialized from).
Definition at line 147 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by finishCurrentEpisode(), initializePropertyDependentVariables(), and MazeOfDigitsDLRERPOMPD().
|
private |
Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task).
Definition at line 153 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by initializePropertyDependentVariables(), and MazeOfDigitsDLRERPOMPD().
|
private |
Property: flad denoting thether the nn should be saved to a file (after every episode end).
Definition at line 150 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by finishCurrentEpisode(), and MazeOfDigitsDLRERPOMPD().
|
private |
Multi-layer neural network used for approximation of the Qstate rewards.
Definition at line 156 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().
|
private |
Saccadic path - a sequence of consecutive agent positions.
Definition at line 112 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by initializePropertyDependentVariables(), performSingleStep(), and startNewEpisode().
|
private |
Property: name of the file to which the statistics will be exported.
Definition at line 144 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by finishCurrentEpisode(), and MazeOfDigitsDLRERPOMPD().
|
private |
Limit of steps for episode. Setting step_limit <= 0 means that the limit should not be considered.
Definition at line 141 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().
|
private |
Property: the "expected intermediate reward", i.e. reward received by performing each step (typically negative, but can be positive as all).
Definition at line 120 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().
|
private |
Sum of all iterations made till now - used in statistics.
Definition at line 189 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Sum of optimal to episodic path lengths.
Definition at line 194 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Window for displaying statistics.
Definition at line 97 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by initialize(), and ~MazeOfDigitsDLRERPOMPD().
|
private |
Window displaying the whole environment.
Definition at line 103 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by initializePropertyDependentVariables(), and ~MazeOfDigitsDLRERPOMPD().
|
private |
Window displaying the observation.
Definition at line 105 of file MazeOfDigitsDLRERPOMPD.hpp.
Referenced by initializePropertyDependentVariables(), and ~MazeOfDigitsDLRERPOMPD().