MachineIntelligenceCore:ReinforcementLearning
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator
mic::application::MazeOfDigitsDLRERPOMPD Class Reference

Application of Partially Observable Deep Q-learning with Experience Reply to the maze of digits problem. There is an assumption that the agent observes only part of the environment (POMPD). More...

#include <MazeOfDigitsDLRERPOMPD.hpp>

Inheritance diagram for mic::application::MazeOfDigitsDLRERPOMPD:
Collaboration diagram for mic::application::MazeOfDigitsDLRERPOMPD:

Public Member Functions

 MazeOfDigitsDLRERPOMPD (std::string node_name_="application")
 
virtual ~MazeOfDigitsDLRERPOMPD ()
 

Protected Member Functions

virtual void initialize (int argc, char *argv[])
 
virtual void initializePropertyDependentVariables ()
 
virtual bool performSingleStep ()
 
virtual void startNewEpisode ()
 
virtual void finishCurrentEpisode ()
 

Private Member Functions

float computeBestValueForGivenStateAndPredictions (mic::types::Position2D player_position_, float *predictions_)
 
mic::types::MatrixXfPtr getPredictedRewardsForGivenState (mic::types::Position2D player_position_)
 
mic::types::NESWAction selectBestActionForGivenState (mic::types::Position2D player_position_)
 
std::string streamNetworkResponseTable ()
 

Private Attributes

WindowCollectorChart< float > * w_chart
 Window for displaying statistics. More...
 
mic::utils::DataCollectorPtr
< std::string, float > 
collector_ptr
 Data collector. More...
 
WindowMazeOfDigits * wmd_environment
 Window displaying the whole environment. More...
 
WindowMazeOfDigits * wmd_observation
 Window displaying the observation. More...
 
mic::environments::MazeOfDigits env
 The maze of digits environment. More...
 
std::shared_ptr< std::vector
< mic::types::Position2D > > 
saccadic_path
 Saccadic path - a sequence of consecutive agent positions. More...
 
size_t batch_size
 Size of the batch in experience replay - set to the size of maze (width*height). More...
 
mic::configuration::Property
< float > 
step_reward
 
mic::configuration::Property
< float > 
discount_rate
 
mic::configuration::Property
< float > 
learning_rate
 
mic::configuration::Property
< double > 
epsilon
 
mic::configuration::Property< int > step_limit
 
mic::configuration::Property
< std::string > 
statistics_filename
 Property: name of the file to which the statistics will be exported. More...
 
mic::configuration::Property
< std::string > 
mlnn_filename
 Property: name of the file to which the neural network will be serialized (or deserialized from). More...
 
mic::configuration::Property
< bool > 
mlnn_save
 Property: flad denoting thether the nn should be saved to a file (after every episode end). More...
 
mic::configuration::Property
< bool > 
mlnn_load
 Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task). More...
 
BackpropagationNeuralNetwork
< float > 
neural_net
 Multi-layer neural network used for approximation of the Qstate rewards. More...
 
long long sum_of_iterations
 
double sum_of_opt_to_episodic_lenghts
 
SpatialExperienceMemory experiences
 

Detailed Description

Application of Partially Observable Deep Q-learning with Experience Reply to the maze of digits problem. There is an assumption that the agent observes only part of the environment (POMPD).

Author
tkornuta

Definition at line 51 of file MazeOfDigitsDLRERPOMPD.hpp.

Constructor & Destructor Documentation

mic::application::MazeOfDigitsDLRERPOMPD::MazeOfDigitsDLRERPOMPD ( std::string  node_name_ = "application")

Default Constructor. Sets the application/node name, default values of variables etc.

Parameters
node_name_Name of the application/node (in configuration file).

Definition at line 40 of file MazeOfDigitsDLRERPOMPD.cpp.

References discount_rate, epsilon, learning_rate, mlnn_filename, mlnn_load, mlnn_save, statistics_filename, step_limit, and step_reward.

mic::application::MazeOfDigitsDLRERPOMPD::~MazeOfDigitsDLRERPOMPD ( )
virtual

Destructor.

Definition at line 68 of file MazeOfDigitsDLRERPOMPD.cpp.

References w_chart, wmd_environment, and wmd_observation.

Member Function Documentation

float mic::application::MazeOfDigitsDLRERPOMPD::computeBestValueForGivenStateAndPredictions ( mic::types::Position2D  player_position_,
float *  predictions_ 
)
private

Calculates the best value for the current state and predictions.

Parameters
player_position_State (player position).
predictions_Vector of predictions to be analyzed.
Returns
Value of the best possible action for given state.

Definition at line 267 of file MazeOfDigitsDLRERPOMPD.cpp.

References env, and mic::environments::Environment::isActionAllowed().

Referenced by performSingleStep().

void mic::application::MazeOfDigitsDLRERPOMPD::finishCurrentEpisode ( )
protectedvirtual

Method called when given episode ends (goal: export collected statistics to file etc.) - abstract, to be overridden.

Definition at line 158 of file MazeOfDigitsDLRERPOMPD.cpp.

References collector_ptr, env, mlnn_filename, mlnn_save, neural_net, mic::environments::MazeOfDigits::optimalPathLength(), statistics_filename, sum_of_iterations, and sum_of_opt_to_episodic_lenghts.

mic::types::MatrixXfPtr mic::application::MazeOfDigitsDLRERPOMPD::getPredictedRewardsForGivenState ( mic::types::Position2D  player_position_)
private

Returns the predicted rewards for given state.

Parameters
player_position_State (player position).
Returns
Pointer to the predicted rewards (network output matrix).

Definition at line 291 of file MazeOfDigitsDLRERPOMPD.cpp.

References batch_size, mic::environments::MazeOfDigits::encodeObservation(), env, mic::environments::MazeOfDigits::getAgentPosition(), mic::environments::MazeOfDigits::getObservationSize(), mic::environments::MazeOfDigits::moveAgentToPosition(), and neural_net.

Referenced by selectBestActionForGivenState().

void mic::application::MazeOfDigitsDLRERPOMPD::initialize ( int  argc,
char *  argv[] 
)
protectedvirtual

Method initializes GLUT and OpenGL windows.

Parameters
argcNumber of application parameters.
argvArray of application parameters.

Definition at line 75 of file MazeOfDigitsDLRERPOMPD.cpp.

References collector_ptr, sum_of_iterations, sum_of_opt_to_episodic_lenghts, and w_chart.

mic::types::NESWAction mic::application::MazeOfDigitsDLRERPOMPD::selectBestActionForGivenState ( mic::types::Position2D  player_position_)
private

Finds the best action for the current state.

Parameters
player_position_State (player position).
Returns
The best action found.

Definition at line 331 of file MazeOfDigitsDLRERPOMPD.cpp.

References env, getPredictedRewardsForGivenState(), and mic::environments::Environment::isActionAllowed().

Referenced by performSingleStep().

void mic::application::MazeOfDigitsDLRERPOMPD::startNewEpisode ( )
protectedvirtual

Method called at the beginning of new episode (goal: to reset the statistics etc.) - abstract, to be overridden.

Definition at line 141 of file MazeOfDigitsDLRERPOMPD.cpp.

References env, mic::environments::MazeOfDigits::getAgentPosition(), mic::environments::MazeOfDigits::getObservation(), mic::environments::MazeOfDigits::initializeEnvironment(), and saccadic_path.

Member Data Documentation

size_t mic::application::MazeOfDigitsDLRERPOMPD::batch_size
private

Size of the batch in experience replay - set to the size of maze (width*height).

Definition at line 115 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().

mic::utils::DataCollectorPtr<std::string, float> mic::application::MazeOfDigitsDLRERPOMPD::collector_ptr
private

Data collector.

Definition at line 100 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and initialize().

mic::configuration::Property<float> mic::application::MazeOfDigitsDLRERPOMPD::discount_rate
private

Property: future discount (should be in range 0.0-1.0).

Definition at line 125 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

mic::configuration::Property<double> mic::application::MazeOfDigitsDLRERPOMPD::epsilon
private

Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). if epsilon < 0 then if will be set to 1/episode, hence change dynamically depending on the episode number.

Definition at line 136 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

SpatialExperienceMemory mic::application::MazeOfDigitsDLRERPOMPD::experiences
private

Table of past experiences.

Definition at line 199 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

mic::configuration::Property<float> mic::application::MazeOfDigitsDLRERPOMPD::learning_rate
private

Property: neural network learning rate (should be in range 0.0-1.0).

Definition at line 130 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

mic::configuration::Property<std::string> mic::application::MazeOfDigitsDLRERPOMPD::mlnn_filename
private

Property: name of the file to which the neural network will be serialized (or deserialized from).

Definition at line 147 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), initializePropertyDependentVariables(), and MazeOfDigitsDLRERPOMPD().

mic::configuration::Property<bool> mic::application::MazeOfDigitsDLRERPOMPD::mlnn_load
private

Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task).

Definition at line 153 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), and MazeOfDigitsDLRERPOMPD().

mic::configuration::Property<bool> mic::application::MazeOfDigitsDLRERPOMPD::mlnn_save
private

Property: flad denoting thether the nn should be saved to a file (after every episode end).

Definition at line 150 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and MazeOfDigitsDLRERPOMPD().

BackpropagationNeuralNetwork<float> mic::application::MazeOfDigitsDLRERPOMPD::neural_net
private

Multi-layer neural network used for approximation of the Qstate rewards.

Definition at line 156 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().

std::shared_ptr<std::vector <mic::types::Position2D> > mic::application::MazeOfDigitsDLRERPOMPD::saccadic_path
private

Saccadic path - a sequence of consecutive agent positions.

Definition at line 112 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), performSingleStep(), and startNewEpisode().

mic::configuration::Property<std::string> mic::application::MazeOfDigitsDLRERPOMPD::statistics_filename
private

Property: name of the file to which the statistics will be exported.

Definition at line 144 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and MazeOfDigitsDLRERPOMPD().

mic::configuration::Property<int> mic::application::MazeOfDigitsDLRERPOMPD::step_limit
private

Limit of steps for episode. Setting step_limit <= 0 means that the limit should not be considered.

Definition at line 141 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

mic::configuration::Property<float> mic::application::MazeOfDigitsDLRERPOMPD::step_reward
private

Property: the "expected intermediate reward", i.e. reward received by performing each step (typically negative, but can be positive as all).

Definition at line 120 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

long long mic::application::MazeOfDigitsDLRERPOMPD::sum_of_iterations
private

Sum of all iterations made till now - used in statistics.

Definition at line 189 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and initialize().

double mic::application::MazeOfDigitsDLRERPOMPD::sum_of_opt_to_episodic_lenghts
private

Sum of optimal to episodic path lengths.

Definition at line 194 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and initialize().

WindowCollectorChart<float>* mic::application::MazeOfDigitsDLRERPOMPD::w_chart
private

Window for displaying statistics.

Definition at line 97 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initialize(), and ~MazeOfDigitsDLRERPOMPD().

WindowMazeOfDigits* mic::application::MazeOfDigitsDLRERPOMPD::wmd_environment
private

Window displaying the whole environment.

Definition at line 103 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), and ~MazeOfDigitsDLRERPOMPD().

WindowMazeOfDigits* mic::application::MazeOfDigitsDLRERPOMPD::wmd_observation
private

Window displaying the observation.

Definition at line 105 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), and ~MazeOfDigitsDLRERPOMPD().


The documentation for this class was generated from the following files: