MachineIntelligenceCore:ReinforcementLearning
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator
mic::application::MNISTDigitDLRERPOMDP Class Reference

Application of Partially Observable Deep Q-learning with Experience Reply to the MNIST digits problem. There is an assumption that the agent observes only part of the environment - a patch of the whole image (POMPD). More...

#include <MNISTDigitDLRERPOMDP.hpp>

Inheritance diagram for mic::application::MNISTDigitDLRERPOMDP:
Collaboration diagram for mic::application::MNISTDigitDLRERPOMDP:

Public Member Functions

 MNISTDigitDLRERPOMDP (std::string node_name_="application")
 
virtual ~MNISTDigitDLRERPOMDP ()
 

Protected Member Functions

virtual void initialize (int argc, char *argv[])
 
virtual void initializePropertyDependentVariables ()
 
virtual bool performSingleStep ()
 
virtual void startNewEpisode ()
 
virtual void finishCurrentEpisode ()
 

Private Member Functions

float computeBestValueForGivenStateAndPredictions (mic::types::Position2D player_position_, float *predictions_)
 
mic::types::MatrixXfPtr getPredictedRewardsForGivenState (mic::types::Position2D player_position_)
 
mic::types::NESWAction selectBestActionForGivenState (mic::types::Position2D player_position_)
 
std::string streamNetworkResponseTable ()
 

Private Attributes

WindowCollectorChart< float > * w_chart
 Window for displaying statistics. More...
 
mic::utils::DataCollectorPtr
< std::string, float > 
collector_ptr
 Data collector. More...
 
WindowMNISTDigit * wmd_environment
 Window displaying the whole environment. More...
 
WindowMNISTDigit * wmd_observation
 Window displaying the observation. More...
 
mic::environments::MNISTDigit env
 The maze of digits environment. More...
 
std::shared_ptr< std::vector
< mic::types::Position2D > > 
saccadic_path
 Saccadic path - a sequence of consecutive agent positions. More...
 
size_t batch_size
 Size of the batch in experience replay - set to the size of maze (width*height). More...
 
mic::configuration::Property
< float > 
step_reward
 
mic::configuration::Property
< float > 
discount_rate
 
mic::configuration::Property
< float > 
learning_rate
 
mic::configuration::Property
< double > 
epsilon
 
mic::configuration::Property< int > step_limit
 
mic::configuration::Property
< std::string > 
statistics_filename
 Property: name of the file to which the statistics will be exported. More...
 
mic::configuration::Property
< std::string > 
mlnn_filename
 Property: name of the file to which the neural network will be serialized (or deserialized from). More...
 
mic::configuration::Property
< bool > 
mlnn_save
 Property: flad denoting thether the nn should be saved to a file (after every episode end). More...
 
mic::configuration::Property
< bool > 
mlnn_load
 Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task). More...
 
BackpropagationNeuralNetwork
< float > 
neural_net
 Multi-layer neural network used for approximation of the Qstate rewards. More...
 
long long sum_of_iterations
 
SpatialExperienceMemory experiences
 

Detailed Description

Application of Partially Observable Deep Q-learning with Experience Reply to the MNIST digits problem. There is an assumption that the agent observes only part of the environment - a patch of the whole image (POMPD).

Author
tkornuta

Definition at line 51 of file MNISTDigitDLRERPOMDP.hpp.

Constructor & Destructor Documentation

mic::application::MNISTDigitDLRERPOMDP::MNISTDigitDLRERPOMDP ( std::string  node_name_ = "application")

Default Constructor. Sets the application/node name, default values of variables etc.

Parameters
node_name_Name of the application/node (in configuration file).

Definition at line 41 of file MNISTDigitDLRERPOMDP.cpp.

References discount_rate, epsilon, learning_rate, mlnn_filename, mlnn_load, mlnn_save, statistics_filename, step_limit, and step_reward.

mic::application::MNISTDigitDLRERPOMDP::~MNISTDigitDLRERPOMDP ( )
virtual

Destructor.

Definition at line 69 of file MNISTDigitDLRERPOMDP.cpp.

References w_chart, wmd_environment, and wmd_observation.

Member Function Documentation

float mic::application::MNISTDigitDLRERPOMDP::computeBestValueForGivenStateAndPredictions ( mic::types::Position2D  player_position_,
float *  predictions_ 
)
private

Calculates the best value for the current state and predictions.

Parameters
player_position_State (player position).
predictions_Vector of predictions to be analyzed.
Returns
Value of the best possible action for given state.

Definition at line 260 of file MNISTDigitDLRERPOMDP.cpp.

References env, and mic::environments::Environment::isActionAllowed().

Referenced by performSingleStep().

void mic::application::MNISTDigitDLRERPOMDP::finishCurrentEpisode ( )
protectedvirtual

Method called when given episode ends (goal: export collected statistics to file etc.) - abstract, to be overridden.

Definition at line 154 of file MNISTDigitDLRERPOMDP.cpp.

References collector_ptr, env, mlnn_filename, mlnn_save, neural_net, mic::environments::MNISTDigit::optimalPathLength(), statistics_filename, and sum_of_iterations.

mic::types::MatrixXfPtr mic::application::MNISTDigitDLRERPOMDP::getPredictedRewardsForGivenState ( mic::types::Position2D  player_position_)
private

Returns the predicted rewards for given state.

Parameters
player_position_State (player position).
Returns
Pointer to the predicted rewards (network output matrix).

Definition at line 284 of file MNISTDigitDLRERPOMDP.cpp.

References batch_size, mic::environments::MNISTDigit::encodeObservation(), env, mic::environments::MNISTDigit::getAgentPosition(), mic::environments::Environment::getObservationSize(), mic::environments::MNISTDigit::moveAgentToPosition(), and neural_net.

Referenced by selectBestActionForGivenState().

void mic::application::MNISTDigitDLRERPOMDP::initialize ( int  argc,
char *  argv[] 
)
protectedvirtual

Method initializes GLUT and OpenGL windows.

Parameters
argcNumber of application parameters.
argvArray of application parameters.

Definition at line 76 of file MNISTDigitDLRERPOMDP.cpp.

References collector_ptr, sum_of_iterations, and w_chart.

mic::types::NESWAction mic::application::MNISTDigitDLRERPOMDP::selectBestActionForGivenState ( mic::types::Position2D  player_position_)
private

Finds the best action for the current state.

Parameters
player_position_State (player position).
Returns
The best action found.

Definition at line 324 of file MNISTDigitDLRERPOMDP.cpp.

References env, getPredictedRewardsForGivenState(), and mic::environments::Environment::isActionAllowed().

Referenced by performSingleStep().

void mic::application::MNISTDigitDLRERPOMDP::startNewEpisode ( )
protectedvirtual

Method called at the beginning of new episode (goal: to reset the statistics etc.) - abstract, to be overridden.

Definition at line 137 of file MNISTDigitDLRERPOMDP.cpp.

References env, mic::environments::MNISTDigit::getAgentPosition(), mic::environments::MNISTDigit::getObservation(), mic::environments::MNISTDigit::initializeEnvironment(), and saccadic_path.

Member Data Documentation

size_t mic::application::MNISTDigitDLRERPOMDP::batch_size
private

Size of the batch in experience replay - set to the size of maze (width*height).

Definition at line 115 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().

mic::utils::DataCollectorPtr<std::string, float> mic::application::MNISTDigitDLRERPOMDP::collector_ptr
private

Data collector.

Definition at line 100 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by finishCurrentEpisode(), and initialize().

mic::configuration::Property<float> mic::application::MNISTDigitDLRERPOMDP::discount_rate
private

Property: future discount (should be in range 0.0-1.0).

Definition at line 125 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by MNISTDigitDLRERPOMDP(), and performSingleStep().

mic::configuration::Property<double> mic::application::MNISTDigitDLRERPOMDP::epsilon
private

Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). if epsilon < 0 then if will be set to 1/episode, hence change dynamically depending on the episode number.

Definition at line 136 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by MNISTDigitDLRERPOMDP(), and performSingleStep().

SpatialExperienceMemory mic::application::MNISTDigitDLRERPOMDP::experiences
private

Table of past experiences.

Definition at line 194 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

mic::configuration::Property<float> mic::application::MNISTDigitDLRERPOMDP::learning_rate
private

Property: neural network learning rate (should be in range 0.0-1.0).

Definition at line 130 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by MNISTDigitDLRERPOMDP(), and performSingleStep().

mic::configuration::Property<std::string> mic::application::MNISTDigitDLRERPOMDP::mlnn_filename
private

Property: name of the file to which the neural network will be serialized (or deserialized from).

Definition at line 147 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by finishCurrentEpisode(), initializePropertyDependentVariables(), and MNISTDigitDLRERPOMDP().

mic::configuration::Property<bool> mic::application::MNISTDigitDLRERPOMDP::mlnn_load
private

Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task).

Definition at line 153 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by initializePropertyDependentVariables(), and MNISTDigitDLRERPOMDP().

mic::configuration::Property<bool> mic::application::MNISTDigitDLRERPOMDP::mlnn_save
private

Property: flad denoting thether the nn should be saved to a file (after every episode end).

Definition at line 150 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by finishCurrentEpisode(), and MNISTDigitDLRERPOMDP().

BackpropagationNeuralNetwork<float> mic::application::MNISTDigitDLRERPOMDP::neural_net
private

Multi-layer neural network used for approximation of the Qstate rewards.

Definition at line 156 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().

std::shared_ptr<std::vector <mic::types::Position2D> > mic::application::MNISTDigitDLRERPOMDP::saccadic_path
private

Saccadic path - a sequence of consecutive agent positions.

Definition at line 112 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by initializePropertyDependentVariables(), performSingleStep(), and startNewEpisode().

mic::configuration::Property<std::string> mic::application::MNISTDigitDLRERPOMDP::statistics_filename
private

Property: name of the file to which the statistics will be exported.

Definition at line 144 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by finishCurrentEpisode(), and MNISTDigitDLRERPOMDP().

mic::configuration::Property<int> mic::application::MNISTDigitDLRERPOMDP::step_limit
private

Limit of steps for episode. Setting step_limit <= 0 means that the limit should not be considered.

Definition at line 141 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by MNISTDigitDLRERPOMDP(), and performSingleStep().

mic::configuration::Property<float> mic::application::MNISTDigitDLRERPOMDP::step_reward
private

Property: the "expected intermediate reward", i.e. reward received by performing each step (typically negative, but can be positive as all).

Definition at line 120 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by MNISTDigitDLRERPOMDP(), and performSingleStep().

long long mic::application::MNISTDigitDLRERPOMDP::sum_of_iterations
private

Sum of all iterations made till now - used in statistics.

Definition at line 189 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by finishCurrentEpisode(), and initialize().

WindowCollectorChart<float>* mic::application::MNISTDigitDLRERPOMDP::w_chart
private

Window for displaying statistics.

Definition at line 97 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by initialize(), and ~MNISTDigitDLRERPOMDP().

WindowMNISTDigit* mic::application::MNISTDigitDLRERPOMDP::wmd_environment
private

Window displaying the whole environment.

Definition at line 103 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by initializePropertyDependentVariables(), and ~MNISTDigitDLRERPOMDP().

WindowMNISTDigit* mic::application::MNISTDigitDLRERPOMDP::wmd_observation
private

Window displaying the observation.

Definition at line 105 of file MNISTDigitDLRERPOMDP.hpp.

Referenced by initializePropertyDependentVariables(), and ~MNISTDigitDLRERPOMDP().


The documentation for this class was generated from the following files: