MachineIntelligenceCore:ReinforcementLearning
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator
mic::application::GridworldDRLExperienceReplay Class Reference

Class responsible for solving the gridworld problem with Q-learning, neural network used for approximation of the rewards and experience replay using for (batch) training of the neural network. More...

#include <GridworldDRLExperienceReplay.hpp>

Inheritance diagram for mic::application::GridworldDRLExperienceReplay:
Collaboration diagram for mic::application::GridworldDRLExperienceReplay:

Public Member Functions

 GridworldDRLExperienceReplay (std::string node_name_="application")
 
virtual ~GridworldDRLExperienceReplay ()
 

Protected Member Functions

virtual void initialize (int argc, char *argv[])
 
virtual void initializePropertyDependentVariables ()
 
virtual bool performSingleStep ()
 
virtual void startNewEpisode ()
 
virtual void finishCurrentEpisode ()
 

Private Member Functions

float computeBestValueForGivenStateAndPredictions (mic::types::Position2D player_position_, float *predictions_)
 
mic::types::MatrixXfPtr getPredictedRewardsForGivenState (mic::types::Position2D player_position_)
 
mic::types::NESWAction selectBestActionForGivenState (mic::types::Position2D player_position_)
 
std::string streamNetworkResponseTable ()
 

Private Attributes

WindowCollectorChart< float > * w_chart
 Window for displaying statistics. More...
 
mic::utils::DataCollectorPtr
< std::string, float > 
collector_ptr
 Data collector. More...
 
mic::environments::Gridworld grid_env
 The gridworld environment. More...
 
size_t batch_size
 Size of the batch in experience replay - set to the size of maze (width*height). More...
 
mic::configuration::Property
< float > 
step_reward
 
mic::configuration::Property
< float > 
discount_rate
 
mic::configuration::Property
< float > 
learning_rate
 
mic::configuration::Property
< double > 
epsilon
 
mic::configuration::Property
< std::string > 
statistics_filename
 Property: name of the file to which the statistics will be exported. More...
 
mic::configuration::Property
< std::string > 
mlnn_filename
 Property: name of the file to which the neural network will be serialized (or deserialized from). More...
 
mic::configuration::Property
< bool > 
mlnn_save
 Property: flad denoting thether the nn should be saved to a file (after every episode end). More...
 
mic::configuration::Property
< bool > 
mlnn_load
 Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task). More...
 
BackpropagationNeuralNetwork
< float > 
neural_net
 Multi-layer neural network used for approximation of the Qstate rewards. More...
 
long long sum_of_iterations
 
long long sum_of_rewards
 
long long number_of_successes
 
SpatialExperienceMemory experiences
 

Detailed Description

Class responsible for solving the gridworld problem with Q-learning, neural network used for approximation of the rewards and experience replay using for (batch) training of the neural network.

Author
tkornuta

Definition at line 49 of file GridworldDRLExperienceReplay.hpp.

Constructor & Destructor Documentation

mic::application::GridworldDRLExperienceReplay::GridworldDRLExperienceReplay ( std::string  node_name_ = "application")

Default Constructor. Sets the application/node name, default values of variables, initializes classifier etc.

Parameters
node_name_Name of the application/node (in configuration file).

Definition at line 39 of file GridworldDRLExperienceReplay.cpp.

References discount_rate, epsilon, learning_rate, mlnn_filename, mlnn_load, mlnn_save, statistics_filename, and step_reward.

mic::application::GridworldDRLExperienceReplay::~GridworldDRLExperienceReplay ( )
virtual

Destructor.

Definition at line 64 of file GridworldDRLExperienceReplay.cpp.

References w_chart.

Member Function Documentation

float mic::application::GridworldDRLExperienceReplay::computeBestValueForGivenStateAndPredictions ( mic::types::Position2D  player_position_,
float *  predictions_ 
)
private

Calculates the best value for the current state and predictions.

Parameters
player_position_State (player position).
predictions_Vector of predictions to be analyzed.
Returns
Value of the best possible action for given state.

Definition at line 239 of file GridworldDRLExperienceReplay.cpp.

References grid_env, and mic::environments::Environment::isActionAllowed().

Referenced by performSingleStep().

void mic::application::GridworldDRLExperienceReplay::finishCurrentEpisode ( )
protectedvirtual

Method called when given episode ends (goal: export collected statistics to file etc.) - abstract, to be overridden.

Definition at line 136 of file GridworldDRLExperienceReplay.cpp.

References collector_ptr, mic::environments::Gridworld::getAgentPosition(), mic::environments::Gridworld::getStateReward(), grid_env, mlnn_filename, mlnn_save, neural_net, number_of_successes, statistics_filename, sum_of_iterations, and sum_of_rewards.

mic::types::MatrixXfPtr mic::application::GridworldDRLExperienceReplay::getPredictedRewardsForGivenState ( mic::types::Position2D  player_position_)
private

Returns the predicted rewards for given state.

Parameters
player_position_State (player position).
Returns
Pointer to the predicted rewards (network output matrix).

Definition at line 263 of file GridworldDRLExperienceReplay.cpp.

References batch_size, mic::environments::Gridworld::encodeEnvironment(), mic::environments::Gridworld::getAgentPosition(), mic::environments::Environment::getEnvironmentSize(), grid_env, mic::environments::Gridworld::moveAgentToPosition(), and neural_net.

Referenced by selectBestActionForGivenState().

void mic::application::GridworldDRLExperienceReplay::initialize ( int  argc,
char *  argv[] 
)
protectedvirtual

Method initializes GLUT and OpenGL windows.

Parameters
argcNumber of application parameters.
argvArray of application parameters.

Definition at line 69 of file GridworldDRLExperienceReplay.cpp.

References collector_ptr, number_of_successes, sum_of_iterations, sum_of_rewards, and w_chart.

void mic::application::GridworldDRLExperienceReplay::initializePropertyDependentVariables ( )
protectedvirtual
mic::types::NESWAction mic::application::GridworldDRLExperienceReplay::selectBestActionForGivenState ( mic::types::Position2D  player_position_)
private

Finds the best action for the current state.

Parameters
player_position_State (player position).
Returns
The best action found.

Definition at line 303 of file GridworldDRLExperienceReplay.cpp.

References getPredictedRewardsForGivenState(), grid_env, and mic::environments::Environment::isActionAllowed().

Referenced by performSingleStep().

void mic::application::GridworldDRLExperienceReplay::startNewEpisode ( )
protectedvirtual

Method called at the beginning of new episode (goal: to reset the statistics etc.) - abstract, to be overridden.

Definition at line 125 of file GridworldDRLExperienceReplay.cpp.

References mic::environments::Gridworld::environmentToString(), grid_env, mic::environments::Gridworld::initializeEnvironment(), and streamNetworkResponseTable().

Member Data Documentation

size_t mic::application::GridworldDRLExperienceReplay::batch_size
private

Size of the batch in experience replay - set to the size of maze (width*height).

Definition at line 104 of file GridworldDRLExperienceReplay.hpp.

Referenced by getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().

mic::utils::DataCollectorPtr<std::string, float> mic::application::GridworldDRLExperienceReplay::collector_ptr
private

Data collector.

Definition at line 98 of file GridworldDRLExperienceReplay.hpp.

Referenced by finishCurrentEpisode(), and initialize().

mic::configuration::Property<float> mic::application::GridworldDRLExperienceReplay::discount_rate
private

Property: future discount (should be in range 0.0-1.0).

Definition at line 114 of file GridworldDRLExperienceReplay.hpp.

Referenced by GridworldDRLExperienceReplay(), and performSingleStep().

mic::configuration::Property<double> mic::application::GridworldDRLExperienceReplay::epsilon
private

Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). if epsilon < 0 then if will be set to 1/episode, hence change dynamically depending on the episode number.

Definition at line 125 of file GridworldDRLExperienceReplay.hpp.

Referenced by GridworldDRLExperienceReplay(), and performSingleStep().

SpatialExperienceMemory mic::application::GridworldDRLExperienceReplay::experiences
private

Table of past experiences.

Definition at line 188 of file GridworldDRLExperienceReplay.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

mic::configuration::Property<float> mic::application::GridworldDRLExperienceReplay::learning_rate
private

Property: neural network learning rate (should be in range 0.0-1.0).

Definition at line 119 of file GridworldDRLExperienceReplay.hpp.

Referenced by GridworldDRLExperienceReplay(), and performSingleStep().

mic::configuration::Property<std::string> mic::application::GridworldDRLExperienceReplay::mlnn_filename
private

Property: name of the file to which the neural network will be serialized (or deserialized from).

Definition at line 131 of file GridworldDRLExperienceReplay.hpp.

Referenced by finishCurrentEpisode(), GridworldDRLExperienceReplay(), and initializePropertyDependentVariables().

mic::configuration::Property<bool> mic::application::GridworldDRLExperienceReplay::mlnn_load
private

Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task).

Definition at line 137 of file GridworldDRLExperienceReplay.hpp.

Referenced by GridworldDRLExperienceReplay(), and initializePropertyDependentVariables().

mic::configuration::Property<bool> mic::application::GridworldDRLExperienceReplay::mlnn_save
private

Property: flad denoting thether the nn should be saved to a file (after every episode end).

Definition at line 134 of file GridworldDRLExperienceReplay.hpp.

Referenced by finishCurrentEpisode(), and GridworldDRLExperienceReplay().

BackpropagationNeuralNetwork<float> mic::application::GridworldDRLExperienceReplay::neural_net
private

Multi-layer neural network used for approximation of the Qstate rewards.

Definition at line 140 of file GridworldDRLExperienceReplay.hpp.

Referenced by finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().

long long mic::application::GridworldDRLExperienceReplay::number_of_successes
private

Number of successes, i.e. how many times we reached goal till now - used in statistics.

Definition at line 183 of file GridworldDRLExperienceReplay.hpp.

Referenced by finishCurrentEpisode(), and initialize().

mic::configuration::Property<std::string> mic::application::GridworldDRLExperienceReplay::statistics_filename
private

Property: name of the file to which the statistics will be exported.

Definition at line 128 of file GridworldDRLExperienceReplay.hpp.

Referenced by finishCurrentEpisode(), and GridworldDRLExperienceReplay().

mic::configuration::Property<float> mic::application::GridworldDRLExperienceReplay::step_reward
private

Property: the "expected intermediate reward", i.e. reward received by performing each step (typically negative, but can be positive as all).

Definition at line 109 of file GridworldDRLExperienceReplay.hpp.

Referenced by GridworldDRLExperienceReplay(), and performSingleStep().

long long mic::application::GridworldDRLExperienceReplay::sum_of_iterations
private

Sum of all iterations made till now - used in statistics.

Definition at line 173 of file GridworldDRLExperienceReplay.hpp.

Referenced by finishCurrentEpisode(), and initialize().

long long mic::application::GridworldDRLExperienceReplay::sum_of_rewards
private

Sum of all rewards collected till now - used in statistics.

Definition at line 178 of file GridworldDRLExperienceReplay.hpp.

Referenced by finishCurrentEpisode(), and initialize().

WindowCollectorChart<float>* mic::application::GridworldDRLExperienceReplay::w_chart
private

Window for displaying statistics.

Definition at line 95 of file GridworldDRLExperienceReplay.hpp.

Referenced by initialize(), and ~GridworldDRLExperienceReplay().


The documentation for this class was generated from the following files: