Application of Partially Observable Deep Q-learning with Experience Reply to the maze of digits problem. There is an assumption that the agent observes only part of the environment (POMPD). More...

#include <MazeOfDigitsDLRERPOMPD.hpp>

Inheritance diagram for mic::application::MazeOfDigitsDLRERPOMPD:

[legend]

Collaboration diagram for mic::application::MazeOfDigitsDLRERPOMPD:

[legend]

Public Member Functions
	MazeOfDigitsDLRERPOMPD (std::string node_name_="application")

virtual	~MazeOfDigitsDLRERPOMPD ()

Protected Member Functions
virtual void	initialize (int argc, char *argv[])

virtual void	initializePropertyDependentVariables ()

virtual bool	performSingleStep ()

virtual void	startNewEpisode ()

virtual void	finishCurrentEpisode ()

Private Member Functions
float	computeBestValueForGivenStateAndPredictions (mic::types::Position2D player_position_, float *predictions_)

mic::types::MatrixXfPtr	getPredictedRewardsForGivenState (mic::types::Position2D player_position_)

mic::types::NESWAction	selectBestActionForGivenState (mic::types::Position2D player_position_)

std::string	streamNetworkResponseTable ()

Private Attributes
WindowCollectorChart< float > *	w_chart
	Window for displaying statistics. More...

mic::utils::DataCollectorPtr < std::string, float >	collector_ptr
	Data collector. More...

WindowMazeOfDigits *	wmd_environment
	Window displaying the whole environment. More...

WindowMazeOfDigits *	wmd_observation
	Window displaying the observation. More...

mic::environments::MazeOfDigits	env
	The maze of digits environment. More...

std::shared_ptr< std::vector < mic::types::Position2D > >	saccadic_path
	Saccadic path - a sequence of consecutive agent positions. More...

size_t	batch_size
	Size of the batch in experience replay - set to the size of maze (width*height). More...

mic::configuration::Property < float >	step_reward

mic::configuration::Property < float >	discount_rate

mic::configuration::Property < float >	learning_rate

mic::configuration::Property < double >	epsilon

mic::configuration::Property< int >	step_limit

mic::configuration::Property < std::string >	statistics_filename
	Property: name of the file to which the statistics will be exported. More...

mic::configuration::Property < std::string >	mlnn_filename
	Property: name of the file to which the neural network will be serialized (or deserialized from). More...

mic::configuration::Property < bool >	mlnn_save
	Property: flad denoting thether the nn should be saved to a file (after every episode end). More...

mic::configuration::Property < bool >	mlnn_load
	Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task). More...

BackpropagationNeuralNetwork < float >	neural_net
	Multi-layer neural network used for approximation of the Qstate rewards. More...

long long	sum_of_iterations

double	sum_of_opt_to_episodic_lenghts

SpatialExperienceMemory	experiences

Detailed Description

Application of Partially Observable Deep Q-learning with Experience Reply to the maze of digits problem. There is an assumption that the agent observes only part of the environment (POMPD).

Author: tkornuta

Definition at line 51 of file MazeOfDigitsDLRERPOMPD.hpp.

Constructor & Destructor Documentation

mic::application::MazeOfDigitsDLRERPOMPD::MazeOfDigitsDLRERPOMPD ( std::string node_name_ = "application" )

Default Constructor. Sets the application/node name, default values of variables etc.

Parameters

node_name_ Name of the application/node (in configuration file).

Definition at line 40 of file MazeOfDigitsDLRERPOMPD.cpp.

References discount_rate, epsilon, learning_rate, mlnn_filename, mlnn_load, mlnn_save, statistics_filename, step_limit, and step_reward.

mic::application::MazeOfDigitsDLRERPOMPD::~MazeOfDigitsDLRERPOMPD ( )

virtual

Destructor.

Definition at line 68 of file MazeOfDigitsDLRERPOMPD.cpp.

References w_chart, wmd_environment, and wmd_observation.

Member Function Documentation

float mic::application::MazeOfDigitsDLRERPOMPD::computeBestValueForGivenStateAndPredictions	(	mic::types::Position2D	player_position_,
		float *	predictions_
	)

private

Calculates the best value for the current state and predictions.

Parameters

player_position_	State (player position).
predictions_	Vector of predictions to be analyzed.

Returns: Value of the best possible action for given state.

Definition at line 267 of file MazeOfDigitsDLRERPOMPD.cpp.

References env, and mic::environments::Environment::isActionAllowed().

Referenced by performSingleStep().

void mic::application::MazeOfDigitsDLRERPOMPD::finishCurrentEpisode ( )

protectedvirtual

Method called when given episode ends (goal: export collected statistics to file etc.) - abstract, to be overridden.

Definition at line 158 of file MazeOfDigitsDLRERPOMPD.cpp.

References collector_ptr, env, mlnn_filename, mlnn_save, neural_net, mic::environments::MazeOfDigits::optimalPathLength(), statistics_filename, sum_of_iterations, and sum_of_opt_to_episodic_lenghts.

mic::types::MatrixXfPtr mic::application::MazeOfDigitsDLRERPOMPD::getPredictedRewardsForGivenState ( mic::types::Position2D player_position_ )

private

Returns the predicted rewards for given state.

Parameters

player_position_ State (player position).

Returns: Pointer to the predicted rewards (network output matrix).

Definition at line 291 of file MazeOfDigitsDLRERPOMPD.cpp.

References batch_size, mic::environments::MazeOfDigits::encodeObservation(), env, mic::environments::MazeOfDigits::getAgentPosition(), mic::environments::MazeOfDigits::getObservationSize(), mic::environments::MazeOfDigits::moveAgentToPosition(), and neural_net.

Referenced by selectBestActionForGivenState().

void mic::application::MazeOfDigitsDLRERPOMPD::initialize	(	int	argc,
		char *	argv[]
	)

protectedvirtual

Method initializes GLUT and OpenGL windows.

Parameters

argc	Number of application parameters.
argv	Array of application parameters.

Definition at line 75 of file MazeOfDigitsDLRERPOMPD.cpp.

References collector_ptr, sum_of_iterations, sum_of_opt_to_episodic_lenghts, and w_chart.

void mic::application::MazeOfDigitsDLRERPOMPD::initializePropertyDependentVariables ( )

protectedvirtual

Initializes all variables that are property-dependent.

Definition at line 96 of file MazeOfDigitsDLRERPOMPD.cpp.

References batch_size, env, experiences, mic::environments::Environment::getEnvironment(), mic::environments::Environment::getEnvironmentHeight(), mic::environments::Environment::getEnvironmentWidth(), mic::environments::MazeOfDigits::getObservation(), mic::environments::Environment::getObservationHeight(), mic::environments::MazeOfDigits::getObservationSize(), mic::environments::Environment::getObservationWidth(), mic::environments::MazeOfDigits::initializeEnvironment(), mlnn_filename, mlnn_load, neural_net, saccadic_path, wmd_environment, and wmd_observation.

bool mic::application::MazeOfDigitsDLRERPOMPD::performSingleStep ( )

protectedvirtual

Performs single step of computations.

Definition at line 364 of file MazeOfDigitsDLRERPOMPD.cpp.

mic::types::NESWAction mic::application::MazeOfDigitsDLRERPOMPD::selectBestActionForGivenState ( mic::types::Position2D player_position_ )

private

Finds the best action for the current state.

Parameters

player_position_ State (player position).

Returns: The best action found.

Definition at line 331 of file MazeOfDigitsDLRERPOMPD.cpp.

References env, getPredictedRewardsForGivenState(), and mic::environments::Environment::isActionAllowed().

Referenced by performSingleStep().

void mic::application::MazeOfDigitsDLRERPOMPD::startNewEpisode ( )

protectedvirtual

Method called at the beginning of new episode (goal: to reset the statistics etc.) - abstract, to be overridden.

Definition at line 141 of file MazeOfDigitsDLRERPOMPD.cpp.

References env, mic::environments::MazeOfDigits::getAgentPosition(), mic::environments::MazeOfDigits::getObservation(), mic::environments::MazeOfDigits::initializeEnvironment(), and saccadic_path.

std::string mic::application::MazeOfDigitsDLRERPOMPD::streamNetworkResponseTable ( )

private

Steams the current network response - values of actions associates with consecutive agent poses.

Returns: Ostream with description of the state-action table.

Definition at line 182 of file MazeOfDigitsDLRERPOMPD.cpp.

Referenced by performSingleStep().

Member Data Documentation

size_t mic::application::MazeOfDigitsDLRERPOMPD::batch_size

private

Size of the batch in experience replay - set to the size of maze (width*height).

Definition at line 115 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().

mic::utils::DataCollectorPtr<std::string, float> mic::application::MazeOfDigitsDLRERPOMPD::collector_ptr

private

Data collector.

Definition at line 100 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and initialize().

mic::configuration::Property<float> mic::application::MazeOfDigitsDLRERPOMPD::discount_rate

private

Property: future discount (should be in range 0.0-1.0).

Definition at line 125 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

mic::environments::MazeOfDigits mic::application::MazeOfDigitsDLRERPOMPD::env

private

The maze of digits environment.

Definition at line 109 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by computeBestValueForGivenStateAndPredictions(), finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), selectBestActionForGivenState(), startNewEpisode(), and streamNetworkResponseTable().

mic::configuration::Property<double> mic::application::MazeOfDigitsDLRERPOMPD::epsilon

private

Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). if epsilon < 0 then if will be set to 1/episode, hence change dynamically depending on the episode number.

Definition at line 136 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

SpatialExperienceMemory mic::application::MazeOfDigitsDLRERPOMPD::experiences

private

Table of past experiences.

Definition at line 199 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

mic::configuration::Property<float> mic::application::MazeOfDigitsDLRERPOMPD::learning_rate

private

Property: neural network learning rate (should be in range 0.0-1.0).

Definition at line 130 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

mic::configuration::Property<std::string> mic::application::MazeOfDigitsDLRERPOMPD::mlnn_filename

private

Property: name of the file to which the neural network will be serialized (or deserialized from).

Definition at line 147 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), initializePropertyDependentVariables(), and MazeOfDigitsDLRERPOMPD().

mic::configuration::Property<bool> mic::application::MazeOfDigitsDLRERPOMPD::mlnn_load

private

Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task).

Definition at line 153 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), and MazeOfDigitsDLRERPOMPD().

mic::configuration::Property<bool> mic::application::MazeOfDigitsDLRERPOMPD::mlnn_save

private

Property: flad denoting thether the nn should be saved to a file (after every episode end).

Definition at line 150 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and MazeOfDigitsDLRERPOMPD().

BackpropagationNeuralNetwork<float> mic::application::MazeOfDigitsDLRERPOMPD::neural_net

private

Multi-layer neural network used for approximation of the Qstate rewards.

Definition at line 156 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().

std::shared_ptr<std::vector <mic::types::Position2D> > mic::application::MazeOfDigitsDLRERPOMPD::saccadic_path

private

Saccadic path - a sequence of consecutive agent positions.

Definition at line 112 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), performSingleStep(), and startNewEpisode().

mic::configuration::Property<std::string> mic::application::MazeOfDigitsDLRERPOMPD::statistics_filename

private

Property: name of the file to which the statistics will be exported.

Definition at line 144 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and MazeOfDigitsDLRERPOMPD().

mic::configuration::Property<int> mic::application::MazeOfDigitsDLRERPOMPD::step_limit

private

Limit of steps for episode. Setting step_limit <= 0 means that the limit should not be considered.

Definition at line 141 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

mic::configuration::Property<float> mic::application::MazeOfDigitsDLRERPOMPD::step_reward

private

Property: the "expected intermediate reward", i.e. reward received by performing each step (typically negative, but can be positive as all).

Definition at line 120 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by MazeOfDigitsDLRERPOMPD(), and performSingleStep().

long long mic::application::MazeOfDigitsDLRERPOMPD::sum_of_iterations

private

Sum of all iterations made till now - used in statistics.

Definition at line 189 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and initialize().

double mic::application::MazeOfDigitsDLRERPOMPD::sum_of_opt_to_episodic_lenghts

private

Sum of optimal to episodic path lengths.

Definition at line 194 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by finishCurrentEpisode(), and initialize().

WindowCollectorChart<float>* mic::application::MazeOfDigitsDLRERPOMPD::w_chart

private

Window for displaying statistics.

Definition at line 97 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initialize(), and ~MazeOfDigitsDLRERPOMPD().

WindowMazeOfDigits* mic::application::MazeOfDigitsDLRERPOMPD::wmd_environment

private

Window displaying the whole environment.

Definition at line 103 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), and ~MazeOfDigitsDLRERPOMPD().

WindowMazeOfDigits* mic::application::MazeOfDigitsDLRERPOMPD::wmd_observation

private

Window displaying the observation.

Definition at line 105 of file MazeOfDigitsDLRERPOMPD.hpp.

Referenced by initializePropertyDependentVariables(), and ~MazeOfDigitsDLRERPOMPD().

The documentation for this class was generated from the following files:

src/application/MazeOfDigitsDLRERPOMPD.hpp
src/application/MazeOfDigitsDLRERPOMPD.cpp

Public Member Functions

Protected Member Functions

Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation