MachineIntelligenceCore:ReinforcementLearning
|
Class responsible for solving the gridworld problem with Q-learning, neural network used for approximation of the rewards and experience replay using for (batch) training of the neural network. More...
#include <GridworldDRLExperienceReplay.hpp>
Public Member Functions | |
GridworldDRLExperienceReplay (std::string node_name_="application") | |
virtual | ~GridworldDRLExperienceReplay () |
Protected Member Functions | |
virtual void | initialize (int argc, char *argv[]) |
virtual void | initializePropertyDependentVariables () |
virtual bool | performSingleStep () |
virtual void | startNewEpisode () |
virtual void | finishCurrentEpisode () |
Private Member Functions | |
float | computeBestValueForGivenStateAndPredictions (mic::types::Position2D player_position_, float *predictions_) |
mic::types::MatrixXfPtr | getPredictedRewardsForGivenState (mic::types::Position2D player_position_) |
mic::types::NESWAction | selectBestActionForGivenState (mic::types::Position2D player_position_) |
std::string | streamNetworkResponseTable () |
Private Attributes | |
WindowCollectorChart< float > * | w_chart |
Window for displaying statistics. More... | |
mic::utils::DataCollectorPtr < std::string, float > | collector_ptr |
Data collector. More... | |
mic::environments::Gridworld | grid_env |
The gridworld environment. More... | |
size_t | batch_size |
Size of the batch in experience replay - set to the size of maze (width*height). More... | |
mic::configuration::Property < float > | step_reward |
mic::configuration::Property < float > | discount_rate |
mic::configuration::Property < float > | learning_rate |
mic::configuration::Property < double > | epsilon |
mic::configuration::Property < std::string > | statistics_filename |
Property: name of the file to which the statistics will be exported. More... | |
mic::configuration::Property < std::string > | mlnn_filename |
Property: name of the file to which the neural network will be serialized (or deserialized from). More... | |
mic::configuration::Property < bool > | mlnn_save |
Property: flad denoting thether the nn should be saved to a file (after every episode end). More... | |
mic::configuration::Property < bool > | mlnn_load |
Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task). More... | |
BackpropagationNeuralNetwork < float > | neural_net |
Multi-layer neural network used for approximation of the Qstate rewards. More... | |
long long | sum_of_iterations |
long long | sum_of_rewards |
long long | number_of_successes |
SpatialExperienceMemory | experiences |
Class responsible for solving the gridworld problem with Q-learning, neural network used for approximation of the rewards and experience replay using for (batch) training of the neural network.
Definition at line 49 of file GridworldDRLExperienceReplay.hpp.
mic::application::GridworldDRLExperienceReplay::GridworldDRLExperienceReplay | ( | std::string | node_name_ = "application" | ) |
Default Constructor. Sets the application/node name, default values of variables, initializes classifier etc.
node_name_ | Name of the application/node (in configuration file). |
Definition at line 39 of file GridworldDRLExperienceReplay.cpp.
References discount_rate, epsilon, learning_rate, mlnn_filename, mlnn_load, mlnn_save, statistics_filename, and step_reward.
|
virtual |
|
private |
Calculates the best value for the current state and predictions.
player_position_ | State (player position). |
predictions_ | Vector of predictions to be analyzed. |
Definition at line 239 of file GridworldDRLExperienceReplay.cpp.
References grid_env, and mic::environments::Environment::isActionAllowed().
Referenced by performSingleStep().
|
protectedvirtual |
Method called when given episode ends (goal: export collected statistics to file etc.) - abstract, to be overridden.
Definition at line 136 of file GridworldDRLExperienceReplay.cpp.
References collector_ptr, mic::environments::Gridworld::getAgentPosition(), mic::environments::Gridworld::getStateReward(), grid_env, mlnn_filename, mlnn_save, neural_net, number_of_successes, statistics_filename, sum_of_iterations, and sum_of_rewards.
|
private |
Returns the predicted rewards for given state.
player_position_ | State (player position). |
Definition at line 263 of file GridworldDRLExperienceReplay.cpp.
References batch_size, mic::environments::Gridworld::encodeEnvironment(), mic::environments::Gridworld::getAgentPosition(), mic::environments::Environment::getEnvironmentSize(), grid_env, mic::environments::Gridworld::moveAgentToPosition(), and neural_net.
Referenced by selectBestActionForGivenState().
|
protectedvirtual |
Method initializes GLUT and OpenGL windows.
argc | Number of application parameters. |
argv | Array of application parameters. |
Definition at line 69 of file GridworldDRLExperienceReplay.cpp.
References collector_ptr, number_of_successes, sum_of_iterations, sum_of_rewards, and w_chart.
|
protectedvirtual |
Initializes all variables that are property-dependent.
Definition at line 91 of file GridworldDRLExperienceReplay.cpp.
References batch_size, experiences, mic::environments::Environment::getEnvironmentHeight(), mic::environments::Environment::getEnvironmentSize(), mic::environments::Environment::getEnvironmentWidth(), grid_env, mic::environments::Gridworld::initializeEnvironment(), mlnn_filename, mlnn_load, and neural_net.
|
protectedvirtual |
Performs single step of computations.
Definition at line 336 of file GridworldDRLExperienceReplay.cpp.
References mic::types::SpatialExperienceMemory::add(), batch_size, computeBestValueForGivenStateAndPredictions(), discount_rate, mic::environments::Gridworld::encodeEnvironment(), mic::environments::Gridworld::environmentToString(), epsilon, experiences, mic::environments::Gridworld::getAgentPosition(), mic::environments::Environment::getEnvironmentSize(), mic::environments::Gridworld::getStateReward(), grid_env, mic::environments::Gridworld::isStateTerminal(), learning_rate, mic::environments::Environment::moveAgent(), mic::environments::Gridworld::moveAgentToPosition(), neural_net, selectBestActionForGivenState(), step_reward, and streamNetworkResponseTable().
|
private |
Finds the best action for the current state.
player_position_ | State (player position). |
Definition at line 303 of file GridworldDRLExperienceReplay.cpp.
References getPredictedRewardsForGivenState(), grid_env, and mic::environments::Environment::isActionAllowed().
Referenced by performSingleStep().
|
protectedvirtual |
Method called at the beginning of new episode (goal: to reset the statistics etc.) - abstract, to be overridden.
Definition at line 125 of file GridworldDRLExperienceReplay.cpp.
References mic::environments::Gridworld::environmentToString(), grid_env, mic::environments::Gridworld::initializeEnvironment(), and streamNetworkResponseTable().
|
private |
Steams the current network response - values of actions associates with consecutive agent poses.
Definition at line 163 of file GridworldDRLExperienceReplay.cpp.
References batch_size, mic::environments::Gridworld::encodeEnvironment(), mic::environments::Gridworld::getAgentPosition(), mic::environments::Environment::getEnvironmentHeight(), mic::environments::Environment::getEnvironmentSize(), mic::environments::Environment::getEnvironmentWidth(), grid_env, mic::environments::Environment::isActionAllowed(), mic::environments::Gridworld::isStateAllowed(), mic::environments::Gridworld::isStateTerminal(), mic::environments::Gridworld::moveAgentToPosition(), and neural_net.
Referenced by performSingleStep(), and startNewEpisode().
|
private |
Size of the batch in experience replay - set to the size of maze (width*height).
Definition at line 104 of file GridworldDRLExperienceReplay.hpp.
Referenced by getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().
|
private |
Data collector.
Definition at line 98 of file GridworldDRLExperienceReplay.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Property: future discount (should be in range 0.0-1.0).
Definition at line 114 of file GridworldDRLExperienceReplay.hpp.
Referenced by GridworldDRLExperienceReplay(), and performSingleStep().
|
private |
Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). if epsilon < 0 then if will be set to 1/episode, hence change dynamically depending on the episode number.
Definition at line 125 of file GridworldDRLExperienceReplay.hpp.
Referenced by GridworldDRLExperienceReplay(), and performSingleStep().
|
private |
Table of past experiences.
Definition at line 188 of file GridworldDRLExperienceReplay.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
The gridworld environment.
Definition at line 101 of file GridworldDRLExperienceReplay.hpp.
Referenced by computeBestValueForGivenStateAndPredictions(), finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), selectBestActionForGivenState(), startNewEpisode(), and streamNetworkResponseTable().
|
private |
Property: neural network learning rate (should be in range 0.0-1.0).
Definition at line 119 of file GridworldDRLExperienceReplay.hpp.
Referenced by GridworldDRLExperienceReplay(), and performSingleStep().
|
private |
Property: name of the file to which the neural network will be serialized (or deserialized from).
Definition at line 131 of file GridworldDRLExperienceReplay.hpp.
Referenced by finishCurrentEpisode(), GridworldDRLExperienceReplay(), and initializePropertyDependentVariables().
|
private |
Property: flad denoting thether the nn should be loaded from a file (at the initialization of the task).
Definition at line 137 of file GridworldDRLExperienceReplay.hpp.
Referenced by GridworldDRLExperienceReplay(), and initializePropertyDependentVariables().
|
private |
Property: flad denoting thether the nn should be saved to a file (after every episode end).
Definition at line 134 of file GridworldDRLExperienceReplay.hpp.
Referenced by finishCurrentEpisode(), and GridworldDRLExperienceReplay().
|
private |
Multi-layer neural network used for approximation of the Qstate rewards.
Definition at line 140 of file GridworldDRLExperienceReplay.hpp.
Referenced by finishCurrentEpisode(), getPredictedRewardsForGivenState(), initializePropertyDependentVariables(), performSingleStep(), and streamNetworkResponseTable().
|
private |
Number of successes, i.e. how many times we reached goal till now - used in statistics.
Definition at line 183 of file GridworldDRLExperienceReplay.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Property: name of the file to which the statistics will be exported.
Definition at line 128 of file GridworldDRLExperienceReplay.hpp.
Referenced by finishCurrentEpisode(), and GridworldDRLExperienceReplay().
|
private |
Property: the "expected intermediate reward", i.e. reward received by performing each step (typically negative, but can be positive as all).
Definition at line 109 of file GridworldDRLExperienceReplay.hpp.
Referenced by GridworldDRLExperienceReplay(), and performSingleStep().
|
private |
Sum of all iterations made till now - used in statistics.
Definition at line 173 of file GridworldDRLExperienceReplay.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Sum of all rewards collected till now - used in statistics.
Definition at line 178 of file GridworldDRLExperienceReplay.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Window for displaying statistics.
Definition at line 95 of file GridworldDRLExperienceReplay.hpp.
Referenced by initialize(), and ~GridworldDRLExperienceReplay().