MachineIntelligenceCore:ReinforcementLearning
|
Class responsible for solving the gridworld problem with Q-learning. More...
#include <GridworldQLearning.hpp>
Public Member Functions | |
GridworldQLearning (std::string node_name_="application") | |
virtual | ~GridworldQLearning () |
Protected Member Functions | |
virtual void | initialize (int argc, char *argv[]) |
virtual void | initializePropertyDependentVariables () |
virtual bool | performSingleStep () |
virtual void | startNewEpisode () |
virtual void | finishCurrentEpisode () |
Private Member Functions | |
std::string | streamQStateTable () |
float | computeBestValue (mic::types::Position2D pos_) |
mic::types::NESWAction | selectBestAction (mic::types::Position2D pos_) |
Private Attributes | |
WindowCollectorChart< float > * | w_chart |
Window for displaying ???. More... | |
mic::utils::DataCollectorPtr < std::string, float > | collector_ptr |
Data collector. More... | |
mic::environments::Gridworld | grid_env |
The gridworld object. More... | |
mic::types::TensorXf | qstate_table |
Tensor storing values for all states (gridworld w * h * 4 (number of actions)). COL MAJOR(!). More... | |
mic::configuration::Property < float > | step_reward |
mic::configuration::Property < float > | discount_rate |
mic::configuration::Property < float > | learning_rate |
mic::configuration::Property < float > | move_noise |
mic::configuration::Property < double > | epsilon |
mic::configuration::Property < std::string > | statistics_filename |
Property: name of the file to which the statistics will be exported. More... | |
long long | sum_of_iterations |
long long | sum_of_rewards |
Class responsible for solving the gridworld problem with Q-learning.
Definition at line 44 of file GridworldQLearning.hpp.
mic::application::GridworldQLearning::GridworldQLearning | ( | std::string | node_name_ = "application" | ) |
Default Constructor. Sets the application/node name, default values of variables, initializes classifier etc.
node_name_ | Name of the application/node (in configuration file). |
Definition at line 41 of file GridworldQLearning.cpp.
References discount_rate, epsilon, learning_rate, move_noise, statistics_filename, and step_reward.
|
virtual |
|
private |
Calculates the best value for given state - by finding the action having the maximal expected value.
pos_ | Starting state (position). |
Definition at line 181 of file GridworldQLearning.cpp.
References grid_env, mic::environments::Environment::isActionAllowed(), mic::environments::Gridworld::isStateAllowed(), and qstate_table.
Referenced by performSingleStep().
|
protectedvirtual |
Method called when given episode ends (goal: export collected statistics to file etc.) - abstract, to be overridden.
Definition at line 112 of file GridworldQLearning.cpp.
References collector_ptr, mic::environments::Gridworld::getAgentPosition(), mic::environments::Gridworld::getStateReward(), grid_env, statistics_filename, sum_of_iterations, and sum_of_rewards.
|
protectedvirtual |
Method initializes GLUT and OpenGL windows.
argc | Number of application parameters. |
argv | Array of application parameters. |
Definition at line 67 of file GridworldQLearning.cpp.
References collector_ptr, sum_of_iterations, sum_of_rewards, and w_chart.
|
protectedvirtual |
Initializes all variables that are property-dependent.
Definition at line 87 of file GridworldQLearning.cpp.
References mic::environments::Environment::getEnvironmentHeight(), mic::environments::Environment::getEnvironmentWidth(), grid_env, mic::environments::Gridworld::initializeEnvironment(), qstate_table, and streamQStateTable().
|
protectedvirtual |
Performs single step of computations.
Definition at line 236 of file GridworldQLearning.cpp.
References computeBestValue(), discount_rate, mic::environments::Gridworld::environmentToString(), epsilon, mic::environments::Gridworld::getAgentPosition(), mic::environments::Gridworld::getStateReward(), grid_env, mic::environments::Gridworld::isStateTerminal(), learning_rate, mic::environments::Environment::moveAgent(), qstate_table, selectBestAction(), step_reward, and streamQStateTable().
|
private |
Finds the best action.
Definition at line 206 of file GridworldQLearning.cpp.
References grid_env, mic::environments::Environment::isActionAllowed(), and qstate_table.
Referenced by performSingleStep().
|
protectedvirtual |
Method called at the beginning of new episode (goal: to reset the statistics etc.) - abstract, to be overridden.
Definition at line 100 of file GridworldQLearning.cpp.
References mic::environments::Gridworld::environmentToString(), grid_env, mic::environments::Gridworld::initializeEnvironment(), and streamQStateTable().
|
private |
Steams the current state of the state-action values.
Definition at line 132 of file GridworldQLearning.cpp.
References mic::environments::Environment::getEnvironmentHeight(), mic::environments::Environment::getEnvironmentWidth(), grid_env, mic::environments::Environment::isActionAllowed(), mic::environments::Gridworld::isStateAllowed(), mic::environments::Gridworld::isStateTerminal(), and qstate_table.
Referenced by initializePropertyDependentVariables(), performSingleStep(), and startNewEpisode().
|
private |
Data collector.
Definition at line 93 of file GridworldQLearning.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Property: future discount (should be in range 0.0-1.0).
Definition at line 109 of file GridworldQLearning.hpp.
Referenced by GridworldQLearning(), and performSingleStep().
|
private |
Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). if epsilon < 0 then if will be set to 1/episode, hence change dynamically depending on the episode number.
Definition at line 125 of file GridworldQLearning.hpp.
Referenced by GridworldQLearning(), and performSingleStep().
|
private |
The gridworld object.
Definition at line 96 of file GridworldQLearning.hpp.
Referenced by computeBestValue(), finishCurrentEpisode(), initializePropertyDependentVariables(), performSingleStep(), selectBestAction(), startNewEpisode(), and streamQStateTable().
|
private |
Property: learning rate (should be in range 0.0-1.0).
Definition at line 114 of file GridworldQLearning.hpp.
Referenced by GridworldQLearning(), and performSingleStep().
|
private |
Property: move noise, determining gow often action results in unintended direction.
Definition at line 119 of file GridworldQLearning.hpp.
Referenced by GridworldQLearning().
|
private |
Tensor storing values for all states (gridworld w * h * 4 (number of actions)). COL MAJOR(!).
Definition at line 99 of file GridworldQLearning.hpp.
Referenced by computeBestValue(), initializePropertyDependentVariables(), performSingleStep(), selectBestAction(), and streamQStateTable().
|
private |
Property: name of the file to which the statistics will be exported.
Definition at line 128 of file GridworldQLearning.hpp.
Referenced by finishCurrentEpisode(), and GridworldQLearning().
|
private |
Property: the "expected intermediate reward", i.e. reward received by performing each step (typically negative, but can be positive as all).
Definition at line 104 of file GridworldQLearning.hpp.
Referenced by GridworldQLearning(), and performSingleStep().
|
private |
Sum of all iterations made till now - used in statistics.
Definition at line 155 of file GridworldQLearning.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Sum of all rewards collected till now - used in statistics.
Definition at line 160 of file GridworldQLearning.hpp.
Referenced by finishCurrentEpisode(), and initialize().
|
private |
Window for displaying ???.
Definition at line 90 of file GridworldQLearning.hpp.
Referenced by initialize(), and ~GridworldQLearning().