MachineIntelligenceCore:ReinforcementLearning
|
Class implementing a n-Armed Bandits problem solving the n armed bandits problem based on unlimited history action selection (storing all action-value pairs). More...
#include <nArmedBanditsUnlimitedHistory.hpp>
Public Member Functions | |
nArmedBanditsUnlimitedHistory (std::string node_name_="application") | |
virtual | ~nArmedBanditsUnlimitedHistory () |
Protected Member Functions | |
virtual void | initializePropertyDependentVariables () |
virtual void | initialize (int argc, char *argv[]) |
virtual bool | performSingleStep () |
Private Member Functions | |
short | calculateReward (float prob_) |
size_t | selectBestArm () |
Private Attributes | |
WindowCollectorChart< float > * | w_reward |
Window for displaying average reward. More... | |
mic::utils::DataCollectorPtr < std::string, float > | reward_collector_ptr |
Reward collector. More... | |
mic::types::VectorXf | arms |
n Bandit arms. More... | |
std::vector< std::pair< size_t, size_t > > | action_values |
Action values - pairs of <arm_number, reward>. More... | |
mic::configuration::Property < size_t > | number_of_bandits |
Property: number of bandits. More... | |
mic::configuration::Property < double > | epsilon |
Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). More... | |
mic::configuration::Property < std::string > | statistics_filename |
Property: name of the file to which the statistics will be exported. More... | |
size_t | best_arm |
float | best_arm_prob |
Class implementing a n-Armed Bandits problem solving the n armed bandits problem based on unlimited history action selection (storing all action-value pairs).
Definition at line 41 of file nArmedBanditsUnlimitedHistory.hpp.
mic::application::nArmedBanditsUnlimitedHistory::nArmedBanditsUnlimitedHistory | ( | std::string | node_name_ = "application" | ) |
Default Constructor. Sets the application/node name, default values of variables, initializes classifier etc.
node_name_ | Name of the application/node (in configuration file). |
Definition at line 38 of file nArmedBanditsUnlimitedHistory.cpp.
References epsilon, number_of_bandits, and statistics_filename.
|
virtual |
|
private |
Calculates the reward.
prob_ | Probability. |
Definition at line 96 of file nArmedBanditsUnlimitedHistory.cpp.
References number_of_bandits.
Referenced by performSingleStep().
|
protectedvirtual |
Method initializes GLUT and OpenGL windows.
argc | Number of application parameters. |
argv | Array of application parameters. |
Definition at line 58 of file nArmedBanditsUnlimitedHistory.cpp.
References reward_collector_ptr, and w_reward.
|
protectedvirtual |
Initializes all variables that are property-dependent.
Definition at line 74 of file nArmedBanditsUnlimitedHistory.cpp.
References action_values, arms, best_arm, best_arm_prob, and number_of_bandits.
|
protectedvirtual |
Performs single step of computations.
Definition at line 136 of file nArmedBanditsUnlimitedHistory.cpp.
References action_values, arms, best_arm, best_arm_prob, calculateReward(), epsilon, number_of_bandits, reward_collector_ptr, selectBestArm(), and statistics_filename.
|
private |
Greedy method that selects best arm based on historical action-value pairs.
Definition at line 106 of file nArmedBanditsUnlimitedHistory.cpp.
References action_values, and number_of_bandits.
Referenced by performSingleStep().
|
private |
Action values - pairs of <arm_number, reward>.
Definition at line 84 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by initializePropertyDependentVariables(), performSingleStep(), and selectBestArm().
|
private |
n Bandit arms.
Definition at line 81 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
The best arm (hidden state).
Definition at line 98 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
The best arm probability/"reward" (hidden state).
Definition at line 103 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected).
Definition at line 90 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by nArmedBanditsUnlimitedHistory(), and performSingleStep().
|
private |
Property: number of bandits.
Definition at line 87 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by calculateReward(), initializePropertyDependentVariables(), nArmedBanditsUnlimitedHistory(), performSingleStep(), and selectBestArm().
|
private |
Reward collector.
Definition at line 78 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by initialize(), and performSingleStep().
|
private |
Property: name of the file to which the statistics will be exported.
Definition at line 93 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by nArmedBanditsUnlimitedHistory(), and performSingleStep().
|
private |
Window for displaying average reward.
Definition at line 75 of file nArmedBanditsUnlimitedHistory.hpp.
Referenced by initialize(), and ~nArmedBanditsUnlimitedHistory().