MachineIntelligenceCore:ReinforcementLearning
|
Class implementing a n-Armed Bandits problem solving the n armed bandits problem using Softmax Action Selection. More...
#include <nArmedBanditsSofmax.hpp>
Public Member Functions | |
nArmedBanditsSofmax (std::string node_name_="application") | |
virtual | ~nArmedBanditsSofmax () |
Protected Member Functions | |
virtual void | initializePropertyDependentVariables () |
virtual void | initialize (int argc, char *argv[]) |
virtual bool | performSingleStep () |
Private Member Functions | |
short | calculateReward (float prob_) |
void | updateSoftmaxValues () |
Private Attributes | |
WindowCollectorChart< float > * | w_reward |
Window for displaying average reward. More... | |
mic::utils::DataCollectorPtr < std::string, float > | reward_collector_ptr |
Reward collector. More... | |
mic::types::VectorXf | arms |
n Bandit arms. More... | |
mic::types::VectorXf | action_values |
Action values. More... | |
mic::types::VectorXi | action_counts |
Counters storing how many times we've taken a particular action. More... | |
mic::types::VectorXf | action_values_softmax |
Action values - softmax. More... | |
mic::configuration::Property < size_t > | number_of_bandits |
Property: number of bandits. More... | |
mic::configuration::Property < double > | tau |
mic::configuration::Property < std::string > | statistics_filename |
Property: name of the file to which the statistics will be exported. More... | |
size_t | best_arm |
float | best_arm_prob |
Class implementing a n-Armed Bandits problem solving the n armed bandits problem using Softmax Action Selection.
Definition at line 41 of file nArmedBanditsSofmax.hpp.
mic::application::nArmedBanditsSofmax::nArmedBanditsSofmax | ( | std::string | node_name_ = "application" | ) |
Default Constructor. Sets the application/node name, default values of variables, initializes classifier etc.
node_name_ | Name of the application/node (in configuration file). |
Definition at line 41 of file nArmedBanditsSofmax.cpp.
References number_of_bandits, statistics_filename, and tau.
|
virtual |
|
private |
Calculates the reward.
prob_ | Probability. |
Definition at line 109 of file nArmedBanditsSofmax.cpp.
References number_of_bandits.
Referenced by performSingleStep().
|
protectedvirtual |
Method initializes GLUT and OpenGL windows.
argc | Number of application parameters. |
argv | Array of application parameters. |
Definition at line 61 of file nArmedBanditsSofmax.cpp.
References reward_collector_ptr, and w_reward.
|
protectedvirtual |
Initializes all variables that are property-dependent.
Definition at line 77 of file nArmedBanditsSofmax.cpp.
References action_counts, action_values, action_values_softmax, arms, best_arm, best_arm_prob, and number_of_bandits.
|
protectedvirtual |
Performs single step of computations.
Definition at line 136 of file nArmedBanditsSofmax.cpp.
References action_counts, action_values, action_values_softmax, arms, best_arm, best_arm_prob, calculateReward(), number_of_bandits, reward_collector_ptr, statistics_filename, and updateSoftmaxValues().
|
private |
Updates the softmax action-value table.
Definition at line 119 of file nArmedBanditsSofmax.cpp.
References action_values, action_values_softmax, number_of_bandits, and tau.
Referenced by performSingleStep().
|
private |
Counters storing how many times we've taken a particular action.
Definition at line 87 of file nArmedBanditsSofmax.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
Action values.
Definition at line 84 of file nArmedBanditsSofmax.hpp.
Referenced by initializePropertyDependentVariables(), performSingleStep(), and updateSoftmaxValues().
|
private |
Action values - softmax.
Definition at line 90 of file nArmedBanditsSofmax.hpp.
Referenced by initializePropertyDependentVariables(), performSingleStep(), and updateSoftmaxValues().
|
private |
n Bandit arms.
Definition at line 81 of file nArmedBanditsSofmax.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
The best arm (hidden state).
Definition at line 106 of file nArmedBanditsSofmax.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
The best arm probability/"reward" (hidden state).
Definition at line 111 of file nArmedBanditsSofmax.hpp.
Referenced by initializePropertyDependentVariables(), and performSingleStep().
|
private |
Property: number of bandits.
Definition at line 94 of file nArmedBanditsSofmax.hpp.
Referenced by calculateReward(), initializePropertyDependentVariables(), nArmedBanditsSofmax(), performSingleStep(), and updateSoftmaxValues().
|
private |
Reward collector.
Definition at line 78 of file nArmedBanditsSofmax.hpp.
Referenced by initialize(), and performSingleStep().
|
private |
Property: name of the file to which the statistics will be exported.
Definition at line 101 of file nArmedBanditsSofmax.hpp.
Referenced by nArmedBanditsSofmax(), and performSingleStep().
|
private |
Property: the softmax "heat" parameter, scaling the probability distribution of all actions. A high temperature will tend the probabilities to be very similar, whereas a low temperature will exaggerate differences in probabilities between actions.
Definition at line 98 of file nArmedBanditsSofmax.hpp.
Referenced by nArmedBanditsSofmax(), and updateSoftmaxValues().
|
private |
Window for displaying average reward.
Definition at line 75 of file nArmedBanditsSofmax.hpp.
Referenced by initialize(), and ~nArmedBanditsSofmax().