MachineIntelligenceCore:ReinforcementLearning
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator
mic::application::nArmedBanditsSofmax Class Reference

Class implementing a n-Armed Bandits problem solving the n armed bandits problem using Softmax Action Selection. More...

#include <nArmedBanditsSofmax.hpp>

Inheritance diagram for mic::application::nArmedBanditsSofmax:
Collaboration diagram for mic::application::nArmedBanditsSofmax:

Public Member Functions

 nArmedBanditsSofmax (std::string node_name_="application")
 
virtual ~nArmedBanditsSofmax ()
 

Protected Member Functions

virtual void initializePropertyDependentVariables ()
 
virtual void initialize (int argc, char *argv[])
 
virtual bool performSingleStep ()
 

Private Member Functions

short calculateReward (float prob_)
 
void updateSoftmaxValues ()
 

Private Attributes

WindowCollectorChart< float > * w_reward
 Window for displaying average reward. More...
 
mic::utils::DataCollectorPtr
< std::string, float > 
reward_collector_ptr
 Reward collector. More...
 
mic::types::VectorXf arms
 n Bandit arms. More...
 
mic::types::VectorXf action_values
 Action values. More...
 
mic::types::VectorXi action_counts
 Counters storing how many times we've taken a particular action. More...
 
mic::types::VectorXf action_values_softmax
 Action values - softmax. More...
 
mic::configuration::Property
< size_t > 
number_of_bandits
 Property: number of bandits. More...
 
mic::configuration::Property
< double > 
tau
 
mic::configuration::Property
< std::string > 
statistics_filename
 Property: name of the file to which the statistics will be exported. More...
 
size_t best_arm
 
float best_arm_prob
 

Detailed Description

Class implementing a n-Armed Bandits problem solving the n armed bandits problem using Softmax Action Selection.

Author
tkornuta

Definition at line 41 of file nArmedBanditsSofmax.hpp.

Constructor & Destructor Documentation

mic::application::nArmedBanditsSofmax::nArmedBanditsSofmax ( std::string  node_name_ = "application")

Default Constructor. Sets the application/node name, default values of variables, initializes classifier etc.

Parameters
node_name_Name of the application/node (in configuration file).

Definition at line 41 of file nArmedBanditsSofmax.cpp.

References number_of_bandits, statistics_filename, and tau.

mic::application::nArmedBanditsSofmax::~nArmedBanditsSofmax ( )
virtual

Destructor.

Definition at line 56 of file nArmedBanditsSofmax.cpp.

References w_reward.

Member Function Documentation

short mic::application::nArmedBanditsSofmax::calculateReward ( float  prob_)
private

Calculates the reward.

Parameters
prob_Probability.

Definition at line 109 of file nArmedBanditsSofmax.cpp.

References number_of_bandits.

Referenced by performSingleStep().

void mic::application::nArmedBanditsSofmax::initialize ( int  argc,
char *  argv[] 
)
protectedvirtual

Method initializes GLUT and OpenGL windows.

Parameters
argcNumber of application parameters.
argvArray of application parameters.

Definition at line 61 of file nArmedBanditsSofmax.cpp.

References reward_collector_ptr, and w_reward.

void mic::application::nArmedBanditsSofmax::initializePropertyDependentVariables ( )
protectedvirtual

Initializes all variables that are property-dependent.

Definition at line 77 of file nArmedBanditsSofmax.cpp.

References action_counts, action_values, action_values_softmax, arms, best_arm, best_arm_prob, and number_of_bandits.

bool mic::application::nArmedBanditsSofmax::performSingleStep ( )
protectedvirtual
void mic::application::nArmedBanditsSofmax::updateSoftmaxValues ( )
private

Updates the softmax action-value table.

Definition at line 119 of file nArmedBanditsSofmax.cpp.

References action_values, action_values_softmax, number_of_bandits, and tau.

Referenced by performSingleStep().

Member Data Documentation

mic::types::VectorXi mic::application::nArmedBanditsSofmax::action_counts
private

Counters storing how many times we've taken a particular action.

Definition at line 87 of file nArmedBanditsSofmax.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

mic::types::VectorXf mic::application::nArmedBanditsSofmax::action_values
private

Action values.

Definition at line 84 of file nArmedBanditsSofmax.hpp.

Referenced by initializePropertyDependentVariables(), performSingleStep(), and updateSoftmaxValues().

mic::types::VectorXf mic::application::nArmedBanditsSofmax::action_values_softmax
private

Action values - softmax.

Definition at line 90 of file nArmedBanditsSofmax.hpp.

Referenced by initializePropertyDependentVariables(), performSingleStep(), and updateSoftmaxValues().

mic::types::VectorXf mic::application::nArmedBanditsSofmax::arms
private

n Bandit arms.

Definition at line 81 of file nArmedBanditsSofmax.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

size_t mic::application::nArmedBanditsSofmax::best_arm
private

The best arm (hidden state).

Definition at line 106 of file nArmedBanditsSofmax.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

float mic::application::nArmedBanditsSofmax::best_arm_prob
private

The best arm probability/"reward" (hidden state).

Definition at line 111 of file nArmedBanditsSofmax.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

mic::configuration::Property<size_t> mic::application::nArmedBanditsSofmax::number_of_bandits
private
mic::utils::DataCollectorPtr<std::string, float> mic::application::nArmedBanditsSofmax::reward_collector_ptr
private

Reward collector.

Definition at line 78 of file nArmedBanditsSofmax.hpp.

Referenced by initialize(), and performSingleStep().

mic::configuration::Property<std::string> mic::application::nArmedBanditsSofmax::statistics_filename
private

Property: name of the file to which the statistics will be exported.

Definition at line 101 of file nArmedBanditsSofmax.hpp.

Referenced by nArmedBanditsSofmax(), and performSingleStep().

mic::configuration::Property<double> mic::application::nArmedBanditsSofmax::tau
private

Property: the softmax "heat" parameter, scaling the probability distribution of all actions. A high temperature will tend the probabilities to be very similar, whereas a low temperature will exaggerate differences in probabilities between actions.

Definition at line 98 of file nArmedBanditsSofmax.hpp.

Referenced by nArmedBanditsSofmax(), and updateSoftmaxValues().

WindowCollectorChart<float>* mic::application::nArmedBanditsSofmax::w_reward
private

Window for displaying average reward.

Definition at line 75 of file nArmedBanditsSofmax.hpp.

Referenced by initialize(), and ~nArmedBanditsSofmax().


The documentation for this class was generated from the following files: