MachineIntelligenceCore:ReinforcementLearning
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator
mic::application::TestApp Class Reference

Class implementing a n-Armed Bandits problem solving the n armed bandits problem using simple Q-learning rule. More...

#include <nArmedBanditsSimpleQlearning.hpp>

Inheritance diagram for mic::application::TestApp:
Collaboration diagram for mic::application::TestApp:

Public Member Functions

 TestApp (std::string node_name_="application")
 
virtual ~TestApp ()
 

Protected Member Functions

virtual void initializePropertyDependentVariables ()
 
virtual void initialize (int argc, char *argv[])
 
virtual bool performSingleStep ()
 

Private Member Functions

short calculateReward (float prob_)
 
size_t selectBestArm ()
 

Private Attributes

WindowCollectorChart< float > * w_reward
 Window for displaying average reward. More...
 
mic::utils::DataCollectorPtr
< std::string, float > 
reward_collector_ptr
 Reward collector. More...
 
mic::types::VectorXf arms
 n Bandit arms. More...
 
mic::types::VectorXf action_values
 Action values. More...
 
mic::types::VectorXi action_counts
 Counters storing how many times we've taken a particular action. More...
 
mic::configuration::Property
< size_t > 
number_of_bandits
 Property: number of bandits. More...
 
mic::configuration::Property
< double > 
epsilon
 Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected). More...
 
mic::configuration::Property
< std::string > 
statistics_filename
 Property: name of the file to which the statistics will be exported. More...
 
size_t best_arm
 
float best_arm_prob
 

Detailed Description

Class implementing a n-Armed Bandits problem solving the n armed bandits problem using simple Q-learning rule.

Author
tkornuta

Definition at line 41 of file nArmedBanditsSimpleQlearning.hpp.

Constructor & Destructor Documentation

mic::application::TestApp::TestApp ( std::string  node_name_ = "application")

Default Constructor. Sets the application/node name, default values of variables, initializes classifier etc.

Parameters
node_name_Name of the application/node (in configuration file).

Definition at line 39 of file nArmedBanditsSimpleQlearning.cpp.

References epsilon, number_of_bandits, and statistics_filename.

mic::application::TestApp::~TestApp ( )
virtual

Destructor.

Definition at line 54 of file nArmedBanditsSimpleQlearning.cpp.

References w_reward.

Member Function Documentation

short mic::application::TestApp::calculateReward ( float  prob_)
private

Calculates the reward.

Parameters
prob_Probability.

Definition at line 100 of file nArmedBanditsSimpleQlearning.cpp.

References number_of_bandits.

Referenced by performSingleStep().

void mic::application::TestApp::initialize ( int  argc,
char *  argv[] 
)
protectedvirtual

Method initializes GLUT and OpenGL windows.

Parameters
argcNumber of application parameters.
argvArray of application parameters.

Definition at line 59 of file nArmedBanditsSimpleQlearning.cpp.

References reward_collector_ptr, and w_reward.

void mic::application::TestApp::initializePropertyDependentVariables ( )
protectedvirtual

Initializes all variables that are property-dependent.

Definition at line 75 of file nArmedBanditsSimpleQlearning.cpp.

References action_counts, action_values, arms, best_arm, best_arm_prob, and number_of_bandits.

bool mic::application::TestApp::performSingleStep ( )
protectedvirtual
size_t mic::application::TestApp::selectBestArm ( )
private

Greedy method that selects best arm based on historical action-value pairs.

Definition at line 110 of file nArmedBanditsSimpleQlearning.cpp.

References action_values, and number_of_bandits.

Referenced by performSingleStep().

Member Data Documentation

mic::types::VectorXi mic::application::TestApp::action_counts
private

Counters storing how many times we've taken a particular action.

Definition at line 87 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

mic::types::VectorXf mic::application::TestApp::action_values
private
mic::types::VectorXf mic::application::TestApp::arms
private

n Bandit arms.

Definition at line 81 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

size_t mic::application::TestApp::best_arm
private

The best arm (hidden state).

Definition at line 101 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

float mic::application::TestApp::best_arm_prob
private

The best arm probability/"reward" (hidden state).

Definition at line 106 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by initializePropertyDependentVariables(), and performSingleStep().

mic::configuration::Property<double> mic::application::TestApp::epsilon
private

Property: variable denoting epsilon in action selection (the probability "below" which a random action will be selected).

Definition at line 93 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by performSingleStep(), and TestApp().

mic::configuration::Property<size_t> mic::application::TestApp::number_of_bandits
private

Property: number of bandits.

Definition at line 90 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by calculateReward(), initializePropertyDependentVariables(), performSingleStep(), selectBestArm(), and TestApp().

mic::utils::DataCollectorPtr<std::string, float> mic::application::TestApp::reward_collector_ptr
private

Reward collector.

Definition at line 78 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by initialize(), and performSingleStep().

mic::configuration::Property<std::string> mic::application::TestApp::statistics_filename
private

Property: name of the file to which the statistics will be exported.

Definition at line 96 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by performSingleStep(), and TestApp().

WindowCollectorChart<float>* mic::application::TestApp::w_reward
private

Window for displaying average reward.

Definition at line 75 of file nArmedBanditsSimpleQlearning.hpp.

Referenced by initialize(), and ~TestApp().


The documentation for this class was generated from the following files: