Ensemble Methods

AMPL supports three types of meta-ensembles, namely: StackEnsemble, VoteEnsemble, and BestLearner. They are considered as meta-ensembles because they can contain other learners including other ensembles as well as meta-ensembles. They support complex level of hierarchy depending on the requirements. The most effective way to show their flexibility is to provide some real examples.

StackEnsemble

Stack ensemble uses the idea of stacking to train learners into two stages. The first stage trains bottom-level learners for the mapping between the input and output. The default is to use 70% of the data. Once the bottom-level learners finish the training, the algorithm proceeds to stage 2 which treats the trained learners as transformers. The output from these transformers is used to train the Meta-Learner (RandomForest, PrunedTree, or Adaboost) using the remaining 30% of the data.

The StackEnsemble accepts the following arguments wrapped in a Dictionary type argument:

  • :name -> alias name of ensemble
  • :learners -> a vector of learners
  • :stacker -> the meta-learner (RandomForest, or Adaboost, or PrunedTree)
  • :stacker_training_portion -> percentage of data for the meta-learner
  • :keep_original_features -> boolean (whether the original data is included together with the transformed data by the bottom-level learners)

While the init function of StackEnsemble expects an argument of Dictionary type, it supports the following convenient function signatures:

  • StackEnsemble(Dict(:learners=>...,:stacker=>...))
  • StackEnsemble([learner1,learner2,...],Dict(:stacker=>...))
  • StackEnsemble([learner1,learner2,...],stacker=...)
  • StackEnsemble([learner1,learner2,...])

To illustrate, let's create some bottom-level learners from Scikitlearn and Julia:

using AutoMLPipeline
using DataFrames

gauss = SKLearner("GaussianProcessClassifier")
svc = SKLearner("LinearSVC")
ridge = SKLearner("RidgeClassifier")
jrf = RandomForest() # julia's rf
rfstacker = RandomForest()
stackens = StackEnsemble([gauss,svc,ridge,jrf],stacker=rfstacker)

Let's load some dataset and create a pipeline with the stackens as the learner at the end of the pipeline.

using CSV
using Random
Random.seed!(123);

profbdata = CSV.File(joinpath(dirname(pathof(AutoMLPipeline)),"../data/profb.csv")) |> DataFrame
X = profbdata[:,2:end]
Y = profbdata[:,1] |> Vector;

ohe = OneHotEncoder()
catf = CatFeatureSelector();
numf = NumFeatureSelector()
rb = SKPreprocessor("RobustScaler");
pt = SKPreprocessor("PowerTransformer");
pca = SKPreprocessor("PCA");
fa = SKPreprocessor("FactorAnalysis");
ica = SKPreprocessor("FastICA")
pplstacks = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                       (catf |> ohe) + (numf |> rb |> fa) |> stackens
julia> crossvalidate(pplstacks,X,Y)
fold: 1, 68.65671641791045
fold: 2, 68.65671641791045
fold: 3, 77.94117647058823
fold: 4, 67.16417910447761
fold: 5, 86.56716417910447
fold: 6, 65.67164179104478
fold: 7, 65.67164179104478
fold: 8, 57.35294117647059
fold: 9, 77.61194029850746
fold: 10, 62.68656716417911
errors: 0
(mean = 69.79806848112379, std = 8.54804430144081, folds = 10, errors = 0)

It is worth noting that stack ensemble is dealing with mixture of libraries consisting of Julia's Random Forest and Scikitlearn learners.

VoteEnsemble

Vote ensemble uses similar idea with the Stack Ensemble but instead of stacking, it uses voting to get the final prediction. The first stage involves the collection of bottom-level learners being trained to learn the mapping between input and output. Once they are trained in a classification problem, they are treated as transformers wherein the final output of the ensemble is based on the output with the greatest count. It's equivalent to majority voting where each learner has one vote based on its prediction output class.

The VoteEnsemble accepts the following arguments wrapped inside a Dictionary type of argument:

  • :name -> alias name of ensemble
  • :learners -> a vector of learners

While the init function of VoteEnsemble expects a Dictionary type of argument, it also supports the following convenient helper functions:

  • VoteEnsemble(Dict(:learners=>...,:name=>...))
  • VoteEnsemble([learner1,learner2,...],Dict(:name=>...))
  • VoteEnsemble([learner1,learner2,...],name=...)
  • VoteEnsemble([learner1,learner2,...])

Let's use the same pipeline but substitute the stack ensemble with the vote ensemble:

Random.seed!(123);

votingens = VoteEnsemble([gauss,svc,ridge,jrf]);
pplvote = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                     (catf |> ohe) + (numf |> rb |> fa) |> votingens;
julia> crossvalidate(pplvote,X,Y)
fold: 1, 68.65671641791045
fold: 2, 64.17910447761194
fold: 3, 70.58823529411765
fold: 4, 67.16417910447761
fold: 5, 85.07462686567165
fold: 6, 71.64179104477611
fold: 7, 68.65671641791045
fold: 8, 52.94117647058824
fold: 9, 80.59701492537313
fold: 10, 64.17910447761194
errors: 0
(mean = 69.36786654960493, std = 8.875725218323854, folds = 10, errors = 0)

BestLearner

The BestLearner ensemble does not perform any 2-stage mapping. What it does is to cross-validate each learner performance and use the most optimal learner as the final model. This ensemble can be used to automatically pick the most optimal learner in a group of learners included in each argument based on certain selection criteria.

The BestLearner accepts the following arguments wrapped in Dictionary type argument:

  • :selection_function -> Function
  • :score_type -> Real
  • :partition_generator -> Function
  • :learners -> Vector of learners
  • :name -> alias name of learner
  • :learner_options_grid -> for hyper-parameter search

The BestLearner supports the following function signatures aside from Dictionary type argument:

  • BestLearner(Dict(:learners=>...,:name=>...))
  • BestLearner([learner1,learner2,...],Dict(:name=>...))
  • BestLearner([learner1,learner2,...],name=...)
  • BestLearner([learner1,learner2,...])

Let's use the same pipeline as above but substitute the vote ensemble with the BestLearner ensemble:

Random.seed!(123);

bestens = BestLearner([gauss,svc,ridge,jrf]);
pplbest = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                     (catf |> ohe) + (numf |> rb |> fa) |> bestens;
julia> crossvalidate(pplbest,X,Y)
fold: 1, 71.64179104477611
fold: 2, 70.1492537313433
fold: 3, 76.47058823529412
fold: 4, 79.1044776119403
fold: 5, 85.07462686567165
fold: 6, 65.67164179104478
fold: 7, 68.65671641791045
fold: 8, 64.70588235294117
fold: 9, 86.56716417910447
fold: 10, 68.65671641791045
errors: 0
(mean = 73.66988586479367, std = 7.780914318512783, folds = 10, errors = 0)