Ensemble Methods

AMPL supports three types of meta-ensembles, namely: StackEnsemble, VoteEnsemble, and BestLearner. They are considered as meta-ensembles because they can contain other learners including other ensembles as well as meta-ensembles. They support complex level of hierarchy depending on the requirements. The most effective way to show their flexibility is to provide some real examples.

StackEnsemble

Stack ensemble uses the idea of stacking to train learners into two stages. The first stage trains bottom-level learners for the mapping between the input and output. The default is to use 70% of the data. Once the bottom-level learners finish the training, the algorithm proceeds to stage 2 which treats the trained learners as transformers. The output from these transformers is used to train the Meta-Learner (RandomForest, PrunedTree, or Adaboost) using the remaining 30% of the data.

The StackEnsemble accepts the following arguments wrapped in a Dictionary type argument:

  • :name -> alias name of ensemble
  • :learners -> a vector of learners
  • :stacker -> the meta-learner (RandomForest, or Adaboost, or PrunedTree)
  • :stacker_training_portion -> percentage of data for the meta-learner
  • :keep_original_features -> boolean (whether the original data is included together with the transformed data by the bottom-level learners)

While the init function of StackEnsemble expects an argument of Dictionary type, it supports the following convenient function signatures:

  • StackEnsemble(Dict(:learners=>...,:stacker=>...))
  • StackEnsemble([learner1,learner2,...],Dict(:stacker=>...))
  • StackEnsemble([learner1,learner2,...])

To illustrate, let's create some bottom-level learners from Scikitlearn and Julia:

using AutoMLPipeline

gauss = SKLearner("GaussianProcessClassifier")
svc = SKLearner("LinearSVC")
ridge = SKLearner("RidgeClassifier")
jrf = RandomForest() # julia's rf
rfstacker = RandomForest()
stackens = StackEnsemble([gauss,svc,ridge,jrf],Dict(:stacker=>rfstacker))

Let's load some dataset and create a pipeline with the stackens as the learner at the end of the pipeline.

using CSV
using Random
Random.seed!(123);

profbdata = CSV.read(joinpath(dirname(pathof(AutoMLPipeline)),"../data/profb.csv"))
X = profbdata[:,2:end]
Y = profbdata[:,1] |> Vector;

ohe = OneHotEncoder()
catf = CatFeatureSelector();
numf = NumFeatureSelector()
rb = SKPreprocessor("RobustScaler");
pt = SKPreprocessor("PowerTransformer");
pca = SKPreprocessor("PCA");
fa = SKPreprocessor("FactorAnalysis");
ica = SKPreprocessor("FastICA")
pplstacks = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                       (catf |> ohe) + (numf |> rb |> fa) |> stackens
┌ Warning: `CSV.read(input; kw...)` is deprecated in favor of `DataFrame!(CSV.File(input; kw...))`
└ @ CSV ~/.julia/packages/CSV/OM6FO/src/CSV.jl:40
julia> crossvalidate(pplstacks,X,Y)
fold: 1, 0.6716417910447762
fold: 2, 0.6911764705882353
fold: 3, 0.6363636363636364
fold: 4, 0.6617647058823529
fold: 5, 0.8208955223880597
fold: 6, 0.7313432835820896
fold: 7, 0.6764705882352942
fold: 8, 0.6666666666666666
fold: 9, 0.7794117647058824
fold: 10, 0.746268656716418
errors: 0
(mean = 0.7082003086173411, std = 0.05909597337447694, folds = 10, errors = 0)

It is worth noting that stack ensemble is dealing with mixture of libraries consisting of Julia's Random Forest and Scikitlearn learners.

VoteEnsemble

Vote ensemble uses similar idea with the Stack Ensemble but instead of stacking, it uses voting to get the final prediction. The first stage involves the collection of bottom-level learners being trained to learn the mapping between input and output. Once they are trained in a classification problem, they are treated as transformers wherein the final output of the ensemble is based on the output with the greatest count. It's equivalent to majority voting where each learner has one vote based on its prediction output class.

The VoteEnsemble accepts the following arguments wrapped inside a Dictionary type of argument:

  • :name -> alias name of ensemble
  • :learners -> a vector of learners

While the init function of VoteEnsemble expects a Dictionary type of argument, it also supports the following convenient helper functions:

  • VoteEnsemble(Dict(:learners=>...,:name=>...))
  • VoteEnsemble([learner1,learner2,...],Dict(:name=>...))
  • VoteEnsemble([learner1,learner2,...])

Let's use the same pipeline but substitute the stack ensemble with the vote ensemble:

Random.seed!(123);

votingens = VoteEnsemble([gauss,svc,ridge,jrf]);
pplvote = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                     (catf |> ohe) + (numf |> rb |> fa) |> votingens;
julia> crossvalidate(pplvote,X,Y)
fold: 1, 0.7014925373134329
fold: 2, 0.6176470588235294
fold: 3, 0.6515151515151515
fold: 4, 0.7352941176470589
fold: 5, 0.7014925373134329
fold: 6, 0.6716417910447762
fold: 7, 0.6764705882352942
fold: 8, 0.7121212121212122
fold: 9, 0.7205882352941176
fold: 10, 0.6865671641791045
errors: 0
(mean = 0.687483039348711, std = 0.03484128460517446, folds = 10, errors = 0)

BestLearner

The BestLearner ensemble does not perform any 2-stage mapping. What it does is to cross-validate each learner performance and use the most optimal learner as the final model. This ensemble can be used to automatically pick the most optimal learner in a group of learners included in each argument based on certain selection criteria.

The BestLearner accepts the following arguments wrapped in Dictionary type argument:

  • :selection_function -> Function
  • :score_type -> Real
  • :partition_generator -> Function
  • :learners -> Vector of learners
  • :name -> alias name of learner
  • :learner_options_grid -> for hyper-parameter search

The BestLearner supports the following function signatures aside from Dictionary type argument:

  • BestLearner(Dict(:learners=>...,:name=>...))
  • BestLearner([learner1,learner2,...],Dict(:name=>...))
  • BestLearner([learner1,learner2,...])

Let's use the same pipeline as above but substitute the vote ensemble with the BestLearner ensemble:

Random.seed!(123);

bestens = BestLearner([gauss,svc,ridge,jrf]);
pplbest = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                     (catf |> ohe) + (numf |> rb |> fa) |> bestens;
julia> crossvalidate(pplbest,X,Y)
fold: 1, 0.7313432835820896
fold: 2, 0.7058823529411765
fold: 3, 0.6818181818181818
fold: 4, 0.7352941176470589
fold: 5, 0.7313432835820896
fold: 6, 0.7611940298507462
fold: 7, 0.8235294117647058
fold: 8, 0.696969696969697
fold: 9, 0.7205882352941176
fold: 10, 0.7611940298507462
errors: 0
(mean = 0.734915662330061, std = 0.04023074520758086, folds = 10, errors = 0)