Ensemble Methods

AMPL supports three types of meta-ensembles, namely: StackEnsemble, VoteEnsemble, and BestLearner. They are considered as meta-ensembles because they can contain other learners including other ensembles as well as meta-ensembles. They support complex level of hierarchy depending on the requirements. The most effective way to show their flexibility is to provide some real examples.

StackEnsemble

Stack ensemble uses the idea of stacking to train learners into two stages. The first stage trains bottom-level learners for the mapping between the input and output. The default is to use 70% of the data. Once the bottom-level learners finish the training, the algorithm proceeds to stage 2 which treats the trained learners as transformers. The output from these transformers is used to train the Meta-Learner (RandomForest, PrunedTree, or Adaboost) using the remaining 30% of the data.

The StackEnsemble accepts the following arguments wrapped in a Dictionary type argument:

:name -> alias name of ensemble
:learners -> a vector of learners
:stacker -> the meta-learner (RandomForest, or Adaboost, or PrunedTree)
:stacker_training_portion -> percentage of data for the meta-learner
:keep_original_features -> boolean (whether the original data is included together with the transformed data by the bottom-level learners)

While the init function of StackEnsemble expects an argument of Dictionary type, it supports the following convenient function signatures:

StackEnsemble(Dict(:learners=>...,:stacker=>...))
StackEnsemble([learner1,learner2,...],Dict(:stacker=>...))
StackEnsemble([learner1,learner2,...],stacker=...)
StackEnsemble([learner1,learner2,...])

To illustrate, let's create some bottom-level learners from Scikitlearn and Julia:

using AutoMLPipeline
using DataFrames

gauss = SKLearner("GaussianProcessClassifier")
svc = SKLearner("LinearSVC")
ridge = SKLearner("RidgeClassifier")
jrf = RandomForest() # julia's rf
rfstacker = RandomForest()
stackens = StackEnsemble([gauss,svc,ridge,jrf],stacker=rfstacker)

Let's load some dataset and create a pipeline with the stackens as the learner at the end of the pipeline.

using CSV
using Random
Random.seed!(123);

profbdata = CSV.File(joinpath(dirname(pathof(AutoMLPipeline)),"../data/profb.csv")) |> DataFrame
X = profbdata[:,2:end]
Y = profbdata[:,1] |> Vector;

ohe = OneHotEncoder()
catf = CatFeatureSelector();
numf = NumFeatureSelector()
rb = SKPreprocessor("RobustScaler");
pt = SKPreprocessor("PowerTransformer");
pca = SKPreprocessor("PCA");
fa = SKPreprocessor("FactorAnalysis");
ica = SKPreprocessor("FastICA")
pplstacks = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                       (catf |> ohe) + (numf |> rb |> fa) |> stackens

julia> crossvalidate(pplstacks,X,Y)fold: 1, 67.16417910447761
fold: 2, 70.1492537313433
fold: 3, 67.64705882352942
fold: 4, 70.1492537313433
fold: 5, 74.6268656716418
fold: 6, 71.64179104477611
fold: 7, 74.6268656716418
fold: 8, 72.05882352941177
fold: 9, 74.6268656716418
fold: 10, 71.64179104477611
errors: 0
(mean = 71.43327480245829, std = 2.7270938319708, folds = 10, errors = 0)

It is worth noting that stack ensemble is dealing with mixture of libraries consisting of Julia's Random Forest and Scikitlearn learners.

VoteEnsemble

Vote ensemble uses similar idea with the Stack Ensemble but instead of stacking, it uses voting to get the final prediction. The first stage involves the collection of bottom-level learners being trained to learn the mapping between input and output. Once they are trained in a classification problem, they are treated as transformers wherein the final output of the ensemble is based on the output with the greatest count. It's equivalent to majority voting where each learner has one vote based on its prediction output class.

The VoteEnsemble accepts the following arguments wrapped inside a Dictionary type of argument:

:name -> alias name of ensemble
:learners -> a vector of learners

While the init function of VoteEnsemble expects a Dictionary type of argument, it also supports the following convenient helper functions:

VoteEnsemble(Dict(:learners=>...,:name=>...))
VoteEnsemble([learner1,learner2,...],Dict(:name=>...))
VoteEnsemble([learner1,learner2,...],name=...)
VoteEnsemble([learner1,learner2,...])

Let's use the same pipeline but substitute the stack ensemble with the vote ensemble:

Random.seed!(123);

votingens = VoteEnsemble([gauss,svc,ridge,jrf]);
pplvote = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                     (catf |> ohe) + (numf |> rb |> fa) |> votingens;

julia> crossvalidate(pplvote,X,Y)fold: 1, 59.70149253731343
fold: 2, 77.61194029850746
fold: 3, 67.64705882352942
fold: 4, 67.16417910447761
fold: 5, 64.17910447761194
fold: 6, 55.223880597014926
fold: 7, 74.6268656716418
fold: 8, 69.11764705882352
fold: 9, 65.67164179104478
fold: 10, 62.68656716417911
errors: 0
(mean = 66.36303775241439, std = 6.599058064720539, folds = 10, errors = 0)

BestLearner

The BestLearner ensemble does not perform any 2-stage mapping. What it does is to cross-validate each learner performance and use the most optimal learner as the final model. This ensemble can be used to automatically pick the most optimal learner in a group of learners included in each argument based on certain selection criteria.

The BestLearner accepts the following arguments wrapped in Dictionary type argument:

:selection_function -> Function
:score_type -> Real
:partition_generator -> Function
:learners -> Vector of learners
:name -> alias name of learner
:learner_options_grid -> for hyper-parameter search

The BestLearner supports the following function signatures aside from Dictionary type argument:

BestLearner(Dict(:learners=>...,:name=>...))
BestLearner([learner1,learner2,...],Dict(:name=>...))
BestLearner([learner1,learner2,...],name=...)
BestLearner([learner1,learner2,...])

Let's use the same pipeline as above but substitute the vote ensemble with the BestLearner ensemble:

Random.seed!(123);

bestens = BestLearner([gauss,svc,ridge,jrf]);
pplbest = @pipeline  (numf |> rb |> pca) + (numf |> rb |> ica) +
                     (catf |> ohe) + (numf |> rb |> fa) |> bestens;

julia> crossvalidate(pplbest,X,Y)fold: 1, 64.17910447761194
fold: 2, 83.5820895522388
fold: 3, 63.23529411764706
fold: 4, 77.61194029850746
fold: 5, 73.13432835820896
fold: 6, 73.13432835820896
fold: 7, 67.16417910447761
fold: 8, 80.88235294117648
fold: 9, 64.17910447761194
fold: 10, 67.16417910447761
errors: 0
(mean = 71.4266900790167, std = 7.376724922881322, folds = 10, errors = 0)