Index
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit!
AMLPipelineBase.AbsTypes.fit_transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.AbsTypes.transform!
AMLPipelineBase.BaselineModels.Baseline
AMLPipelineBase.BaselineModels.Baseline
AMLPipelineBase.BaselineModels.Identity
AMLPipelineBase.BaselineModels.Identity
AMLPipelineBase.BaseFilters.Imputer
AMLPipelineBase.BaseFilters.OneHotEncoder
AMLPipelineBase.BaseFilters.Wrapper
AMLPipelineBase.BaseFilters.createtransformer
AMLPipelineBase.Pipelines.ComboPipeline
AMLPipelineBase.Pipelines.Pipeline
AMLPipelineBase.Pipelines.Pipeline
AMLPipelineBase.Pipelines.Pipeline
AMLPipelineBase.NARemovers.NARemover
AMLPipelineBase.NARemovers.NARemover
AMLPipelineBase.CrossValidators.crossvalidate
AMLPipelineBase.CrossValidators.crossvalidate
AMLPipelineBase.DecisionTreeLearners.Adaboost
AMLPipelineBase.DecisionTreeLearners.PrunedTree
AMLPipelineBase.DecisionTreeLearners.RandomForest
AMLPipelineBase.EnsembleMethods.BestLearner
AMLPipelineBase.EnsembleMethods.StackEnsemble
AMLPipelineBase.EnsembleMethods.VoteEnsemble
AMLPipelineBase.FeatureSelectors.CatFeatureSelector
AMLPipelineBase.FeatureSelectors.CatNumDiscriminator
AMLPipelineBase.FeatureSelectors.CatNumDiscriminator
AMLPipelineBase.FeatureSelectors.FeatureSelector
AMLPipelineBase.FeatureSelectors.FeatureSelector
AMLPipelineBase.FeatureSelectors.FeatureSelector
AMLPipelineBase.FeatureSelectors.NumFeatureSelector
AutoMLPipeline.SKLearners.SKLearner
AutoMLPipeline.SKLearners.sklearners
AutoMLPipeline.SKPreprocessors.SKPreprocessor
AMLPipelineBase.Utils.aggregatorclskipmissing
AMLPipelineBase.Utils.createmachine
AMLPipelineBase.Utils.find_catnum_columns
AMLPipelineBase.Utils.holdout
AMLPipelineBase.Utils.infer_eltype
AMLPipelineBase.Utils.kfold
AMLPipelineBase.Utils.nested_dict_merge
AMLPipelineBase.Utils.nested_dict_set!
AMLPipelineBase.Utils.nested_dict_to_tuples
AMLPipelineBase.Utils.score
Descriptions
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(mc::Machine, input::DataFrame, output::Vector)
Generic trait to be overloaded by different subtypes of Machine. Multiple dispatch for fit!.
AMLPipelineBase.AbsTypes.fit_transform!
— Functionfit_transform!(mc::Machine, input::DataFrame, output::Vector)
Dynamic dispatch that calls in sequence fit!
and transform!
functions.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(mc::Machine, input::DataFrame)
Generic trait to be overloaded by different subtypes of Machine. Multiple dispatch for transform!.
AMLPipelineBase.BaselineModels.Baseline
— TypeBaseline(
default_args = Dict(
:name => "baseline",
:output => :class,
:strat => mode
)
)
Baseline model that returns the mode during classification.
AMLPipelineBase.BaselineModels.Baseline
— MethodBaseline(name::String,opt...)
Helper function
AMLPipelineBase.BaselineModels.Identity
— TypeIdentity(args=Dict())
Returns the input as output.
AMLPipelineBase.BaselineModels.Identity
— MethodBaseline(name::String,opt...)
Helper function
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(bsl::Baseline,x::DataFrame,y::Vector)
Get the mode of the training data.
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(idy::Identity,x::DataFrame,y::Vector)
Does nothing.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(bsl::Baseline,x::DataFrame)
Return the mode in classification.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(idy::Identity,x::DataFrame)
Return the input as output.
AMLPipelineBase.BaseFilters.Imputer
— TypeImputer(
Dict(
# Imputation strategy.
# Statistic that takes a vector such as mean or median.
:strategy => mean
)
)
Imputes NaN values from Float64 features.
Implements fit!
and transform
.
AMLPipelineBase.BaseFilters.OneHotEncoder
— TypeOneHotEncoder(Dict(
# Nominal columns
:nominal_columns => Int[],
# Nominal column values map. Key is column index, value is list of
# possible values for that column.
:nominal_column_values_map => Dict{Int,Any}()
))
Transforms myinstances with nominal features into one-hot form and coerces the instance matrix to be of element type Float64.
Implements fit!
and transform
.
AMLPipelineBase.BaseFilters.Wrapper
— TypeWrapper(
default_args = Dict(
:name => "ohe-wrapper",
# Transformer to call.
:transformer => OneHotEncoder(),
# Transformer args.
:transformer_args => Dict()
)
)
Wraps around a transformer.
Implements fit!
and transform
.
AMLPipelineBase.BaseFilters.createtransformer
— Functioncreatetransformer(prototype::Transformer, args=Dict())
Create transformer
prototype
: prototype transformer to base new transformer onoptions
: additional options to override prototype's options
Returns: new transformer.
AMLPipelineBase.Pipelines.ComboPipeline
— TypeComboPipeline(machs::Vector{T}) where {T<:Machine}
Feature union pipeline which iteratively calls fit_transform
of each element and concatenate their output into one dataframe.
Implements fit!
and transform!
.
AMLPipelineBase.Pipelines.Pipeline
— TypePipeline(machs::Vector{<:Machine},args::Dict=Dict())
Linear pipeline which iteratively calls and passes the result of fit_transform
to the succeeding elements in the pipeline.
Implements fit!
and transform!
.
AMLPipelineBase.Pipelines.Pipeline
— MethodPipeline(machs::Vararg{Machine})
Helper function for Pipeline structure.
AMLPipelineBase.Pipelines.Pipeline
— MethodPipeline(machs::Vector{<:Machine},args::Dict=Dict())
Helper function for Pipeline structure.
AMLPipelineBase.NARemovers.NARemover
— TypeNARemover(
Dict(
:name => "nadetect",
:acceptance => 0.10 # tolerable NAs percentage
)
)
Removes columns with NAs greater than acceptance rate. This assumes that it processes columns of features. The output column should not be part of input to avoid it being excluded if it fails the acceptance critera.
Implements fit!
and transform!
.
AMLPipelineBase.NARemovers.NARemover
— MethodNARemover(acceptance::Float64)
Helper function for NARemover.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(nad::NARemover,features::DataFrame,labels::Vector=[])
Checks and exit of df is empty
Arguments
nad::NARemover
: custom typefeatures::DataFrame
: inputlabels::Vector=[]
:
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(nad::NARemover,nfeatures::DataFrame)
Removes columns with NAs greater than acceptance rate.
Arguments
nad::NARemover
: custom typenfeatures::DataFrame
: input
AMLPipelineBase.CrossValidators.crossvalidate
— Methodcrossvalidate(pl::Machine,X::DataFrame,Y::Vector,pfunc::Function,kfolds=10)
Run K-fold crossvalidation where:
pfunc
is a performance metricX
andY
are input and target
AMLPipelineBase.DecisionTreeLearners.Adaboost
— TypeAdaboost(
Dict(
:output => :class,
:num_iterations => 7
)
)
Adaboosted decision tree stumps. See DecisionTree.jl's documentation
Hyperparameters:
:num_iterations
=> 7 (number of iterations of AdaBoost)
Implements fit!
, transform!
AMLPipelineBase.DecisionTreeLearners.PrunedTree
— TypePrunedTree(
Dict(
:purity_threshold => 1.0,
:max_depth => -1,
:min_samples_leaf => 1,
:min_samples_split => 2,
:min_purity_increase => 0.0
)
)
Decision tree classifier. See DecisionTree.jl's documentation
Hyperparmeters:
:purity_threshold
=> 1.0 (merge leaves having >=thresh combined purity):max_depth
=> -1 (maximum depth of the decision tree):min_samples_leaf
=> 1 (the minimum number of samples each leaf needs to have):min_samples_split
=> 2 (the minimum number of samples in needed for a split):min_purity_increase
=> 0.0 (minimum purity needed for a split)
Implements fit!
, transform!
AMLPipelineBase.DecisionTreeLearners.RandomForest
— TypeRandomForest(
Dict(
:output => :class,
:num_subfeatures => 0,
:num_trees => 10,
:partial_sampling => 0.7,
:max_depth => -1
)
)
Random forest classification. See DecisionTree.jl's documentation
Hyperparmeters:
:num_subfeatures
=> 0 (number of features to consider at random per split):num_trees
=> 10 (number of trees to train):partial_sampling
=> 0.7 (fraction of samples to train each tree on):max_depth
=> -1 (maximum depth of the decision trees):min_samples_leaf
=> 1 (the minimum number of samples each leaf needs to have):min_samples_split
=> 2 (the minimum number of samples in needed for a split):min_purity_increase
=> 0.0 (minimum purity needed for a split)
Implements fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(adaboost::Adaboost, features::DataFrame, labels::Vector)
Optimize the hyperparameters of Adaboost
instance.
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(tree::PrunedTree, features::DataFrame, labels::Vector)
Optimize the hyperparameters of PrunedTree
instance.
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(forest::RandomForest, features::DataFrame, labels::Vector)
Optimize the parameters of the RandomForest
instance.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(adaboost::Adaboost, features::DataFrame)
Predict using the optimized hyperparameters of the trained Adaboost
instance.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(ptree::PrunedTree, features::DataFrame)
Predict using the optimized hyperparameters of the trained PrunedTree
instance.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(forest::RandomForest, features::DataFrame)
Predict using the optimized hyperparameters of the trained RandomForest
instance.
AMLPipelineBase.EnsembleMethods.BestLearner
— TypeBestLearner(
Dict(
# Output to train against
# (:class).
:output => :class,
# Function to return partitions of instance indices.
:partition_generator => (instances, labels) -> kfold(size(instances, 1), 5),
# Function that selects the best learner by index.
# Arg learner_partition_scores is a (learner, partition) score matrix.
:selection_function => (learner_partition_scores) -> findmax(mean(learner_partition_scores, dims=2))[2],
# Score type returned by score() using respective output.
:score_type => Real,
# Candidate learners.
:learners => [PrunedTree(), Adaboost(), RandomForest()],
# Options grid for learners, to search through by BestLearner.
# Format is [learner_1_options, learner_2_options, ...]
# where learner_options is same as a learner's options but
# with a list of values instead of scalar.
:learner_options_grid => nothing
)
)
Selects best learner from the set by performing a grid search on learners if grid option is indicated.
AMLPipelineBase.EnsembleMethods.StackEnsemble
— TypeStackEnsemble(
Dict(
# Output to train against
# (:class).
:output => :class,
# Set of learners that produce feature space for stacker.
:learners => [PrunedTree(), Adaboost(), RandomForest()],
# Machine learner that trains on set of learners' outputs.
:stacker => RandomForest(),
# Proportion of training set left to train stacker itself.
:stacker_training_proportion => 0.3,
# Provide original features on top of learner outputs to stacker.
:keep_original_features => false
)
)
An ensemble where a 'stack' of learners is used for training and prediction.
AMLPipelineBase.EnsembleMethods.VoteEnsemble
— TypeVoteEnsemble(
Dict(
# Output to train against
# (:class).
:output => :class,
# Learners in voting committee.
:learners => [PrunedTree(), Adaboost(), RandomForest()]
)
)
Set of machine learners employing majority vote to decide prediction.
Implements: fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(bls::BestLearner, instances::DataFrame, labels::Vector)
Training phase:
- obtain learners as is if grid option is not present
- generate learners if grid option is present
- foreach prototype learner, generate learners with specific options found in grid
- generate partitions
- train each learner on each partition and obtain validation output
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(se::StackEnsemble, instances::DataFrame, labels::Vector)
Training phase of the stack of learners.
- perform holdout to obtain indices for
- partition learner and stacker training sets
- partition training set for learners and stacker
- train all learners
- train stacker on learners' outputs
- build final model from the trained learners
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(ve::VoteEnsemble, instances::DataFrame, labels::Vector)
Training phase of the ensemble.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(bls::BestLearner, instances::DataFrame)
Choose the best learner based on cross-validation results and use it for prediction.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(se::StackEnsemble, instances::DataFrame)
Build stacker instances and predict
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(ve::VoteEnsemble, instances::DataFrame)
Prediction phase of the ensemble.
AMLPipelineBase.FeatureSelectors.CatFeatureSelector
— TypeCatFeatureSelector(Dict(:name => "catf"))
Automatically extract categorical columns based on inferred element types.
Implements fit!
and transform!
.
AMLPipelineBase.FeatureSelectors.CatNumDiscriminator
— TypeCatNumDiscriminator(
Dict(
:name => "catnumdisc",
:maxcategories => 24
)
)
Transform numeric columns to string (as categories) if the count of their unique elements <= maxcategories.
Implements fit!
and transform!
.
AMLPipelineBase.FeatureSelectors.CatNumDiscriminator
— MethodCatNumDiscriminator(maxcat::Int)
Helper function for CatNumDiscriminator.
AMLPipelineBase.FeatureSelectors.FeatureSelector
— TypeFeatureSelector(
Dict(
:name => "featureselector",
:columns => [col1, col2, ...]
)
)
Returns a dataframe of the selected columns.
Implements fit!
and transform!
.
AMLPipelineBase.FeatureSelectors.FeatureSelector
— MethodFeatureSelector(cols::Vararg{Int})
Helper function for FeatureSelector.
AMLPipelineBase.FeatureSelectors.FeatureSelector
— MethodFeatureSelector(cols::Vector{Int})
Helper function for FeatureSelector.
AMLPipelineBase.FeatureSelectors.NumFeatureSelector
— TypeNumFeatureSelector(Dict(:name=>"numfeatsel"))
Automatically extracts numeric features based on their inferred element types.
Implements fit!
and transform!
.
AutoMLPipeline.SKLearners.SKLearner
— TypeSKLearner(learner::String, args::Dict=Dict())
A Scikitlearn wrapper to load the different machine learning models. Invoking sklearners()
will list the available learners. Please consult Scikitlearn documentation for arguments to pass.
Implements fit!
and transform!
.
AutoMLPipeline.SKLearners.sklearners
— Methodfunction sklearners()
List the available scikitlearn machine learners.
AutoMLPipeline.SKPreprocessors.SKPreprocessor
— TypeSKPreprocessor(preprocessor::String,args::Dict=Dict())
A wrapper for Scikitlearn preprocessor functions. Invoking skpreprocessors()
will list the acceptable and supported functions. Please check Scikitlearn documentation for arguments to pass.
Implements fit!
and transform!
.
AMLPipelineBase.CrossValidators.crossvalidate
— Methodcrossvalidate(pl::Machine,X::DataFrame,Y::Vector,sfunc::String="balanced_accuracy_score",nfolds=10)
Runs K-fold cross-validation using balanced accuracy as the default. It support the following metrics for classification:
- accuracy_score
- balancedaccuracyscore
- cohenkappascore
- jaccard_score
- matthews_corrcoef
- hamming_loss
- zerooneloss
- f1_score
- precision_score
- recall_score
and the following metrics for regression:
- meansquarederror
- meansquaredlog_error
- medianabsoluteerror
- r2_score
- max_error
- explainedvariancescore
AMLPipelineBase.Utils.aggregatorclskipmissing
— Methodaggregatorclskipmissing(fn::Function)
Function to create aggregator closure with skipmissing features
AMLPipelineBase.Utils.createmachine
— Functioncreatemachine(prototype::Machine, options=nothing)
Create machine
prototype
: prototype machine to base new machine onargs
: additional options to override prototype's options
Returns: new machine
AMLPipelineBase.Utils.find_catnum_columns
— Functionfind_catnum_columns(instances::DataFrame,maxuniqcat::Int=0)
Finds all categorial and numerical columns. Categorical columns are those that do not have Real type nor do all their elements correspond to Real. Also, columns with size of unique instances are less than maxuniqcat
are considered categorical.
AMLPipelineBase.Utils.holdout
— Methodholdout(n, right_prop)
Holdout method that partitions a collection into two partitions.
n
: Size of collection to partitionright_prop
: Percentage of collection placed in right partition
Returns: two partitions of indices, left and right
AMLPipelineBase.Utils.infer_eltype
— Methodinfer_eltype(vector::Vector)
Returns element type of vector unless it is Any. If Any, returns the most specific type that can be inferred from the vector elements.
vector
: vector to infer element type on
Returns: inferred element type
AMLPipelineBase.Utils.kfold
— Methodkfold(num_instances, num_partitions)
Returns k-fold partitions.
num_instances
: total number of instancesnum_partitions
: number of partitions required
Returns: training set partition.
AMLPipelineBase.Utils.nested_dict_merge
— Methodnested_dict_merge(first::Dict, second::Dict)
Second nested dictionary is merged into first.
If a second dictionary's value as well as the first are both dictionaries, then a merge is conducted between the two inner dictionaries. Otherwise the second's value overrides the first.
first
: first nested dictionarysecond
: second nested dictionary
Returns: merged nested dictionary
AMLPipelineBase.Utils.nested_dict_set!
— Methodnested_dict_set!(dict::Dict, keys::Array{T, 1}, value) where {T}
Set value in a nested dictionary.
dict
: nested dictionary to assign valuekeys
: keys to access nested dictionaries in sequencevalue
: value to assign
AMLPipelineBase.Utils.nested_dict_to_tuples
— Methodnested_dict_to_tuples(dict::Dict)
Converts nested dictionary to list of tuples
dict
: dictionary that can have other dictionaries as values
Returns: list where elements are ([outer-key, inner-key, ...], value)
AMLPipelineBase.Utils.score
— Methodscore(metric::Symbol, actual::Vector, predicted::Vector)
Score learner predictions against ground truth values.
Available metrics:
:accuracy
metric
: metric to assess withactual
: ground truth valuespredicted
: predicted values
Returns: score of learner