Index

Descriptions

AMLPipelineBase.AbsTypes.fit!Method

fit!(mc::Machine, input::DataFrame, output::Vector)

Generic trait to be overloaded by different subtypes of Machine. Multiple dispatch for fit!.

source
AMLPipelineBase.BaseFilters.ImputerType
Imputer(
   Dict(
      # Imputation strategy.
      # Statistic that takes a vector such as mean or median.
      :strategy => mean
   )
)

Imputes NaN values from Float64 features.

Implements fit! and transform.

source
AMLPipelineBase.BaseFilters.OneHotEncoderType
OneHotEncoder(Dict(
   # Nominal columns
   :nominal_columns => Int[],

   # Nominal column values map. Key is column index, value is list of
   # possible values for that column.
   :nominal_column_values_map => Dict{Int,Any}()
))

Transforms myinstances with nominal features into one-hot form and coerces the instance matrix to be of element type Float64.

Implements fit! and transform.

source
AMLPipelineBase.BaseFilters.WrapperType
Wrapper(
   default_args = Dict(
      :name => "ohe-wrapper",
      # Transformer to call.
      :transformer => OneHotEncoder(),
      # Transformer args.
      :transformer_args => Dict()
   )
)

Wraps around a transformer.

Implements fit! and transform.

source
AMLPipelineBase.BaseFilters.createtransformerFunction
createtransformer(prototype::Transformer, args=Dict())

Create transformer

  • prototype: prototype transformer to base new transformer on
  • options: additional options to override prototype's options

Returns: new transformer.

source
AMLPipelineBase.Pipelines.ComboPipelineType
ComboPipeline(machs::Vector{T}) where {T<:Machine}

Feature union pipeline which iteratively calls fit_transform of each element and concatenate their output into one dataframe.

Implements fit! and transform!.

source
AMLPipelineBase.Pipelines.PipelineType
Pipeline(machs::Vector{<:Machine},args::Dict=Dict())

Linear pipeline which iteratively calls and passes the result of fit_transform to the succeeding elements in the pipeline.

Implements fit! and transform!.

source
AMLPipelineBase.NARemovers.NARemoverType
NARemover(
  Dict(
    :name => "nadetect",
    :acceptance => 0.10 # tolerable NAs percentage
  )
)

Removes columns with NAs greater than acceptance rate. This assumes that it processes columns of features. The output column should not be part of input to avoid it being excluded if it fails the acceptance critera.

Implements fit! and transform!.

source
AMLPipelineBase.AbsTypes.fit!Function
fit!(nad::NARemover,features::DataFrame,labels::Vector=[])

Checks and exit of df is empty

Arguments

  • nad::NARemover: custom type
  • features::DataFrame: input
  • labels::Vector=[]:
source
AMLPipelineBase.AbsTypes.transform!Method
transform!(nad::NARemover,nfeatures::DataFrame)

Removes columns with NAs greater than acceptance rate.

Arguments

  • nad::NARemover: custom type
  • nfeatures::DataFrame: input
source
AMLPipelineBase.DecisionTreeLearners.PrunedTreeType
PrunedTree(
  Dict(
    :purity_threshold => 1.0,
    :max_depth => -1,
    :min_samples_leaf => 1,
    :min_samples_split => 2,
    :min_purity_increase => 0.0
  )
)

Decision tree classifier. See DecisionTree.jl's documentation

Hyperparmeters:

  • :purity_threshold => 1.0 (merge leaves having >=thresh combined purity)
  • :max_depth => -1 (maximum depth of the decision tree)
  • :min_samples_leaf => 1 (the minimum number of samples each leaf needs to have)
  • :min_samples_split => 2 (the minimum number of samples in needed for a split)
  • :min_purity_increase => 0.0 (minimum purity needed for a split)

Implements fit!, transform!

source
AMLPipelineBase.DecisionTreeLearners.RandomForestType
RandomForest(
  Dict(
    :output => :class,
    :num_subfeatures => 0,
    :num_trees => 10,
    :partial_sampling => 0.7,
    :max_depth => -1
  )
)

Random forest classification. See DecisionTree.jl's documentation

Hyperparmeters:

  • :num_subfeatures => 0 (number of features to consider at random per split)
  • :num_trees => 10 (number of trees to train)
  • :partial_sampling => 0.7 (fraction of samples to train each tree on)
  • :max_depth => -1 (maximum depth of the decision trees)
  • :min_samples_leaf => 1 (the minimum number of samples each leaf needs to have)
  • :min_samples_split => 2 (the minimum number of samples in needed for a split)
  • :min_purity_increase => 0.0 (minimum purity needed for a split)

Implements fit!, transform!

source
AMLPipelineBase.EnsembleMethods.BestLearnerType
BestLearner(
   Dict(
      # Output to train against
      # (:class).
      :output => :class,
      # Function to return partitions of instance indices.
      :partition_generator => (instances, labels) -> kfold(size(instances, 1), 5),
      # Function that selects the best learner by index.
      # Arg learner_partition_scores is a (learner, partition) score matrix.
      :selection_function => (learner_partition_scores) -> findmax(mean(learner_partition_scores, dims=2))[2],      
      # Score type returned by score() using respective output.
      :score_type => Real,
      # Candidate learners.
      :learners => [PrunedTree(), Adaboost(), RandomForest()],
      # Options grid for learners, to search through by BestLearner.
      # Format is [learner_1_options, learner_2_options, ...]
      # where learner_options is same as a learner's options but
      # with a list of values instead of scalar.
      :learner_options_grid => nothing
   )
)

Selects best learner from the set by performing a grid search on learners if grid option is indicated.

source
AMLPipelineBase.EnsembleMethods.StackEnsembleType
StackEnsemble(
   Dict(    
      # Output to train against
      # (:class).
      :output => :class,
      # Set of learners that produce feature space for stacker.
      :learners => [PrunedTree(), Adaboost(), RandomForest()],
      # Machine learner that trains on set of learners' outputs.
      :stacker => RandomForest(),
      # Proportion of training set left to train stacker itself.
      :stacker_training_proportion => 0.3,
      # Provide original features on top of learner outputs to stacker.
      :keep_original_features => false
   )
)

An ensemble where a 'stack' of learners is used for training and prediction.

source
AMLPipelineBase.EnsembleMethods.VoteEnsembleType
VoteEnsemble(
   Dict( 
      # Output to train against
      # (:class).
      :output => :class,
      # Learners in voting committee.
      :learners => [PrunedTree(), Adaboost(), RandomForest()]
   )
)

Set of machine learners employing majority vote to decide prediction.

Implements: fit!, transform!

source
AMLPipelineBase.AbsTypes.fit!Method
fit!(bls::BestLearner, instances::DataFrame, labels::Vector)

Training phase:

  • obtain learners as is if grid option is not present
  • generate learners if grid option is present
  • foreach prototype learner, generate learners with specific options found in grid
  • generate partitions
  • train each learner on each partition and obtain validation output
source
AMLPipelineBase.AbsTypes.fit!Method
fit!(se::StackEnsemble, instances::DataFrame, labels::Vector)

Training phase of the stack of learners.

  • perform holdout to obtain indices for
  • partition learner and stacker training sets
  • partition training set for learners and stacker
  • train all learners
  • train stacker on learners' outputs
  • build final model from the trained learners
source
AutoMLPipeline.SKLearners.SKLearnerType
SKLearner(learner::String, args::Dict=Dict())

A Scikitlearn wrapper to load the different machine learning models. Invoking sklearners() will list the available learners. Please consult Scikitlearn documentation for arguments to pass.

Implements fit! and transform!.

source
AutoMLPipeline.SKPreprocessors.SKPreprocessorType
SKPreprocessor(preprocessor::String,args::Dict=Dict())

A wrapper for Scikitlearn preprocessor functions. Invoking skpreprocessors() will list the acceptable and supported functions. Please check Scikitlearn documentation for arguments to pass.

Implements fit! and transform!.

source
AMLPipelineBase.CrossValidators.crossvalidateMethod
crossvalidate(pl::Machine,X::DataFrame,Y::Vector,sfunc::String="balanced_accuracy_score",nfolds=10)

Runs K-fold cross-validation using balanced accuracy as the default. It support the following metrics for classification:

  • "accuracy_score"
  • "balancedaccuracyscore"
  • "cohenkappascore"
  • "jaccard_score"
  • "matthews_corrcoef"
  • "hamming_loss"
  • "zerooneloss"
  • "f1_score"
  • "precision_score"
  • "recall_score"

and the following metrics for regression:

  • "meansquarederror"
  • "meansquaredlog_error"
  • "medianabsoluteerror"
  • "r2_score"
  • "max_error"
  • "explainedvariancescore"
source
AMLPipelineBase.Utils.createmachineFunction
createmachine(prototype::Machine, options=nothing)

Create machine

  • prototype: prototype machine to base new machine on
  • args: additional options to override prototype's options

Returns: new machine

source
AMLPipelineBase.Utils.find_catnum_columnsFunction
find_catnum_columns(instances::DataFrame,maxuniqcat::Int=0)

Finds all categorial and numerical columns. Categorical columns are those that do not have Real type nor do all their elements correspond to Real. Also, columns with size of unique instances are less than maxuniqcat are considered categorical.

source
AMLPipelineBase.Utils.holdoutMethod
holdout(n, right_prop)

Holdout method that partitions a collection into two partitions.

  • n: Size of collection to partition
  • right_prop: Percentage of collection placed in right partition

Returns: two partitions of indices, left and right

source
AMLPipelineBase.Utils.infer_eltypeMethod
infer_eltype(vector::Vector)

Returns element type of vector unless it is Any. If Any, returns the most specific type that can be inferred from the vector elements.

  • vector: vector to infer element type on

Returns: inferred element type

source
AMLPipelineBase.Utils.kfoldMethod
kfold(num_instances, num_partitions)

Returns k-fold partitions.

  • num_instances: total number of instances
  • num_partitions: number of partitions required

Returns: training set partition.

source
AMLPipelineBase.Utils.nested_dict_mergeMethod
nested_dict_merge(first::Dict, second::Dict)

Second nested dictionary is merged into first.

If a second dictionary's value as well as the first are both dictionaries, then a merge is conducted between the two inner dictionaries. Otherwise the second's value overrides the first.

  • first: first nested dictionary
  • second: second nested dictionary

Returns: merged nested dictionary

source
AMLPipelineBase.Utils.nested_dict_set!Method
nested_dict_set!(dict::Dict, keys::Array{T, 1}, value) where {T}

Set value in a nested dictionary.

  • dict: nested dictionary to assign value
  • keys: keys to access nested dictionaries in sequence
  • value: value to assign
source
AMLPipelineBase.Utils.nested_dict_to_tuplesMethod
nested_dict_to_tuples(dict::Dict)

Converts nested dictionary to list of tuples

  • dict: dictionary that can have other dictionaries as values

Returns: list where elements are ([outer-key, inner-key, ...], value)

source
AMLPipelineBase.Utils.scoreMethod
score(metric::Symbol, actual::Vector, predicted::Vector)

Score learner predictions against ground truth values.

Available metrics:

  • :accuracy

  • metric: metric to assess with

  • actual: ground truth values

  • predicted: predicted values

Returns: score of learner

source