Preprocessors

The design of AMLP is to allow easy extensibility of its processing elements. The choice of Scikitlearn preprocessors in this initial release is more for demonstration purposes to get a good narrative of how the various parts of AMLP fits together to solve a particular problem. AMLP has been tested to run with a mixture of transformers and filters from Julia, Scikitlearn, and R's caret in the same pipeline without issues as long as the interfaces are properly implemented for each wrapped functions. As there are loads of preprocessing techniques available, the user is encouraged to create their own wrappers of their favorite implementations to allow them interoperability with the existing AMLP implementations.

SKPreprocessor Structure

    SKPreprocessor(args=Dict(
       :name => "skprep",
       :preprocessor => "PCA",
       :impl_args => Dict()
      )
    )

Helper Function:
   SKPreprocessor(preprocessor::String,args::Dict=Dict())

SKPreprocessor maintains a dictionary of pre-processors and dynamically load them based on the :preprocessor name passed during its initialization. The :impl_args is a dictionary of parameters to be passed as arguments to the Scikitlearn preprocessor.

Note

Please consult the documentation in Scikitlearn for what arguments to pass relative to the chosen preprocessor.

Let's try PCA with 2 components decomposition and random state initialized at 0.

using AutoMLPipeline

iris = getiris()
X=iris[:,1:4]

pca = SKPreprocessor("PCA",Dict(:n_components=>2,:random_state=>0))
respca = fit_transform!(pca,X)
julia> first(respca,5)5×4 DataFrame
 Row │ x1        x2         x3          x4
     │ Float64   Float64    Float64     Float64
─────┼──────────────────────────────────────────────
   1 │ -2.68413   0.319397  -0.0279148  -0.00226244
   2 │ -2.71414  -0.177001  -0.210464   -0.0990266
   3 │ -2.88899  -0.144949   0.0179003  -0.0199684
   4 │ -2.74534  -0.318299   0.0315594   0.0755758
   5 │ -2.72872   0.326755   0.0900792   0.0612586

Let's try ICA with 3 components decomposition and whitening:

ica = SKPreprocessor("FastICA",Dict(:n_components=>3,:whiten=>true))
resica = fit_transform!(ica,X)
julia> first(resica,5)5×4 DataFrame
 Row │ x1          x2         x3        x4
     │ Float64     Float64    Float64   Float64
─────┼────────────────────────────────────────────
   1 │ -0.0297546  -0.261483  -1.39244   0.375044
   2 │  0.0816279  -0.385988  -1.32946  -0.972173
   3 │  0.159258    0.357134  -1.34926  -0.349151
   4 │ -0.311684    0.880533  -1.20489  -0.38025
   5 │ -0.108223    0.200341  -1.37284   0.736304

To get a listing of available preprocessors, use the skpreprocessors() function:

julia> skpreprocessors()syntax: SKPreprocessor(name::String, args::Dict=Dict())
where *name* can be one of:

Binarizer chi2 dict_learning dict_learning_online DictionaryLearning f_classif f_regression FactorAnalysis FastICA fastica FunctionTransformer GenericUnivariateSelect IncrementalPCA KBinsDiscretizer KernelCenterer KernelPCA LabelBinarizer LabelEncoder LatentDirichletAllocation MaxAbsScaler MiniBatchDictionaryLearning MiniBatchSparsePCA MinMaxScaler MissingIndicator MultiLabelBinarizer mutual_info_classif mutual_info_regression NMF non_negative_factorization Normalizer OneHotEncoder OrdinalEncoder PCA PolynomialFeatures PowerTransformer QuantileTransformer RFE RFECV RobustScaler SelectFdr SelectFpr SelectFromModel SelectFwe SelectKBest SelectPercentile SimpleImputer sparse_encode SparseCoder SparsePCA StandardScaler TruncatedSVD VarianceThreshold 

and *args* are the corresponding preprocessor's initial parameters.
Note: Please consult Scikitlearn's online help for more details about the preprocessor's arguments.