Preprocessors
The design of AMLP is to allow easy extensibility of its processing elements. The choice of Scikitlearn preprocessors in this initial release is more for demonstration purposes to get a good narrative of how the various parts of AMLP fits together to solve a particular problem. AMLP has been tested to run with a mixture of transformers and filters from Julia, Scikitlearn, and R's caret in the same pipeline without issues as long as the interfaces are properly implemented for each wrapped functions. As there are loads of preprocessing techniques available, the user is encouraged to create their own wrappers of their favorite implementations to allow them interoperability with the existing AMLP implementations.
SKPreprocessor Structure
SKPreprocessor(args=Dict(
:name => "skprep",
:preprocessor => "PCA",
:impl_args => Dict()
)
)
Helper Function:
SKPreprocessor(preprocessor::String,args::Dict=Dict())
SKPreprocessor maintains a dictionary of pre-processors and dynamically load them based on the :preprocessor
name passed during its initialization. The :impl_args
is a dictionary of parameters to be passed as arguments to the Scikitlearn preprocessor.
Please consult the documentation in Scikitlearn for what arguments to pass relative to the chosen preprocessor.
Let's try PCA with 2 components decomposition and random state initialized at 0.
using AutoMLPipeline
iris = getiris()
X=iris[:,1:4]
pca = SKPreprocessor("PCA",Dict(:n_components=>2,:random_state=>0))
respca = fit_transform!(pca,X)
julia> first(respca,5)
5×4 DataFrame
Row │ x1 x2 x3 x4
│ Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────
1 │ -2.68413 0.319397 -0.0279148 -0.00226244
2 │ -2.71414 -0.177001 -0.210464 -0.0990266
3 │ -2.88899 -0.144949 0.0179003 -0.0199684
4 │ -2.74534 -0.318299 0.0315594 0.0755758
5 │ -2.72872 0.326755 0.0900792 0.0612586
Let's try ICA with 3 components decomposition and whitening:
ica = SKPreprocessor("FastICA",Dict(:n_components=>3,:whiten=>true))
resica = fit_transform!(ica,X)
julia> first(resica,5)
5×4 DataFrame
Row │ x1 x2 x3 x4
│ Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────────
1 │ -0.0304769 0.021382 -0.11374 0.00153624
2 │ 0.0796354 0.0308775 -0.108533 -0.00685741
3 │ 0.0284692 -0.0295572 -0.110055 -0.0131338
4 │ 0.0305044 -0.0720254 -0.0983933 0.0256809
5 │ -0.0602286 -0.0161133 -0.112131 0.00800787
To get a listing of available preprocessors, use the skpreprocessors()
function:
julia> skpreprocessors()
syntax: SKPreprocessor(name::String, args::Dict=Dict())
where *name* can be one of:
Binarizer chi2 dict_learning dict_learning_online DictionaryLearning f_classif f_regression FactorAnalysis FastICA fastica FunctionTransformer GenericUnivariateSelect IncrementalPCA KBinsDiscretizer KernelCenterer KernelPCA LabelBinarizer LabelEncoder LatentDirichletAllocation MaxAbsScaler MiniBatchDictionaryLearning MiniBatchSparsePCA MinMaxScaler MissingIndicator MultiLabelBinarizer mutual_info_classif mutual_info_regression NMF non_negative_factorization Normalizer OneHotEncoder OrdinalEncoder PCA PolynomialFeatures PowerTransformer QuantileTransformer RFE RFECV RobustScaler SelectFdr SelectFpr SelectFromModel SelectFwe SelectKBest SelectPercentile SimpleImputer sparse_encode SparseCoder SparsePCA StandardScaler TruncatedSVD VarianceThreshold
and *args* are the corresponding preprocessor's initial parameters.
Note: Please consult Scikitlearn's online help for more details about the preprocessor's arguments.