Types and Functions
Index
AMLPipelineBase.BaseFilters.Imputer
AMLPipelineBase.BaseFilters.OneHotEncoder
AMLPipelineBase.BaseFilters.Wrapper
AMLPipelineBase.BaseFilters.createtransformer
AMLPipelineBase.BaselineModels.Baseline
AMLPipelineBase.BaselineModels.Baseline
AMLPipelineBase.BaselineModels.Identity
AMLPipelineBase.BaselineModels.Identity
AMLPipelineBase.CrossValidators.crossvalidate
AMLPipelineBase.DecisionTreeLearners.Adaboost
AMLPipelineBase.DecisionTreeLearners.PrunedTree
AMLPipelineBase.DecisionTreeLearners.RandomForest
AMLPipelineBase.EnsembleMethods.BestLearner
AMLPipelineBase.EnsembleMethods.StackEnsemble
AMLPipelineBase.EnsembleMethods.VoteEnsemble
AMLPipelineBase.FeatureSelectors.CatFeatureSelector
AMLPipelineBase.FeatureSelectors.CatNumDiscriminator
AMLPipelineBase.FeatureSelectors.CatNumDiscriminator
AMLPipelineBase.FeatureSelectors.FeatureSelector
AMLPipelineBase.FeatureSelectors.FeatureSelector
AMLPipelineBase.FeatureSelectors.FeatureSelector
AMLPipelineBase.FeatureSelectors.NumFeatureSelector
TSML.MLBaseWrapper.StandardScaler
TSML.MLBaseWrapper.Standardize
TSML.Monotonicers.Monotonicer
TSML.Normalizers.Normalizer
TSML.Outliernicers.Outliernicer
AMLPipelineBase.Pipelines.ComboPipeline
AMLPipelineBase.Pipelines.Pipeline
AMLPipelineBase.Pipelines.Pipeline
AMLPipelineBase.Pipelines.Pipeline
TSML.Plotters.Plotter
TSML.Statifiers.Statifier
TSML.TSClassifiers.TSClassifier
TSML.ValDateFilters.CSVDateValReader
TSML.ValDateFilters.CSVDateValWriter
TSML.ValDateFilters.DateValLinearImputer
TSML.ValDateFilters.DateValMultiNNer
TSML.ValDateFilters.DateValNNer
TSML.ValDateFilters.DateValgator
TSML.ValDateFilters.DateValizer
TSML.ValDateFilters.Dateifier
TSML.ValDateFilters.Matrifier
Descriptions
AMLPipelineBase.BaseFilters.Imputer
— TypeImputer(
Dict(
# Imputation strategy.
# Statistic that takes a vector such as mean or median.
:strategy => mean
)
)
Imputes NaN values from Float64 features.
Implements fit!
and transform
.
AMLPipelineBase.BaseFilters.OneHotEncoder
— TypeOneHotEncoder(Dict(
# Nominal columns
:nominal_columns => Int[],
# Nominal column values map. Key is column index, value is list of
# possible values for that column.
:nominal_column_values_map => Dict{Int,Any}()
))
Transforms myinstances with nominal features into one-hot form and coerces the instance matrix to be of element type Float64.
Implements fit!
and transform
.
AMLPipelineBase.BaseFilters.Wrapper
— TypeWrapper(
default_args = Dict(
:name => "ohe-wrapper",
# Transformer to call.
:transformer => OneHotEncoder(),
# Transformer args.
:transformer_args => Dict()
)
)
Wraps around a transformer.
Implements fit!
and transform
.
AMLPipelineBase.BaseFilters.createtransformer
— Functioncreatetransformer(prototype::Transformer, args=Dict())
Create transformer
prototype
: prototype transformer to base new transformer onoptions
: additional options to override prototype's options
Returns: new transformer.
AMLPipelineBase.BaselineModels.Baseline
— TypeBaseline(
default_args = Dict(
:name => "baseline",
:output => :class,
:strat => mode
)
)
Baseline model that returns the mode during classification.
AMLPipelineBase.BaselineModels.Baseline
— MethodBaseline(name::String,opt...)
Helper function
AMLPipelineBase.BaselineModels.Identity
— TypeIdentity(args=Dict())
Returns the input as output.
AMLPipelineBase.BaselineModels.Identity
— MethodIdentity(name::String,opt...)
Helper function
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(idy::Identity,x::DataFrame,y::Vector)
Does nothing.
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(bsl::Baseline,x::DataFrame,y::Vector)
Get the mode of the training data.
AMLPipelineBase.AbsTypes.transform!
— Functiontransform!(idy::Identity,x::DataFrame)
Return the input as output.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(bsl::Baseline,x::DataFrame)
Return the mode in classification.
AMLPipelineBase.CrossValidators.crossvalidate
— Methodcrossvalidate(pl::Machine,X::DataFrame,Y::Vector,pfunc::Function,kfolds=10)
Run K-fold crossvalidation where:
pfunc
is a performance metricX
andY
are input and target
AMLPipelineBase.DecisionTreeLearners.Adaboost
— TypeAdaboost(
Dict(
:output => :class,
:num_iterations => 7
)
)
Adaboosted decision tree stumps. See DecisionTree.jl's documentation
Hyperparameters:
:num_iterations
=> 7 (number of iterations of AdaBoost)
Implements fit!
, transform!
AMLPipelineBase.DecisionTreeLearners.PrunedTree
— TypePrunedTree(
Dict(
:purity_threshold => 1.0,
:max_depth => -1,
:min_samples_leaf => 1,
:min_samples_split => 2,
:min_purity_increase => 0.0
)
)
Decision tree classifier. See DecisionTree.jl's documentation
Hyperparmeters:
:purity_threshold
=> 1.0 (merge leaves having >=thresh combined purity):max_depth
=> -1 (maximum depth of the decision tree):min_samples_leaf
=> 1 (the minimum number of samples each leaf needs to have):min_samples_split
=> 2 (the minimum number of samples in needed for a split):min_purity_increase
=> 0.0 (minimum purity needed for a split)
Implements fit!
, transform!
AMLPipelineBase.DecisionTreeLearners.RandomForest
— TypeRandomForest(
Dict(
:output => :class,
:num_subfeatures => 0,
:num_trees => 10,
:partial_sampling => 0.7,
:max_depth => -1
)
)
Random forest classification. See DecisionTree.jl's documentation
Hyperparmeters:
:num_subfeatures
=> 0 (number of features to consider at random per split):num_trees
=> 10 (number of trees to train):partial_sampling
=> 0.7 (fraction of samples to train each tree on):max_depth
=> -1 (maximum depth of the decision trees):min_samples_leaf
=> 1 (the minimum number of samples each leaf needs to have):min_samples_split
=> 2 (the minimum number of samples in needed for a split):min_purity_increase
=> 0.0 (minimum purity needed for a split)
Implements fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(adaboost::Adaboost, features::DataFrame, labels::Vector)
Optimize the hyperparameters of Adaboost
instance.
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(tree::PrunedTree, features::DataFrame, labels::Vector)
Optimize the hyperparameters of PrunedTree
instance.
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(forest::RandomForest, features::DataFrame, labels::Vector)
Optimize the parameters of the RandomForest
instance.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(adaboost::Adaboost, features::DataFrame)
Predict using the optimized hyperparameters of the trained Adaboost
instance.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(ptree::PrunedTree, features::DataFrame)
Predict using the optimized hyperparameters of the trained PrunedTree
instance.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(forest::RandomForest, features::DataFrame)
Predict using the optimized hyperparameters of the trained RandomForest
instance.
AMLPipelineBase.EnsembleMethods.BestLearner
— TypeBestLearner(
Dict(
# Output to train against
# (:class).
:output => :class,
# Function to return partitions of instance indices.
:partition_generator => (instances, labels) -> kfold(size(instances, 1), 5),
# Function that selects the best learner by index.
# Arg learner_partition_scores is a (learner, partition) score matrix.
:selection_function => (learner_partition_scores) -> findmax(mean(learner_partition_scores, dims=2))[2],
# Score type returned by score() using respective output.
:score_type => Real,
# Candidate learners.
:learners => [PrunedTree(), Adaboost(), RandomForest()],
# Options grid for learners, to search through by BestLearner.
# Format is [learner_1_options, learner_2_options, ...]
# where learner_options is same as a learner's options but
# with a list of values instead of scalar.
:learner_options_grid => nothing
)
)
Selects best learner from the set by performing a grid search on learners if grid option is indicated.
AMLPipelineBase.EnsembleMethods.StackEnsemble
— TypeStackEnsemble(
Dict(
# Output to train against
# (:class).
:output => :class,
# Set of learners that produce feature space for stacker.
:learners => [PrunedTree(), Adaboost(), RandomForest()],
# Machine learner that trains on set of learners' outputs.
:stacker => RandomForest(),
# Proportion of training set left to train stacker itself.
:stacker_training_proportion => 0.3,
# Provide original features on top of learner outputs to stacker.
:keep_original_features => false
)
)
An ensemble where a 'stack' of learners is used for training and prediction.
AMLPipelineBase.EnsembleMethods.VoteEnsemble
— TypeVoteEnsemble(
Dict(
# Output to train against
# (:class).
:output => :class,
# Learners in voting committee.
:learners => [PrunedTree(), Adaboost(), RandomForest()]
)
)
Set of machine learners employing majority vote to decide prediction.
Implements: fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(bls::BestLearner, instances::DataFrame, labels::Vector)
Training phase:
- obtain learners as is if grid option is not present
- generate learners if grid option is present
- foreach prototype learner, generate learners with specific options found in grid
- generate partitions
- train each learner on each partition and obtain validation output
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(se::StackEnsemble, instances::DataFrame, labels::Vector)
Training phase of the stack of learners.
- perform holdout to obtain indices for
- partition learner and stacker training sets
- partition training set for learners and stacker
- train all learners
- train stacker on learners' outputs
- build final model from the trained learners
AMLPipelineBase.AbsTypes.fit!
— Methodfit!(ve::VoteEnsemble, instances::DataFrame, labels::Vector)
Training phase of the ensemble.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(bls::BestLearner, instances::DataFrame)
Choose the best learner based on cross-validation results and use it for prediction.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(se::StackEnsemble, instances::DataFrame)
Build stacker instances and predict
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(ve::VoteEnsemble, instances::DataFrame)
Prediction phase of the ensemble.
AMLPipelineBase.FeatureSelectors.CatFeatureSelector
— TypeCatFeatureSelector(Dict(:name => "catf"))
Automatically extract categorical columns based on inferred element types.
Implements fit!
and transform!
.
AMLPipelineBase.FeatureSelectors.CatNumDiscriminator
— TypeCatNumDiscriminator(
Dict(
:name => "catnumdisc",
:maxcategories => 24
)
)
Transform numeric columns to string (as categories) if the count of their unique elements <= maxcategories.
Implements fit!
and transform!
.
AMLPipelineBase.FeatureSelectors.CatNumDiscriminator
— MethodCatNumDiscriminator(maxcat::Int)
Helper function for CatNumDiscriminator.
AMLPipelineBase.FeatureSelectors.FeatureSelector
— TypeFeatureSelector(
Dict(
:name => "featureselector",
:columns => [col1, col2, ...]
)
)
Returns a dataframe of the selected columns.
Implements fit!
and transform!
.
AMLPipelineBase.FeatureSelectors.FeatureSelector
— MethodFeatureSelector(cols::Vararg{Int})
Helper function for FeatureSelector.
AMLPipelineBase.FeatureSelectors.FeatureSelector
— MethodFeatureSelector(cols::Vector{Int})
Helper function for FeatureSelector.
AMLPipelineBase.FeatureSelectors.NumFeatureSelector
— TypeNumFeatureSelector(Dict(:name=>"numfeatsel"))
Automatically extracts numeric features based on their inferred element types.
Implements fit!
and transform!
.
TSML.MLBaseWrapper.StandardScaler
— TypeStandardScaler(
Dict(
:impl_args => Dict(
:center => true,
:scale => true
)
)
)
Standardizes each feature using (X - mean) / stddev. Will produce NaN if standard deviation is zero.
TSML.MLBaseWrapper.Standardize
— TypeStandardize(d::Int, m::Vector{Float64}, s::Vector{Float64})
Standardization type.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(st::StandardScaler, features::T, labels::Vector=[]) where {T<:Union{Vector,Matrix,DataFrame}}
Compute the parameters to center and scale.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(st::StandardScaler, features::T)
Apply the computed parameters for centering and scaling to new data.
TSML.Monotonicers.Monotonicer
— TypeMonotonicer()
Monotonic filter to detect and normalize two types of dataset:
- daily monotonic
- entirely non-decreasing/non-increasing data
Example:
fname = joinpath(dirname(pathof(TSML)),"../data/testdata.csv")
csvfilter = CSVDateValReader(Dict(:filename=>fname,:dateformat=>"dd/mm/yyyy HH:MM"))
valgator = DateValgator(Dict(:dateinterval=>Dates.Hour(1)))
valnner = DateValNNer(Dict(:dateinterval=>Dates.Hour(1)))
stfier = Statifier(Dict(:processmissing=>true))
mono = Monotonicer(Dict())
mypipeline = @pipeline csvfilter |> valgator |> mono |> stfier
result = fit_transform!(mypipeline)
Implements: fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(st::Monotonicer,features::T, labels::Vector=[])
A function that checks if features
are two-column data of Dates and Values
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(st::Monotonicer, features::T) where {T<:Union{Vector,Matrix,DataFrame}}
Normalize monotonic or daily monotonic data by taking the diffs and counting the flips.
TSML.Normalizers.Normalizer
— TypeNormalizer(Dict(
:method => :zscore
))
Transforms continuous features into normalized form such as zscore, unitrange, square-root, log, pca, ppca with parameter:
:method
=>:zscore
or:unitrange
or:sqrt
or:log
orpca
orppca
orfa
:zscore
=> standard z-score with centering and scaling:unitrange
=> unit range normalization with centering and scaling:sqrt
=> square-root transform:pca
=> principal component analysis transform:ppca
=> probabilistic pca:fa
=> factor analysis:log
=> log transform
Example:
function generatedf()
Random.seed!(123)
gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1)
gval1 = rand(length(gdate))
gval2 = rand(length(gdate))
gval3 = rand(length(gdate))
X = DataFrame(Date=gdate,Value1=gval1,Value2=gval2,Value3=gval3)
X
end
X = generatedf()
norm = Normalizer(Dict(:method => :zscore))
fit!(norm,X)
res=transform!(norm,X)
Implements: fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(st::Statifier, features::T, labels::Vector=[])
Validate argument features other than dates are continuous.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(norm::Normalizer, features::T) where {T<:Union{Vector,Matrix,DataFrame}}
Compute statistics.
TSML.Outliernicers.Outliernicer
— TypeOutliernicer(Dict(
:dateinterval => Dates.Hour(1),
:nnsize => 1,
:missdirection => :symmetric,
:scale => 1.25
))
Detects outliers below or above (median-scaleiqr,median+scaleiqr) and calls DateValNNer to replace them with nearest neighbors.
Example:
fname = joinpath(dirname(pathof(TSML)),"../data/testdata.csv")
csvfilter = CSVDateValReader(Dict(:filename=>fname,:dateformat=>"dd/mm/yyyy HH:MM"))
valgator = DateValgator(Dict(:dateinterval=>Dates.Hour(1)))
valnner = DateValNNer(Dict(:dateinterval=>Dates.Hour(1)))
stfier = Statifier(Dict(:processmissing=>true))
mono = Monotonicer(Dict())
outliernicer = Outliernicer(Dict(:dateinterval=>Dates.Hour(1)))
mpipeline = @pipeline csvfilter |> valgator |> mono |> valnner |> outliernicer |> stfier
results = fit_transform!(mpipeline)
Implements: fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(st::Outliernicer, features::T, labels::Vector=[])
Check that features
are two-colum data.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(st::Outliernicer, features::T) where {T<:Union{Vector,Matrix,DataFrame}}
Locate outliers based on IQR factor and calls DateValNNer to replace them with nearest neighbors.
AMLPipelineBase.Pipelines.ComboPipeline
— TypeComboPipeline(machs::Vector{T}) where {T<:Machine}
Feature union pipeline which iteratively calls fit_transform
of each element and concatenate their output into one dataframe.
Implements fit!
and transform!
.
AMLPipelineBase.Pipelines.Pipeline
— TypePipeline(machs::Vector{<:Machine},args::Dict=Dict())
Linear pipeline which iteratively calls and passes the result of fit_transform
to the succeeding elements in the pipeline.
Implements fit!
and transform!
.
AMLPipelineBase.Pipelines.Pipeline
— MethodPipeline(machs::Vararg{Machine})
Helper function for Pipeline structure.
AMLPipelineBase.Pipelines.Pipeline
— MethodPipeline(machs::Vector{<:Machine},args::Dict=Dict())
Helper function for Pipeline structure.
TSML.Plotters.Plotter
— TypePlotter( Dict( :interactive => false, :pdfoutput => true ) )
Plots a TS by default but performs interactive plotting if specified during instance creation.
:interactive
=> boolean to indicate whether to use interactive plotting withfalse
as default:pdfoutput
=> boolean to indicate whether ouput will be saved as pdf withfalse
as default
Example:
csvfilter = CSVDateValReader(Dict(:filename=>fname,:dateformat=>"dd/mm/yyyy HH:MM")) pltr = Plotter(Dict(:interactive => false))
mpipeline = @pipeline csvfilter |> pltr myplot = fit_transform!(mpipeline)
Implements: fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(pltr::Plotter, features::T, labels::Vector=[])
Check validity of features
: 2-column Date,Val data
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(pltr::Plotter, features::T)
Convert missing
into NaN
to allow plotting of discontinuities.
TSML.Statifiers.Statifier
— TypeStatifier(Dict(
:processmissing => true
))
Outputs summary statistics such as mean, median, quartile, entropy, kurtosis, skewness, etc. with parameter:
:processmissing
=>boolean
to indicate whether to includemissing
data stats.
Example:
dt=[missing;rand(1:10,3);missing;missing;missing;rand(1:5,3)]
dat = DataFrame(Date= DateTime(2017,12,31,1):Dates.Hour(1):DateTime(2017,12,31,10) |> collect,
Value = dt)
statfier = Statifier(Dict(:processmissing=>false))
fit!(statfier,dat)
results=transform!(statfier,dat)
Implements: fit!
, transform!
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(st::Statifier, features::T=[], labels::Vector=[])
Validate argument to make sure it's a 2-column format.
AMLPipelineBase.AbsTypes.transform!
— Functiontransform!(st::Statifier, features::T=[])
Compute statistics.
TSML.TSClassifiers.TSClassifier
— TypeTSClassifier(
Dict(
# training directory
:trdirectory => "",
:tstdirectory => "",
:modeldirectory => "",
:feature_range => 7:20,
:juliarfmodelname => "juliarfmodel.serialized",
# Output to train against
# (:class).
:output => :class,
# Options specific to this implementation.
:impl_args => Dict(
# Merge leaves having >= purity_threshold CombineMLd purity.
:purity_threshold => 1.0,
# Maximum depth of the decision tree (default: no maximum).
:max_depth => -1,
# Minimum number of samples each leaf needs to have.
:min_samples_leaf => 1,
# Minimum number of samples in needed for a split.
:min_samples_split => 2,
# Minimum purity needed for a split.
:min_purity_increase => 0.0
)
)
)
Given a bunch of time-series with specific types. Get the statistical features of each, use these as inputs to RF classifier with output as the TS type, train and test. Another option is to use these stat features for clustering and check cluster quality. If accuracy is poor, add more stat features and repeat same process as outlined for training and testing. Assume that each time-series is named based on their type which will be used as target output. For example, temperature time series will be named as temperature?.csv where ? is an integer. Loop over each file in a directory, get stat and record in a dictionary/dataframe, train/test. Default to using RandomForest for classification of data types.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(tsc::TSClassifier, features::T=[], labels::Vector=[])
Get the stats of each file, collect as dataframe, and train.
AMLPipelineBase.AbsTypes.transform!
— Functiontransform!(tsc::TSClassifier, features::T=[])
Apply the learned parameters to the new data.
TSML.ValDateFilters.CSVDateValReader
— TypeCSVDateValReader(
Dict(
:filename => "",
:dateformat => ""
)
)
Reads csv file and parse date using the given format.
:filename
=> complete path including filename of csv file:dateformat
=> date format to parse
Example:
inputfile =joinpath(dirname(pathof(TSML)),"../data/testdata.csv")
csvreader = CSVDateValReader(Dict(:filename=>inputfile,:dateformat=>"d/m/y H:M"))
fit!(csvreader)
df = transform!(csvreader)
# using pipeline workflow
filter1 = DateValgator()
filter2 = DateValNNer(Dict(:nnsize=>1))
mypipeline = @pipeline csvreader |> filter1 |> filter2
fit!(mypipeline)
res=transform!(mypipeline)
Implements: fit!
, transform!
TSML.ValDateFilters.CSVDateValWriter
— TypeCSVDateValWriter(
Dict(
:filename => "",
:dateformat => ""
)
)
Writes the time series dataframe into a file with the given date format.
Example:
inputfile =joinpath(dirname(pathof(TSML)),"../data/testdata.csv")
outputfile = joinpath("/tmp/test.csv")
csvreader = CSVDateValReader(Dict(:filename=>inputfile,:dateformat=>"d/m/y H:M"))
csvwtr = CSVDateValWriter(Dict(:filename=>outputfile,:dateformat=>"d/m/y H:M"))
filter1 = DateValgator()
filter2 = DateValNNer(Dict(:nnsize=>1))
mypipeline = @pipeline csvreader |> filter1 |> filter2 |> csvwtr
res=fit_transform!(mypipeline)
# read back what was written to validate
csvreader = CSVDateValReader(Dict(:filename=>outputfile,:dateformat=>"y-m-d HH:MM:SS"))
fit!(csvreader)
transform!(csvreader)
Implements: fit!
, transform!
TSML.ValDateFilters.DateValLinearImputer
— TypeDateValLinearImputer(
Dict(
:dateinterval => Dates.Hour(1),
)
)
Fills missings
by linear interpolation.
:dateinterval
=> time period to use for grouping,
Example:
Random.seed!(123)
gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1)
gval = Array{Union{Missing,Float64}}(rand(length(gdate)))
gmissing = 50000
gndxmissing = Random.shuffle(1:length(gdate))[1:gmissing]
X = DataFrame(Date=gdate,Value=gval)
X.Value[gndxmissing] .= missing
dnnr = DateValLinearImputer()
fit!(dnnr,X)
transform!(dnnr,X)
Implements: fit!
, transform!`
TSML.ValDateFilters.DateValMultiNNer
— TypeDateValMultiNNer(
Dict(
:type => :knn # :linear
:missdirection => :symmetric, #:reverse, # or :forward or :symmetric
:dateinterval => Dates.Hour(1),
:nnsize => 1,
:strict => false,
:aggregator => :median
)
)
Fills missings
with their nearest-neighbors. It assumes that first column is a Date class and the other columns are Union{Missings,Real}. It uses DateValNNer and DateValizer+Impute to process each numeric column concatendate with the Date column.
:type
=> type of imputation which can be a linear interpolation or nearest neighbor:missdirection
=> direction to fill missing data (:symmetric, :reverse, :forward):dateinterval
=> time period to use for grouping,:nnsize
=> neighborhood size,:strict
=> boolean value to indicate whether to be strict about replacement or not,- `:aggregator => function to aggregate based on date interval
Example:
Random.seed!(123)
gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1)
gval1 = Array{Union{Missing,Float64}}(rand(length(gdate)))
gval2 = Array{Union{Missing,Float64}}(rand(length(gdate)))
gval3 = Array{Union{Missing,Float64}}(rand(length(gdate)))
gmissing = 50000
gndxmissing1 = Random.shuffle(1:length(gdate))[1:gmissing]
gndxmissing2 = Random.shuffle(1:length(gdate))[1:gmissing]
gndxmissing3 = Random.shuffle(1:length(gdate))[1:gmissing]
X = DataFrame(Date=gdate,Temperature=gval1,Humidity=gval2,Ozone=gval3)
X.Temperature[gndxmissing1] .= missing
X.Humidity[gndxmissing2] .= missing
X.Ozone[gndxmissing3] .= missing
dnnr = DateValMultiNNer(Dict(
:type=>:linear,
:dateinterval=>Dates.Hour(1),
:nnsize=>10,
:missdirection => :symmetric,
:strict=>false,
:aggregator => :mean))
fit!(dnnr,X)
transform!(dnnr,X)
Implements: fit!
, transform!`
TSML.ValDateFilters.DateValNNer
— TypeDateValNNer(
Dict(
:missdirection => :symmetric, #:reverse, # or :forward or :symmetric
:dateinterval => Dates.Hour(1),
:nnsize => 1,
:strict => false,
:aggregator => :median
)
)
Fills missings
with their nearest-neighbors.
:missdirection
=> direction to fill missing data (:symmetric, :reverse, :forward):dateinterval
=> time period to use for grouping,:nnsize
=> neighborhood size,:strict
=> boolean value to indicate whether to be strict about replacement or not,- `:aggregator => function to aggregate based on date interval
Example:
Random.seed!(123)
gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1)
gval = Array{Union{Missing,Float64}}(rand(length(gdate)))
gmissing = 50000
gndxmissing = Random.shuffle(1:length(gdate))[1:gmissing]
X = DataFrame(Date=gdate,Value=gval)
X.Value[gndxmissing] .= missing
dnnr = DateValNNer(Dict(
:dateinterval=>Dates.Hour(1),
:nnsize=>10,
:missdirection => :symmetric,
:strict=>true,
:aggregator => :mean))
fit!(dnnr,X)
transform!(dnnr,X)
Implements: fit!
, transform!`
TSML.ValDateFilters.DateValgator
— TypeDateValgator(args=Dict())
Dict(
:dateinterval => Dates.Hour(1),
:aggregator => :median
)
)
Aggregates values based on date period specified.
Example:
# generate random values with missing data
Random.seed!(123)
gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1)
gval = Array{Union{Missing,Float64}}(rand(length(gdate)))
gmissing = 50000
gndxmissing = Random.shuffle(1:length(gdate))[1:gmissing]
X = DataFrame(Date=gdate,Value=gval)
X.Value[gndxmissing] .= missing
dtvlmean = DateValgator(Dict(
:dateinterval=>Dates.Hour(1),
:aggregator => :mean))
fit!(dtvlmean,X)
res = transform!(dtvlmean,X)
Implements: fit!
, transform!
TSML.ValDateFilters.DateValizer
— TypeDateValizer(
Dict(
:medians => DataFrame(),
:dateinterval => Dates.Hour(1)
)
)
Normalizes and cleans time series by replacing missings
with global medians computed based on time period groupings.
Example:
# generate random values with missing data
Random.seed!(123)
gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1)
gval = Array{Union{Missing,Float64}}(rand(length(gdate)))
gmissing = 50000
gndxmissing = Random.shuffle(1:length(gdate))[1:gmissing]
X = DataFrame(Date=gdate,Value=gval)
X.Value[gndxmissing] .= missing
dvzr = DateValizer(Dict(:dateinterval=>Dates.Hour(1)))
fit!(dvzr,X)
transform!(dvzr,X)
Implements: fit!
, transform!
TSML.ValDateFilters.Dateifier
— TypeDateifier(args=Dict())
Dict(
:ahead => 1,
:size => 7,
:stride => 1
)
)
Converts a 1-D date series into sliding window matrix for ML training
Example:
dtr = Dateifier(Dict())
lower = DateTime(2017,1,1)
upper = DateTime(2018,1,31)
dat=lower:Dates.Day(1):upper |> collect
vals = rand(length(dat))
x=DataFrame(Date=dat,Value=vals)
fit!(dtr,x)
res = transform!(dtr,x)
Implements: 'fit!
, transform!
TSML.ValDateFilters.Matrifier
— TypeMatrifier(Dict(
Dict(
:ahead => 1,
:size => 7,
:stride => 1,
)
)
Converts a 1-D timeseries into sliding window matrix for ML training:
:ahead
=> steps ahead to predict:size
=> size of sliding window:stride
=> amount of overlap in sliding window
Example:
mtr = Matrifier(Dict(:ahead=>24,:size=>24,:stride=>5))
lower = DateTime(2017,1,1)
upper = DateTime(2017,1,5)
dat=lower:Dates.Hour(1):upper |> collect
vals = 1:length(dat)
x = DataFrame(Date=dat,Value=vals)
fit!(mtr,x)
res = transform!(mtr,x)
Implements: fit!
, transform
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(dtr::Dateifier,xx::T,y::Vector=[])
Computes range of dates to be used during transform.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(dvzr::DateValizer,xx::T,y::Vector=[])
Validates input and computes global medians grouped by time period.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(csvrdr::CSVDateValReader,x::T=[],y::Vector=[])
Makes sure filename and dateformat are not empty strings.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(mtr::Matrifier,xx::T,y::Vector=Vector())
Checks and validate inputs are in correct structure
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(dnnr::DateValLinearImputer,xx::T,y::Vector=[])
Validates and checks arguments for errors.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(dvmr::DateValgator,xx::T,y::Vector=[])
Checks and validates arguments.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(dnnr::DateValNNer,xx::T,y::Vector=[])
Validates and checks arguments for errors.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(dnnr::DateValMultiNNer,xx::T,y::Vector=[])
Validates and checks arguments for errors.
AMLPipelineBase.AbsTypes.fit!
— Functionfit!(csvwtr::CSVDateValWriter,x::T=[],y::Vector=[])
Makes sure filename and dateformat are not empty strings.
AMLPipelineBase.AbsTypes.transform!
— Functiontransform!(csvrdr::CSVDateValReader,x::T=[])
Uses CSV package to read the csv file and converts it to dataframe.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(csvwtr::CSVDateValWriter,x::T)
Uses CSV package to write the dataframe into a csv file.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(dnnr::DateValLinearImputer,xx::T)
Replaces missings
by linear interpolation.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(dnnr::DateValMultiNNer,xx::T)
Replaces missings
by nearest neighbor or linear interpolation by looping over the dataset for each column until all missing values are gone.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(dnnr::DateValNNer,xx::T)
Replaces missings
by nearest neighbor looping over the dataset until all missing values are gone.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(dvmr::DateValgator,xx::T)
Aggregates values grouped by date-time period using aggregate function such as mean, median, maximum, minimum. Default is mean.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(dvzr::DateValizer,xx::T) where {T<:DataFrame}
Replaces missing
with the corresponding global medians with respect to time period.
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(dtr::Dateifier,xx::T)
Transforms to day of the month, day of the week, etc
AMLPipelineBase.AbsTypes.transform!
— Methodtransform!(mtr::Matrifier,xx::T)
Applies the parameters of sliding windows to create the corresponding matrix