Date Preprocessing
Extracting the Date features in a Date,Value
table follows similar workflow with the value preprocessing of the previous section. The main difference is we are only interested on the date corresponding to the last column of the values generated by the Matrifier
. This last column contains the values before the prediction happens and the dates corresponding to these values carry significant information based on recency compared to the other dates.
Let us start by creating a Date,Value dataframe similar to the previous section.
using TSML
lower = DateTime(2017,1,1)
upper = DateTime(2018,1,31)
dat=lower:Dates.Day(1):upper |> collect
vals = rand(length(dat))
x = DataFrame(Date=dat,Value=vals)
julia> first(x,5)
5×2 DataFrame Row │ Date Value │ DateTime Float64 ─────┼─────────────────────────────── 1 │ 2017-01-01T00:00:00 0.968518 2 │ 2017-01-02T00:00:00 0.368095 3 │ 2017-01-03T00:00:00 0.383846 4 │ 2017-01-04T00:00:00 0.343016 5 │ 2017-01-05T00:00:00 0.642754
Dateifier
Let us create an instance of Dateifier
passing the size of row, stride, and steps ahead to predict:
mtr = Dateifier(Dict(:ahead=>24,:size=>24,:stride=>5))
res = fit_transform!(mtr,x)
julia> first(res,5)
5×8 DataFrame Row │ year month day hour week dow doq qoy │ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 ─────┼──────────────────────────────────────────────────────── 1 │ 2018 1 7 0 1 7 7 1 2 │ 2018 1 2 0 1 2 2 1 3 │ 2017 12 28 0 52 4 89 4 4 │ 2017 12 23 0 51 6 84 4 5 │ 2017 12 18 0 51 1 79 4
The model transform!
output extracts automatically several date features such as year, month, day, hour, week, day of the week, day of quarter, quarter of year.
ML Features: Matrifier and Datefier
You can then combine the outputs in both the Matrifier
and Datefier
as input features to a machine learning model. Below is an example of the workflow where the code extracts the Date and Value features combining them to form a matrix of features as input to a machine learning model.
commonargs = Dict(:ahead=>3,:size=>5,:stride=>2)
dtr = Dateifier(commonargs)
mtr = Matrifier(commonargs)
lower = DateTime(2017,1,1)
upper = DateTime(2018,1,31)
dat=lower:Dates.Day(1):upper |> collect
vals = rand(length(dat))
X = DataFrame(Date=dat,Value=vals)
valuematrix = fit_transform!(mtr,X)
datematrix = fit_transform!(dtr,X)
mlfeatures = hcat(datematrix,valuematrix)
julia> first(mlfeatures,5)
5×14 DataFrame Row │ year month day hour week dow doq qoy x1 x2 ⋯ │ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Float64 Float ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 2018 1 28 0 4 7 28 1 0.471969 0.650 ⋯ 2 │ 2018 1 26 0 4 5 26 1 0.113725 0.415 3 │ 2018 1 24 0 4 3 24 1 0.18972 0.748 4 │ 2018 1 22 0 4 1 22 1 0.650217 0.289 5 │ 2018 1 20 0 3 6 20 1 0.667861 0.617 ⋯ 5 columns omitted
Another way is to use the symbolic pipeline to describe the transformation and concatenation in just one line of expression.
ppl = dtr + mtr
features = fit_transform!(ppl,X)
julia> first(features,5)
5×14 DataFrame Row │ year month day hour week dow doq qoy x1 x2 ⋯ │ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Float64 Float ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 2018 1 28 0 4 7 28 1 0.471969 0.650 ⋯ 2 │ 2018 1 26 0 4 5 26 1 0.113725 0.415 3 │ 2018 1 24 0 4 3 24 1 0.18972 0.748 4 │ 2018 1 22 0 4 1 22 1 0.650217 0.289 5 │ 2018 1 20 0 3 6 20 1 0.667861 0.617 ⋯ 5 columns omitted