Date Preprocessing

Extracting the Date features in a Date,Value table follows similar workflow with the value preprocessing of the previous section. The main difference is we are only interested on the date corresponding to the last column of the values generated by the Matrifier. This last column contains the values before the prediction happens and the dates corresponding to these values carry significant information based on recency compared to the other dates.

Let us start by creating a Date,Value dataframe similar to the previous section.

using TSML

lower = DateTime(2017,1,1)
upper = DateTime(2018,1,31)
dat=lower:Dates.Day(1):upper |> collect
vals = rand(length(dat))
x = DataFrame(Date=dat,Value=vals)

julia> first(x,5)5×2 DataFrame
 Row │ Date                 Value
     │ DateTime             Float64
─────┼───────────────────────────────
   1 │ 2017-01-01T00:00:00  0.207314
   2 │ 2017-01-02T00:00:00  0.773526
   3 │ 2017-01-03T00:00:00  0.694852
   4 │ 2017-01-04T00:00:00  0.215086
   5 │ 2017-01-05T00:00:00  0.378855

Dateifier

Let us create an instance of Dateifier passing the size of row, stride, and steps ahead to predict:

mtr = Dateifier(Dict(:ahead=>24,:size=>24,:stride=>5))
res = fit_transform!(mtr,x)

julia> first(res,5)5×8 DataFrame
 Row │ year   month  day    hour   week   dow    doq    qoy
     │ Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64
─────┼────────────────────────────────────────────────────────
   1 │  2018      1      7      0      1      7      7      1
   2 │  2018      1      2      0      1      2      2      1
   3 │  2017     12     28      0     52      4     89      4
   4 │  2017     12     23      0     51      6     84      4
   5 │  2017     12     18      0     51      1     79      4

The model transform! output extracts automatically several date features such as year, month, day, hour, week, day of the week, day of quarter, quarter of year.

ML Features: Matrifier and Datefier

You can then combine the outputs in both the Matrifier and Datefier as input features to a machine learning model. Below is an example of the workflow where the code extracts the Date and Value features combining them to form a matrix of features as input to a machine learning model.

commonargs = Dict(:ahead=>3,:size=>5,:stride=>2)
dtr = Dateifier(commonargs)
mtr = Matrifier(commonargs)

lower = DateTime(2017,1,1)
upper = DateTime(2018,1,31)
dat=lower:Dates.Day(1):upper |> collect
vals = rand(length(dat))
X = DataFrame(Date=dat,Value=vals)

valuematrix = fit_transform!(mtr,X)
datematrix = fit_transform!(dtr,X)
mlfeatures = hcat(datematrix,valuematrix)

julia> first(mlfeatures,5)5×14 DataFrame
 Row │ year   month  day    hour   week   dow    doq    qoy    x1        x2    ⋯
     │ Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Float64   Float ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │  2018      1     28      0      4      7     28      1  0.763344  0.625 ⋯
   2 │  2018      1     26      0      4      5     26      1  0.90385   0.918
   3 │  2018      1     24      0      4      3     24      1  0.939854  0.175
   4 │  2018      1     22      0      4      1     22      1  0.640097  0.565
   5 │  2018      1     20      0      3      6     20      1  0.911608  0.582 ⋯
                                                               5 columns omitted

Another way is to use the symbolic pipeline to describe the transformation and concatenation in just one line of expression.

ppl = dtr + mtr
features = fit_transform!(ppl,X)

julia> first(features,5)5×14 DataFrame
 Row │ year   month  day    hour   week   dow    doq    qoy    x1        x2    ⋯
     │ Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Float64   Float ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │  2018      1     28      0      4      7     28      1  0.763344  0.625 ⋯
   2 │  2018      1     26      0      4      5     26      1  0.90385   0.918
   3 │  2018      1     24      0      4      3     24      1  0.939854  0.175
   4 │  2018      1     22      0      4      1     22      1  0.640097  0.565
   5 │  2018      1     20      0      3      6     20      1  0.911608  0.582 ⋯
                                                               5 columns omitted