Date PreProcessing

Date Preprocessing

Extracting the Date features in a Date,Value table follows similar workflow with the value preprocessing of the previous section. The main difference is we are only interested on the date corresponding to the last column of the values generated by the Matrifier. This last column contains the values before the prediction happens and the dates corresponding to these values carry significant information based on recency compared to the other dates.

Let us start by creating a Date,Value dataframe similar to the previous section.

using Dates
using TSML, TSML.Utils, TSML.TSMLTypes
using TSML.TSMLTransformers
using DataFrames

lower = DateTime(2017,1,1)
upper = DateTime(2018,1,31)
dat=lower:Dates.Day(1):upper |> collect
vals = rand(length(dat))
x = DataFrame(Date=dat,Value=vals)
first(x,5)

5 rows × 2 columns

DateValue
Dates…Float64
12017-01-01T00:00:000.184567
22017-01-02T00:00:000.797453
32017-01-03T00:00:000.385266
42017-01-04T00:00:000.992998
52017-01-05T00:00:000.543396

Dateifier

Let us create an instance of Dateifier passing the size of row, stride, and steps ahead to predict:

mtr = Dateifier(Dict(:ahead=>24,:size=>24,:stride=>5))
fit!(mtr,x)
res = transform!(mtr,x)
first(res,5)

5 rows × 8 columns

yearmonthdayhourweekdowdoqqoy
Int64Int64Int64Int64Int64Int64Int64Int64
120181701771
220181201221
3201712280524894
4201712230516844
5201712180511794

The model transform! output extracts automatically several date features such as year, month, day, hour, week, day of the week, day of quarter, quarter of year.

ML Features: Matrifier and Datefier

You can then combine the outputs in both the Matrifier and Datefier as input features to a machine learning model. Below is an example of the workflow where the code extracts the Date and Value features combining them to form a matrix of features as input to a machine learning model.

commonargs = Dict(:ahead=>3,:size=>5,:stride=>2)
dtr = Dateifier(commonargs)
mtr = Matrifier(commonargs)

lower = DateTime(2017,1,1)
upper = DateTime(2018,1,31)
dat=lower:Dates.Day(1):upper |> collect
vals = rand(length(dat))
X = DataFrame(Date=dat,Value=vals)

fit!(mtr,X)
valuematrix = transform!(mtr,X)
fit!(dtr,X)
datematrix = transform!(dtr,X)
mlfeatures = hcat(datematrix,valuematrix)
first(mlfeatures,5)

5 rows × 14 columns

yearmonthdayhourweekdowdoqqoyx1x2x3x4x5output
Int64Int64Int64Int64Int64Int64Int64Int64Float64Float64Float64Float64Float64Float64
120181280472810.5254220.4747210.0864430.8752840.9516780.199716
220181260452610.611960.7173330.5254220.4747210.0864430.619944
320181240432410.4126230.3157150.611960.7173330.5254220.875284
420181220412210.1669630.480560.4126230.3157150.611960.474721
520181200362010.1161240.1493830.1669630.480560.4126230.717333