Value Preprocessing
In order to process 1-D TS as input for ML model, it has to be converted into Matrix form where each row represents a slice of 1-D TS representing daily/hourly/weekly pattern depending on the size of the chunk, stride, and number of steps ahead for prediction. Below illustrates the processing workflow to Matrify
a 1-D TS.
For illustration purposes, the code below generates a Date,Value dataframe where the values are just a sequece of integer from 1 to the length of the date sequence. We use this simple sequence to have a better understanding how the slicing of rows, steps ahead, and the stride to create the Matrified
output is generated.
using TSML
lower = DateTime(2017,1,1)
upper = DateTime(2017,1,5)
dat=lower:Dates.Hour(1):upper |> collect
vals = 1:length(dat)
x = DataFrame(Date=dat,Value=vals)
julia> last(x,5)
5×2 DataFrame Row │ Date Value │ DateTime Int64 ─────┼──────────────────────────── 1 │ 2017-01-04T20:00:00 93 2 │ 2017-01-04T21:00:00 94 3 │ 2017-01-04T22:00:00 95 4 │ 2017-01-04T23:00:00 96 5 │ 2017-01-05T00:00:00 97
Matrifier
Let us create an instance of Matrifier passing the size of row, stride, and steps ahead to predict:
mtr = Matrifier(Dict(:ahead=>6,:size=>6,:stride=>3))
res = fit_transform!(mtr,x)
julia> first(res,5)
5×7 DataFrame Row │ x1 x2 x3 x4 x5 x6 output │ Int64 Int64 Int64 Int64 Int64 Int64 Int64 ─────┼────────────────────────────────────────────────── 1 │ 86 87 88 89 90 91 97 2 │ 83 84 85 86 87 88 94 3 │ 80 81 82 83 84 85 91 4 │ 77 78 79 80 81 82 88 5 │ 74 75 76 77 78 79 85
In this example, we have hourly values. We indicated in the Matrifier
to generate a matrix where the size of each row is 6 hours, steps ahead for prediction is 6 hours and the stride of 3 hours. There are 7 columns because the last column indicates the value indicated by the steps ahead
argument.
Let us try to make a matrix with the size of 6 hours, steps ahead of 2 hours, and a stride of 3 hours:
mtr = Matrifier(Dict(:ahead=>2,:size=>6,:stride=>3))
res = fit_transform!(mtr,x)
julia> first(res,5)
5×7 DataFrame Row │ x1 x2 x3 x4 x5 x6 output │ Int64 Int64 Int64 Int64 Int64 Int64 Int64 ─────┼────────────────────────────────────────────────── 1 │ 90 91 92 93 94 95 97 2 │ 87 88 89 90 91 92 94 3 │ 84 85 86 87 88 89 91 4 │ 81 82 83 84 85 86 88 5 │ 78 79 80 81 82 83 85