Value Preprocessing

In order to process 1-D TS as input for ML model, it has to be converted into Matrix form where each row represents a slice of 1-D TS representing daily/hourly/weekly pattern depending on the size of the chunk, stride, and number of steps ahead for prediction. Below illustrates the processing workflow to Matrify a 1-D TS.

For illustration purposes, the code below generates a Date,Value dataframe where the values are just a sequece of integer from 1 to the length of the date sequence. We use this simple sequence to have a better understanding how the slicing of rows, steps ahead, and the stride to create the Matrified output is generated.

using TSML

lower = DateTime(2017,1,1)
upper = DateTime(2017,1,5)
dat=lower:Dates.Hour(1):upper |> collect
vals = 1:length(dat)
x = DataFrame(Date=dat,Value=vals)
julia> last(x,5)5×2 DataFrame
 Row │ Date                 Value
     │ DateTime             Int64
─────┼────────────────────────────
   1 │ 2017-01-04T20:00:00     93
   2 │ 2017-01-04T21:00:00     94
   3 │ 2017-01-04T22:00:00     95
   4 │ 2017-01-04T23:00:00     96
   5 │ 2017-01-05T00:00:00     97

Matrifier

Let us create an instance of Matrifier passing the size of row, stride, and steps ahead to predict:

mtr = Matrifier(Dict(:ahead=>6,:size=>6,:stride=>3))
res = fit_transform!(mtr,x)
julia> first(res,5)5×7 DataFrame
 Row │ x1     x2     x3     x4     x5     x6     output
     │ Int64  Int64  Int64  Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────────────
   1 │    86     87     88     89     90     91      97
   2 │    83     84     85     86     87     88      94
   3 │    80     81     82     83     84     85      91
   4 │    77     78     79     80     81     82      88
   5 │    74     75     76     77     78     79      85

In this example, we have hourly values. We indicated in the Matrifier to generate a matrix where the size of each row is 6 hours, steps ahead for prediction is 6 hours and the stride of 3 hours. There are 7 columns because the last column indicates the value indicated by the steps ahead argument.

Let us try to make a matrix with the size of 6 hours, steps ahead of 2 hours, and a stride of 3 hours:

mtr = Matrifier(Dict(:ahead=>2,:size=>6,:stride=>3))
res = fit_transform!(mtr,x)
julia> first(res,5)5×7 DataFrame
 Row │ x1     x2     x3     x4     x5     x6     output
     │ Int64  Int64  Int64  Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────────────
   1 │    90     91     92     93     94     95      97
   2 │    87     88     89     90     91     92      94
   3 │    84     85     86     87     88     89      91
   4 │    81     82     83     84     85     86      88
   5 │    78     79     80     81     82     83      85