Value Preprocessing

In order to process 1-D TS as input for ML model, it has to be converted into Matrix form where each row represents a slice of 1-D TS representing daily/hourly/weekly pattern depending on the size of the chunk, stride, and number of steps ahead for prediction. Below illustrates the processing workflow to Matrify a 1-D TS.

For illustration purposes, the code below generates a Date,Value dataframe where the values are just a sequece of integer from 1 to the length of the date sequence. We use this simple sequence to have a better understanding how the slicing of rows, steps ahead, and the stride to create the Matrified output is generated.

using TSML

lower = DateTime(2017,1,1)
upper = DateTime(2017,1,5)
dat=lower:Dates.Hour(1):upper |> collect
vals = 1:length(dat)
x = DataFrame(Date=dat,Value=vals)
julia> last(x,5)
5×2 DataFrame
│ Row │ Date                │ Value │
│     │ DateTime            │ Int64 │
├─────┼─────────────────────┼───────┤
│ 1   │ 2017-01-04T20:00:00 │ 93    │
│ 2   │ 2017-01-04T21:00:00 │ 94    │
│ 3   │ 2017-01-04T22:00:00 │ 95    │
│ 4   │ 2017-01-04T23:00:00 │ 96    │
│ 5   │ 2017-01-05T00:00:00 │ 97    │

Matrifier

Let us create an instance of Matrifier passing the size of row, stride, and steps ahead to predict:

mtr = Matrifier(Dict(:ahead=>6,:size=>6,:stride=>3))
res = fit_transform!(mtr,x)
julia> first(res,5)
5×7 DataFrame
│ Row │ x1    │ x2    │ x3    │ x4    │ x5    │ x6    │ output │
│     │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64  │
├─────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┤
│ 1   │ 86    │ 87    │ 88    │ 89    │ 90    │ 91    │ 97     │
│ 2   │ 83    │ 84    │ 85    │ 86    │ 87    │ 88    │ 94     │
│ 3   │ 80    │ 81    │ 82    │ 83    │ 84    │ 85    │ 91     │
│ 4   │ 77    │ 78    │ 79    │ 80    │ 81    │ 82    │ 88     │
│ 5   │ 74    │ 75    │ 76    │ 77    │ 78    │ 79    │ 85     │

In this example, we have hourly values. We indicated in the Matrifier to generate a matrix where the size of each row is 6 hours, steps ahead for prediction is 6 hours and the stride of 3 hours. There are 7 columns because the last column indicates the value indicated by the steps ahead argument.

Let us try to make a matrix with the size of 6 hours, steps ahead of 2 hours, and a stride of 3 hours:

mtr = Matrifier(Dict(:ahead=>2,:size=>6,:stride=>3))
res = fit_transform!(mtr,x)
julia> first(res,5)
5×7 DataFrame
│ Row │ x1    │ x2    │ x3    │ x4    │ x5    │ x6    │ output │
│     │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64  │
├─────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┤
│ 1   │ 90    │ 91    │ 92    │ 93    │ 94    │ 95    │ 97     │
│ 2   │ 87    │ 88    │ 89    │ 90    │ 91    │ 92    │ 94     │
│ 3   │ 84    │ 85    │ 86    │ 87    │ 88    │ 89    │ 91     │
│ 4   │ 81    │ 82    │ 83    │ 84    │ 85    │ 86    │ 88     │
│ 5   │ 78    │ 79    │ 80    │ 81    │ 82    │ 83    │ 85     │