Value Preprocessing
In order to process 1-D TS as input for ML model, it has to be converted into Matrix form where each row represents a slice of 1-D TS representing daily/hourly/weekly pattern depending on the size of the chunk, stride, and number of steps ahead for prediction. Below illustrates the processing workflow to Matrify a 1-D TS.
For illustration purposes, the code below generates a Date,Value dataframe where the values are just a sequece of integer from 1 to the length of the date sequence. We use this simple sequence to have a better understanding how the slicing of rows, steps ahead, and the stride to create the Matrified output is generated.
using Dates
using TSML, TSML.Utils, TSML.TSMLTypes
using TSML.TSMLTransformers
using DataFrames
lower = DateTime(2017,1,1)
upper = DateTime(2017,1,5)
dat=lower:Dates.Hour(1):upper |> collect
vals = 1:length(dat)
x = DataFrame(Date=dat,Value=vals)
last(x,5)| Date | Value | |
|---|---|---|
| Dates… | Int64 | |
| 1 | 2017-01-04T20:00:00 | 93 |
| 2 | 2017-01-04T21:00:00 | 94 |
| 3 | 2017-01-04T22:00:00 | 95 |
| 4 | 2017-01-04T23:00:00 | 96 |
| 5 | 2017-01-05T00:00:00 | 97 |
Matrifier
Let us create an instance of Matrifier passing the size of row, stride, and steps ahead to predict:
mtr = Matrifier(Dict(:ahead=>6,:size=>6,:stride=>3))
fit!(mtr,x)
res = transform!(mtr,x)
first(res,5)| x1 | x2 | x3 | x4 | x5 | x6 | output | |
|---|---|---|---|---|---|---|---|
| Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | |
| 1 | 86 | 87 | 88 | 89 | 90 | 91 | 97 |
| 2 | 83 | 84 | 85 | 86 | 87 | 88 | 94 |
| 3 | 80 | 81 | 82 | 83 | 84 | 85 | 91 |
| 4 | 77 | 78 | 79 | 80 | 81 | 82 | 88 |
| 5 | 74 | 75 | 76 | 77 | 78 | 79 | 85 |
In this example, we have hourly values. We indicated in the Matrifier to generate a matrix where the size of each row is 6 hours, steps ahead for prediction is 6 hours and the stride of 3 hours. There are 7 columns because the last column indicates the value indicated by the steps ahead argument.
Let us try to make a matrix with the size of 6 hours, steps ahead of 2 hours, and a stride of 3 hours:
mtr = Matrifier(Dict(:ahead=>2,:size=>6,:stride=>3))
fit!(mtr,x)
res = transform!(mtr,x)
first(res,5)| x1 | x2 | x3 | x4 | x5 | x6 | output | |
|---|---|---|---|---|---|---|---|
| Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | |
| 1 | 90 | 91 | 92 | 93 | 94 | 95 | 97 |
| 2 | 87 | 88 | 89 | 90 | 91 | 92 | 94 |
| 3 | 84 | 85 | 86 | 87 | 88 | 89 | 91 |
| 4 | 81 | 82 | 83 | 84 | 85 | 86 | 88 |
| 5 | 78 | 79 | 80 | 81 | 82 | 83 | 85 |