Value PreProcessing

Value Preprocessing

In order to process 1-D TS as input for ML model, it has to be converted into Matrix form where each row represents a slice of 1-D TS representing daily/hourly/weekly pattern depending on the size of the chunk, stride, and number of steps ahead for prediction. Below illustrates the processing workflow to Matrify a 1-D TS.

For illustration purposes, the code below generates a Date,Value dataframe where the values are just a sequece of integer from 1 to the length of the date sequence. We use this simple sequence to have a better understanding how the slicing of rows, steps ahead, and the stride to create the Matrified output is generated.

using Dates
using TSML, TSML.Utils, TSML.TSMLTypes
using TSML.TSMLTransformers
using DataFrames

lower = DateTime(2017,1,1)
upper = DateTime(2017,1,5)
dat=lower:Dates.Hour(1):upper |> collect
vals = 1:length(dat)
x = DataFrame(Date=dat,Value=vals)
last(x,5)

5 rows × 2 columns

DateValue
Dates…Int64
12017-01-04T20:00:0093
22017-01-04T21:00:0094
32017-01-04T22:00:0095
42017-01-04T23:00:0096
52017-01-05T00:00:0097

Matrifier

Let us create an instance of Matrifier passing the size of row, stride, and steps ahead to predict:

mtr = Matrifier(Dict(:ahead=>6,:size=>6,:stride=>3))
fit!(mtr,x)
res = transform!(mtr,x)
first(res,5)

5 rows × 7 columns

x1x2x3x4x5x6output
Int64Int64Int64Int64Int64Int64Int64
186878889909197
283848586878894
380818283848591
477787980818288
574757677787985

In this example, we have hourly values. We indicated in the Matrifier to generate a matrix where the size of each row is 6 hours, steps ahead for prediction is 6 hours and the stride of 3 hours. There are 7 columns because the last column indicates the value indicated by the steps ahead argument.

Let us try to make a matrix with the size of 6 hours, steps ahead of 2 hours, and a stride of 3 hours:

mtr = Matrifier(Dict(:ahead=>2,:size=>6,:stride=>3))
fit!(mtr,x)
res = transform!(mtr,x)
first(res,5)

5 rows × 7 columns

x1x2x3x4x5x6output
Int64Int64Int64Int64Int64Int64Int64
190919293949597
287888990919294
384858687888991
481828384858688
578798081828385