Monotonic Detection
One important preprocessing step for time series data processing is the detection of monotonic data and transform it to non-monotonic type by using the finite difference operator.
Let's create an artificial monotonic data and apply our monotonic transformer to normalize it:
using Dates, DataFrames, Random
Random.seed!(123)
mdates = DateTime(2017,12,31,1):Dates.Hour(1):DateTime(2017,12,31,10) |> collect
mvals = rand(length(mdates)) |> cumsum
df = DataFrame(Date=mdates ,Value = mvals)| Date | Value | |
|---|---|---|
| Dates… | Float64 | |
| 1 | 2017-12-31T01:00:00 | 0.768448 |
| 2 | 2017-12-31T02:00:00 | 1.70896 |
| 3 | 2017-12-31T03:00:00 | 2.38292 |
| 4 | 2017-12-31T04:00:00 | 2.77837 |
| 5 | 2017-12-31T05:00:00 | 3.09162 |
| 6 | 2017-12-31T06:00:00 | 3.75417 |
| 7 | 2017-12-31T07:00:00 | 4.3402 |
| 8 | 2017-12-31T08:00:00 | 4.39233 |
| 9 | 2017-12-31T09:00:00 | 4.66097 |
| 10 | 2017-12-31T10:00:00 | 4.76984 |
Now that we have a monotonic data, let's use the Monotonicer to normalize it:
using TSML, TSML.Utils, TSML.TSMLTypes
using TSML.TSMLTransformers
using TSML: Monotonicer
mono = Monotonicer(Dict())
fit!(mono,df)
res=transform!(mono,df)| Date | Value | |
|---|---|---|
| Dates… | Float64 | |
| 1 | 2017-12-31T01:00:00 | 0.940515 |
| 2 | 2017-12-31T02:00:00 | 0.940515 |
| 3 | 2017-12-31T03:00:00 | 0.673959 |
| 4 | 2017-12-31T04:00:00 | 0.395453 |
| 5 | 2017-12-31T05:00:00 | 0.313244 |
| 6 | 2017-12-31T06:00:00 | 0.662555 |
| 7 | 2017-12-31T07:00:00 | 0.586022 |
| 8 | 2017-12-31T08:00:00 | 0.0521332 |
| 9 | 2017-12-31T09:00:00 | 0.26864 |
| 10 | 2017-12-31T10:00:00 | 0.108871 |
Real Data Example
We will now apply the entire pipeline starting from reading csv data, aggregate, impute, and normalize if it's monotonic. We will consider three different data types: a regular time series data, a monotonic data, and a daily monotonic data. The difference between monotonic and daily monotonic is that the values in daily monotonic resets to zero or some baseline and cumulatively increases in a day until the next day where it resets to zero or some baseline value. Monotonicer automatically detects these three different types and apply the corresponding normalization accordingly.
using TSML: DataReader
using TSML: DateValgator, DateValNNer, Statifier, Monotonicer
regularfile = joinpath(dirname(pathof(TSML)),"../data/typedetection/regular.csv")
monofile = joinpath(dirname(pathof(TSML)),"../data/typedetection/monotonic.csv")
dailymonofile = joinpath(dirname(pathof(TSML)),"../data/typedetection/dailymonotonic.csv")
regularfilecsv = DataReader(Dict(:filename=>regularfile,:dateformat=>"dd/mm/yyyy HH:MM"))
monofilecsv = DataReader(Dict(:filename=>monofile,:dateformat=>"dd/mm/yyyy HH:MM"))
dailymonofilecsv = DataReader(Dict(:filename=>dailymonofile,:dateformat=>"dd/mm/yyyy HH:MM"))
valgator = DateValgator(Dict(:dateinterval=>Dates.Hour(1)))
valnner = DateValNNer(Dict(:dateinterval=>Dates.Hour(1)))
stfier = Statifier(Dict(:processmissing=>true))
mono = Monotonicer(Dict())Regular TS Processing
Let's test by feeding the regular time series type to the pipeline. We expect that for this type, Monotonicer will not perform further processing:
- Pipeline with
Monotonicer: regular time series
pipeline = Pipeline(Dict(
:transformers => [regularfilecsv,valgator,valnner,mono]
)
)
fit!(pipeline)
regulardf=transform!(pipeline)
first(regulardf,5)| Date | Value | |
|---|---|---|
| Dates… | Float64⍰ | |
| 1 | 2014-01-01T01:00:00 | 4.5 |
| 2 | 2014-01-01T02:00:00 | 4.35 |
| 3 | 2014-01-01T03:00:00 | 4.05 |
| 4 | 2014-01-01T04:00:00 | 4.45 |
| 5 | 2014-01-01T05:00:00 | 4.2 |
- Pipeline without
Monotonicer: regular time series
pipeline = Pipeline(Dict(
:transformers => [regularfilecsv,valgator,valnner]
)
)
fit!(pipeline)
regulardf=transform!(pipeline)
first(regulardf,5)| Date | Value | |
|---|---|---|
| Dates… | Float64⍰ | |
| 1 | 2014-01-01T01:00:00 | 4.5 |
| 2 | 2014-01-01T02:00:00 | 4.35 |
| 3 | 2014-01-01T03:00:00 | 4.05 |
| 4 | 2014-01-01T04:00:00 | 4.45 |
| 5 | 2014-01-01T05:00:00 | 4.2 |
Notice that the outputs are the same with or without the Monotonicer instance.
Monotonic TS Processing
Let's now feed the same pipeline with a monotonic csv data.
- Pipeline with
Monotonicer: monotonic time series
pipeline = Pipeline(Dict(
:transformers => [monofilecsv,valgator,valnner,mono]
)
)
fit!(pipeline)
monodf=transform!(pipeline)
first(monodf,10)| Date | Value | |
|---|---|---|
| Dates… | Float64 | |
| 1 | 2016-01-06T17:00:00 | 230.0 |
| 2 | 2016-01-06T18:00:00 | 230.0 |
| 3 | 2016-01-06T19:00:00 | 264.0 |
| 4 | 2016-01-06T20:00:00 | 258.0 |
| 5 | 2016-01-06T21:00:00 | 244.0 |
| 6 | 2016-01-06T22:00:00 | 254.0 |
| 7 | 2016-01-06T23:00:00 | 242.0 |
| 8 | 2016-01-07T00:00:00 | 240.0 |
| 9 | 2016-01-07T01:00:00 | 240.0 |
| 10 | 2016-01-07T02:00:00 | 240.0 |
- Pipeline without
Monotonicer: monotonic time series
pipeline = Pipeline(Dict(
:transformers => [monofilecsv,valgator,valnner]
)
)
fit!(pipeline)
monodf=transform!(pipeline)
first(monodf,10)| Date | Value | |
|---|---|---|
| Dates… | Float64⍰ | |
| 1 | 2016-01-06T17:00:00 | 5.77291e7 |
| 2 | 2016-01-06T18:00:00 | 5.77294e7 |
| 3 | 2016-01-06T19:00:00 | 5.77296e7 |
| 4 | 2016-01-06T20:00:00 | 5.77299e7 |
| 5 | 2016-01-06T21:00:00 | 5.77301e7 |
| 6 | 2016-01-06T22:00:00 | 5.77304e7 |
| 7 | 2016-01-06T23:00:00 | 5.77306e7 |
| 8 | 2016-01-07T00:00:00 | 5.77309e7 |
| 9 | 2016-01-07T01:00:00 | 5.77311e7 |
| 10 | 2016-01-07T02:00:00 | 5.77313e7 |
Notice that without the Monotonicer instance, the data becomes monotonic while with the Monotonicer instance in the pipeline, it becomes a regular time series data.
Daily Monotonic TS Processing
Lastly, let's feed the daily monotonic data using similar pipeline and examine its output.
- Pipeline with
Monotonicer: daily monotonic time series
pipeline = Pipeline(Dict(
:transformers => [dailymonofilecsv,valgator,valnner,mono]
)
)
fit!(pipeline)
dailymonodf=transform!(pipeline)
first(dailymonodf,50)| Date | Value | |
|---|---|---|
| Dates… | Float64 | |
| 1 | 2019-02-10T12:00:00 | 2.35 |
| 2 | 2019-02-10T13:00:00 | 2.35 |
| 3 | 2019-02-10T14:00:00 | 0.205 |
| 4 | 2019-02-10T15:00:00 | 0.205 |
| 5 | 2019-02-10T16:00:00 | 0.205 |
| 6 | 2019-02-10T17:00:00 | 0.18 |
| 7 | 2019-02-10T18:00:00 | 0.94 |
| 8 | 2019-02-10T19:00:00 | 0.24 |
| 9 | 2019-02-10T20:00:00 | 0.24 |
| 10 | 2019-02-10T21:00:00 | 0.24 |
| 11 | 2019-02-10T22:00:00 | 0.24 |
| 12 | 2019-02-10T23:00:00 | 1.75 |
| 13 | 2019-02-11T00:00:00 | 0.49 |
| 14 | 2019-02-11T01:00:00 | 0.49 |
| 15 | 2019-02-11T02:00:00 | 0.49 |
| 16 | 2019-02-11T03:00:00 | 0.475 |
| 17 | 2019-02-11T04:00:00 | 0.475 |
| 18 | 2019-02-11T05:00:00 | 0.475 |
| 19 | 2019-02-11T06:00:00 | 0.38 |
| 20 | 2019-02-11T07:00:00 | 0.38 |
| 21 | 2019-02-11T08:00:00 | 0.38 |
| 22 | 2019-02-11T09:00:00 | 0.38 |
| 23 | 2019-02-11T10:00:00 | 0.38 |
| 24 | 2019-02-11T11:00:00 | 3.41 |
| 25 | 2019-02-11T12:00:00 | 0.25 |
| 26 | 2019-02-11T13:00:00 | 0.25 |
| 27 | 2019-02-11T14:00:00 | 1.75 |
| 28 | 2019-02-11T15:00:00 | 863.62 |
| 29 | 2019-02-11T16:00:00 | 42.5 |
| 30 | 2019-02-11T17:00:00 | 42.5 |
| 31 | 2019-02-11T18:00:00 | 84.0 |
| 32 | 2019-02-11T19:00:00 | 81.0 |
| 33 | 2019-02-11T20:00:00 | 77.0 |
| 34 | 2019-02-11T21:00:00 | 48.0 |
| 35 | 2019-02-11T22:00:00 | 53.0 |
| 36 | 2019-02-11T23:00:00 | 60.0 |
| 37 | 2019-02-12T00:00:00 | 39.0 |
| 38 | 2019-02-12T01:00:00 | 67.0 |
| 39 | 2019-02-12T02:00:00 | 67.0 |
| 40 | 2019-02-12T03:00:00 | 47.0 |
| 41 | 2019-02-12T04:00:00 | 67.0 |
| 42 | 2019-02-12T05:00:00 | 72.5 |
| 43 | 2019-02-12T06:00:00 | 55.5 |
| 44 | 2019-02-12T07:00:00 | 69.5 |
| 45 | 2019-02-12T08:00:00 | 68.5 |
| 46 | 2019-02-12T09:00:00 | 67.0 |
| 47 | 2019-02-12T10:00:00 | 69.0 |
| 48 | 2019-02-12T11:00:00 | 52.0 |
| 49 | 2019-02-12T12:00:00 | 66.0 |
| 50 | 2019-02-12T13:00:00 | 87.0 |
- Pipeline without
Monotonicer: daily monotonic time series
pipeline = Pipeline(Dict(
:transformers => [dailymonofilecsv,valgator,valnner]
)
)
fit!(pipeline)
dailymonodf=transform!(pipeline)
first(dailymonodf,50)| Date | Value | |
|---|---|---|
| Dates… | Float64⍰ | |
| 1 | 2019-02-10T12:00:00 | 60.36 |
| 2 | 2019-02-10T13:00:00 | 62.71 |
| 3 | 2019-02-10T14:00:00 | 61.76 |
| 4 | 2019-02-10T15:00:00 | 61.965 |
| 5 | 2019-02-10T16:00:00 | 62.17 |
| 6 | 2019-02-10T17:00:00 | 62.35 |
| 7 | 2019-02-10T18:00:00 | 63.29 |
| 8 | 2019-02-10T19:00:00 | 62.57 |
| 9 | 2019-02-10T20:00:00 | 61.85 |
| 10 | 2019-02-10T21:00:00 | 60.73 |
| 11 | 2019-02-10T22:00:00 | 60.97 |
| 12 | 2019-02-10T23:00:00 | 62.72 |
| 13 | 2019-02-11T00:00:00 | 61.325 |
| 14 | 2019-02-11T01:00:00 | 59.93 |
| 15 | 2019-02-11T02:00:00 | 60.42 |
| 16 | 2019-02-11T03:00:00 | 60.09 |
| 17 | 2019-02-11T04:00:00 | 60.565 |
| 18 | 2019-02-11T05:00:00 | 61.04 |
| 19 | 2019-02-11T06:00:00 | 60.76 |
| 20 | 2019-02-11T07:00:00 | 60.18 |
| 21 | 2019-02-11T08:00:00 | 59.76 |
| 22 | 2019-02-11T09:00:00 | 59.34 |
| 23 | 2019-02-11T10:00:00 | 59.72 |
| 24 | 2019-02-11T11:00:00 | 63.13 |
| 25 | 2019-02-11T12:00:00 | 63.38 |
| 26 | 2019-02-11T13:00:00 | 63.63 |
| 27 | 2019-02-11T14:00:00 | 65.38 |
| 28 | 2019-02-11T15:00:00 | 929.0 |
| 29 | 2019-02-11T16:00:00 | 971.5 |
| 30 | 2019-02-11T17:00:00 | 1014.0 |
| 31 | 2019-02-11T18:00:00 | 1098.0 |
| 32 | 2019-02-11T19:00:00 | 1179.0 |
| 33 | 2019-02-11T20:00:00 | 1256.0 |
| 34 | 2019-02-11T21:00:00 | 1304.0 |
| 35 | 2019-02-11T22:00:00 | 1357.0 |
| 36 | 2019-02-11T23:00:00 | 1417.0 |
| 37 | 2019-02-12T00:00:00 | 1456.0 |
| 38 | 2019-02-12T01:00:00 | 62.0 |
| 39 | 2019-02-12T02:00:00 | 129.0 |
| 40 | 2019-02-12T03:00:00 | 176.0 |
| 41 | 2019-02-12T04:00:00 | 243.0 |
| 42 | 2019-02-12T05:00:00 | 315.5 |
| 43 | 2019-02-12T06:00:00 | 371.0 |
| 44 | 2019-02-12T07:00:00 | 440.5 |
| 45 | 2019-02-12T08:00:00 | 509.0 |
| 46 | 2019-02-12T09:00:00 | 576.0 |
| 47 | 2019-02-12T10:00:00 | 645.0 |
| 48 | 2019-02-12T11:00:00 | 697.0 |
| 49 | 2019-02-12T12:00:00 | 763.0 |
| 50 | 2019-02-12T13:00:00 | 850.0 |
Notice that the first 27 rows behave like a regular time series with no monotonic signature. Only after row 27 the data behaves in a monotonic fashion. Notice further that the series reset to baseline value in row 38 at 1:00 am. This daily monotonic pattern can be seen when the data is plotted. In the pipeline with Monotonicer, the normalization replaces the baseline values to their immediate neighbor after applying the finite difference operation.