Monotonic Detection

Monotonic Detection

One important preprocessing step for time series data processing is the detection of monotonic data and transform it to non-monotonic type by using the finite difference operator.

Let's create an artificial monotonic data and apply our monotonic transformer to normalize it:

using Dates, DataFrames, Random

Random.seed!(123)
mdates = DateTime(2017,12,31,1):Dates.Hour(1):DateTime(2017,12,31,10) |> collect
mvals = rand(length(mdates)) |> cumsum
df =  DataFrame(Date=mdates ,Value = mvals)

10 rows × 2 columns

DateValue
Dates…Float64
12017-12-31T01:00:000.768448
22017-12-31T02:00:001.70896
32017-12-31T03:00:002.38292
42017-12-31T04:00:002.77837
52017-12-31T05:00:003.09162
62017-12-31T06:00:003.75417
72017-12-31T07:00:004.3402
82017-12-31T08:00:004.39233
92017-12-31T09:00:004.66097
102017-12-31T10:00:004.76984

Now that we have a monotonic data, let's use the Monotonicer to normalize it:

using TSML, TSML.Utils, TSML.TSMLTypes
using TSML.TSMLTransformers
using TSML: Monotonicer

mono = Monotonicer(Dict())
fit!(mono,df)
res=transform!(mono,df)

10 rows × 2 columns

DateValue
Dates…Float64
12017-12-31T01:00:000.940515
22017-12-31T02:00:000.940515
32017-12-31T03:00:000.673959
42017-12-31T04:00:000.395453
52017-12-31T05:00:000.313244
62017-12-31T06:00:000.662555
72017-12-31T07:00:000.586022
82017-12-31T08:00:000.0521332
92017-12-31T09:00:000.26864
102017-12-31T10:00:000.108871

Real Data Example

We will now apply the entire pipeline starting from reading csv data, aggregate, impute, and normalize if it's monotonic. We will consider three different data types: a regular time series data, a monotonic data, and a daily monotonic data. The difference between monotonic and daily monotonic is that the values in daily monotonic resets to zero or some baseline and cumulatively increases in a day until the next day where it resets to zero or some baseline value. Monotonicer automatically detects these three different types and apply the corresponding normalization accordingly.

using TSML: DataReader
using TSML: DateValgator, DateValNNer, Statifier, Monotonicer
regularfile = joinpath(dirname(pathof(TSML)),"../data/typedetection/regular.csv")
monofile = joinpath(dirname(pathof(TSML)),"../data/typedetection/monotonic.csv")
dailymonofile = joinpath(dirname(pathof(TSML)),"../data/typedetection/dailymonotonic.csv")

regularfilecsv = DataReader(Dict(:filename=>regularfile,:dateformat=>"dd/mm/yyyy HH:MM"))
monofilecsv = DataReader(Dict(:filename=>monofile,:dateformat=>"dd/mm/yyyy HH:MM"))
dailymonofilecsv = DataReader(Dict(:filename=>dailymonofile,:dateformat=>"dd/mm/yyyy HH:MM"))

valgator = DateValgator(Dict(:dateinterval=>Dates.Hour(1)))
valnner = DateValNNer(Dict(:dateinterval=>Dates.Hour(1)))
stfier = Statifier(Dict(:processmissing=>true))
mono = Monotonicer(Dict())

Regular TS Processing

Let's test by feeding the regular time series type to the pipeline. We expect that for this type, Monotonicer will not perform further processing:

pipeline = Pipeline(Dict(
    :transformers => [regularfilecsv,valgator,valnner,mono]
   )
)
fit!(pipeline)
regulardf=transform!(pipeline)
first(regulardf,5)

5 rows × 2 columns

DateValue
Dates…Float64⍰
12014-01-01T01:00:004.5
22014-01-01T02:00:004.35
32014-01-01T03:00:004.05
42014-01-01T04:00:004.45
52014-01-01T05:00:004.2
pipeline = Pipeline(Dict(
    :transformers => [regularfilecsv,valgator,valnner]
   )
)
fit!(pipeline)
regulardf=transform!(pipeline)
first(regulardf,5)

5 rows × 2 columns

DateValue
Dates…Float64⍰
12014-01-01T01:00:004.5
22014-01-01T02:00:004.35
32014-01-01T03:00:004.05
42014-01-01T04:00:004.45
52014-01-01T05:00:004.2

Notice that the outputs are the same with or without the Monotonicer instance.

Monotonic TS Processing

Let's now feed the same pipeline with a monotonic csv data.

pipeline = Pipeline(Dict(
    :transformers => [monofilecsv,valgator,valnner,mono]
   )
)
fit!(pipeline)
monodf=transform!(pipeline)
first(monodf,10)

10 rows × 2 columns

DateValue
Dates…Float64
12016-01-06T17:00:00230.0
22016-01-06T18:00:00230.0
32016-01-06T19:00:00264.0
42016-01-06T20:00:00258.0
52016-01-06T21:00:00244.0
62016-01-06T22:00:00254.0
72016-01-06T23:00:00242.0
82016-01-07T00:00:00240.0
92016-01-07T01:00:00240.0
102016-01-07T02:00:00240.0
pipeline = Pipeline(Dict(
    :transformers => [monofilecsv,valgator,valnner]
   )
)
fit!(pipeline)
monodf=transform!(pipeline)
first(monodf,10)

10 rows × 2 columns

DateValue
Dates…Float64⍰
12016-01-06T17:00:005.77291e7
22016-01-06T18:00:005.77294e7
32016-01-06T19:00:005.77296e7
42016-01-06T20:00:005.77299e7
52016-01-06T21:00:005.77301e7
62016-01-06T22:00:005.77304e7
72016-01-06T23:00:005.77306e7
82016-01-07T00:00:005.77309e7
92016-01-07T01:00:005.77311e7
102016-01-07T02:00:005.77313e7

Notice that without the Monotonicer instance, the data becomes monotonic while with the Monotonicer instance in the pipeline, it becomes a regular time series data.

Daily Monotonic TS Processing

Lastly, let's feed the daily monotonic data using similar pipeline and examine its output.

pipeline = Pipeline(Dict(
    :transformers => [dailymonofilecsv,valgator,valnner,mono]
   )
)
fit!(pipeline)
dailymonodf=transform!(pipeline)
first(dailymonodf,50)

50 rows × 2 columns

DateValue
Dates…Float64
12019-02-10T12:00:002.35
22019-02-10T13:00:002.35
32019-02-10T14:00:000.205
42019-02-10T15:00:000.205
52019-02-10T16:00:000.205
62019-02-10T17:00:000.18
72019-02-10T18:00:000.94
82019-02-10T19:00:000.24
92019-02-10T20:00:000.24
102019-02-10T21:00:000.24
112019-02-10T22:00:000.24
122019-02-10T23:00:001.75
132019-02-11T00:00:000.49
142019-02-11T01:00:000.49
152019-02-11T02:00:000.49
162019-02-11T03:00:000.475
172019-02-11T04:00:000.475
182019-02-11T05:00:000.475
192019-02-11T06:00:000.38
202019-02-11T07:00:000.38
212019-02-11T08:00:000.38
222019-02-11T09:00:000.38
232019-02-11T10:00:000.38
242019-02-11T11:00:003.41
252019-02-11T12:00:000.25
262019-02-11T13:00:000.25
272019-02-11T14:00:001.75
282019-02-11T15:00:00863.62
292019-02-11T16:00:0042.5
302019-02-11T17:00:0042.5
312019-02-11T18:00:0084.0
322019-02-11T19:00:0081.0
332019-02-11T20:00:0077.0
342019-02-11T21:00:0048.0
352019-02-11T22:00:0053.0
362019-02-11T23:00:0060.0
372019-02-12T00:00:0039.0
382019-02-12T01:00:0067.0
392019-02-12T02:00:0067.0
402019-02-12T03:00:0047.0
412019-02-12T04:00:0067.0
422019-02-12T05:00:0072.5
432019-02-12T06:00:0055.5
442019-02-12T07:00:0069.5
452019-02-12T08:00:0068.5
462019-02-12T09:00:0067.0
472019-02-12T10:00:0069.0
482019-02-12T11:00:0052.0
492019-02-12T12:00:0066.0
502019-02-12T13:00:0087.0
pipeline = Pipeline(Dict(
    :transformers => [dailymonofilecsv,valgator,valnner]
   )
)
fit!(pipeline)
dailymonodf=transform!(pipeline)
first(dailymonodf,50)

50 rows × 2 columns

DateValue
Dates…Float64⍰
12019-02-10T12:00:0060.36
22019-02-10T13:00:0062.71
32019-02-10T14:00:0061.76
42019-02-10T15:00:0061.965
52019-02-10T16:00:0062.17
62019-02-10T17:00:0062.35
72019-02-10T18:00:0063.29
82019-02-10T19:00:0062.57
92019-02-10T20:00:0061.85
102019-02-10T21:00:0060.73
112019-02-10T22:00:0060.97
122019-02-10T23:00:0062.72
132019-02-11T00:00:0061.325
142019-02-11T01:00:0059.93
152019-02-11T02:00:0060.42
162019-02-11T03:00:0060.09
172019-02-11T04:00:0060.565
182019-02-11T05:00:0061.04
192019-02-11T06:00:0060.76
202019-02-11T07:00:0060.18
212019-02-11T08:00:0059.76
222019-02-11T09:00:0059.34
232019-02-11T10:00:0059.72
242019-02-11T11:00:0063.13
252019-02-11T12:00:0063.38
262019-02-11T13:00:0063.63
272019-02-11T14:00:0065.38
282019-02-11T15:00:00929.0
292019-02-11T16:00:00971.5
302019-02-11T17:00:001014.0
312019-02-11T18:00:001098.0
322019-02-11T19:00:001179.0
332019-02-11T20:00:001256.0
342019-02-11T21:00:001304.0
352019-02-11T22:00:001357.0
362019-02-11T23:00:001417.0
372019-02-12T00:00:001456.0
382019-02-12T01:00:0062.0
392019-02-12T02:00:00129.0
402019-02-12T03:00:00176.0
412019-02-12T04:00:00243.0
422019-02-12T05:00:00315.5
432019-02-12T06:00:00371.0
442019-02-12T07:00:00440.5
452019-02-12T08:00:00509.0
462019-02-12T09:00:00576.0
472019-02-12T10:00:00645.0
482019-02-12T11:00:00697.0
492019-02-12T12:00:00763.0
502019-02-12T13:00:00850.0

Notice that the first 27 rows behave like a regular time series with no monotonic signature. Only after row 27 the data behaves in a monotonic fashion. Notice further that the series reset to baseline value in row 38 at 1:00 am. This daily monotonic pattern can be seen when the data is plotted. In the pipeline with Monotonicer, the normalization replaces the baseline values to their immediate neighbor after applying the finite difference operation.