Aggregation

Aggregation

DateValgator is a data type that supports operation for aggregation to minimize noise and lessen the occurrence of missing data. It expects to receive one argument which is the date-time interval for grouping values by taking their median. For example, hourly median as the basis of aggregation can be carried out by passing this argument: :dateinterval => Dates.Hour(1)

To illustrate DateValgator usage, let's start by generating an artificial data with sample frequencey every 5 minutes and print the first 10 rows.

using TSML

gdate = DateTime(2014,1,1):Dates.Minute(5):DateTime(2014,5,1)
gval = rand(length(gdate))
df = DataFrame(Date=gdate,Value=gval)
julia> first(df,10)
10×2 DataFrame
│ Row │ Date                │ Value    │
│     │ DateTime            │ Float64  │
├─────┼─────────────────────┼──────────┤
│ 1   │ 2014-01-01T00:00:00 │ 0.779611 │
│ 2   │ 2014-01-01T00:05:00 │ 0.512027 │
│ 3   │ 2014-01-01T00:10:00 │ 0.815486 │
│ 4   │ 2014-01-01T00:15:00 │ 0.92707  │
│ 5   │ 2014-01-01T00:20:00 │ 0.647006 │
│ 6   │ 2014-01-01T00:25:00 │ 0.626544 │
│ 7   │ 2014-01-01T00:30:00 │ 0.209695 │
│ 8   │ 2014-01-01T00:35:00 │ 0.781134 │
│ 9   │ 2014-01-01T00:40:00 │ 0.911039 │
│ 10  │ 2014-01-01T00:45:00 │ 0.956309 │

DateValgator

Let's apply the aggregator and try diffent groupings: hourly vs half hourly vs daily aggregates of the data.

using TSML

hourlyagg = DateValgator(Dict(:dateinterval => Dates.Hour(1)))
halfhourlyagg = DateValgator(Dict(:dateinterval => Dates.Minute(30)))
dailyagg = DateValgator(Dict(:dateinterval => Dates.Day(1)))

fit!(halfhourlyagg,df)
halfhourlyres = transform!(halfhourlyagg,df)

fit!(hourlyagg,df)
hourlyres = transform!(hourlyagg,df)

fit!(dailyagg,df)
dailyres = transform!(dailyagg,df)

The first 5 rows of half-hourly, hourly, and daily aggregates:

julia> first(halfhourlyres,5)
5×2 DataFrame
│ Row │ Date                │ Value    │
│     │ DateTime            │ Float64⍰ │
├─────┼─────────────────────┼──────────┤
│ 1   │ 2014-01-01T00:00:00 │ 0.779611 │
│ 2   │ 2014-01-01T00:30:00 │ 0.209695 │
│ 3   │ 2014-01-01T01:00:00 │ 0.513259 │
│ 4   │ 2014-01-01T01:30:00 │ 0.802717 │
│ 5   │ 2014-01-01T02:00:00 │ 0.290366 │

julia> first(hourlyres,5)
5×2 DataFrame
│ Row │ Date                │ Value    │
│     │ DateTime            │ Float64⍰ │
├─────┼─────────────────────┼──────────┤
│ 1   │ 2014-01-01T00:00:00 │ 0.713308 │
│ 2   │ 2014-01-01T01:00:00 │ 0.446031 │
│ 3   │ 2014-01-01T02:00:00 │ 0.480788 │
│ 4   │ 2014-01-01T03:00:00 │ 0.756309 │
│ 5   │ 2014-01-01T04:00:00 │ 0.229725 │

julia> first(dailyres,5)
5×2 DataFrame
│ Row │ Date                │ Value    │
│     │ DateTime            │ Float64⍰ │
├─────┼─────────────────────┼──────────┤
│ 1   │ 2014-01-01T00:00:00 │ 0.51665  │
│ 2   │ 2014-01-02T00:00:00 │ 0.488274 │
│ 3   │ 2014-01-03T00:00:00 │ 0.49588  │
│ 4   │ 2014-01-04T00:00:00 │ 0.479018 │
│ 5   │ 2014-01-05T00:00:00 │ 0.450831 │