Aggregation
DateValgator is a data type that supports operation for aggregation to minimize noise and lessen the occurrence of missing data. It expects to receive one argument which is the date-time interval for grouping values by taking their median. For example, hourly median as the basis of aggregation can be carried out by passing this argument: :dateinterval => Dates.Hour(1)
To illustrate DateValgator usage, let's start by generating an artificial data with sample frequencey every 5 minutes and print the first 10 rows.
using Dates, DataFrames
gdate = DateTime(2014,1,1):Dates.Minute(5):DateTime(2014,5,1)
gval = rand(length(gdate))
df = DataFrame(Date=gdate,Value=gval)
first(df,10)| Date | Value | |
|---|---|---|
| Dates… | Float64 | |
| 1 | 2014-01-01T00:00:00 | 0.317564 |
| 2 | 2014-01-01T00:05:00 | 0.286774 |
| 3 | 2014-01-01T00:10:00 | 0.702091 |
| 4 | 2014-01-01T00:15:00 | 0.497805 |
| 5 | 2014-01-01T00:20:00 | 0.0628874 |
| 6 | 2014-01-01T00:25:00 | 0.817261 |
| 7 | 2014-01-01T00:30:00 | 0.412464 |
| 8 | 2014-01-01T00:35:00 | 0.449599 |
| 9 | 2014-01-01T00:40:00 | 0.00206474 |
| 10 | 2014-01-01T00:45:00 | 0.513834 |
DateValgator
Let's apply the aggregator and try diffent groupings: hourly vs half hourly vs daily aggregates of the data.
using TSML, TSML.TSMLTransformers, TSML.Utils, TSML.TSMLTypes
hourlyagg = DateValgator(Dict(:dateinterval => Dates.Hour(1)))
halfhourlyagg = DateValgator(Dict(:dateinterval => Dates.Minute(30)))
dailyagg = DateValgator(Dict(:dateinterval => Dates.Day(1)))
fit!(halfhourlyagg,df)
halfhourlyres = transform!(halfhourlyagg,df)
fit!(hourlyagg,df)
hourlyres = transform!(hourlyagg,df)
fit!(dailyagg,df)
dailyres = transform!(dailyagg,df)The first 5 rows of half-hourly, hourly, and daily aggregates:
julia> first(halfhourlyres,5)
5×2 DataFrames.DataFrame
│ Row │ Date │ Value │
│ │ Dates.DateTime │ Float64⍰ │
├─────┼─────────────────────┼──────────┤
│ 1 │ 2014-01-01T00:00:00 │ 0.317564 │
│ 2 │ 2014-01-01T00:30:00 │ 0.412464 │
│ 3 │ 2014-01-01T01:00:00 │ 0.634207 │
│ 4 │ 2014-01-01T01:30:00 │ 0.576241 │
│ 5 │ 2014-01-01T02:00:00 │ 0.490236 │
julia> first(hourlyres,5)
5×2 DataFrames.DataFrame
│ Row │ Date │ Value │
│ │ Dates.DateTime │ Float64⍰ │
├─────┼─────────────────────┼──────────┤
│ 1 │ 2014-01-01T00:00:00 │ 0.407684 │
│ 2 │ 2014-01-01T01:00:00 │ 0.481717 │
│ 3 │ 2014-01-01T02:00:00 │ 0.626427 │
│ 4 │ 2014-01-01T03:00:00 │ 0.498177 │
│ 5 │ 2014-01-01T04:00:00 │ 0.285262 │
julia> first(dailyres,5)
5×2 DataFrames.DataFrame
│ Row │ Date │ Value │
│ │ Dates.DateTime │ Float64⍰ │
├─────┼─────────────────────┼──────────┤
│ 1 │ 2014-01-01T00:00:00 │ 0.46017 │
│ 2 │ 2014-01-02T00:00:00 │ 0.503287 │
│ 3 │ 2014-01-03T00:00:00 │ 0.488554 │
│ 4 │ 2014-01-04T00:00:00 │ 0.515489 │
│ 5 │ 2014-01-05T00:00:00 │ 0.496321 │