Aggregation
DateValgator
is a data type that supports operation for aggregation to minimize noise and lessen the occurrence of missing data. It expects to receive one argument which is the date-time interval for grouping values by taking their median. For example, hourly median as the basis of aggregation can be carried out by passing this argument: :dateinterval => Dates.Hour(1)
To illustrate DateValgator
usage, let's start by generating an artificial data with sample frequencey every 5 minutes and print the first 10 rows.
using TSML
gdate = DateTime(2014,1,1):Dates.Minute(5):DateTime(2014,5,1)
gval = rand(length(gdate))
df = DataFrame(Date=gdate,Value=gval)
julia> first(df,10)
10×2 DataFrame Row │ Date Value │ DateTime Float64 ─────┼──────────────────────────────── 1 │ 2014-01-01T00:00:00 0.662278 2 │ 2014-01-01T00:05:00 0.331899 3 │ 2014-01-01T00:10:00 0.469472 4 │ 2014-01-01T00:15:00 0.739479 5 │ 2014-01-01T00:20:00 0.334865 6 │ 2014-01-01T00:25:00 0.769361 7 │ 2014-01-01T00:30:00 0.0128247 8 │ 2014-01-01T00:35:00 0.897179 9 │ 2014-01-01T00:40:00 0.117367 10 │ 2014-01-01T00:45:00 0.242543
DateValgator
Let's apply the aggregator and try diffent groupings: hourly vs half hourly vs daily aggregates of the data.
using TSML
hourlyagg = DateValgator(Dict(:dateinterval => Dates.Hour(1)))
halfhourlyagg = DateValgator(Dict(:dateinterval => Dates.Minute(30)))
dailyagg = DateValgator(Dict(:dateinterval => Dates.Day(1)))
halfhourlyres = fit_transform!(halfhourlyagg,df)
hourlyres = fit_transform!(hourlyagg,df)
dailyres = fit_transform!(dailyagg,df)
The first 5 rows of half-hourly, hourly, and daily aggregates:
julia> first(halfhourlyres,5)
5×2 DataFrame Row │ Date Value │ DateTime Float64? ─────┼──────────────────────────────── 1 │ 2014-01-01T00:00:00 0.662278 2 │ 2014-01-01T00:30:00 0.0128247 3 │ 2014-01-01T01:00:00 0.0531016 4 │ 2014-01-01T01:30:00 0.631336 5 │ 2014-01-01T02:00:00 0.817896
julia> first(hourlyres,5)
5×2 DataFrame Row │ Date Value │ DateTime Float64? ─────┼─────────────────────────────── 1 │ 2014-01-01T00:00:00 0.565875 2 │ 2014-01-01T01:00:00 0.567998 3 │ 2014-01-01T02:00:00 0.42379 4 │ 2014-01-01T03:00:00 0.437441 5 │ 2014-01-01T04:00:00 0.61324
julia> first(dailyres,5)
5×2 DataFrame Row │ Date Value │ DateTime Float64? ─────┼─────────────────────────────── 1 │ 2014-01-01T00:00:00 0.556626 2 │ 2014-01-02T00:00:00 0.51652 3 │ 2014-01-03T00:00:00 0.505215 4 │ 2014-01-04T00:00:00 0.522375 5 │ 2014-01-05T00:00:00 0.509797