TimeSeries Analysis and Plotting

Daru allows for a host of functions for analyzing and visualizing time series data. In this notebook we'll go over a few with examples.

For details on using statistical analysis functions offered by daru see this blog post.

In [1]:
require 'distribution'
require 'daru'
require 'gnuplotrb'
Out[1]:
true
In [2]:
rng = Distribution::Normal.rng

index  = Daru::DateTimeIndex.date_range(:start => '2012-4-2', :periods => 1000, :freq => 'D')
vector = Daru::Vector.new(1000.times.map {rng.call}, index: index)
Out[2]:
Daru::Vector:17602640 size: 1000
nil
2012-04-02T00:00:00+00:001.8929310014208862
2012-04-03T00:00:00+00:00-1.7877272477227173
2012-04-04T00:00:00+00:000.13713043059789104
2012-04-05T00:00:00+00:000.08248983987417105
2012-04-06T00:00:00+00:000.5016724046503049
2012-04-07T00:00:00+00:001.4755023856805087
2012-04-08T00:00:00+00:001.3250840528296892
2012-04-09T00:00:00+00:00-1.4449599562106255
2012-04-10T00:00:00+00:000.8618890299729314
2012-04-11T00:00:00+00:001.1155691054056835
2012-04-12T00:00:00+00:000.5335788810730238
2012-04-13T00:00:00+00:00-0.7589896655108529
2012-04-14T00:00:00+00:000.043391207814487326
2012-04-15T00:00:00+00:000.3163352310810718
2012-04-16T00:00:00+00:00-1.397212944702625
2012-04-17T00:00:00+00:000.11548292990715667
2012-04-18T00:00:00+00:00-1.490174232857472
2012-04-19T00:00:00+00:00-0.9393382026618173
2012-04-20T00:00:00+00:00-1.4113215880007444
2012-04-21T00:00:00+00:000.6958532971395335
2012-04-22T00:00:00+00:00-0.8904903017202142
2012-04-23T00:00:00+00:00-0.2837001343416957
2012-04-24T00:00:00+00:00-1.917197520319854
2012-04-25T00:00:00+00:000.7012337714486957
2012-04-26T00:00:00+00:001.1666246183803257
2012-04-27T00:00:00+00:000.29611920958332577
2012-04-28T00:00:00+00:000.8804928443175081
2012-04-29T00:00:00+00:00-1.6403348075359634
2012-04-30T00:00:00+00:00-0.31762253519595485
2012-05-01T00:00:00+00:00-0.12294936853487166
2012-05-02T00:00:00+00:002.4227489569733893
2012-05-03T00:00:00+00:000.11947772841630783
......
2014-12-27T00:00:00+00:000.18341906625828117
In [3]:
vector = vector.cumsum
Out[3]:
Daru::Vector:28616380 size: 1000
nil
2012-04-02T00:00:00+00:001.8929310014208862
2012-04-03T00:00:00+00:000.10520375369816892
2012-04-04T00:00:00+00:000.24233418429605996
2012-04-05T00:00:00+00:000.32482402417023104
2012-04-06T00:00:00+00:000.826496428820536
2012-04-07T00:00:00+00:002.3019988145010446
2012-04-08T00:00:00+00:003.627082867330734
2012-04-09T00:00:00+00:002.182122911120109
2012-04-10T00:00:00+00:003.04401194109304
2012-04-11T00:00:00+00:004.159581046498724
2012-04-12T00:00:00+00:004.693159927571748
2012-04-13T00:00:00+00:003.934170262060895
2012-04-14T00:00:00+00:003.9775614698753823
2012-04-15T00:00:00+00:004.293896700956454
2012-04-16T00:00:00+00:002.896683756253829
2012-04-17T00:00:00+00:003.0121666861609855
2012-04-18T00:00:00+00:001.5219924533035134
2012-04-19T00:00:00+00:000.5826542506416961
2012-04-20T00:00:00+00:00-0.8286673373590483
2012-04-21T00:00:00+00:00-0.13281404021951482
2012-04-22T00:00:00+00:00-1.023304341939729
2012-04-23T00:00:00+00:00-1.3070044762814246
2012-04-24T00:00:00+00:00-3.2242019966012787
2012-04-25T00:00:00+00:00-2.522968225152583
2012-04-26T00:00:00+00:00-1.3563436067722572
2012-04-27T00:00:00+00:00-1.0602243971889314
2012-04-28T00:00:00+00:00-0.17973155287142328
2012-04-29T00:00:00+00:00-1.8200663604073868
2012-04-30T00:00:00+00:00-2.1376888956033415
2012-05-01T00:00:00+00:00-2.2606382641382132
2012-05-02T00:00:00+00:000.1621106928351761
2012-05-03T00:00:00+00:000.2815884212514839
......
2014-12-27T00:00:00+00:00-24.80984292989138

Daru::Vector has a bunch of functions for performing useful statistical analysis of time series data. See this blog post for a comprehensive overview of the statistics functions available on Daru::Vector.

For example, you can calculate the rolling mean of a Vector with the #rolling_mean function and pass in the loopback length as the argument:

In [4]:
rolling = vector.rolling_mean 60
rolling.tail
Out[4]:
Daru::Vector:29080000 size: 10
nil
2014-12-18T00:00:00+00:00-19.821585248153262
2014-12-19T00:00:00+00:00-19.808139883503745
2014-12-20T00:00:00+00:00-19.781216028083545
2014-12-21T00:00:00+00:00-19.73020939331389
2014-12-22T00:00:00+00:00-19.755920206890632
2014-12-23T00:00:00+00:00-19.766270574147697
2014-12-24T00:00:00+00:00-19.785619794017194
2014-12-25T00:00:00+00:00-19.795712631164005
2014-12-26T00:00:00+00:00-19.837207021104312
2014-12-27T00:00:00+00:00-19.889931452290522

Using the gnuplotRB gem, it is also possible to directly plot the vector and its rolling mean as line plots on the same graph:

In [5]:
GnuplotRB::Plot.new([vector, with: 'lines', title: 'Vector'], [rolling, with: 'lines', title: 'Rolling Mean'])
Out[5]:
Gnuplot Produced by GNUPLOT 5.0 patchlevel 3 -30 -25 -20 -15 -10 -5 0 5 01 Apr 2012 01 Jul 2012 01 Oct 2012 01 Jan 2013 01 Apr 2013 01 Jul 2013 01 Oct 2013 01 Jan 2014 01 Apr 2014 01 Jul 2014 01 Oct 2014 01 Jan 2015 Vector Vector Rolling Mean Rolling Mean
In [6]:
df = Daru::DataFrame.new({
  a: 1000.times.map {rng.call}, 
  b: 1000.times.map {rng.call}, 
  c: 1000.times.map {rng.call}
}, index: index)
Out[6]:
Daru::DataFrame:18785760 rows: 1000 cols: 3
abc
2012-04-02T00:00:00+00:00-0.69475362051709242.047322364819309-0.8312388511803154
2012-04-03T00:00:00+00:000.4288182884252429-0.168666731098301021.8619871129533594
2012-04-04T00:00:00+00:001.11222603051191452.401373414519374-0.22086231994040165
2012-04-05T00:00:00+00:00-0.54913575531436381.0593306090541381-1.2358490555536528
2012-04-06T00:00:00+00:00-0.3524359331423541.0361253698252052.051707999011653
2012-04-07T00:00:00+00:00-1.54593434233260821.41450469754070940.479501909302452
2012-04-08T00:00:00+00:00-0.91268146514970380.08997451440228435-0.33244467366719316
2012-04-09T00:00:00+00:001.0942284409025929-0.5914256631603069-0.10965690286961519
2012-04-10T00:00:00+00:00-1.189599766618536-1.6504362069360503-1.7834684774901928
2012-04-11T00:00:00+00:000.25555968244568790.09444524135559265-1.573776911863845
2012-04-12T00:00:00+00:000.08915341016926566-0.26820372617029165-0.9867829661340854
2012-04-13T00:00:00+00:000.068779428376005150.83086960933172870.9932475109122552
2012-04-14T00:00:00+00:000.58124626845592820.8284554752139211.794974039065598
2012-04-15T00:00:00+00:000.57176535443382550.7968497134039435-0.2137281706627073
2012-04-16T00:00:00+00:00-0.18354726982374670.113753536332970331.3995365019881125
2012-04-17T00:00:00+00:00-1.8036316202310821.1202312653723845-1.8772466220257593
2012-04-18T00:00:00+00:00-0.2394511541502766-0.359297267815066431.2625165836476817
2012-04-19T00:00:00+00:000.9449696065324419-1.2238915741892322-0.3445182971483625
2012-04-20T00:00:00+00:00-2.61562621853613830.86659684014086570.41715577129962633
2012-04-21T00:00:00+00:000.2759161208244279-0.02479736654918991-1.0281218944966948
2012-04-22T00:00:00+00:002.27601353896496540.4279361038038636-0.06980266563482719
2012-04-23T00:00:00+00:00-0.124784006554017890.192990654282551660.672079999341098
2012-04-24T00:00:00+00:00-1.06365585395650240.86289927681910840.08168988302828417
2012-04-25T00:00:00+00:001.1740110448821357-0.390462970654100350.8258712835867261
2012-04-26T00:00:00+00:000.9265831360275995-0.07846575584946325-0.18251048994431143
2012-04-27T00:00:00+00:000.9879069917975268-1.532686297548485-1.3414889817000768
2012-04-28T00:00:00+00:00-0.009959972194899661.8568261714703210.01995383034537257
2012-04-29T00:00:00+00:00-0.79345489098443591.24405657248738291.0453568260856172
2012-04-30T00:00:00+00:000.38190234217252020.37312726223532170.10998010272644088
2012-05-01T00:00:00+00:000.7911424676513545-0.86974341797108260.4474602612770729
2012-05-02T00:00:00+00:001.81974351376074670.001316203559759350.8575841779364537
2012-05-03T00:00:00+00:00-0.386427622005688640.54580098745222630.14343686573268757
............
2014-12-27T00:00:00+00:00-1.37224564210677061.24284436350218150.3617790549036807
In [7]:
df = df.cumsum
Out[7]:
Daru::DataFrame:25605060 rows: 1000 cols: 3
abc
2012-04-02T00:00:00+00:00-0.69475362051709242.047322364819309-0.8312388511803154
2012-04-03T00:00:00+00:00-0.265935332091849551.87865563372100791.0307482617730441
2012-04-04T00:00:00+00:000.84629069842006494.2800290482403810.8098859418326425
2012-04-05T00:00:00+00:000.29715494310570115.339359657294519-0.42596311372101026
2012-04-06T00:00:00+00:00-0.055280990036652886.37548502711972451.6257448852906426
2012-04-07T00:00:00+00:00-1.6012153323692617.7899897246604342.1052467945930946
2012-04-08T00:00:00+00:00-2.5138967975189657.8799642390627181.7728021209259015
2012-04-09T00:00:00+00:00-1.41966835661637217.2885385759024111.6631452180562862
2012-04-10T00:00:00+00:00-2.6092681232349085.6381023689663605-0.12032325943390654
2012-04-11T00:00:00+00:00-2.35370844078922045.732547610321953-1.6941001712977515
2012-04-12T00:00:00+00:00-2.26455503061995475.464343884151662-2.680883137431837
2012-04-13T00:00:00+00:00-2.19577560224394966.2952134934833905-1.687635626519582
2012-04-14T00:00:00+00:00-1.61452933378802137.1236689686973110.10733841254601595
2012-04-15T00:00:00+00:00-1.04276397935419587.920518682101255-0.10638975811669135
2012-04-16T00:00:00+00:00-1.22631124917794278.0342722184342251.2931467438714213
2012-04-17T00:00:00+00:00-3.0299428694090259.154503483806609-0.5840998781543381
2012-04-18T00:00:00+00:00-3.26939402355930138.7952062159915430.6784167054933437
2012-04-19T00:00:00+00:00-2.32442441702685937.5713146418023110.33389840834498113
2012-04-20T00:00:00+00:00-4.9400506355629978.4379114819431770.7510541796446075
2012-04-21T00:00:00+00:00-4.6641345147385698.413114115393986-0.27706771485208725
2012-04-22T00:00:00+00:00-2.38812097577360358.84105021919785-0.34687038048691443
2012-04-23T00:00:00+00:00-2.51290498232762139.0340408734804020.32520961885418354
2012-04-24T00:00:00+00:00-3.57656083628412359.896940150299510.4068995018824677
2012-04-25T00:00:00+00:00-2.4025497914019889.506477179645411.2327707854691938
2012-04-26T00:00:00+00:00-1.47596665537438869.4280114237959471.0502602955248823
2012-04-27T00:00:00+00:00-0.48805966357686187.895325126247462-0.2912286861751945
2012-04-28T00:00:00+00:00-0.498019635771761459.752151297717782-0.2712748558298219
2012-04-29T00:00:00+00:00-1.291474526756197310.9962078702051650.7740819702557953
2012-04-30T00:00:00+00:00-0.909572184583677111.3693351324404870.8840620729822362
2012-05-01T00:00:00+00:00-0.1184297169323225210.4995917144694031.331522334259309
2012-05-02T00:00:00+00:001.701313796828424210.5009079180291622.189106512195763
2012-05-03T00:00:00+00:001.314886174822735611.0467089054813882.3325433779284506
............
2014-12-27T00:00:00+00:004.228210982672566-7.57194387892858852.135586977188005
In [8]:
rs = df.rolling_sum(60)
plots = []
rs.each_vector_with_index do |vec,n|
  plots << GnuplotRB::Plot.new([vec, with: 'lines', title: n])
end

GnuplotRB::Multiplot.new(*plots, layout: [3,1], title: 'Rolling sums')
Out[8]:
Gnuplot Produced by GNUPLOT 5.0 patchlevel 3 Rolling sums -500 0 500 1000 1500 2000 01 Apr 2012 01 Jul 2012 01 Oct 2012 01 Jan 2013 01 Apr 2013 01 Jul 2013 01 Oct 2013 01 Jan 2014 01 Apr 2014 01 Jul 2014 01 Oct 2014 01 Jan 2015 a a -800 -600 -400 -200 0 200 400 600 800 1000 01 Apr 2012 01 Jul 2012 01 Oct 2012 01 Jan 2013 01 Apr 2013 01 Jul 2013 01 Oct 2013 01 Jan 2014 01 Apr 2014 01 Jul 2014 01 Oct 2014 01 Jan 2015 b b 0 500 1000 1500 2000 2500 3000 01 Apr 2012 01 Jul 2012 01 Oct 2012 01 Jan 2013 01 Apr 2013 01 Jul 2013 01 Oct 2013 01 Jan 2014 01 Apr 2014 01 Jul 2014 01 Oct 2014 01 Jan 2015 c c