Autumn School - Day 2

Dr. J. Kavanagh

2022-09-12

Exploring Time Series Data

We are going to use the Rainfall.RData package for this session.

load('rainfall.RData')

This data is a collection of rainfall levels captured across Ireland from 1850-2014. The data is in two parts: stations and rain. This a truncated version of a longer lesson plan from Prof. Chris Brunsdon at the National Centre for Geocomputation available here and here.

head(stations)
## # A tibble: 6 × 9
##   Station     Elevation Easting Northing   Lat  Long County  Abbreviation Source
##   <chr>           <int>   <dbl>    <dbl> <dbl> <dbl> <chr>   <chr>        <chr> 
## 1 Athboy             87  270400   261700  53.6 -6.93 Meath   AB           Met E…
## 2 Foulksmills        71  284100   118400  52.3 -6.77 Wexford F            Met E…
## 3 Mullingar         112  241780   247765  53.5 -7.37 Westme… M            Met E…
## 4 Portlaw             8  246600   115200  52.3 -7.31 Waterf… P            Met E…
## 5 Rathdrum          131  319700   186000  52.9 -6.22 Wicklow RD           Met E…
## 6 Strokestown        49  194500   279100  53.8 -8.1  Roscom… S            Met E…
head(rain)
## # A tibble: 6 × 4
##    Year Month Rainfall Station
##   <dbl> <ord>    <dbl> <chr>  
## 1  1850 Jan      169   Ardara 
## 2  1851 Jan      236.  Ardara 
## 3  1852 Jan      250.  Ardara 
## 4  1853 Jan      209.  Ardara 
## 5  1854 Jan      188.  Ardara 
## 6  1855 Jan       32.3 Ardara

Creating an Average Rainfall dataframe

rain %>% group_by(Station) %>% 
summarise(mrain=mean(Rainfall))  -> rain_summary
head(rain_summary)
## # A tibble: 6 × 2
##   Station    mrain
##   <chr>      <dbl>
## 1 Ardara     140. 
## 2 Armagh      68.3
## 3 Athboy      74.7
## 4 Belfast     87.1
## 5 Birr        70.8
## 6 Cappoquinn 121.

Grouping Rainfall by Months

This is simpler than the earlier examples using lubridate, but the principal is the same.

rain %>% group_by(Month) %>% 
summarise(mrain=mean(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 × 2
##   Month mrain
##   <ord> <dbl>
## 1 Jan   113. 
## 2 Feb    83.2
## 3 Mar    79.5
## 4 Apr    68.7
## 5 May    71.3
## 6 Jun    72.7

Plotting the results - Bar Charts

rain_months %>% 
ggplot(aes(x=Month,y=mrain)) + 
geom_bar(stat='identity') + 
labs(y='Mean Rainfall')

rain_months %>% 
ggplot(aes(x=Month,y=mrain)) + 
geom_bar(stat='identity') + 
labs(y='Mean Rainfall') +
theme_economist()

Average by month/station Combination

rain %>% group_by(Month,Station) %>% 
  summarise(mean_rain=mean(Rainfall)) -> rain_season_station
## `summarise()` has grouped output by 'Month'. You can override using the
## `.groups` argument.
head(rain_season_station)
## # A tibble: 6 × 3
## # Groups:   Month [1]
##   Month Station    mean_rain
##   <ord> <chr>          <dbl>
## 1 Jan   Ardara         175. 
## 2 Jan   Armagh          74.6
## 3 Jan   Athboy          84.9
## 4 Jan   Belfast        101. 
## 5 Jan   Birr            79.9
## 6 Jan   Cappoquinn     154.

Re-ordering the Data

Typically data exists in two formats: long and wide. At the present the rain_season_station dataframe is in the long format and we need to make it a wide format. We can use the reshape2 package for this.

require(reshape2)
rain_season_station %>% acast(Station~Month) %>% head
## Using mean_rain as value column: use value.var to override.
##                  Jan       Feb       Mar      Apr      May       Jun       Jul
## Ardara     174.82606 126.82303 123.02000 98.79333 96.90727 105.24061 123.70485
## Armagh      74.57242  55.97182  56.48879 53.67030 59.23182  62.72939  72.50636
## Athboy      84.94759  62.62133  62.44944 58.97874 62.16260  68.11460  76.28662
## Belfast    101.20718  74.50206  73.10221 65.90492 69.23426  74.48525  87.70003
## Birr        79.92074  57.88501  58.42056 54.07187 60.25831  61.96445  75.10084
## Cappoquinn 153.97159 117.77099 110.02890 94.00365 95.81437  94.86357 104.09372
##                  Aug       Sep       Oct       Nov       Dec
## Ardara     145.24788 152.80727 174.44788 176.45030 186.14182
## Armagh      81.92182  69.02576  80.94242  73.73121  79.05939
## Athboy      88.84710  76.35077  88.14627  80.95767  87.05997
## Belfast    102.44499  87.11622 106.35394 100.31760 102.95073
## Birr        86.75669  72.21722  83.80810  77.87266  81.74334
## Cappoquinn 125.42245 116.04466 146.70658 141.10012 154.87402

Creating a Heatmap

A heatmap is a useful visualisation for exploring density within data. The colour schema is fairly straightforward, the darkest red is the highest values and the lightest yellows are the lowest values.

rain_season_station %>% acast(Station~Month) %>% heatmap(Colv=NA)
## Using mean_rain as value column: use value.var to override.

Creating a Time Series data format in R

The rainfall data is clearly a time series dataset, however, R also creates specific file types called ts which are designed for use with specific packages such as dygraphs. The majority of financial data is best explored in the ts data format. We will be exploring stock market data at a later stage using live data from Yahoo! Finance.

# This creates a rainfall ts dataframe from using the rain data, summarised as the sum of rainfall per month and year. 

rain %>%group_by(Year,Month) %>% 
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>% 
ts(start=c(1850,1),freq=12) -> rain_ts
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
rain_ts %>% window(c(1870,1),c(1871,12))
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1870 2666.2 1975.3 1500.5 1024.8 1862.8  789.2 1038.6 1510.5 2045.5 5177.6
## 1871 3148.3 2343.7 1731.7 2654.5  657.6 2040.1 3705.0 1869.9 2083.4 2774.3
##         Nov    Dec
## 1870 1733.2 1902.2
## 1871 2000.1 1902.0

Creating an interactive dygraph

rain_ts %>% dygraph

Dygraph with a selector option

rain_ts %>% dygraph(width=800,height=300) %>% dyRangeSelector

Comparative Dygraphs

First select the station in Birr, Co. Offaly and use the same code for creating the overall ts data.

rain %>%  group_by(Year,Month) %>% filter(Station=="Birr") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) ->  birr_ts
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.

Second follow this up with a comparative, in this case Shannon Airport.

rain %>%  group_by(Year,Month) %>% filter(Station=="Shannon Airport") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) ->  shannon_ts
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.

Finall merge these three ts objects together into one new file.

birr_shannon_natl_ts <- cbind(birr_ts,shannon_ts, rain_ts)

# Check your results! 
window(birr_shannon_natl_ts,c(1850,1),c(1850,5))
##          birr_ts shannon_ts rain_ts
## Jan 1850    93.3       85.2  2836.3
## Feb 1850    41.5       81.3  2158.9
## Mar 1850    10.3       27.1   964.1
## Apr 1850    81.7       87.2  3457.2
## May 1850    72.2       71.0  1492.1

Three Dygraphs

A cross-comparison of rainfall at two stations and the national average across the entire dataset.

birr_shannon_natl_ts %>% dygraph(width=800,height=360) %>% dyRangeSelector

R Exercise

  1. Create a dygraph of four distinct stations across Ireland showing the mean of rainfall per year. Each station should be within a specific province of Ireland.

  2. Create a dygraph of the mean and median rainfall per year and compare it to the sum of rainfall.

Financial Time Series Data

We are going to be using the quantmod package, which is useful for exploring financial data from the stock exchange. Specifically we’re going to be looking up the information from Yahoo! Finance.

library(quantmod)

Create specific dates, in this case I’ve chosen a ten year span from 2007-2017.

start <- as.Date("1989-01-01")
end <- as.Date("2022-09-13")

Next get the relevant information from the publicly listed company of your choice. In my case, Apple Corp.

getSymbols("AAPL", src = "yahoo", from = start, to = end)
## [1] "AAPL"

This creates a new xts object called AAPL

AAPL %>% head()
##            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
## 1989-01-03  0.359375  0.361607 0.357143   0.360491   100016000      0.284788
## 1989-01-04  0.363839  0.376116 0.361607   0.375000   239948800      0.296250
## 1989-01-05  0.375000  0.386161 0.368304   0.377232   307328000      0.298013
## 1989-01-06  0.377232  0.388393 0.377232   0.380580   198665600      0.300658
## 1989-01-09  0.383929  0.385045 0.377232   0.383929    79307200      0.303304
## 1989-01-10  0.379464  0.382813 0.370536   0.380580   103320000      0.300658

Some Examples of Charts

Use the OHLC command from quantmod and change the AAPL xts object to a new one called price.

OHLC(AAPL) -> price 
dygraph(price, main = "Apple Stock price") %>% dyOptions(colors = RColorBrewer::brewer.pal(4, "Dark2"))