This document contains the homework problems for the Data 624 course. Link: https://otexts.com/fpp3/graphics-exercises.html
Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.
Use autoplot() to plot each of these in separate plots
What is the time interval of each series?
if (!require('fpp3')) (install.packages('fpp3'))
if (!require('magrittr')) (install.packages('magrittr'))
if (!require('dplyr')) (install.packages('dplyr'))
if (!require('readxl')) (install.packages('readxl'))
if (!require('RCurl')) (install.packages('RCurl'))library(fpp3)
library(ggplot2)
#help("gafa_stock")
#help("PBS")
#help("vic_elec")
#help("pelt")The gafa_stock data represents the Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD.
summary(gafa_stock)## Symbol Date Open High
## Length:5032 Min. :2014-01-02 Min. : 54.02 Min. : 54.94
## Class :character 1st Qu.:2015-04-02 1st Qu.: 118.33 1st Qu.: 119.25
## Mode :character Median :2016-06-30 Median : 257.59 Median : 261.94
## Mean :2016-07-01 Mean : 465.75 Mean : 469.95
## 3rd Qu.:2017-09-29 3rd Qu.: 746.53 3rd Qu.: 750.96
## Max. :2018-12-31 Max. :2038.11 Max. :2050.50
## Low Close Adj_Close Volume
## Min. : 51.85 Min. : 53.53 Min. : 53.53 Min. : 7900
## 1st Qu.: 117.35 1st Qu.: 118.54 1st Qu.: 115.48 1st Qu.: 2519975
## Median : 256.89 Median : 259.51 Median : 258.61 Median : 10804400
## Mean : 460.92 Mean : 465.56 Mean : 464.24 Mean : 19493800
## 3rd Qu.: 738.01 3rd Qu.: 744.79 3rd Qu.: 744.79 3rd Qu.: 29399250
## Max. :2013.00 Max. :2039.51 Max. :2039.51 Max. :266380800
head(gafa_stock)## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
autoplot(gafa_stock) +
ggtitle("Historical stock prices from 2014-2018 (Google, Amazon, Facebook and Apple)") +
xlab("Symbol") +
ylab("Adj_Close")What is the time interval of these stocks?
interval(gafa_stock)## <interval[1]>
## [1] !
Interval cannot be calculated as it’s an irregular dataset. We can see that the date difference can range from 1 day to >1 day for each of the stock symbols
The PBS data represents the Monthly Medicare Australia prescription data.
head(PBS)## # A tsibble: 6 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [1]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_…¹ Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-payments A Alimentar… A01 STOMAT… 18228 67877
## 2 1991 Aug Concessional Co-payments A Alimentar… A01 STOMAT… 15327 57011
## 3 1991 Sep Concessional Co-payments A Alimentar… A01 STOMAT… 14775 55020
## 4 1991 Oct Concessional Co-payments A Alimentar… A01 STOMAT… 15380 57222
## 5 1991 Nov Concessional Co-payments A Alimentar… A01 STOMAT… 14371 52120
## 6 1991 Dec Concessional Co-payments A Alimentar… A01 STOMAT… 15028 54299
## # … with abbreviated variable name ¹​ATC2_desc
PBS %>% filter(ATC2 == "A01")%>% autoplot(Cost) +
ggtitle("Cost of the scripts in $AUD for Anatomical Therapeutic Chemical index (level 2)") What is the time interval of these costs?
interval(PBS)## <interval[1]>
## [1] 1M
The time interval of the dataset is at monthly level.
The vic_elec data represents the Half-hourly electricity demand for Victoria, Australia.This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MWh, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MWh, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).
summary(vic_elec)## Time Demand Temperature
## Min. :2012-01-01 00:00:00 Min. :2858 Min. : 1.50
## 1st Qu.:2012-09-30 22:52:30 1st Qu.:3969 1st Qu.:12.30
## Median :2013-07-01 22:45:00 Median :4635 Median :15.40
## Mean :2013-07-01 22:45:00 Mean :4665 Mean :16.27
## 3rd Qu.:2014-04-01 23:37:30 3rd Qu.:5244 3rd Qu.:19.40
## Max. :2014-12-31 23:30:00 Max. :9345 Max. :43.20
## Date Holiday
## Min. :2012-01-01 Mode :logical
## 1st Qu.:2012-09-30 FALSE:51120
## Median :2013-07-01 TRUE :1488
## Mean :2013-07-01
## 3rd Qu.:2014-04-01
## Max. :2014-12-31
head(vic_elec)## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
autoplot(vic_elec) +
ggtitle("Half-hourly electricity demand for Victoria, Australia") +
xlab("Year") +
ylab("Temperature")What is the time interval of this demand?
interval(vic_elec)## <interval[1]>
## [1] 30m
The time interval is 30 minutes.
The pelt data represents the Pelt trading records.Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company.
summary(pelt)## Year Hare Lynx
## Min. :1845 Min. : 1800 Min. : 3190
## 1st Qu.:1868 1st Qu.: 12730 1st Qu.:11840
## Median :1890 Median : 40970 Median :29590
## Mean :1890 Mean : 45406 Mean :28337
## 3rd Qu.:1912 3rd Qu.: 71405 3rd Qu.:42520
## Max. :1935 Max. :152650 Max. :79350
head(pelt)## # A tsibble: 6 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
autoplot(pelt) +
ggtitle("The number of Snowshoe Hare pelts vs Canadian Lynx pelts traded)") +
xlab("Year") +
ylab("Trading records")What is the time interval of this demand?
interval(pelt)## <interval[1]>
## [1] 1Y
The time interval of the dataset is 1 year.
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
filter_gafa <- gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
arrange(desc(Close))
filter_gafa## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 2 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
## 3 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 4 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
Peak closing price for AMZN is with price 2039.51 Peak closing price for AAPL is with price 232.07 Peak closing price for FB is with price 217.50 Peak closing price for GOOG is with price 1268.33
Interesting to see lot changed from then to now!
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- read.csv("https://raw.githubusercontent.com/BharaniNittala/DATA624/main/HW1/tute1.csv")
head(tute1)## Quarter Sales AdBudget GDP
## 1 1981-03-01 1020.2 659.2 251.8
## 2 1981-06-01 889.2 589.0 290.9
## 3 1981-09-01 795.0 512.5 290.8
## 4 1981-12-01 1003.9 614.1 292.4
## 5 1982-03-01 1057.7 647.2 279.1
## 6 1982-06-01 944.4 602.0 254.0
mytimeseries <- tute1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter)
head(mytimeseries)## # A tsibble: 6 x 4 [1Q]
## Quarter Sales AdBudget GDP
## <qtr> <dbl> <dbl> <dbl>
## 1 1981 Q1 1020. 659. 252.
## 2 1981 Q2 889. 589 291.
## 3 1981 Q3 795 512. 291.
## 4 1981 Q4 1004. 614. 292.
## 5 1982 Q1 1058. 647. 279.
## 6 1982 Q2 944. 602 254
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() Each of the time series have range of values. When facet_grid is not included, everything is plotted on one scale. While this makes comparison easy, we may miss out on specific time series learnings.
The USgas package contains data on the demand for natural gas in the US.
if (!require('USgas')) (install.packages('USgas'))
library(USgas)us_total_new <- us_total
us_total_new <- us_total_new %>%
as_tsibble(index = year, key = state)
head(us_total_new)## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
newengland_gas <- us_total_new %>%
filter(state == 'Maine' |
state == 'Vermont' |
state == 'New Hampshire' |
state == 'Massachusetts' |
state == 'Connecticut' |
state == 'Rhode Island') %>%
mutate(y = y/1000)
head(newengland_gas)## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <dbl>
## 1 1997 Connecticut 145.
## 2 1998 Connecticut 131.
## 3 1999 Connecticut 152.
## 4 2000 Connecticut 160.
## 5 2001 Connecticut 146.
## 6 2002 Connecticut 178.
autoplot(newengland_gas, y) +
labs(title = "The annual natural gas consumption by state",
subtitle = "New England Zone",
y = "Consumption in thousands")if (!require('readxl')) (install.packages('readxl'))
library("readxl") # library for read_excel() method
tourism_xlsx <- readxl::read_excel("C:/Users/nittalab/Documents/Personal/CUNY/DATA 624/HW1/tourism.xlsx")
head(tourism_xlsx)## # A tibble: 6 × 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
tourism_xlsx_tb <- tourism_xlsx %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = c(Region, State, Purpose)) -> tourism_xlsx
head(tourism_xlsx_tb)## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
# summarize average trip length by region and purpose then finds combination with highest
tourism_xlsx%>%
group_by(Region, Purpose)%>%
summarize(avg_trip_length = mean(Trips), .groups = "keep")%>%
ungroup()%>%
filter(avg_trip_length == max(avg_trip_length))## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter avg_trip_length
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
# summarize average trip length by region and purpose then finds combination with highest
t_by_state <- tourism_xlsx_tb %>%
group_by(State) %>%
summarise(Trips = sum(Trips)) %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = State)
head(t_by_state)## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## State Quarter Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))autoplot(), ggseason(), ggsubseries(), gglag(), ACF() %>% autoplot()
autoplot(myseries) +
ggtitle("A3349767W")+
xlab("Time") +
ylab("Sales");myseries%>%
gg_season(Turnover)+labs(title = "Turnover for Clothing, footwear and personal accessory retailing",
subtitle = "Series: A3349767W",
y = "Turnover")There appears to be mostly increasing trend with the exception of a slight dip after 2010. The auto plot shows evidence of seasonal changes in the data, evident by the constant fluctuations within each period. We can use the seasonal plot to drill down further.The seasonal plot actually shows a spike in consumer spending between from Nov and December. The slope of each spike increases every year. This could be representative of an increasing consumer culture mindset. December confirmed to be a major retail month from the charts.