Homework 1

First, I’m loading some libraries that I think will be useful.

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(tsibble)

## Warning: package 'tsibble' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## 
## Attaching package: 'tsibble'

## The following object is masked from 'package:lubridate':
## 
##     interval

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.3

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr   1.1.4     ✔ stringr 1.5.1
## ✔ forcats 1.0.0     ✔ tibble  3.2.1
## ✔ purrr   1.0.2     ✔ tidyr   1.3.1
## ✔ readr   2.1.5

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tsibbledata 0.4.1     ✔ fable       0.3.4
## ✔ feasts      0.3.2     ✔ fabletools  0.4.2

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(forecast)

## Warning: package 'forecast' was built under R version 4.3.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

🗹 Use ? (or help()) to find out about the data in each series.

🗹 What is the time interval of each series?

🗹 Use autoplot() to produce a time plot of each series.

🗹 For the last plot, modify the axis labels and title.

?aus_production

## starting httpd help server ... done

The time interval of aus_production is quaterly.

?pelt

The time interval of pelt is yearly.

?gafa_stock

The time interval of Gafa_stock is daily.

?vic_elec

The time intervalof vic_elec is half hourly.

autoplot(aus_production, Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

autoplot(pelt,Lynx)

autoplot(gafa_stock, Close)

autoplot(vic_elec, Demand)

I’ll add a title on the plot, name the axis and also change the colour.

autoplot(vic_elec, Demand) +
  ggtitle("Half-hourly electricity demand of Victoria") +
  xlab("Date ") +
  ylab("Demand")+
   geom_line(colour = "purple")

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock

library(dplyr)
gafa_stock %>%
  group_by(Symbol) %>%             #Grouping by stock symbol 
  filter(Close == max(Close)) %>%  #Filtering to find the row with the max closing value for each stock
  select(Symbol, Date, Close)

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

4

The USgas package contains data on the demand for natural gas in the US.

🗹 Install the USgas package.

🗹 Create a tsibble from us_total with year as the index and state as the key.

🗹 Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

#install.packages('USgas')
library(USgas)

## Warning: package 'USgas' was built under R version 4.3.3

data("us_total")
us_gas_tsibble <- as_tsibble(us_total, key = state, index = year)

First, I’ll create a list of the New England states for convenience and clarity and then I’ll proceed by filtering the us_gas_tsibble accordingly.

new_england_states <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")

new_england_tsibble <- us_gas_tsibble %>% filter(state %in% new_england_states)

The plot:

autoplot(new_england_tsibble, y) +
  labs(title = "Annual Natural Gas Consumption by State in New England Area",
       x = "Year",
       y = "Natural Gas Consumption")

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

5

🗹Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

🗹Create a tsibble which is identical to the tourism tsibble from the tsibble package.

🗹Find what combination of Region and Purpose had the maximum number of overnight trips on average

🗹Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism1<- readxl::read_excel("tourism.xlsx")
View(tourism1)
str(tourism1)

## tibble [24,320 × 5] (S3: tbl_df/tbl/data.frame)
##  $ Quarter: chr [1:24320] "1998-01-01" "1998-04-01" "1998-07-01" "1998-10-01" ...
##  $ Region : chr [1:24320] "Adelaide" "Adelaide" "Adelaide" "Adelaide" ...
##  $ State  : chr [1:24320] "South Australia" "South Australia" "South Australia" "South Australia" ...
##  $ Purpose: chr [1:24320] "Business" "Business" "Business" "Business" ...
##  $ Trips  : num [1:24320] 135 110 166 127 137 ...

Before creating the tsibble I need to convert the Quarter to a year-quarter format (with the use of yearquarter()) and trips to numeric as their both characters

tourism1<-tourism1 %>%
  mutate(
    Quarter = yearquarter(Quarter),  
    Trips = as.numeric(Trips)        
  )

The tourism tsibble from the tsibble package uses the Quarter as the index and Region, State and Purpose as keys.

tourism1_tsibble <- tourism1 %>%
  as_tsibble(key = c(Region, State, Purpose), index = Quarter)

I’ll group the data by Region and Purpose and then calculate the average number of trips

max_average_trips <- tourism1_tsibble %>%
  group_by(Region, Purpose) %>%
  summarize(max_average_trips = mean(Trips, na.rm = TRUE)) %>%
  arrange(desc(max_average_trips))

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.

head(max_average_trips,1)

## # A tsibble: 1 x 4 [1Q]
## # Key:       Region, Purpose [1]
## # Groups:    Region [1]
##   Region    Purpose  Quarter max_average_trips
##   <chr>     <chr>      <qtr>             <dbl>
## 1 Melbourne Visiting 2017 Q4              985.

I’ll group the data by State and Quarter, so I can calculate the total trips across all regions and purposes.Then I’ll convert the grouped data into a tsibble, using State as the key and Quarter as the time index.

statetrips_tsibble <- tourism1 %>%
  group_by(State, Quarter) %>%
  summarize(total_trips = sum(Trips, na.rm = TRUE)) %>%
  as_tsibble(key = State, index = Quarter)

## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

8 Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

🗹Can you spot any seasonality, cyclicity and trend?

🗹What do you learn about the series?

🗹What can you say about the seasonal patterns?

🗹Can you identify any unusual years?

Employed

 fixed<-us_employment %>% 
        filter(Title == "Total Private") 
 fixed%>% autoplot(Employed)

fixed%>%gg_season(Employed)

fixed%>% gg_subseries(Employed)

fixed%>% gg_lag(Employed)

ACF(fixed, Employed) %>%
  autoplot()

By looking at the autoplot we can clearly notice an upward trend throughout the years but with a decline around year 2008-2010.For seasonality we’ll look at the second and third plor, where we can see a clear seasonality in the first six months of the year (Jan-June) when an icrease can be noticed.We have a positive correlation across all lag plots.

Bricks

autoplot(aus_production,Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(aus_production,Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries(aus_production,Bricks)

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_lag(aus_production,Bricks)

## Warning: Removed 20 rows containing missing values (gg_lag).

ACF(aus_production,Bricks)%>%
  autoplot()

Hare

autoplot(pelt,Hare)

gg_subseries(pelt,Hare)

gg_lag(pelt,Hare)

ACF(pelt,Hare)%>%
  autoplot()

In this dataset, there is no apparent trend, and the data exhibits a strong cyclical pattern. Notably, the autocorrelation plot shows a cyclical behavior with a 5-year cycle, where autocorrelation increases over a 5-year period and then decreases.The cyclical behavior indicates that the data fluctuates in a repeating pattern.

HO2 Cost

fixed2<-PBS %>% 
        filter(ATC2 == "H02") 
 fixed2%>% autoplot(Cost)

fixed2%>%gg_season(Cost)

fixed2%>% gg_subseries(Cost)

ACF(fixed2, Cost) %>%
  autoplot()

#### It is difficult to tell if there’s a trend. In many cases there is a heavy seasonality. Concessional co-payments seem to have higher values in the months of February to early April. However, we can observe the opposite happening for the safety net during the same months,as they are both experiencing their lowest values.

Barrels

autoplot(us_gasoline,Barrels)

gg_season(us_gasoline,Barrels)

gg_subseries(us_gasoline,Barrels)

gg_lag(us_gasoline,Barrels)

ACF(us_gasoline,Barrels)%>%
  autoplot()

Lastly, the first plot of barrels of gasoline reveals a clear upward trend, indicating that the quantity of gasoline barrels has generally increased over time. Additionally, there is some noticeable seasonality, which suggests that the data exhibits regular patterns at certain times of the year.However, due to the presence of significant noise in the data, it is challenging to draw more precise conclusions.There is also positive autocorreltion, as indicated by the lag plot.

Homework 1

Nikoleta Emanouilidi

2024-09-07

First, I’m loading some libraries that I think will be useful.

1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

🗹 Use ? (or help()) to find out about the data in each series.

🗹 What is the time interval of each series?

🗹 Use autoplot() to produce a time plot of each series.

🗹 For the last plot, modify the axis labels and title.

The time interval of aus_production is quaterly.

The time interval of pelt is yearly.

The time interval of Gafa_stock is daily.

The time intervalof vic_elec is half hourly.

I’ll add a title on the plot, name the axis and also change the colour.

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

3

Without facet_grid(), all the lines for the different variables are plotted on the same graph and this can make interpretation quite challenging, especially if the series exhibit different trends.

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

4

The USgas package contains data on the demand for natural gas in the US.

🗹 Install the USgas package.

🗹 Create a tsibble from us_total with year as the index and state as the key.

🗹 Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

First, I’ll create a list of the New England states for convenience and clarity and then I’ll proceed by filtering the us_gas_tsibble accordingly.

The plot:

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

5

🗹Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

🗹Create a tsibble which is identical to the tourism tsibble from the tsibble package.

🗹Find what combination of Region and Purpose had the maximum number of overnight trips on average

🗹Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

Before creating the tsibble I need to convert the Quarter to a year-quarter format (with the use of yearquarter()) and trips to numeric as their both characters

The tourism tsibble from the tsibble package uses the Quarter as the index and Region, State and Purpose as keys.

I’ll group the data by Region and Purpose and then calculate the average number of trips

I’ll group the data by State and Quarter, so I can calculate the total trips across all regions and purposes.Then I’ll convert the grouped data into a tsibble, using State as the key and Quarter as the time index.

┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅

8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

🗹Can you spot any seasonality, cyclicity and trend?

🗹What do you learn about the series?

🗹What can you say about the seasonal patterns?

🗹Can you identify any unusual years?

Employed

Bricks

Hare

HO2 Cost

Barrels