HW_2_Data

1. Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

#loading Packages
library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.0     ✔ fabletools  0.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

#loading datasets
data("aus_production")
data("pelt")
data("gafa_stock")
data("vic_elec")

Use ? (or help()) to find out about the data in each series.

?aus_production
?pelt
?gafa_stock
?vic_elec

What is the time interval of each series?

aus_production: Quarterly

pelt: Yearly

gafa_stock: Daily

vic_elec: Half-hourly

Use autoplot() to produce a time plot of each series.

autoplot(aus_production, Bricks) +
  labs( title = "Quarterly Production of Bricks in Australia")

#For this visual, there seems to be an increasing trend, however more recent years so a gradual drop.

autoplot(pelt, Lynx) +
  labs( title = "Amount of Lynx Pelts Traded 1845-1935")

#A bit difficult to see a trend here overall but there is seasonality.

autoplot(gafa_stock, Close) +
  labs( title = "GAFA Stock Prices")

#From this time series we can see that up until mid 2016, Google has the highest Prices, but there on after Amazon had the highest prices.

autoplot(vic_elec, Demand) +
  labs( title = "Electricity Demand in Victoria, Australia")

#Due to the time interval being a half hour, the visual is very crowded, but we can see a pattern where there's a spike in electricity demands at the beginning of each year.

For the last plot, modify the axis labels and title.

autoplot(vic_elec, Demand) +
  labs(title = "Electricity Demands: Victoria, Australia",  y = "Demand (MWh)")

2. Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock |>
  group_by(Symbol) |>
  filter(Close == max(Close))

## # A tsibble: 4 x 8 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
## 2 AMZN   2018-09-04 2026. 2050. 2013  2040.     2040.  5721100
## 3 FB     2018-07-25  216.  219.  214.  218.      218. 58954200
## 4 GOOG   2018-07-26 1251  1270. 1249. 1268.     1268.  2405600

#Use max function to find dates with highest closing price for each stock.

We can see that for Apple has its peak 10/3/18, Amazon on 9/4/18, Facebook on 7/25/18, and Google on 7/26/18.

3. Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- readr::read_csv("C:/Users/natal/Documents/Masters/Cuny SPS MDS/Fall 2024/Data 624/week 2/tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)

Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

4. The USgas package contains data on the demand for natural gas in the US.

Install the USgas package.

library(USgas)

data(us_total)

Create a tsibble from us_total with year as the index and state as the key.

us_total <- us_total |>
  as_tsibble( key = state, index = year)

view(us_total)

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

new_england_gas <- us_total |>
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island"))

autoplot(new_england_gas, y) +
  labs( title = "New England Annual Gas Consumption", y = "Gas Consumption (Million Cubic Feet)")

5. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

library(readxl)

tourism <- readxl::read_excel("C:/Users/natal/Documents/Masters/Cuny SPS MDS/Fall 2024/Data 624/week 2/tourism.xlsx")

Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_tibble <- tourism %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, State, Purpose),
             index = Quarter)


# added quarter abbreviations to the quarter column to make it easier to read over 1/1,4/1,7/1 and 10/1.

Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism_tibble |> 
 group_by(Region, Purpose) |>
 mutate(Trips = mean(Trips)) |>
 ungroup() |>
 filter(Trips == max(Trips)) |>
 distinct(Region, Purpose, Trips)

## # A tibble: 1 × 3
##   Region Purpose  Trips
##   <chr>  <chr>    <dbl>
## 1 Sydney Visiting  747.

# We use the distinct function to aggregate region and purpose.

Sydney has the max number of average overnight trips for visiting.

Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

library(dplyr)

tourism_state <- tourism_tibble |>
 group_by(State) |> 
  summarise(Trips = sum(Trips))|>
 ungroup()
tourism_state

## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
##    State Quarter Trips
##    <chr>   <qtr> <dbl>
##  1 ACT   1998 Q1  551.
##  2 ACT   1998 Q2  416.
##  3 ACT   1998 Q3  436.
##  4 ACT   1998 Q4  450.
##  5 ACT   1999 Q1  379.
##  6 ACT   1999 Q2  558.
##  7 ACT   1999 Q3  449.
##  8 ACT   1999 Q4  595.
##  9 ACT   2000 Q1  600.
## 10 ACT   2000 Q2  557.
## # ℹ 630 more rows

#We group by state and quarter in order to see total trips by state. I wasn't sure what combining purposes and regions meant, as that means I would need to add it to the group function in order to concatenate the two columns, but the question asked for the tsibble to only have total trips by state.

8. Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

total_private <- us_employment |> 
        filter(Title == "Total Private")
total_private |>  autoplot(Employed)

total_private |> gg_season(Employed)

total_private |> gg_subseries(Employed)

total_private |> gg_lag(Employed)

total_private |> ACF(Employed)

## # A tsibble: 29 x 3 [1M]
## # Key:       Series_ID [1]
##    Series_ID          lag   acf
##    <chr>         <cf_lag> <dbl>
##  1 CEU0500000001       1M 0.997
##  2 CEU0500000001       2M 0.993
##  3 CEU0500000001       3M 0.990
##  4 CEU0500000001       4M 0.986
##  5 CEU0500000001       5M 0.983
##  6 CEU0500000001       6M 0.980
##  7 CEU0500000001       7M 0.977
##  8 CEU0500000001       8M 0.974
##  9 CEU0500000001       9M 0.971
## 10 CEU0500000001      10M 0.968
## # ℹ 19 more rows

aus_production |> autoplot(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |> gg_season(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |> gg_subseries(Bricks)

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |> gg_lag(Bricks)

## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production |> ACF(Bricks)

## # A tsibble: 22 x 2 [1Q]
##         lag   acf
##    <cf_lag> <dbl>
##  1       1Q 0.900
##  2       2Q 0.815
##  3       3Q 0.813
##  4       4Q 0.828
##  5       5Q 0.720
##  6       6Q 0.642
##  7       7Q 0.655
##  8       8Q 0.692
##  9       9Q 0.609
## 10      10Q 0.556
## # ℹ 12 more rows

pelt |> autoplot(Hare)

pelt |> gg_subseries(Hare)

pelt |> gg_lag(Hare)

pelt |> ACF(Hare)

## # A tsibble: 19 x 2 [1Y]
##         lag     acf
##    <cf_lag>   <dbl>
##  1       1Y  0.658 
##  2       2Y  0.214 
##  3       3Y -0.155 
##  4       4Y -0.401 
##  5       5Y -0.493 
##  6       6Y -0.401 
##  7       7Y -0.168 
##  8       8Y  0.113 
##  9       9Y  0.307 
## 10      10Y  0.340 
## 11      11Y  0.296 
## 12      12Y  0.206 
## 13      13Y  0.0372
## 14      14Y -0.153 
## 15      15Y -0.285 
## 16      16Y -0.295 
## 17      17Y -0.202 
## 18      18Y -0.0676
## 19      19Y  0.0956

h_02 <- PBS |> filter(ATC2 == "H02") 
h_02 |> autoplot(Cost)

h_02 |> gg_season(Cost)

h_02 %>% gg_subseries(Cost)

h_02 |> ACF(Cost)

## # A tsibble: 92 x 6 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [4]
##    Concession   Type        ATC1  ATC2       lag   acf
##    <chr>        <chr>       <chr> <chr> <cf_lag> <dbl>
##  1 Concessional Co-payments H     H02         1M 0.834
##  2 Concessional Co-payments H     H02         2M 0.679
##  3 Concessional Co-payments H     H02         3M 0.514
##  4 Concessional Co-payments H     H02         4M 0.352
##  5 Concessional Co-payments H     H02         5M 0.264
##  6 Concessional Co-payments H     H02         6M 0.219
##  7 Concessional Co-payments H     H02         7M 0.253
##  8 Concessional Co-payments H     H02         8M 0.337
##  9 Concessional Co-payments H     H02         9M 0.464
## 10 Concessional Co-payments H     H02        10M 0.574
## # ℹ 82 more rows

us_gasoline |> autoplot(Barrels)

us_gasoline |> gg_season(Barrels)

us_gasoline |> gg_subseries(Barrels)

us_gasoline |> gg_lag(Barrels)

us_gasoline |> ACF(Barrels)

## # A tsibble: 31 x 2 [1W]
##         lag   acf
##    <cf_lag> <dbl>
##  1       1W 0.893
##  2       2W 0.882
##  3       3W 0.873
##  4       4W 0.866
##  5       5W 0.847
##  6       6W 0.844
##  7       7W 0.832
##  8       8W 0.831
##  9       9W 0.822
## 10      10W 0.808
## # ℹ 21 more rows

HW_2_Data_624

Natalie Kalukeerthie

2024-09-02

1. Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series.

What is the time interval of each series?

aus_production: Quarterly

pelt: Yearly

gafa_stock: Daily

vic_elec: Half-hourly

Use autoplot() to produce a time plot of each series.

For the last plot, modify the axis labels and title.

2. Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

We can see that for Apple has its peak 10/3/18, Amazon on 9/4/18, Facebook on 7/25/18, and Google on 7/26/18.

You can read the data into R with the following script:

Convert the data to time series

Construct time series plots of each of the three series

Check what happens when you don’t include facet_grid().

When you don’t use facet_grid() the three series are plotted on one graph, but because their value ranges are vastly different, it’s harder to see the patterns in the series of series that have smaller fluctuations (ex. the GDP time series)

4. The USgas package contains data on the demand for natural gas in the US.

Install the USgas package.

Create a tsibble from us_total with year as the index and state as the key.

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

5. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

Create a tsibble which is identical to the tourism tsibble from the tsibble package.

Find what combination of Region and Purpose had the maximum number of overnight trips on average.

Sydney has the max number of average overnight trips for visiting.

Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

8. Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend?

Overall, the total private dataset shows an increase in employment during summer months, this would indicate seasonality.

What do you learn about the series?

The series is quite repetitive. We can see that volume does increase with time, but the pattern remains predictable.

What can you say about the seasonal patterns?

There seems to be an increase in employment each year from April to August, then there is a small decrease after those months.

Can you identify any unusual years?

During 2008 to 2010, there appears to be a more dramatic dip in employment in comparison to other years.