Data 624 Homework 1

Exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

?aus_production

## starting httpd help server ... done

Quarterly production of selected commodities in Australia. Description Quarterly estimates of selected indicators of manufacturing production in Australia.

Format Time series of class tsibble.

Details aus_production is a half-hourly tsibble with six values:

Beer: Beer production in megalitres. Tobacco: Tobacco and cigarette production in tonnes. Bricks: Clay brick production in millions of bricks. Cement: Portland cement production in thousands of tonnes. Electricity: Electricity production in gigawatt hours. Gas: Gas production in petajoules.

aus_production |> select(Bricks) |> head()

aus_production |> select(Bricks) |> autoplot()

## Plot variable not specified, automatically selected `.vars = Bricks`

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

The TS has an quaterly index

?pelt

Description Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company.

Format Time series of class tsibble

Details pelt is an annual tsibble with two values:

Hare: The number of Snowshoe Hare pelts traded. Lynx: The number of Canadian Lynx pelts traded.

pelt |> 
  select(Lynx) |> 
  head()

The index is yearly

pelt|> select(Lynx) |> autoplot()

## Plot variable not specified, automatically selected `.vars = Lynx`

?gafa_stock

GAFA stock prices Description Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD.

Format Time series of class tsibble

Details gafa_stock is a tsibble containing data on irregular trading days:

Open: The opening price for the stock. High: The stock’s highest trading price. Low: The stock’s lowest trading price. Close: The closing price for the stock. Adj_Close: The adjusted closing price for the stock. Volume: The amount of stock traded. Each stock is uniquely identified by one key:

Symbol: The ticker symbol for the stock. Source Yahoo Finance historical data

gafa_stock|> filter(Symbol=="AAPL")|>
  select(Close) |>
  
  head()

The index of the TS is daily

gafa_stock|> filter(Symbol=="AAPL")|>
  select(Close) |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = Close`

?vic_elec

vic_elec is a half-hourly tsibble with three values:

Demand: Total electricity demand in MWh. Temperature: Temperature of Melbourne (BOM site 086071). Holiday: Indicator for if that day is a public holiday. Format Time series of class tsibble.

Details This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MWh, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MWh, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).

Source Australian Energy Market Operator.

vic_elec |> select(Demand)|> head()

The index of this TS is every 30 minutes

vic_elec |> select(Demand)|> 
  autoplot()+
  labs(title=" Bi-Hourly Electric Demand in Victory",
       x="Time",y= "Demand in MegaWatts MW")+
      theme_classic()

## Plot variable not specified, automatically selected `.vars = Demand`

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

peak_closing_price_dates <- gafa_stock |>
  group_by(Symbol)|>
  filter(Close ==max(Close, na.rm = TRUE))|>
  select(Symbol, Date, Close)
peak_closing_price_dates

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
1. You can read the data into R with the following script:

 tute1 <- readr::read_csv("tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

        View(tute1)

2.  Convert the data to time series

        mytimeseries <- tute1 |>   
        mutate(Quarter = yearquarter(Quarter)) |>
        as_tsibble(index = Quarter)

3.  Construct time series plots of each of the three series

mytimeseries |>   pivot_longer(-Quarter) |>   ggplot(aes(x = Quarter, y = value, colour = name)) +   geom_line() +   facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries |>   pivot_longer(-Quarter) |>   ggplot(aes(x = Quarter, y = value, colour = name)) +   geom_line()

It includes the Colunms in one single plot

The USgas package contains data on the demand for natural gas in the US.
1. Install the USgas package.
```
library(USgas)
```
```
## Warning: package 'USgas' was built under R version 4.4.1
```
1. Create a tsibble from us_total with year as the index and state as the key.

us_total_tsibble <- us_total |>
  as_tsibble(index = year, key = state)

3.  Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

us_total_tsibble |> 
  filter(state %in% c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont")) |>
  ggplot(aes(x = year, y = y, colour = state)) + 
  geom_line() + 
  facet_wrap(state ~ ., scales = "free_y",ncol=2) + 
  labs(title = "Time Series of US Gas Consumption for Selected States",
       x = "Year", 
       y = "Gas Consumption") +
  theme_minimal()

1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
```
readxl::read_excel("tourism.xlsx")
```
1. Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_data<- readxl::read_excel("tourism.xlsx")

tourism_tsibble<- tourism_data |>
  mutate(Quarter = yearquarter(Quarter))|>
  as_tsibble(key=c(Region,State,Purpose,Trips),index = Quarter)
tourism_tsibble

3.  Find what combination of `Region` and `Purpose` had the maximum number of overnight trips on average.

max_mean_trip_RP <- tourism_tsibble|>
  group_by(Region, Purpose)|>
  summarise(mean_trips = mean(Trips, na.rm = TRUE))|>
  ungroup()|>
  filter(mean_trips == max(mean_trips))

max_mean_trip_RP

4.  Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

state_trips_tsibble <- tourism_tsibble |>
  group_by(State) |>
  summarise(state_trips = sum(Trips, na.rm = TRUE), .groups = 'drop') |>
  as_tsibble(index = Quarter, key = State)
head(state_trips_tsibble)

8.  The Employment data for Total Private from us_employment shows a generally upward trend. However, when zoomed in, seasonal patterns become apparent. Notable dips are observed in the mid-1970s, early 1980s, and after 2007, which correspond to real-world economic downturns in the US economy.

total_private<- us_employment|>
  filter(Title =="Total Private")
autoplot(total_private,Employed)

gg_season(total_private,Employed)

gg_subseries(total_private,Employed)

gg_lag(total_private,Employed,geom = "point", lags = 1:12)

ACF(total_private,Employed)|>
  autoplot()

Bricks from aus_production There seems to be a positive trend every 4 quarters which makes sense, I say the based on the ACF chart.

autoplot(aus_production,Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(aus_production,Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries(aus_production,Bricks)

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_lag(aus_production,Bricks,geom = "point", lags =  1:4)

## Warning: Removed 20 rows containing missing values (gg_lag).

ACF(aus_production,Bricks)|>
  autoplot()

Hare from pelt The data here shows a cyclical pattern with production spiking every 8 years or so there other smaller recurring patterns every two years. I suspect population growth and shipping schedules factor in this trend.

autoplot(pelt, Hare, )

#gg_season(pelt, Hare)
gg_subseries(pelt, Hare)

gg_lag(pelt, Hare,geom = "point")

ACF(pelt, Hare)|>
  autoplot()

Barrels from us_gasoline

There does seem to be some patterns in the data, it might be better to convert this to a monthly or yearly TS.

autoplot(us_gasoline, Barrels)

gg_season(us_gasoline, Barrels)

gg_subseries(us_gasoline, Barrels)

gg_lag(us_gasoline, Barrels, lags = 1:4, geom = "point")

ACF(us_gasoline, Barrels)|>
  autoplot()

“Ho2” cost from PBS The ACF shows strong seasonality for every TS except General co-payment.

h02<- PBS|> filter(ATC2 =="H02")
h02ts <- h02 |>
  mutate(Month = yearmonth(Month)) |>
  as_tsibble(index = Month, )

autoplot(h02ts,Cost)

gg_season(h02ts,Cost)

gg_subseries(h02ts,Cost)

ACF(h02ts,Cost)|>
  autoplot()

</div>

Data 624 Homework 1

Darwhin Gomez

2024-09-11

Exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8