DATA624

(2.1) Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.

a.Use autoplot() to plot some of the series in these data sets.

autoplot(gafa_stock, Open) +
  ggtitle("Historical stock prices", subtitle = "2014-2018")

autoplot(vic_elec, Demand) +
  ggtitle("Hald-hourly Electricity Demand", subtitle = "Victoria, Australia")

autoplot(pelt, Hare) +
  ggtitle("The timeline for pel/Hare series", subtitle = "1945-1935")

b.What is the time interval of each series?

gafa_stock : The time interval is 1 day PBS : It has not time interval vic_elec : The time interval is 30 minute / Half hour pelt : The time interval is 1 year

(2.2) Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

view(gafa_stock)
sum(is.na(gafa_stock))

## [1] 0

gafa_stock_close <- gafa_stock %>%
  dplyr::select(Symbol,Date,Close) %>%
  group_by(Symbol)%>%
  filter(Close == max(Close)) %>%
  arrange(desc(Close))
gafa_stock_close

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AMZN   2018-09-04 2040.
## 2 GOOG   2018-07-26 1268.
## 3 AAPL   2018-10-03  232.
## 4 FB     2018-07-25  218.

(2.3) Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a.You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)
sum(is.na(tute1))

## [1] 0

b.Convert the data to time series

mytimeseries <- tute1 %>%
  mutate(Quarter = yearmonth(Quarter)) %>%
  as_tsibble(index = Quarter)

c.Construct time series plots of each of the three series

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y") +
  ggtitle("Facet grid")

Check what happens when you don’t include facet_grid().

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  ggtitle("No facet grid")

(2.4) The USgas package contains data on the demand for natural gas in the US.

a.Install the USgas package.

b.Ceate a tsibble from us_total with year as the index and state as the key.

c.Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire,Massachusetts, Connecticut and Rhode Island).

view(us_total)
glimpse(us_total)

## Rows: 1,266
## Columns: 3
## $ year  <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007…
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabama"…
## $ y     <int> 324158, 329134, 337270, 353614, 332693, 379343, 350345, 382367, …

us_total1 <- us_total %>%
  as_tibble(index = year, key = state)%>%
  filter(state == 'Connecticut' | state == 'Maine' | state == 'Massachusetts' | state == 'New Hampshire' | state == 'Rhode Island' | state == 'Vermont')#%>%
head(us_total1)

## # A tibble: 6 × 3
##    year state            y
##   <int> <chr>        <int>
## 1  1997 Connecticut 144708
## 2  1998 Connecticut 131497
## 3  1999 Connecticut 152237
## 4  2000 Connecticut 159712
## 5  2001 Connecticut 146278
## 6  2002 Connecticut 177587

ggplot(data= us_total1, aes(x = year, y = y, col = state)) + 
  geom_line() +
  facet_grid(state ~ ., scales = "free_y") +
  labs(title='Annual Natural Gas Consumption of New England Region')

Observing the graphs I can see that in Connecticut, Massachusetts, New Hampshite and Vermont, the annual consumption of natural gas increases. In contrast, in Rhode Island the consumption of natural gas decreases.

(2.5) a. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

b.Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism <- readxl::read_excel("tourism.xlsx")
View(tourism)
sum(is.na(tourism))

## [1] 0

tourism$Quarter <- yearquarter(as.Date(tourism$Quarter))
glimpse(tourism)

## Rows: 24,320
## Columns: 5
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Region  <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
## $ State   <chr> "South Australia", "South Australia", "South Australia", "Sout…
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
## $ Trips   <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…

tourism1 <- tourism %>%
   as_tsibble( index = Quarter, key = c(Region, State, Purpose))

c.Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism %>%
  group_by(Region, Purpose) %>%
  mutate(Avg_Trips = mean(Trips)) %>%
  ungroup() %>%
  filter(Avg_Trips == max(Avg_Trips)) %>%
  distinct(Region, Purpose)

## # A tibble: 1 × 2
##   Region Purpose 
##   <chr>  <chr>   
## 1 Sydney Visiting

d.Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism %>%
  group_by(Quarter, State) %>%
  mutate(Quarter = yearquarter(Quarter),
         Total_Trips = sum(Trips)) %>%
  select(Quarter, State, Total_Trips) %>%
  distinct() %>%
  as_tsibble(index = Quarter,
             key = State)

## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
## # Groups:    State @ Quarter [640]
##    Quarter State Total_Trips
##      <qtr> <chr>       <dbl>
##  1 1998 Q1 ACT          551.
##  2 1998 Q2 ACT          416.
##  3 1998 Q3 ACT          436.
##  4 1998 Q4 ACT          450.
##  5 1999 Q1 ACT          379.
##  6 1999 Q2 ACT          558.
##  7 1999 Q3 ACT          449.
##  8 1999 Q4 ACT          595.
##  9 2000 Q1 ACT          600.
## 10 2000 Q2 ACT          557.
## # … with 630 more rows

(2.8) Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

Explore your chosen retail time series using the following functions:

set.seed(1975)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

gg_season(myseries)

## Plot variable not specified, automatically selected `y = Turnover`

gg_subseries(myseries)

## Plot variable not specified, automatically selected `y = Turnover`

gg_lag(myseries)

## Plot variable not specified, automatically selected `y = Turnover`

myseries %>%
  ACF(Turnover)%>%
  autoplot()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

According to the graphs, I can analyze that the trend is increasing in turnover, with a cycle each year. The seasonality is similar to that of the previous year. The months of November and December are the highest each year.

DATA624_HW1

Gabriel Santos

2023-02-04

(2.1) Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.

a.Use autoplot() to plot some of the series in these data sets.

b.What is the time interval of each series?

(2.2) Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

a.You can read the data into R with the following script:

b.Convert the data to time series

c.Construct time series plots of each of the three series

Check what happens when you don’t include facet_grid().

When facet grid is used, it creates independent graphs and the scales of value are adjusted for better viewing and analysis.

(2.4) The USgas package contains data on the demand for natural gas in the US.

a.Install the USgas package.

b.Ceate a tsibble from us_total with year as the index and state as the key.

c.Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire,Massachusetts, Connecticut and Rhode Island).

Observing the graphs I can see that in Connecticut, Massachusetts, New Hampshite and Vermont, the annual consumption of natural gas increases. In contrast, in Rhode Island the consumption of natural gas decreases.

(2.5) a. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

b.Create a tsibble which is identical to the tourism tsibble from the tsibble package.

c.Find what combination of Region and Purpose had the maximum number of overnight trips on average.

d.Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

(2.8) Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

Explore your chosen retail time series using the following functions:

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

According to the graphs, I can analyze that the trend is increasing in turnover, with a cycle each year. The seasonality is similar to that of the previous year. The months of November and December are the highest each year.