Data624

Exercise 1

Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.Use autoplot() to plot some of the series in these data sets. What is the time interval of each series?

## [1] "Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD."

## [1] "PBS: maptools::SpatialLines2PolySet\t\tConvert sp line and polygon objects to PBSmapping PolySet objects pbapply::pbapply\t\tAdding Progress Bar to '*apply' Functions plm::pbsytest\t\tBera, Sosa-Escudero and Yoon Locally-Robust Lagrange Multiplier Tests for Panel Models and Joint Test by Baltagi and Liraster::readStart\t\tHelper functions for programming splines::predict.bSpline\t\tEvaluate a Spline at New Values of x"

## [1] "Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company."

## [1] "Half-hourly electricity demand for Victoria, Australia"

## [1] "The timeline for gafa_stock series on volume stock goes from 2014-01-02 to 2018-12-31"

## [1] "The PBS/Cost/Scripts didn't auotoplot but the timeline goes from July 1991 to June 2008"

## [1] "The autoplot is almost identical each year. The timeline for vic_elec series on power demand goes from 2012-01-01 AEDT to 2014-12-31 23:30:00 AEDT"

## [1] "The timeline for pel/Hare series goes from 1845 to 1935"

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

view(gafa_stock)

sum(is.na(gafa_stock))

## [1] 0

gafa_stock_close <- gafa_stock %>%
  dplyr::select(Symbol,Date,Close) %>%
  group_by(Symbol)%>%
  filter(Close == max(Close)) %>%
  arrange(desc(Close))

gafa_stock_close

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AMZN   2018-09-04 2040.
## 2 GOOG   2018-07-26 1268.
## 3 AAPL   2018-10-03  232.
## 4 FB     2018-07-25  218.

  #%>%filter(complete.cases(.))

Exercise 3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a-You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   Quarter = col_date(format = ""),
##   Sales = col_double(),
##   AdBudget = col_double(),
##   GDP = col_double()
## )

View(tute1)
sum(is.na(tute1))

## [1] 0

b-Convert the data to time series

mytimeseries <- tute1 %>%
  mutate(Quarter = yearmonth(Quarter)) %>%
  as_tsibble(index = Quarter)

c-Construct time series plots of each of the three series

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

### The only difference is the groupinng. I think it would be redondant to use facet_grid if the the three variables have close range values.

Exercise 4

The USgas package contains data on the demand for natural gas in the US.

a-Install the USgas package. b-Create a tsibble from us_total with year as the index and state as the key. c-Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

view(us_total)
glimpse(us_total)

## Rows: 1,266
## Columns: 3
## $ year  <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007~
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabama"~
## $ y     <int> 324158, 329134, 337270, 353614, 332693, 379343, 350345, 382367, ~

us_total1 <- us_total %>%
  as_tibble(index = year, key = state)%>%
  filter(state == 'Connecticut' | state == 'Maine' | state == 'Massachusetts' | state == 'New Hampshire' | state == 'Rhode Island' | state == 'Vermont')#%>%

head(us_total1)

## # A tibble: 6 x 3
##    year state            y
##   <int> <chr>        <int>
## 1  1997 Connecticut 144708
## 2  1998 Connecticut 131497
## 3  1999 Connecticut 152237
## 4  2000 Connecticut 159712
## 5  2001 Connecticut 146278
## 6  2002 Connecticut 177587

ggplot(data= us_total1, aes(x = year, y = y, col = state)) + 
  geom_line()+
   labs(x='Year', y="Natural Gas Consumption", title='Annual Natural Gas Consumption by State Region: case of New England Region')

Exercise 5

a-Download tourism.xlsx from the book website and read it into R using readxl::read_excel(). b-Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism <- readxl::read_excel("tourism.xlsx")
View(tourism)
#colSums(is.na(tourism))%>% kable()
sum(is.na(tourism))

## [1] 0

# tourism1 <- tourism %>%
#   mutate(Quarter = yearquarter(as.Date(tourism$Quarter)))%>%
#   as_tsibble( index = Quarter, key = c(Region, State, Purpose))

tourism$Quarter <- yearquarter(as.Date(tourism$Quarter))
glimpse(tourism)

## Rows: 24,320
## Columns: 5
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,~
## $ Region  <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A~
## $ State   <chr> "South Australia", "South Australia", "South Australia", "Sout~
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B~
## $ Trips   <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16~

tourism1 <- tourism %>%
   as_tsibble( index = Quarter, key = c(Region, State, Purpose))

c-Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism2 <- select( tourism, Region,Purpose, Trips) #something wrong with select....I think I was calling tourism1


# tourism2c <- tourism2 %>%
#   #dplyr::select(Region,Purpose, Trips) %>% #something wrong with select....
#   group_by(Region,Purpose)%>%
#   summarise(MeanTrip == mean(Trips)) %>%
#   filter(MeanTrip == max(MeanTrip))%>%
#   arrange(desc(MeanTrip))
# tourism2c
print("running into error running the above code, reason might be the datatype of Quarter, so I went back to adjust the datatype")

## [1] "running into error running the above code, reason might be the datatype of Quarter, so I went back to adjust the datatype"

#rlang::last_error()
#rlang::last_trace()

d-Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism3 <- tourism %>%
  #dplyr::select(Region,Purpose, Trips) %>%
  group_by(Region, Purpose)
  tourism1$RegionPurpose <- paste(tourism1$Region, tourism1$Purpose)
  
# tourism3d <- tourism3 %>%
#   group_by(State) %>%
#   mutate(SumTripByState == sum(Trips)) %>%
#   as_tsibble( index = Quarter, key = c(Region, State, RPurpose))
#   #arrange(desc(MeanTrip))
# tourism3d
print("running into error running the above code, reason might be the datatype")

## [1] "running into error running the above code, reason might be the datatype"

Exercise 6

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

Explore your chosen retail time series using the following functions:

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

set.seed(53566)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

gg_season(myseries)

## Plot variable not specified, automatically selected `y = Turnover`

gg_subseries(myseries)

## Plot variable not specified, automatically selected `y = Turnover`

gg_lag(myseries)

## Plot variable not specified, automatically selected `y = Turnover`

myseries %>%
  ACF(Turnover)%>%
  autoplot()

print("autoplot() gives a global view for the overall timeline, the trend line shows turnover never stop going wild. gg_season() allow a zoom in on the timeline to observe the turnover progression by month. gg_subseries() and gg_lag() goes a little more deeper than other function should one interested in looking up particular trend.")

## [1] "autoplot() gives a global view for the overall timeline, the trend line shows turnover never stop going wild. gg_season() allow a zoom in on the timeline to observe the turnover progression by month. gg_subseries() and gg_lag() goes a little more deeper than other function should one interested in looking up particular trend."

Data624_HW1