Email             :
Instagram     : https://www.instagram.com/sherlytaurin
RPubs            : https://rpubs.com/sherlytaurin/
Github           : https://github.com/sherlytaurin/
Telegram       : @Sherlytaurin



1 Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.

  1. Use autoplot() to plot some of the series in these data sets.
  2. What is the time interval of each series?

1.1 gafa_stock

## starting httpd help server ... done

gafa_stock is a time series of Historical stock prices from 2014-2018 for Google, Amazon, Facebook, and Apple, where all prices are in $USD

a. Autoplot

b.Time interval

## Warning: tz(): Don't know how to compute timezone for object of class tbl_ts/
## tbl_df/tbl/data.frame; returning "UTC". This warning will become an error in the
## next major version of lubridate.
## <Interval[0]>

exclamation mark (!) means there is no fix interval in gafa_stock.

1.2 PBS

PBS is a monthly tsibble with two values:
Scripts: Total number of scripts
Cost: Cost of the scripts in $ $ USD $
which taken from Medicare Australia.

## # A tsibble: 10 x 9 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [1]
##       Month Concession  Type   ATC1  ATC1_desc   ATC2  ATC2_desc   Scripts  Cost
##       <mth> <chr>       <chr>  <chr> <chr>       <chr> <chr>         <dbl> <dbl>
##  1 1991 Jul Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   18228 67877
##  2 1991 Aug Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   15327 57011
##  3 1991 Sep Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   14775 55020
##  4 1991 Oct Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   15380 57222
##  5 1991 Nov Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   14371 52120
##  6 1991 Dec Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   15028 54299
##  7 1992 Jan Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   11040 39753
##  8 1992 Feb Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   15165 54405
##  9 1992 Mar Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   16898 61108
## 10 1992 Apr Concession~ Co-pa~ A     Alimentary~ A01   STOMATOLOG~   18141 65356

a. Autoplot

b. Time Interval

## Warning: tz(): Don't know how to compute timezone for object of class tbl_ts/
## tbl_df/tbl/data.frame; returning "UTC". This warning will become an error in the
## next major version of lubridate.
## <Interval[0]>

1M means the time interval is monthly.

1.3 vic_elec

vic_elec is a half-hourly tsibble with 3 values:
Demand: Total electricity demand in MW.
Temperature: Temperature of Melbourne (BOM site 086071).
Holiday: Indicator for if that day is a public holiday.
which taken from Australian Energy Market Operator.

This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MW, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MW, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).

## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

a. Autoplot

b. Time Interval

## Warning: tz(): Don't know how to compute timezone for object of class tbl_ts/
## tbl_df/tbl/data.frame; returning "UTC". This warning will become an error in the
## next major version of lubridate.
## <Interval[0]>

30m can be read 30 minutes. Means the time interval is half-hourly.

1.4 Pelt

Pelt is a time series of Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company.

pelt is an annual tsibble with two values:
Hare: The number of Snowshoe Hare pelts traded.
Lynx: The number of Canadian Lynx pelts traded.
which taken from Hudson Bay Company.

a. Autoplot

b. Time Interval

## Warning: tz(): Don't know how to compute timezone for object of class tbl_ts/
## tbl_df/tbl/data.frame; returning "UTC". This warning will become an error in the
## next major version of lubridate.
## <Interval[0]>

1Y can be read one years. Means the time interval is yearly.

2 Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

## # A tsibble: 4 x 8 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
## 2 AMZN   2018-09-04 2026. 2050. 2013  2040.     2040.  5721100
## 3 FB     2018-07-25  216.  219.  214.  218.      218. 58954200
## 4 GOOG   2018-07-26 1251  1270. 1249. 1268.     1268.  2405600

3 Download the file tute1.csv, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series labelled Sales, AdBudget, and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget, and GDP is the gross domestic product. All series have been adjusted for inflation.

3.1 You can read the data into R with the following script:

##. Convert the data to time series

3.2 Construct time series plots of each of the three series

Check what happens when you don’t include facet_grid().

When there is no facet_grid, all of the plot will place in one graph. There won’t be any name tag for each plot. So we should read the plot by it’s color only.

4 The USgas package contains data on the demand for natural gas in the US.

4.1 install the USgas Package

4.2 Create a tsibble from us_total with year as the index and state as the key.

4.3 Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

5 Follow the instruction below:

5.1 Download tourism.xlsx and read it into R using readxl::read_excel().

## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

5.3 Find what combination of Region and Purpose had the maximum number of overnight trips on average.

## # A tsibble: 1 x 4 [1Q]
## # Key:       Region, Purpose [1]
##   Region    Purpose  Quarter Trips
##   <chr>     <chr>      <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4  985.

5.4 Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
##    State Quarter Trips
##    <chr>   <qtr> <dbl>
##  1 ACT   1998 Q1  551.
##  2 ACT   1998 Q2  416.
##  3 ACT   1998 Q3  436.
##  4 ACT   1998 Q4  450.
##  5 ACT   1999 Q1  379.
##  6 ACT   1999 Q2  558.
##  7 ACT   1999 Q3  449.
##  8 ACT   1999 Q4  595.
##  9 ACT   2000 Q1  600.
## 10 ACT   2000 Q2  557.
## # ... with 630 more rows

6 Create time plots of the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

7 The aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US.

7.1 Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries.

## $title
## [1] "Seasonal plot: Quarterly Arrivals from Japan"
## 
## attr(,"class")
## [1] "labels"

7.2 Can you identify any unusual observations?

There appears to be a seasonality in the time series for all countries. With the exception of New Zealand, the peaks occur in the fourth quarter and subsequently falling in the first quarter of the following year. This makes sense given that Australia is in the southern hemisphere, and so December would be a summer month there, while it would be winter month in the three countries except New Zealand. Unsurprisingly, for New Zealand, the peak quarter is 3, which would be spring for both countries. There is a generally upward trend for all but Japan, for which the trend seems to have reversed in late 1990s. Historically, this makes sense since Japan was swept up in the Asian Financial Crisis that started in 1997 and lasted until late 1998, following which Japan began its longest-lasting stagflation in the first decade of 2000. There does not appear to be any sign.

In seasonal plot and subseries plot,

Below on the top left is the seasonal plot for tourists arriving from Japan. Interestingly, the number of arrivals was more or less flat throughout the year for all years until 1987, after which the arrivals started displaying a zigzagging pattern, starting high in Q1, then falling in Q2, then rising again in Q3, then falling in Q4.
To the right is New Zealand. Unlike Japan, New Zealand tourists clearly exhibit a constant seasonality for all years, starting low in Q1 then gradually increasingly throughout the year for most years. For the first decade in 2000, this increasing trend in each year seems to have peaked in Q3 then falling slightly in Q4.
Below is the seasonal plot for the U.K. In stark contrast to the two earlier countries, the volumes of tourists from the U.K. seem to have followed a U-shaped pattern for all years, with the high number coming in Q1, then dipping sharply in Q2 and staying flat through Q3, then rising sharply in Q4. What is amazing is that this trend seems to have continued for all years in the sample, and hence makes U.K. a highly seasonal tourist country for Australia.
In the US plot, There is a clear upward trend in annual volume as seen by the clear shift between each line from earlier years to later years. The quarterly trend within each year is more varied than the one for the U.S., although the U-shaped density appears to have been followed for most of the years, with the exception of early 1990s. In particular, the sudden spike in Q3 of 1991 stands out as a clear anomaly. This is a bit strange given the fact that the oil price had surged due to the U.S. entering a war against Iraq during that time period. One would surmise that this would have made the cost of travel higher.

8 Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

Explore your chosen retail time series using the following functions:

  • autoplot(),
  • gg_season(),
  • gg_subseries(),
  • gg_lag(),
  • ACF() %>% autoplot()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

from the time series, we can see a clear seasonal pattern and a upward trend.

Seasonal plot shows that there are seasonal patterns. The plot also show that there is a typical big jump every year in December and a drop in February. The sales increase in fall, peaking between November and December, after that decreasing after January. Maybe that can happen because coincide with holiday shopping and sales for Christmas.

Seasonal subseries offers a new perspective on seasonality by showing the monthly mean values. We can see a large increase from November to December and a decrease from December to February, but also a small decreasing trend in turnover from January to June and a similar increase from July to November, before the big spike from November to December.

In lag graph, the data was difficult to analyze. There are some negative and positive relationship, but because of the high number of graph and the fact that this is a monthly graph, it’s hard to tell the different.

ACF shows powerful statistically significant autocorrelation, which showing that whe lagged values get a linear relationship. The small drop in amount of the ACF over time supports a pattern, as does seasonality with spikes at 12 and 24 in December.

9 Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and us_gasoline.

9.2 Bricks from aus_production

autoplot()

## Warning: Removed 20 row(s) containing missing values (geom_path).

ggseason()

## Warning: Removed 20 row(s) containing missing values (geom_path).

gg_subseries()

## Warning: Removed 5 row(s) containing missing values (geom_path).

gg_lag()

## Warning: Removed 20 rows containing missing values (gg_lag).

ACF()

9.3 Hare from pelt

autoplot()

gg_subseries()

gg_lag()

ACF()

9.5 us_gasoline

autoplot()

## Plot variable not specified, automatically selected `.vars = Barrels`

ggseason()

## Plot variable not specified, automatically selected `y = Barrels`

gg_subseries()

## Plot variable not specified, automatically selected `y = Barrels`

gg_lag()

## Plot variable not specified, automatically selected `y = Barrels`

ACF()

## Response variable not specified, automatically selected `var = Barrels`

10 The following time plots and ACF plots correspond to four different time series. Your task is to match each time plot in the first row with one of the ACF plots in the second row.

* 1 with B * 2 with A * 3 with D * 4 with C

11 The aus_livestock data contains the monthly total number of pigs slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018. Use filter() to extract pig slaughters in Victoria between 1990 and 1995. Use autoplot() and ACF() for this data. How do they differ from white noise? If a longer period of data is used, what difference does it make to the ACF?

Almost all of the spikes are out of the bounds which is confirming the series is not white noise.

12 Use the following code to compute the daily changes in Google closing stock prices.

12.1 Why was it necessary to re-index the tsibble?

Because when w ise a filter for the GOOG symbol, the interval for GOOG by Date will scrambled, misleading, and different for each row, so we need to change out index to use the row sequence number so that the interval for each row is the same.

12.3 Do the changes in the stock prices look like white noise?

There are 30 lags in total. 5% of 30 is:

## [1] 1.5

so, if there are more than 1.5 lags out of the bound, the data seies is not white noises.

From the plot we can see that there are 3 lags that out of the bound, then the data series is not white noise.