Email : putriangelina865@gmail.com
Instagram : https://www.instagram.com/putriangelinaw
RPubs : https://rpubs.com/putriangelinaw/
1. Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.
?gafa_stock
?PBS
?vic_elec
?pelt
for more information:
a. Use autoplot() to plot some of the series in these data sets.
mypbs <- ts(PBS, start= 1992, end = 2008, frequency = 12)
autoplot(mypbs[,"Cost"])+ggtitle("Monthly Medicare Australia Prescription Data")+ylab("Cost")b. What is the time interval of each series?
The time interval of gafa_stock is daily, PBS is monthly, vic_elec is half-hourly, and pelt is annualy.
2. Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
VarsO <- c("Open")
CondO <- c(max(gafa_stock$Open))
gafa_stock %>% filter(.data[[VarsO[[1]]]]==CondO[[1]])## # A tsibble: 1 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AMZN 2018-09-05 2038. 2040. 1990. 1995. 1995. 8220600
VarsH <- c("High")
CondH <- c(max(gafa_stock$High))
gafa_stock %>% filter(.data[[VarsH[[1]]]]==CondH[[1]])## # A tsibble: 1 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
VarsL <- c("Low")
CondL <- c(max(gafa_stock$Low))
gafa_stock %>% filter(.data[[VarsL[[1]]]]==CondL[[1]])## # A tsibble: 1 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
VarsC <- c("Close")
CondC <- c(max(gafa_stock$Close))
gafa_stock %>% filter(.data[[VarsC[[1]]]]==CondC[[1]])## # A tsibble: 1 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
3. Download the file tute1.csv here, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
a. You can read the data into R with the following script:
b. Convert the data to time series
c. Construct time series plots of each of the three series
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")d. Check what happens when you don’t include facet_grid().
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()4. The USgas package contains data on the demand for natural gas in the US.
b. Create a tsibble from us_total with year as the index and state as the key.
c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
myus_total2 <- us_total %>%
group_by(state) %>%
filter(state %in% c('Maine', 'Vermont', 'New
Hampshire', 'Massachusetts',
'Connecticut' ,'Rhode Island')) %>%
ungroup()%>%
as_tsibble(key = state,index = year)
autoplot(myus_total2,y)5. Follow the instructions below:
a. Download tourism.xlsx here and read it into R using readxl::read_excel().
b. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
c. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
6. Create time plots of the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
a. Use ? (or help()) to find out about the data in each series.
?aus_production
?pelt
?gafa_stock
?vic_elec
for more information;
b. For the last plot, modify the axis labels and title.
7. The aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US.
a. Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries.
b. Can you identify any unusual observations?
8. Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))a. Explore your chosen retail time series using the following functions:
b. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
From the autoplot, we can see a clear seasonal or cyclic pattern in the time series, and a upward trend.
The seasonal plot shows that there are indeed seasonal patterns. The plot also reveals that there is a typical big jump every year in December, and a drop in February. Sales begin to increase in the fall, peaking between November and December, then decreasing after January, likely to coincide with holiday shopping and sales for Christmas.
The seasonal subseries offers a new perspective on seasonality by showing the monthly mean values. We see a large increase from November to December and a decrease from December to February, but also a small, decreasing trend in turnover from January to June and a similar increase from July to November, before the big spike from November to December.
In this lag graph, the data is difficult to analyze. We can see some negative and positive relationships, but due to the high number of graphs and the fact that this is a monthly graph, it’s hard to tell much different.
9. Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series:
10. The following time plots and ACF plots correspond to four different time series. Your task is to match each time plot in the first row with one of the ACF plots in the second row.
11. The aus_livestock data contains the monthly total number of pigs slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018. Use filter() to extract pig slaughters in Victoria between 1990 and 1995. Use autoplot() and ACF() for this data. How do they differ from white noise? If a longer period of data is used, what difference does it make to the ACF?
victoria <- aus_livestock %>%
filter(State == "Victoria",
Animal == "Pigs",
between(year(Month),1990,1995))
ACF(victoria, Count) %>% autoplot()12. Use the following code to compute the daily changes in Google closing stock prices.
dgoog <- gafa_stock %>%
filter(Symbol == "GOOG", year(Date) >= 2018) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE) %>%
mutate(diff = difference(Close))a. Why was it necessary to re-index the tsibble?
Because when we use filter for the GOOG, the interval for GOOG by Date will be scrambled, misleading, and are not the same for each row. So we need to re-index which is used to row sequence number hence the interval for each row is same.
b. Plot these differences and their ACF.
google <- gafa_stock %>%
filter(Symbol == "GOOG") %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE)
ACF(google,difference(Close)) %>% autoplot()c. Do the changes in the stock prices look like white noise?
There are total 30 lags, 5% of 30;
## [1] 1.5
If there are more than 1.5 lags that out of the bound, the data series isn’t white noises. From the plot above, we can see that there are 4 lags that out of the bound, hence the data series isn’t white noises.