Email : sherlytaurinsiri@gmail.com
Instagram : https://www.instagram.com/sherlytaurin
RPubs : https://rpubs.com/sherlytaurin/
Github : https://github.com/sherlytaurin/
Telegram : @Sherlytaurin
gafa_stock, PBS, vic_elec and pelt represent.autoplot() to plot some of the series in these data sets.## starting httpd help server ... done
gafa_stock is a time series of Historical stock prices from 2014-2018 for Google, Amazon, Facebook, and Apple, where all prices are in $USD
datatable(gafa_stock,
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
htmltools::em('gafa stock')),
extensions = 'FixedColumns',
option = list(scrollX = TRUE, fixedColumns = TRUE)
)a. Autoplot
b.Time interval
## Warning: tz(): Don't know how to compute timezone for object of class tbl_ts/
## tbl_df/tbl/data.frame; returning "UTC". This warning will become an error in the
## next major version of lubridate.
## <Interval[0]>
exclamation mark (!) means there is no fix interval in gafa_stock.
PBS is a monthly tsibble with two values:
Scripts: Total number of scripts
Cost: Cost of the scripts in $ $ USD $
which taken from Medicare Australia.
## # A tsibble: 10 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [1]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 18228 67877
## 2 1991 Aug Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 15327 57011
## 3 1991 Sep Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 14775 55020
## 4 1991 Oct Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 15380 57222
## 5 1991 Nov Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 14371 52120
## 6 1991 Dec Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 15028 54299
## 7 1992 Jan Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 11040 39753
## 8 1992 Feb Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 15165 54405
## 9 1992 Mar Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 16898 61108
## 10 1992 Apr Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 18141 65356
a. Autoplot
b. Time Interval
## Warning: tz(): Don't know how to compute timezone for object of class tbl_ts/
## tbl_df/tbl/data.frame; returning "UTC". This warning will become an error in the
## next major version of lubridate.
## <Interval[0]>
1M means the time interval is monthly.
vic_elec is a half-hourly tsibble with 3 values:
Demand: Total electricity demand in MW.
Temperature: Temperature of Melbourne (BOM site 086071).
Holiday: Indicator for if that day is a public holiday.
which taken from Australian Energy Market Operator.
This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MW, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MW, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).
datatable(vic_elec, caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
htmltools::em('vic elec')),
extensions = 'FixedColumns',
option = list(scrollX = TRUE, fixedColumns = TRUE))## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
a. Autoplot
b. Time Interval
## Warning: tz(): Don't know how to compute timezone for object of class tbl_ts/
## tbl_df/tbl/data.frame; returning "UTC". This warning will become an error in the
## next major version of lubridate.
## <Interval[0]>
30m can be read 30 minutes. Means the time interval is half-hourly.
Pelt is a time series of Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company.
pelt is an annual tsibble with two values:
Hare: The number of Snowshoe Hare pelts traded.
Lynx: The number of Canadian Lynx pelts traded.
which taken from Hudson Bay Company.
datatable(pelt,
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
htmltools::em('pelt')),
extensions = 'FixedColumns',
option = list(scrollX = TRUE, fixedColumns = TRUE))a. Autoplot
b. Time Interval
## Warning: tz(): Don't know how to compute timezone for object of class tbl_ts/
## tbl_df/tbl/data.frame; returning "UTC". This warning will become an error in the
## next major version of lubridate.
## <Interval[0]>
1Y can be read one years. Means the time interval is yearly.
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
tute1.csv, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series labelled Sales, AdBudget, and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget, and GDP is the gross domestic product. All series have been adjusted for inflation.tute1 <- read.csv("tute1.csv", header = TRUE)
datatable(tute1,
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
htmltools::em('tute1')),
extensions = 'FixedColumns',
option = list(scrollX = TRUE, fixedColumns = TRUE))##. Convert the data to time series
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")Check what happens when you don’t include facet_grid().
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() When there is no
facet_grid, all of the plot will place in one graph. There won’t be any name tag for each plot. So we should read the plot by it’s color only.
USgas Packagenew_england<- us_total %>%
group_by(state) %>%
filter(state %in% c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut' ,'Rhode Island')) %>%
ungroup()
autoplot(new_england, y)tourism.xlsx and read it into R using readxl::read_excel().tourism <-readxl::read_excel("tourism.xlsx")
datatable(tourism,
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
htmltools::em('tourism')),
extensions = 'FixedColumns',
option = list(scrollX = TRUE, fixedColumns = TRUE))## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
tourism tsibble from the tsibble package.Region and Purpose had the maximum number of overnight trips on average.tourism_tsibble %>% group_by(Region, Purpose) %>%
summarise(Trips = mean(Trips)) %>%
ungroup() %>%
filter(Trips == max(Trips))## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter Trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ... with 630 more rows
Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.Demand from vic_elec
vic_elec %>% ggplot(aes(x= Date, y= Demand, group = Holiday)) +
geom_line(aes(col=Holiday)) +
facet_grid(Holiday ~ ., scales ="free" )aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US.datatable(aus_arrivals, caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
htmltools::em('aus arrivals')),
extensions = 'FixedColumns',
option = list(scrollX = TRUE, fixedColumns = TRUE))autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries.p1 <- ggseasonplot(myts[,1], year.labels=TRUE, year.labels.left=TRUE) +
theme(plot.title = element_text(hjust = 0.5)) + ylab("Arrivals in thousands")
ggtitle("Seasonal plot: Quarterly Arrivals from Japan")## $title
## [1] "Seasonal plot: Quarterly Arrivals from Japan"
##
## attr(,"class")
## [1] "labels"
p2 <- ggseasonplot(myts[,2], year.labels=TRUE, year.labels.left=TRUE) +
theme(plot.title = element_text(hjust = 0.5)) + ylab("Arrivals in thousands") +
ggtitle("Seasonal plot: Quarterly Arrivals from New Zealand")
p3 <- ggseasonplot(myts[,3], year.labels=TRUE, year.labels.left=TRUE) +
theme(plot.title = element_text(hjust = 0.5)) + ylab("Arrivals in thousands") +
ggtitle("Seasonal plot: Quarterly Arrivals from U.K.")
p4 <- ggseasonplot(myts[,4], year.labels=TRUE, year.labels.left=TRUE) +
theme(plot.title = element_text(hjust = 0.5)) + ylab("Arrivals in thousands") +
ggtitle("Seasonal plot: Quarterly Arrivals from U.S.")
gridExtra::grid.arrange(p1, p2, p3, p4, ncol = 2)p1 <- ggsubseriesplot(myts[,1]) +
theme(plot.title = element_text(hjust = 0.5)) + # to center the plot title+
ylab("Arrivals in thousands") +
ggtitle("Seasonal subseries plot: Japan")
p2 <- ggsubseriesplot(myts[,2]) +
theme(plot.title = element_text(hjust = 0.5)) + # to center the plot title+
ylab("Arrivals in thousands") +
ggtitle("Seasonal subseries plot: New Zealand")
p3 <- ggsubseriesplot(myts[,3]) +
theme(plot.title = element_text(hjust = 0.5)) + # to center the plot title+
ylab("Arrivals in thousands") +
ggtitle("Seasonal subseries plot: U.K.")
p4 <- ggsubseriesplot(myts[,4]) +
theme(plot.title = element_text(hjust = 0.5)) + # to center the plot title+
ylab("Arrivals in thousands") +
ggtitle("Seasonal subseries plot: U.S.")
gridExtra::grid.arrange(p1, p2, p3, p4, ncol = 2)There appears to be a seasonality in the time series for all countries. With the exception of New Zealand, the peaks occur in the fourth quarter and subsequently falling in the first quarter of the following year. This makes sense given that Australia is in the southern hemisphere, and so December would be a summer month there, while it would be winter month in the three countries except New Zealand. Unsurprisingly, for New Zealand, the peak quarter is 3, which would be spring for both countries. There is a generally upward trend for all but Japan, for which the trend seems to have reversed in late 1990s. Historically, this makes sense since Japan was swept up in the Asian Financial Crisis that started in 1997 and lasted until late 1998, following which Japan began its longest-lasting stagflation in the first decade of 2000. There does not appear to be any sign.
In seasonal plot and subseries plot,
Below on the top left is the seasonal plot for tourists arriving from Japan. Interestingly, the number of arrivals was more or less flat throughout the year for all years until 1987, after which the arrivals started displaying a zigzagging pattern, starting high in Q1, then falling in Q2, then rising again in Q3, then falling in Q4.
To the right is New Zealand. Unlike Japan, New Zealand tourists clearly exhibit a constant seasonality for all years, starting low in Q1 then gradually increasingly throughout the year for most years. For the first decade in 2000, this increasing trend in each year seems to have peaked in Q3 then falling slightly in Q4.
Below is the seasonal plot for the U.K. In stark contrast to the two earlier countries, the volumes of tourists from the U.K. seem to have followed a U-shaped pattern for all years, with the high number coming in Q1, then dipping sharply in Q2 and staying flat through Q3, then rising sharply in Q4. What is amazing is that this trend seems to have continued for all years in the sample, and hence makes U.K. a highly seasonal tourist country for Australia.
In the US plot, There is a clear upward trend in annual volume as seen by the clear shift between each line from earlier years to later years. The quarterly trend within each year is more varied than the one for the U.S., although the U-shaped density appears to have been followed for most of the years, with the exception of early 1990s. In particular, the sudden spike in Q3 of 1991 stands out as a clear anomaly. This is a bit strange given the fact that the oil price had surged due to the U.S. entering a war against Iraq during that time period. One would surmise that this would have made the cost of travel higher.
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))Explore your chosen retail time series using the following functions:
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
from the time series, we can see a clear seasonal pattern and a upward trend.
Seasonal plot shows that there are seasonal patterns. The plot also show that there is a typical big jump every year in December and a drop in February. The sales increase in fall, peaking between November and December, after that decreasing after January. Maybe that can happen because coincide with holiday shopping and sales for Christmas.
Seasonal subseries offers a new perspective on seasonality by showing the monthly mean values. We can see a large increase from November to December and a decrease from December to February, but also a small decreasing trend in turnover from January to June and a similar increase from July to November, before the big spike from November to December.
In lag graph, the data was difficult to analyze. There are some negative and positive relationship, but because of the high number of graph and the fact that this is a monthly graph, it’s hard to tell the different.
ACF shows powerful statistically significant autocorrelation, which showing that whe lagged values get a linear relationship. The small drop in amount of the ACF over time supports a pattern, as does seasonality with spikes at 12 and 24 in December.
autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and us_gasoline.autoplot()
ggseason()
gg_subseries()
gg_lag()
ACF()
autoplot()
## Warning: Removed 20 row(s) containing missing values (geom_path).
ggseason()
## Warning: Removed 20 row(s) containing missing values (geom_path).
gg_subseries()
## Warning: Removed 5 row(s) containing missing values (geom_path).
gg_lag()
## Warning: Removed 20 rows containing missing values (gg_lag).
ACF()
autoplot()
gg_subseries()
gg_lag()
ACF()
autoplot()
ggseason()
gg_subseries()
ACF()
autoplot()
## Plot variable not specified, automatically selected `.vars = Barrels`
ggseason()
## Plot variable not specified, automatically selected `y = Barrels`
gg_subseries()
## Plot variable not specified, automatically selected `y = Barrels`
gg_lag()
## Plot variable not specified, automatically selected `y = Barrels`
ACF()
## Response variable not specified, automatically selected `var = Barrels`
* 1 with B * 2 with A * 3 with D * 4 with C
aus_livestock data contains the monthly total number of pigs slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018. Use filter() to extract pig slaughters in Victoria between 1990 and 1995. Use autoplot() and ACF() for this data. How do they differ from white noise? If a longer period of data is used, what difference does it make to the ACF?VicPig<- aus_livestock %>%
filter(State == "Victoria",
Animal == "Pigs",
between(year(Month),1990,1995))
VicPig %>% ACF(Count) %>% autoplot()Almost all of the spikes are out of the bounds which is confirming the series is not white noise.
dgoog <- gafa_stock %>%
filter(Symbol == "GOOG", year(Date) >= 2018) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE) %>%
mutate(diff = difference(Close))Because when w ise a filter for the GOOG symbol, the interval for GOOG by Date will scrambled, misleading, and different for each row, so we need to change out index to use the row sequence number so that the interval for each row is the same.
google_stock <- gafa_stock %>%
filter(Symbol == "GOOG") %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE)
google_stock %>%
ACF(difference(Close)) %>%
autoplot()There are 30 lags in total. 5% of 30 is:
## [1] 1.5
so, if there are more than 1.5 lags out of the bound, the data seies is not white noises.
From the plot we can see that there are 3 lags that out of the bound, then the data series is not white noise.