Email : vanessasupit0910@gmail.com
RPubs : https://rpubs.com/vanessasupit/
Department : Business Statistics
Address : ARA Center, Matana University Tower
Jl. CBD Barat Kav, RT.1, Curug Sangereng, Kelapa Dua, Tangerang, Banten 15810.
gafa_stock
, PBS
, vic_elec
and pelt
represent.help("gafa_stock")
## starting httpd help server ... done
help("PBS")
help("vic_elec")
help("pelt")
%>% autoplot(Open) gafa_stock
gafa_stock is a time series of stock prices in $USD from 2014-2018 for Google, Amazon, Facebook and Apple.
%>% autoplot(Scripts) PBS
PBS is a time series of Australian monthly medicare with two values which is Scripts and Cost.
%>% autoplot(Demand) vic_elec
vic_elec is a time series of half-hourly electricity demand for Victoria, Australia with three values which is Demand, Temperature, and Holiday.
%>% autoplot(Hare) pelt
Pelt is a time series of pelt trading records Snowshoe Hare and Canadian Lynx furs from 1845 to 1935.
gafa_stock
interval(gafa_stock)
## <interval[1]>
## [1] !
There is no fixed interval for gafa stock
PBS
interval(PBS)
## <interval[1]>
## [1] 1M
gafa_stock has no fix interval
vic_elec
interval(vic_elec)
## <interval[1]>
## [1] 30m
Time interval of vic_elec is Half-Hourly
pelt
interval(pelt)
## <interval[1]>
## [1] 1Y
Time interval of pelt is Annual
filter()
to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.%>% group_by(Symbol) %>% filter(Close == max(Close)) gafa_stock
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
tute1.csv
here, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.setwd("C:/Users/User_Pro/Downloads")
<- read_csv("tute1.csv")
tute1 datatable(tute1,
extensions = 'FixedColumns',
option = list(scrollX = TRUE, fixedColumns = TRUE)
)
<- tute1 %>%
mytimeseries mutate(Quarter = yearmonth(Quarter)) %>%
as_tsibble(index = Quarter)
%>%
mytimeseries pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name))+
geom_line()+
facet_grid(name ~ ., scales = "free_y")
%>%
mytimeseries pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name))+
geom_line()
when you don’t include
facet_grid()
, all three sets are plotted on one graph rather than 3 different ones with the same x-axis values. Because the range of values differs significantly for each set, it makes it difficult to see patterns in individual sets. All the graphs are plotted on one plane, which makes reading all three a little difficult.
USgas
package.#install.packages("USgas")
library(USgas)
## Warning: package 'USgas' was built under R version 4.0.4
<- us_total %>%
tsibble_us as_tsibble(index = year, key = state)
<- tsibble_us %>%
new_englandgroup_by(state) %>%
filter(state %in% c('Maine', 'Vermont', 'New
Hampshire', 'Massachusetts',
'Connecticut' ,'Rhode Island')) %>%
ungroup()
%>% autoplot(y) new_england
tourism.xlsx
here and read it into R using readxl::read_excel()
.<- readxl::read_excel("tourism.xlsx") tourism
<- tourism %>% mutate(Quarter = yearquarter(Quarter) ) %>%
tsibble_tourism as_tsibble(index = Quarter, key = c(Region, State, Purpose))
tsibble_tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # ... with 24,310 more rows
%>% group_by(Region, Purpose) %>%
tsibble_tourism summarise(Trips = mean(Trips)) %>%
ungroup() %>%
filter(Trips == max(Trips))
## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter Trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
<- tsibble_tourism %>%
new_tsibble group_by(State) %>% summarise(Trips = sum(Trips))%>%
ungroup()
new_tsibble
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ... with 630 more rows
aus_production
%>% autoplot(Bricks) aus_production
pelt
%>% autoplot(Lynx) pelt
gafastock
%>% autoplot(Close) gafa_stock
vic_elec
%>% autoplot(Demand) vic_elec
%>% ggplot(aes(x= Date, y= Demand, group = Holiday)) +
vic_elec geom_line(aes(col=Holiday)) +
facet_grid(Holiday ~ ., scales ="free" )
aus_arrivals
data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US.datatable(aus_arrivals,
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
::em('data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US')),
htmltoolsextensions = 'FixedColumns',
option = list(scrollX = TRUE, fixedColumns = TRUE)
)
autoplot()
, gg_season()
and gg_subseries()
to compare the differences between the arrivals from these four countries.%>% autoplot(Arrivals) aus_arrivals
The largest number of arrivals came from New Zealand in the 1980s. The title country changed to Japan in the 1990s and returned to UK in the first decade of the twenty-first century.The arrival data of UK shows the biggest quarterly fluctuation.
%>% gg_season(Arrivals) aus_arrivals
%>% gg_subseries(Arrivals) aus_arrivals
set.seed(12345678)
<- aus_retail %>%
myseries filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Explore your chosen retail time series using the following functions:
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
%>% autoplot(Turnover) myseries
%>% gg_season(Turnover) myseries
%>% gg_subseries(Turnover) myseries
%>% gg_lag(Turnover) myseries
%>% ACF(Turnover) %>%
myseries autoplot()
From the autoplot, we can see a clear seasonal or cyclic pattern in the time series, and a upward trend.
The seasonal plot shows that there are indeed seasonal patterns. The plot also reveals that there is a typical big jump every year in December, and a drop in February. Sales begin to increase in the fall, peaking between November and December, then decreasing after January, likely to coincide with holiday shopping and sales for Christmas.
The seasonal subseries offers a new perspective on seasonality by showing the monthly mean values. We see a large increase from November to December and a decrease from December to February, but also a small, decreasing trend in turnover from January to June and a similar increase from July to November, before the big spike from November to December.
*In this lag graph, the data is difficult to analyze. We can see some negative and positive relationships, but due to the high number of graphs and the fact that this is a monthly graph, it’s hard to tell much different.
autoplot()
, gg_season()
, gg_subseries()
, gg_lag()
, ACF()
and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and us_gasoline.“Total Private” Employed from us_employment
<- us_employment %>%
Priv filter(Title == "Total Private")
%>% autoplot(Employed) Priv
%>% gg_season(Employed) Priv
%>% gg_subseries(Employed) Priv
%>% gg_lag(Employed) Priv
%>% ACF(Employed) Priv
## # A tsibble: 29 x 3 [1M]
## # Key: Series_ID [1]
## Series_ID lag acf
## <chr> <lag> <dbl>
## 1 CEU0500000001 1M 0.997
## 2 CEU0500000001 2M 0.993
## 3 CEU0500000001 3M 0.990
## 4 CEU0500000001 4M 0.986
## 5 CEU0500000001 5M 0.983
## 6 CEU0500000001 6M 0.980
## 7 CEU0500000001 7M 0.977
## 8 CEU0500000001 8M 0.974
## 9 CEU0500000001 9M 0.971
## 10 CEU0500000001 10M 0.968
## # ... with 19 more rows
Bricks from aus_production
%>% autoplot(Bricks) aus_production
%>% gg_season(Bricks) aus_production
%>% gg_subseries(Bricks) aus_production
%>% gg_lag(Bricks) aus_production
%>% ACF(Bricks) aus_production
## # A tsibble: 22 x 2 [1Q]
## lag acf
## <lag> <dbl>
## 1 1Q 0.900
## 2 2Q 0.815
## 3 3Q 0.813
## 4 4Q 0.828
## 5 5Q 0.720
## 6 6Q 0.642
## 7 7Q 0.655
## 8 8Q 0.692
## 9 9Q 0.609
## 10 10Q 0.556
## # ... with 12 more rows
Hare from pelt
%>% autoplot(Hare) pelt
%>% gg_subseries(Hare) pelt
%>% gg_lag(Hare) pelt
%>% ACF(Hare) pelt
## # A tsibble: 19 x 2 [1Y]
## lag acf
## <lag> <dbl>
## 1 1Y 0.658
## 2 2Y 0.214
## 3 3Y -0.155
## 4 4Y -0.401
## 5 5Y -0.493
## 6 6Y -0.401
## 7 7Y -0.168
## 8 8Y 0.113
## 9 9Y 0.307
## 10 10Y 0.340
## 11 11Y 0.296
## 12 12Y 0.206
## 13 13Y 0.0372
## 14 14Y -0.153
## 15 15Y -0.285
## 16 16Y -0.295
## 17 17Y -0.202
## 18 18Y -0.0676
## 19 19Y 0.0956
“H02” Cost from PBS
<- PBS %>% filter(ATC2 == "H02")
H02 %>% autoplot(Cost) H02
%>% gg_season(Cost) H02
%>% gg_subseries(Cost) H02
%>% ACF(Cost) H02
## # A tsibble: 92 x 6 [1M]
## # Key: Concession, Type, ATC1, ATC2 [4]
## Concession Type ATC1 ATC2 lag acf
## <chr> <chr> <chr> <chr> <lag> <dbl>
## 1 Concessional Co-payments H H02 1M 0.834
## 2 Concessional Co-payments H H02 2M 0.679
## 3 Concessional Co-payments H H02 3M 0.514
## 4 Concessional Co-payments H H02 4M 0.352
## 5 Concessional Co-payments H H02 5M 0.264
## 6 Concessional Co-payments H H02 6M 0.219
## 7 Concessional Co-payments H H02 7M 0.253
## 8 Concessional Co-payments H H02 8M 0.337
## 9 Concessional Co-payments H H02 9M 0.464
## 10 Concessional Co-payments H H02 10M 0.574
## # ... with 82 more rows
us_gasoline
%>% autoplot(Barrels) us_gasoline
%>% gg_season(Barrels) us_gasoline
%>% gg_subseries(Barrels) us_gasoline
%>% gg_lag(Barrels) us_gasoline
%>% ACF(Barrels) us_gasoline
## # A tsibble: 31 x 2 [1W]
## lag acf
## <lag> <dbl>
## 1 1W 0.893
## 2 2W 0.882
## 3 3W 0.873
## 4 4W 0.866
## 5 5W 0.847
## 6 6W 0.844
## 7 7W 0.832
## 8 8W 0.831
## 9 9W 0.822
## 10 10W 0.808
## # ... with 21 more rows
<- aus_livestock %>%
Victoria_Pigfilter(State == "Victoria",
== "Pigs",
Animal between(year(Month),1990,1995))
%>% ACF(Count) %>% autoplot() Victoria_Pig
The spikes are almost all outside bounds, meaning that the set is not white noise.
<- gafa_stock %>%
dgoog filter(Symbol == "GOOG", year(Date) >= 2018) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE) %>%
mutate(diff = difference(Close))
Since the interval for GOOG by Date would be chaotic and not the same for each row when we use a filter for the GOOG symbol, we need to change our index to use the row sequence number so that the interval for each row is the same.
<- gafa_stock %>%
google_stock filter(Symbol == "GOOG") %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE)
%>%
google_stock ACF(difference(Close)) %>%
autoplot()
5/100*30
## [1] 1.5
The data series is not white noises if there are more than 1.5 lags it beyond bound.
We can see from the plot above that there are three lags that are out of bounds, meaning that the data series is not white noise.