library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.1
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## âś” tibble 3.2.1 âś” tsibble 1.1.6
## âś” dplyr 1.1.4 âś” tsibbledata 0.4.1
## âś” tidyr 1.3.1 âś” feasts 0.4.1
## âś” lubridate 1.9.3 âś” fable 0.4.1
## âś” ggplot2 3.5.1
## Warning: package 'tsibble' was built under R version 4.4.1
## Warning: package 'feasts' was built under R version 4.4.1
## Warning: package 'fabletools' was built under R version 4.4.1
## Warning: package 'fable' was built under R version 4.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## âś– lubridate::date() masks base::date()
## âś– dplyr::filter() masks stats::filter()
## âś– tsibble::intersect() masks base::intersect()
## âś– tsibble::interval() masks lubridate::interval()
## âś– dplyr::lag() masks stats::lag()
## âś– tsibble::setdiff() masks base::setdiff()
## âś– tsibble::union() masks base::union()
library(ggplot2)
library(USgas)
library(readxl)
2.1 Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.
Time series for Bricks from aus_production
?aus_production
autoplot(aus_production,Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
Time series for Lynx from pelt
?pelt
autoplot(pelt,Lynx)
Time series for Close from gafa_stock
?gafa_stock
autoplot(gafa_stock,Close)
Time series for Demand from vic_elec
?vic_elec
vic_elec
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
## 7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
## 8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
## 9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
## 10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
## # ℹ 52,598 more rows
autoplot(vic_elec,Demand)+
labs(x="Half-hour intervals", y= "Electricity Demand [MWh]", title = "Electricity demand, Victoria")
2.2 Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
gafa_stock|>
group_by(Symbol)|> #grouped by symbols to find the max closing for each company
filter(Close == max(Close))|>
select(Symbol,Date,Close)#display companny symbol, date and their max closing
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
2.3 Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1<-read.csv("https://otexts.com/fpp3/extrafiles/tute1.csv")
view(tute1)
#convert data into a time series
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
#Construct time series plots of each of the three series
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Without facet_grid the the times series are placed together on the same
grid, while facet_grid seperate each time series into seperate charts
but share the same x axis.
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
2.4 The USgas package contains data on the demand for natural gas in the US. Install the USgas package. Create a tsibble from us_total with year as the index and state as the key. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
head(us_total)
## year state y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
GasConsumption<-us_total|>
as_tsibble(key=state, index = year)
New_England_area<-c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island") #list for New Enland area
GasConsumption|>
filter(state %in% New_England_area)|> #in state search for states from New England and filter them out
autoplot(y)+ #plot yearly total of gas consumption
labs(x="Year", y="Gas Consumption [Million Cubic feet]", title = "Natural Consumption by State in New England Area")
2.5 Download tourism.xlsx from the book website and read it into R using readxl::read_excel(). Create a tsibble which is identical to the tourism tsibble from the tsibble package. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
tourism_copy<-read_excel("Downloads/tourism.xlsx")
TourTsibble<- tourism_copy|>
mutate(Quarter = yearquarter(Quarter))|>
as_tsibble(index= Quarter , key=c(Region, Purpose))# created tsibble
#find max overnight trips on average
Max_Tour<-TourTsibble|>
group_by(Region, Purpose)|>
summarise(Trips=mean(Trips))|> #average for trips
ungroup()|> #to prevent getting a list of regions and purposes
filter(Trips==max(Trips))#filter out Region and purpose with max trip average
print(Max_Tour)
## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter Trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State
New_Tour<-TourTsibble|>
group_by(State)|>
summarise(Trips=sum(Trips))|>
ungroup()
print(New_Tour)
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ℹ 630 more rows
2.8 Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
us_employment
## # A tsibble: 143,412 x 4 [1M]
## # Key: Series_ID [148]
## Month Series_ID Title Employed
## <mth> <chr> <chr> <dbl>
## 1 1939 Jan CEU0500000001 Total Private 25338
## 2 1939 Feb CEU0500000001 Total Private 25447
## 3 1939 Mar CEU0500000001 Total Private 25833
## 4 1939 Apr CEU0500000001 Total Private 25801
## 5 1939 May CEU0500000001 Total Private 26113
## 6 1939 Jun CEU0500000001 Total Private 26485
## 7 1939 Jul CEU0500000001 Total Private 26481
## 8 1939 Aug CEU0500000001 Total Private 26848
## 9 1939 Sep CEU0500000001 Total Private 27468
## 10 1939 Oct CEU0500000001 Total Private 27830
## # ℹ 143,402 more rows
Private<-us_employment|>
filter(Title=="Total Private")
Private|>
autoplot(Employed)
Private|>
gg_season(Employed)
Private|>
gg_subseries(Employed)
Private|>
gg_lag(Employed)
Private|>
ACF(Employed)|>
autoplot()
Can you spot any seasonality, cyclicity and trend? In US employment there is a trend moving upwards, increasing over time, has some seasonality which seems to be around the summer time where employment seem to have a peak, the Lag plot proves the increase trend over the years. What do you learn about the series? I learned that series may have patterns of trends and seasonality but they still have somewhat of a repetitive pattern. What can you say about the seasonal patterns? Using the sub series plot I could see the employment increase started in April and then started to decrease after August. Can you identify any unusual years? In US employment had a decrease in 2008 and then after 2010 the employment started tom increase again back to normal.
aus_production
## # A tsibble: 218 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
## 7 1957 Q3 236 5317 227 603 5259 7
## 8 1957 Q4 320 6152 222 582 4735 6
## 9 1958 Q1 272 5758 199 554 4608 5
## 10 1958 Q2 233 5641 229 620 5196 7
## # ℹ 208 more rows
aus_production|>
autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production|>
gg_season(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production|>
gg_subseries(Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production|>
gg_lag(Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production|>
ACF(Bricks)|>
autoplot()
Can you spot any seasonality, cyclicity and trend? In for bricks in US production there is a trend moving upwards, increasing over time until it reached it’s peak around 1980. There seems to be seasonality where the peak is in every 3rd quarter. The data also seems to be cyclical based on some of the repetitive patterns that aren’t fixed.
pelt
## # A tsibble: 91 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
## 7 1851 74600 5560
## 8 1852 75090 5080
## 9 1853 88480 10170
## 10 1854 61280 19600
## # ℹ 81 more rows
pelt|>
autoplot(Hare)
pelt|>
gg_subseries(Hare)
pelt|>
gg_lag(Hare)
pelt|>
ACF(Hare)|>
autoplot()
Using the lag plot we can see the Hare in pelt had a cyclic pattern as the peak patterns repeat over the years instead of just within a year, the cycle seem to repeat every 10 years. Makes sense why a season plot can’t be viewed.
HO2 <- PBS|>
filter(ATC2 == "H02")
HO2|>
autoplot(Cost)
HO2|>
gg_season(Cost)
HO2|>
gg_subseries(Cost)
HO2|>
ACF(Cost)|>
autoplot()
There seems to be an upwards trend for cost in HO2, which can mean sense due to inflation where as time goes by the more expensive things become. There is also seasonality where there is a repeating pattern through out 12 months every 12 months for Concessional co-pay, Concessional co-pay, and general safety net. General copayment doesn’t seem to have a pattern.
us_employment
## # A tsibble: 143,412 x 4 [1M]
## # Key: Series_ID [148]
## Month Series_ID Title Employed
## <mth> <chr> <chr> <dbl>
## 1 1939 Jan CEU0500000001 Total Private 25338
## 2 1939 Feb CEU0500000001 Total Private 25447
## 3 1939 Mar CEU0500000001 Total Private 25833
## 4 1939 Apr CEU0500000001 Total Private 25801
## 5 1939 May CEU0500000001 Total Private 26113
## 6 1939 Jun CEU0500000001 Total Private 26485
## 7 1939 Jul CEU0500000001 Total Private 26481
## 8 1939 Aug CEU0500000001 Total Private 26848
## 9 1939 Sep CEU0500000001 Total Private 27468
## 10 1939 Oct CEU0500000001 Total Private 27830
## # ℹ 143,402 more rows
us_gasoline|>
autoplot(Barrels)
us_gasoline|>
gg_season(Barrels)
us_gasoline|>
gg_subseries(Barrels)
us_gasoline|>
gg_lag(Barrels)
us_gasoline|>
ACF(Barrels)|>
autoplot()
The US gasoline seems to have a upwards trend but seemed ot have a drop around 2007, there also seems to be cyclic pattern that isn’t repeating on a fixed time.