── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ tsibble::interval() masks lubridate::interval()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(USgas)library(readxl)
Chapter 2: Exercises
Question 1
Bricks from aus_production
Time interval is 1 Quarter
#Using '?' gets meta-data from data that's within packages?aus_production#Calling the dataset allows us to observe features such as the time interval#Limiting print to 5, to keep report more consiseaus_production |>print(n =5)
#max() function finds the max value of numeric variable#and .by groups by selected variablegafa_stock |>filter(Close ==max(Close), .by = Symbol)
# A tsibble: 4 x 8 [!]
# Key: Symbol [4]
Symbol Date Open High Low Close Adj_Close Volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
Question 3
tute1 <-read_csv("~/Downloads/tute1.csv")
Rows: 100 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (3): Sales, AdBudget, GDP
date (1): Quarter
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
us_total_timeseries <-tsibble(us_total, key = state, index = year)
#Observe the format for stat namesus_total_timeseries |>count(state)
# A tibble: 53 × 2
state n
<chr> <int>
1 Alabama 23
2 Alaska 23
3 Arizona 23
4 Arkansas 23
5 California 23
6 Colorado 23
7 Connecticut 23
8 Delaware 23
9 District of Columbia 23
10 Federal Offshore -- Gulf of Mexico 21
# ℹ 43 more rows
#Use grepl in filter to shorthand type state namesus_total_timeseries |>filter(grepl("main|verm|new hamp|massac|connec|rhode", state, ignore.case =TRUE)) |>mutate(y = y /1e3) |>autoplot(y) +labs(title ="New England Gas Consumption by State") +ylab("Gas Consumption (thousands)")
#Observe time interval and keytourism |>print(n =5)
# A tsibble: 24,320 x 5 [1Q]
# Key: Region, State, Purpose [304]
Quarter Region State Purpose Trips
<qtr> <chr> <chr> <chr> <dbl>
1 1998 Q1 Adelaide South Australia Business 135.
2 1998 Q2 Adelaide South Australia Business 110.
3 1998 Q3 Adelaide South Australia Business 166.
4 1998 Q4 Adelaide South Australia Business 127.
5 1999 Q1 Adelaide South Australia Business 137.
# ℹ 24,315 more rows
#Convert downloaded dataset to a tsibble with same index and key as tourism#df from tsibble packagetextbook_tourism <- textbook_tourism |>mutate(Quarter =yearquarter(Quarter)) |>tsibble(key =c(Region, State, Purpose), index = Quarter)#Checks that our datasets are nearly identicalall.equal(tourism, textbook_tourism)
[1] TRUE
#Find the avg trips by region and purpose and filter to the highest valuetourism |>group_by(Region, Purpose) |>summarise(avg_trips =mean(Trips)) |>arrange(desc(avg_trips)) |>head(1)
# A tsibble: 1 x 4 [1Q]
# Key: Region, Purpose [1]
# Groups: Region [1]
Region Purpose Quarter avg_trips
<chr> <chr> <qtr> <dbl>
1 Melbourne Visiting 2017 Q4 985.
time plot Based on the time plot, we can see a clear positive trend in this time series, as well as seasonality. There also appears to be a cycle of steady rising followed by short periods of decline.
seasonal plot Based on the seasonal plot, we can see that there is a positive trend in this time series, because the years are descending downwards in the plot. It’s hard to see a clear seasonal trend, because the chart is quite busy, and also there may not be any.
total_private |>gg_season(Employed)
seasonal subseries plot Based on the seasonal subseries plot, we a consistent positive trend in all months. The averages are fairly similar across all months, furthering our suspicion that there isn’t actually seasonality.
total_private |>gg_subseries(Employed)
lag plot Every lag plot is nearly perfectly linear, further proving that there is no seasonality. I made the choice to transform the tsibble, changing the index from 1M to 1Q, so that I could see seasonal multiples in a 9 panel grid.
ACF and ACF plot Based on the ACF coefficients, we can see that there is a strong positive correlation between all of the lagged values. Once again, it is hard to see any seasonality, but there is a clear trend.
time plot The time plot shows us that this time series has a steep upward trend up to half-way point, then a mild downward trend. There appears to be strong seasonality, as well as cyclic deep depressions. This cycle seems to start prominently in 1975, and reoccurs about every 5-10 years from then on. Notably, in about 1983, production has its biggest fall.
aus_production |>autoplot(Bricks)
Warning: Removed 20 rows containing missing values or values outside the scale range
(`geom_line()`).
seasonal plot Based on the seasonal plot, production seems to peak in Q2 and Q3, especially in Q3 (except some exceptions). There’s also several years where there is a sharp decline in production in Q3 and Q4.
aus_production |>gg_season(Bricks)
Warning: Removed 20 rows containing missing values or values outside the scale range
(`geom_line()`).
seasonal subseries plot The subseries plots shows us what the seasonality looks like. Production tends to increase from Q1-Q3 and then decrease from Q3-Q1.
ACF and ACF plot Based on the lag plot and ACF, we can see there is a strong positive correlation in all lags, which provides further evidence for a trend in the time series. But, the autocorrelation coefficient decreases greatly with each lag, until it eventually dips below the significance level 38 quarters (9.5 years) into the time series. This tells us that past data may not be a good predictor of values at 10+ years into the future. The ACF plot also displays peaks at seasonal intervals (multiples of 4), providing more evidence for seasonality.
time plot Based on the time plot, we can see a strong cyclical pattern in this time series. There appears to be lull periods of about 3-5 years, then trading will shoot up and stay around there over the course of a few years.
pelt |>autoplot(Hare)
seasonal plot
subseries plot The seasonal and subseries plots don’t appear to work with the pelt data, because the index is 1 year. I couldn’t find a solution to this
pelt |>gg_subseries(Hare)
lag plot The lag plots show us that there is a strong positive correlation in lag 1. That correlation diminishes with each lag, but seemingly retursn a bit after lag 8
pelt |>gg_lag(Hare, geom ="point")
ACF and ACF plot ACF reveals that the autocorrelation coefficient ebbs and flows between positive and negative at an interval of about 3-5 lags. I’m not entirely sure, but I think this is evidence of a cyclic effect.
time plot In this time plot, we see a variety of behaviors depending on the group. Most of the groups don’t appear to have a trend, but most appear to have seasonality and/or cyclic pattern.
seasonal plot Again, we’re seeing a variety of seasonal patterns. The Safety net group are fairly similar in that they peak in around Q3/Q4, but there yearly trends are different. the Co-payments group is more variant.
ho2 |>mutate(Cost = Cost /1e3) |>gg_season(Cost)
subseries plot The subseries plots further my previous analyses. Most of the plots display seasonality, albeit in a variety of ways.
lag plot I realized I needed to adjust the dataset so that the key was just ATC2. After running the lag plot, I can see a strong positive trends for every lag. The correlation looks extra strong at the seasonal multiples.
`summarise()` has grouped output by 'Quarter'. You can override using the
`.groups` argument.
ACF and ACF plot The ACF plot shows us that the autocorrelation coefficient ebbs and flows from really high to really low positive correlation. I think this suggests a cyclic pattern.
time plot Based on the time plot, we can see the time series starts with an upward trend, then eventually plateaus. There also appears to be seasonality.
us_gasoline |>autoplot(Barrels)
seasonality plot The seasonality plot looks like the barrels are lower in the fall/winter months and then build higher into the summer months. I decided to look at a monthly view to clear up some of the noise. This view is quite interesting, as there seems to be a consistent seasonality pattern of 2 month intervals.
subseries plot The subseries plot shows further evidence for seasonality. February looks interesting because it has a very different pattern from the other months. It mostly shoots up over the span of the time series and has several lull years where Barrels growth slows, then sharply drops before rising again.
gas_month |>gg_subseries(Barrels)
lag plot
us_gasoline |>gg_lag(Barrels, geom ="point")
ACF and ACF plot The lag plot and ACF show us that there is strong positive correlation across all the lags.