Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.
help(aus_production)
help(pelt)
help(gafa_stock)
help(vic_elec)
##glimpse(aus_production)
autoplot(aus_production, Bricks) + ggtitle("Bricks Production Per Quarter")
Quarterly
#glimpse(pelt)
autoplot(pelt, Lynx) + ggtitle("Lynx Pelts Traded Per Year")
Yearly
#glimpse(gafa_stock)
autoplot(gafa_stock, Close) + ggtitle("GAFA Stock Close Prices Daily")
Daily
#glimpse(vic_elec)
autoplot(vic_elec, Demand) +
ggtitle("Electricity Demand Per Half Hour") +
labs(x = "Time (half hour)", y = "Elec Demand (Megawatts)")
Every-half-hour
peak_closing_prices <- gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close, na.rm = TRUE)) %>%
select(Symbol, Date, Close)
peak_closing_prices
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
1 AAPL 2018-10-03 232. 2 AMZN 2018-09-04 2040. 3 FB 2018-07-25 218. 4 GOOG 2018-07-26 1268.
tute1 <- readr::read_csv("/Users/aelsaeyed/Downloads/tute1.csv")
View(tute1)
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Facet_grid ensures that multiple subplots are made rather than one plot
with three lines on it
Install the USgas package. Create a tsibble from us_total with year as the index and state as the key. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
us_gas_tsibble <- as_tsibble(us_total, key = state, index = year)
new_england_states <- c("Maine", "Vermont", "New Hampshire",
"Massachusetts", "Connecticut", "Rhode Island")
new_england_gas <- us_gas_tsibble %>%
filter(state %in% new_england_states)
autoplot(new_england_gas, y) +
facet_grid(state ~ .) +
ggtitle("Annual Natural Gas Consumption in New England States") +
labs(x = "Year", y = "Total Consumption (Billion Cubic Feet)")
tourism <- readxl::read_excel("/Users/aelsaeyed/Downloads/tourism.xlsx")
View(tourism)
help(tourism)
tourism <- tourism %>%
mutate(Quarter = yearquarter(Quarter)) #quarter is a string in the downloaded file, changing it to yearquarter
tourism_tsibble <- tourism %>%
as_tsibble(key = c(Region, Purpose), index = Quarter)
max_average_trips <- tourism_tsibble %>%
group_by(Region, Purpose) %>%
summarise(avg_trips = mean(Trips, na.rm = TRUE)) %>%
arrange(desc(avg_trips)) %>%
slice(1)
max_average_trips
## # A tsibble: 76 x 4 [1Q]
## # Key: Region, Purpose [76]
## # Groups: Region [76]
## Region Purpose Quarter avg_trips
## <chr> <chr> <qtr> <dbl>
## 1 Adelaide Visiting 2017 Q1 270.
## 2 Adelaide Hills Visiting 2002 Q4 81.1
## 3 Alice Springs Holiday 1998 Q3 76.5
## 4 Australia's Coral Coast Holiday 2014 Q3 198.
## 5 Australia's Golden Outback Business 2017 Q3 174.
## 6 Australia's North West Business 2016 Q3 297.
## 7 Australia's South West Holiday 2016 Q1 612.
## 8 Ballarat Visiting 2004 Q1 103.
## 9 Barkly Holiday 1998 Q3 37.9
## 10 Barossa Holiday 2006 Q1 51.0
## # ℹ 66 more rows
“Visiting” the region Adelaide is the combo with the most average trips per quarter.
state_total_trips_tsibble <- tourism_tsibble %>%
group_by(State) %>%
summarise(total_trips = sum(Trips, na.rm = TRUE)) %>%
as_tsibble(key = State, index = Quarter)
state_total_trips_tsibble
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter total_trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ℹ 630 more rows
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
#glimpse(us_employment)
total_private_employment <- us_employment %>%
filter(Title == "Total Private")
autoplot(total_private_employment, Employed) +
ggtitle("Total Monthly Private Employment") +
labs(x = "Month", y = "Number Employed")
gg_season(total_private_employment, Employed) +
ggtitle("Seasonal Plot of Total Private Employment")
gg_subseries(total_private_employment, Employed) +
ggtitle("Subseries Plot of Total Private Employment")
gg_lag(total_private_employment, Employed) +
ggtitle("Lag Plot of Total Private Employment")
ACF(total_private_employment, Employed) %>%
autoplot() +
ggtitle("ACF of Total Private Employment")
These graphs show an interesting upward trend in total private employment. Every year the total private employment goes up but there are dips in certain years. Looking at it from decade to decade, there seems to be a big mid-decade dip in the 40’s, 50’s, 70’s, 80’s, 90’s, and 00’s, with the other years being outliers. More granularly it seems that each year sees a rise and fall in employment which indicates a seasonal cycle within each year as well as in each decade.
# Autoplot
autoplot(aus_production, Bricks) + ggtitle("Bricks Production Over Time")
# Seasonality
gg_season(aus_production, Bricks)
# Subseries plot
gg_subseries(aus_production, Bricks)
# Lag plot
gg_lag(aus_production, Bricks)
# Autocorrelation
ACF(aus_production, Bricks) %>% autoplot()
These graphs show that brick production gradually increased as a general trend from the 60s onward and hit their peak in the early 80, after which production started to fall. We also see some mid-decade dips in productivity and a cyclical rise and fall from year to year. Brick making might be seasonal- Q1 sees markedly less brick production than 2, and 3 is the peak. Q4 sees a gradual reduction.
# Autoplot
autoplot(pelt, Hare) + ggtitle("Hare Pelts Traded Over Time")
# Subseries plot
gg_subseries(pelt, Hare)
# Lag plot
gg_lag(pelt, Hare)
# Autocorrelation
ACF(pelt, Hare) %>% autoplot()
#gg_season(pelt, Hare)
When trying to run gg_season, I got the error: “The data must contain at least one observation per seasonal period.” It seems there may be missing data and/or its not granular enough.
Overall it looks like pelt sales rise and spike mid-decade each decade before falling quickly, with particularly big spikes in the mid 60s and mid 80s.
pbs_ho2_only <- PBS %>%
filter(ATC2 == "H02")
#glimpse(pbs_ho2_only)
# Autoplot
autoplot(pbs_ho2_only, Cost) + ggtitle("H02 Cost Over Time")
# Seasonality
gg_season(pbs_ho2_only, Cost)
# Subseries plot
gg_subseries(pbs_ho2_only, Cost)
# Autocorrelation
ACF(pbs_ho2_only, Cost) %>% autoplot()
I originally filtered the column ATC2 for “H02”, but it seems another column has several categories. From the ACF plot above I could see that it was the Concession and Type columns, so I filtered those and tried it again.
pbs_ho2_concession_type <- pbs_ho2_only %>%
filter(Concession == "Concessional") %>%
filter(Type == "Co-payments")
#glimpse(pbs_ho2_concession_type)
# Lag plot
gg_lag(pbs_ho2_concession_type, Cost)
There is a general upward trend for all the insurance copays, with a
general trend of lower copays in the early year and higher in the late
year for safety net. Concessional copays have the opposite trend, with
no easily identifiable trend for general copays.
# Autoplot
autoplot(us_gasoline, Barrels) + ggtitle("Barrels of Gasoline Over Time")
# Seasonality
gg_season(us_gasoline, Barrels)
# Subseries plot
gg_subseries(us_gasoline, Barrels)
# Lag plot
gg_lag(us_gasoline, Barrels)
# Autocorrelation
ACF(us_gasoline, Barrels) %>% autoplot()
The general trend of gasoline production is upwards, and there does seem to be some seasonality- theres a rise and fall every year. There is also a plateau in the trend in the years leading up to 2009, after which production dips.