Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
Use ? (or help()) to find out about the data in each series.
What is the time interval of each series?
bricks <- aus_production |>
select(Bricks)
lynx <- pelt |>
select(Lynx)
close <- gafa_stock |>
select(Close)
demand <- vic_elec |>
select(Demand)The time intervals of each series:
## # A tsibble: 6 x 2 [1Q]
## Bricks Quarter
## <dbl> <qtr>
## 1 189 1956 Q1
## 2 204 1956 Q2
## 3 208 1956 Q3
## 4 197 1956 Q4
## 5 187 1957 Q1
## 6 214 1957 Q2
## # A tsibble: 6 x 2 [1Y]
## Lynx Year
## <dbl> <dbl>
## 1 30090 1845
## 2 45150 1846
## 3 49150 1847
## 4 39520 1848
## 5 21230 1849
## 6 8420 1850
## # A tsibble: 6 x 3 [1D]
## # Key: Symbol [1]
## Close Date Symbol
## <dbl> <date> <chr>
## 1 79.0 2014-01-02 AAPL
## 2 77.3 2014-01-03 AAPL
## 3 77.7 2014-01-06 AAPL
## 4 77.1 2014-01-07 AAPL
## 5 77.6 2014-01-08 AAPL
## 6 76.6 2014-01-09 AAPL
## # A tsibble: 6 x 2 [30m] <Australia/Melbourne>
## Demand Time
## <dbl> <dttm>
## 1 4383. 2012-01-01 00:00:00
## 2 4263. 2012-01-01 00:30:00
## 3 4049. 2012-01-01 01:00:00
## 4 3878. 2012-01-01 01:30:00
## 5 4036. 2012-01-01 02:00:00
## 6 3866. 2012-01-01 02:30:00
Use autoplot() to produce a time plot of each series.
For the last plot, modify the axis labels and title.
autoplot(bricks, Bricks) +
labs(title="Quarterly Bricks Production (AUS)",
x= "Quarter",
y= "# of Bricks Produced (Millions)") +
theme_minimal()autoplot(lynx, Lynx) +
labs(title = "Lynx Data",
x= "Year (Annual)",
y = "Number of Lynx Pelt Traded") +
theme_minimal()First, we need to group the data by Symbol so then we
can filter Close to have the maximum or peak closing price
for each stock. We then select the columns we want to see which are
Date, Symbol, and Close.
## # A tsibble: 4 x 3 [1D]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Date Symbol Close
## <date> <chr> <dbl>
## 1 2018-10-03 AAPL 232.
## 2 2018-09-04 AMZN 2040.
## 3 2018-07-25 FB 218.
## 4 2018-07-26 GOOG 1268.
Code provided by textbook. We downloaded the tute1.csv file and used
view() to examine the data. Using the as_tsibble() function, we convert
the data to a time series where the time interval is quarterly setting
the index to Quarter. Plotting the timeseries a line plot
using ggplot() and geom_line() and utilized facet_grid() to create a
subgrid of the timeseries.
You can read the data into R with the following script:
Convert the data to time series
Construct time series plots of each of the three series
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
labs(title = "No facet_grid()") +
geom_line() Check what happens when you don’t include facet_grid().
First, we install the package USgas and converted Usgas to a
timeseries using as_tsibble() and setting the index to year
and key to state. Afterwards, we wanted to see the gas
consumption for the following states; Maine, Vermont, New Hampshire,
Massachussetts, Connecticut and Rhode Island. To do that using the
filter() function and used autoplot() to visualize the timeseries.
## year state y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
filtered_us_total <- us_total |>
filter(state %in% c("Maine", "Vermont", "New Hamposhire", "Massachusetss", "Connecticut", "Rhode Island"))
autoplot(filtered_us_total, y ) +
labs(title='US Annual Total Natural Gas Consumption',
x = "Year",
y = "Total Natural Gas Consumption (MMcf") +
theme_minimal()Using read_excel() from readxl package to load in the data, in order
to create a tsibble object that is identical to the tourism tsibble from
the tsibble package. Created the tourism tsibble object with setting the
index to Quarter, a timeseries with quarterly time interval, and setting
the key to Region, State, and
Purpose to mimic the original tourism tsibble. Then, we
wanted find which combination of Region and
Purpose had the highest average of overnight trips. We did
that by first grouping the timeseries by the Region and
Purpose then using summarize() to calculate the mean() of
Trips. Next, filtered the timeseries for the max overnight
trips of each possible combination. FInally, arranged() the output in
descending order to show the the combination with the highest average of
overnight trips.
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
tourism_2 <- data |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter, key = c(Region, State, Purpose))
head(tourism_2)## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
tourism_2 |>
group_by(Region, Purpose) |>
summarize(AverageTrips = mean(Trips, na.rm = TRUE)) |>
filter(AverageTrips == max(AverageTrips)) |>
arrange(desc(AverageTrips)) |> head()## # A tsibble: 6 x 4 [1Q]
## # Key: Region, Purpose [6]
## # Groups: Region [6]
## Region Purpose Quarter AverageTrips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
## 2 Sydney Business 2001 Q4 948.
## 3 South Coast Holiday 1998 Q1 915.
## 4 North Coast NSW Holiday 2016 Q1 906.
## 5 Brisbane Visiting 2016 Q4 796.
## 6 Gold Coast Holiday 2002 Q1 711.
tourism_2 |>
group_by(State) |>
summarize(TotalTrips = sum(Trips)) |>
as_tsibble(index = Quarter) |>
head()## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## State Quarter TotalTrips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
Created a function that will generate the following plots each timeseries input, autoplot(), gg_season(), gg_subseries(), gg_lag(), and ACF(), since we will be performing similar operation on multiple timeseries. While answering the following question:
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
plot_time_series <- function(data) {
p1 <- autoplot(data) + theme_minimal()
p2 <- gg_season(data) + theme_minimal()
p3 <- gg_subseries(data) + theme_minimal()
p4 <- gg_lag(data, geom = "point") + theme_minimal()
p5 <- ACF(data) |>
autoplot() + theme_minimal()
print(p1)
print(p2)
print(p3)
print(p4)
print(p5)
}plot_time_series_without_gg_season <- function(data) {
p1 <- autoplot(data) + theme_minimal()
p3 <- gg_subseries(data) + theme_minimal()
p4 <- gg_lag(data, geom = "point") + theme_minimal()
p5 <- ACF(data) |>
autoplot() + theme_minimal()
print(p1)
print(p3)
print(p4)
print(p5)
}Overall, this time series has only linearly increasing trend. Some may argue that there may be a seasonal aspect to it when taking a look at the peaks and valleys generated from autoplot() but looking at gg_season() plot it is almost perfect parallel line indicating little to no seasonality. Look the plots, we can say that job growth in private sector in the US has been linearly increasing over the years. It seems that the years 2001 and 2008 had impacts on job growth where we a decrease in the number of employed in the private sector. Moreover, there is a strongt positive correlation between lag as n is from 1 to 9 due the general increasing trend of the timeseries.
From 1960 - 1970s, there is an increased trend of bricks production in Australia. Then, that trend starts decreasing in the 1980s. The plots do not suggest any cyclic behavior but I suspect there exists some seasonality within each quarter. The plots supports my suspicion of a strong seasonality in the time series with its strong positive correlation from lag 1 to 9.
This timeseries demonstrates neither an increasing nor decreasing trend but rather a cyclic pattern. Now, taking a look the lag plots there is no obvious correlation from prior time intervals. Moreover, the ACF() plots supports the observation that this time series exhibits cyclical behavior.
gg_season() doesnt work on this particular time series since a season suggest sub-yearly data.
PBS time series exhibits a somewhat increasing trend along with a seasonal patterns where cost is usually at its peak during December. From this time series, we learned that Australian presciption cost is at its lowest in the month of February, repeating for each consecutive years.
h02 <- PBS |>
filter(ATC2 == "H02") |>
as_tibble() |>
select(Month, Cost) |>
group_by(Month) |>
summarize(Cost = sum(Cost)) |>
as_tsibble(index = Month)Us_gasoline time series increasing trend with a seasonal pattern. When inspecting the lag plot, all 52 weeks are closely grouped together along the positive correlation line indicating that us gas price generally increase every year. Then, this is supported by AFC() plot where the values are all above 0.50 correlation.