Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.
# Load required packages
library(fpp3)
library(tsibble)
library(ggplot2)
Quarterly estimates of selected indicators of manufacturing production in Australia. aus_production is a half-hourly tsibble with six values:
Beer: Beer production in megalitres. Tobacco: Tobacco and cigarette production in tonnes. Bricks: Clay brick production in millions of bricks. Cement: Portland cement production in thousands of tonnes. Electricity: Electricity production in gigawatt hours. Gas: Gas production in petajoules.
# View the first few rows of the aus_production dataset
head(aus_production)
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
# Get the list of dates from the 'Quarter' column.
start_date <- min(aus_production$Quarter)
end_date <- max(aus_production$Quarter)
start_date
## <yearquarter[1]>
## [1] "1956 Q1"
## # Year starts on: January
end_date
## <yearquarter[1]>
## [1] "2010 Q2"
## # Year starts on: January
The time interval for Aus_production is Quarterly and it extends from 1956 to 2010.
# Create a time series plot of Bricks production
aus_production |>
autoplot(Bricks)
In aus_production time series the frist half from mid 1900 to about 1975
there was a strong increasing trend. Lots of beers were being produced
and trend is going upward. Then we have a big decrease in productin
during mid 70s and mid 80s during Australia’s recession. From then on
there was increase and decrease in production. This is very cyclincal,
the data exhibit rises and falls that are not of a fixed frequency.
Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company. Pelt is an annual tsibble with two values:
Hare: The number of Snowshoe Hare pelts traded. Lynx: The number of Canadian Lynx pelts traded.
# View the first few rows of the Pelt dataset
head(pelt)
## # A tsibble: 6 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
# Get the list of dates from the 'Year' column.
start_date <- min(pelt$Year)
end_date <- max(pelt$Year)
start_date
## [1] 1845
end_date
## [1] 1935
The time interval for Pelt is Yearly and it extends from 1845 to 1935.
# Create a time series plot of Lynx
pelt |>
autoplot(Lynx)
In Pelt time series the pattern seems to be seasonal but is not seasonal cause annual data does not have seasonality. This time series has a smooth pattern so this makes it a cycle and a life cycle of the lynx
Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD. gafa_stock is a tsibble containing data on irregular trading days:
Open: The opening price for the stock. High: The stock’s highest trading price. Low: The stock’s lowest trading price. Close: The closing price for the stock. Adj_Close: The adjusted closing price for the stock. Volume: The amount of stock traded. Each stock is uniquely identified by one key:
Symbol: The ticker symbol for the stock.
# View the first few rows of the gafa_stock dataset
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
# Get the list of dates from the 'Year' column.
start_date <- min(gafa_stock$Date)
end_date <- max(gafa_stock$Date)
start_date
## [1] "2014-01-02"
end_date
## [1] "2018-12-31"
The time interval for gafa_stock is Yearly and it extends from 1845 to 1935.
# Plot the closing prices from the gafa_stock dataset
gafa_stock |>
autoplot(Close)
vic_elec is a half-hourly tsibble with three values:
Demand: Total electricity demand in MWh. Temperature: Temperature of Melbourne (BOM site 086071). Holiday: Indicator for if that day is a public holiday.
This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MWh, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MWh, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).
# View the first few rows of the vic_elec dataset
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
# Get the list of dates from the 'Year' column.
start_date <- min(vic_elec$Date)
end_date <- max(vic_elec$Date)
start_date
## [1] "2012-01-01"
end_date
## [1] "2014-12-31"
The time interval for vic_elec is half-hourly and it extends from 1/1/2012 to 12/31/2014.
# Modifying Axis and Title
vic_elec %>%
autoplot(Demand) +
labs(x = "Date", y = "Demand") +
ggtitle("Half-hourly Electricity Demand for Victoria, Australia")
In vic_elec time series is multiple seasonality. Dec., Jan., and Feb., summer month in Australia we see alot of variations and spikes in demand on very hot days, and alot of air conditioner happening. This is why we see the increase variation in the summer. Then as we move along the x axis to autum we have a lower demand thatn summer and higher in winter months, June, July, and August. Then we have spring which is fairly stable season in Victoria.
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
select(Symbol, Date, Close)
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
Apple had a Max at 232.07 on 10/03/2018. Amazon had a max of 2039.51 on 9/4/2018. Facebook had a max of 217.50 on 7/25/2018. Google had a max of 1268.33 on 7/26/2018.
# Upload csv file and view the first few rows of the aus_production dataset
tute1 <- read.csv("https://raw.githubusercontent.com/enidroman/Data-624-Predictive-Analytics/main/tute1.csv")
head(tute1)
## Quarter Sales AdBudget GDP
## 1 3/1/1981 1020.2 659.2 251.8
## 2 6/1/1981 889.2 589.0 290.9
## 3 9/1/1981 795.0 512.5 290.8
## 4 12/1/1981 1003.9 614.1 292.4
## 5 3/1/1982 1057.7 647.2 279.1
## 6 6/1/1982 944.4 602.0 254.0
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
Check what happens when you don’t include facet_grid().
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y") +
ggtitle("facet_grid")
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
ggtitle("No facet_grid")
When you don’t include facet_grid(), the plot will not be divided into separate panels; instead, you will get a single plot that displays all the data points together.
When you include facet_grid(), the plot will be split into multiple panels based on the levels of the categorical variable, making it easier to compare groups visually.
#install.packages('USgas')
library(USgas)
data("us_total")
str(us_total)
## 'data.frame': 1266 obs. of 3 variables:
## $ year : int 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
## $ state: chr "Alabama" "Alabama" "Alabama" "Alabama" ...
## $ y : int 324158 329134 337270 353614 332693 379343 350345 382367 353156 391093 ...
us_total <- us_total %>%
as_tibble(key = state,
index = year)
us_total %>%
filter(state %in% c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')) %>%
ggplot(aes(x = year, y = y, colour = state)) +
geom_line() +
facet_grid(state ~., scales = "free_y") +
labs(title = "Annual Natural Gas Consumption in New England",
y = "Consumption")
tourism <- read.csv("https://raw.githubusercontent.com/enidroman/Data-624-Predictive-Analytics/main/tourism.csv")
head(tourism)
## Quarter Region State Purpose Trips
## 1 1998-01-01 Adelaide South Australia Business 135.0777
## 2 1998-04-01 Adelaide South Australia Business 109.9873
## 3 1998-07-01 Adelaide South Australia Business 166.0347
## 4 1998-10-01 Adelaide South Australia Business 127.1605
## 5 1999-01-01 Adelaide South Australia Business 137.4485
## 6 1999-04-01 Adelaide South Australia Business 199.9126
tourism_ts <- tourism %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(key = c(Region, State, Purpose),
index = Quarter)
tourism_ts %>%
group_by(Region, Purpose) %>%
mutate(Avg_Trips = mean(Trips)) %>%
ungroup() %>%
filter(Avg_Trips == max(Avg_Trips)) %>%
distinct(Region, Purpose)
## # A tibble: 1 × 2
## Region Purpose
## <chr> <chr>
## 1 Sydney Visiting
Syndey, Australia has the maximum number of overnight trips on average for Visiting.
# Tsibble for Total Trips by State
total_trips_by_state <- tourism %>%
group_by(State) %>%
summarise(total_trips = sum(Trips)) %>%
arrange(desc(total_trips))
head(total_trips_by_state)
## # A tibble: 6 × 2
## State total_trips
## <chr> <dbl>
## 1 New South Wales 557367.
## 2 Victoria 390463.
## 3 Queensland 386643.
## 4 Western Australia 147820.
## 5 South Australia 118151.
## 6 Tasmania 54137.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
# This will produce a time series plot showing the number of people employed in the "Total Private" sector over time. The plot visualizes employment data over time.
data("us_employment")
us_employment %>%
filter(Title == "Total Private") %>%
autoplot(Employed) +
ggtitle("Auto Plot")
# This generates a seasonal decomposition plot for the Employed variable in the us_employment dataset, filtered to include only the "Total Private" sector. This plot helps visualize the seasonal patterns within the employment data.
us_employment %>% filter(Title == "Total Private") %>% gg_season(Employed) +
ggtitle("Seasonal Decompositon Plot")
# This generates a subseries plot for the Employed variable in the us_employment dataset, filtered to include only the "Total Private" sector. A subseries plot helps visualize and compare seasonal patterns across different periods, such as months or quarters.
us_employment %>%
filter(Title == "Total Private") %>%
gg_subseries(Employed) +
ggtitle("Subseries Plot")
# This is to generate a lag plot for the "Total Private" employment data from the us_employment dataset. A lag plot is useful for detecting autocorrelation in time series data by plotting the values of a variable against lagged versions of itself.
us_employment %>%
filter(Title == "Total Private") %>%
gg_lag(Employed) +
ggtitle("Lag Plot")
# This generates an Autocorrelation Function (ACF) plot for private-sector employment data using the us_employment dataset. An ACF plot is helpful in determining the degree of autocorrelation in a time series.
us_employment %>%
filter(Title == "Total Private") %>%
ACF(Employed) %>%
autoplot() +
ggtitle("Autocorrelation Function Plot")
In regards to “Total Private” Employed from us_employment, the data shows a strong upward trend with clear seasonal fluctuations. Meaning both seasonality and trend are present.There is no evidence of cyclical patterns in the series. Employment levels steadily increase over time, with stable seasonal patterns throughout the year. However, a slight rise occurs in June, which tapers off after December. Anomalies were observed in the years 2008/2010.
# This will produce a time series plot showing the number of people employed in the "Total Private" sector over time
data("us_employment")
data("aus_production")
aus_production %>%
autoplot(Bricks) +
ggtitle("Auto Plot")
# This generates a seasonal decomposition plot for the Bricks variable from the aus_production dataset. The plot visualizes seasonal patterns in the production of bricks over time.
aus_production %>%
gg_season(Bricks) +
ggtitle("Seasonal Decompositon Plot")
# This generates a subseries plot for the Bricks variable from the aus_production dataset.
aus_production %>%
gg_subseries(Bricks) +
ggtitle("Subseries Plot")
# This generates a lag plot for the Bricks variable from the aus_production dataset.
aus_production %>%
gg_lag(Bricks) +
ggtitle("Lag Plot")
# This generates an Autocorrelation Function (ACF) plot for the Bricks variable from the aus_production dataset.
aus_production %>%
ACF(Bricks) %>%
autoplot() +
ggtitle("Autocorrelation Function Plot")
In regards to Bricks from aus_production, the data reveals significant seasonality each year and shows clear cyclic behavior with a 40-year period. No noticeable trend is present over this timeframe. Brick production rises from the first to the third quarter, then declines in the fourth quarter. A marked drop in brick production is evident around 1983.
# This generates a time series plot for the Hare variable from the pelt dataset.
data("pelt")
pelt %>%
autoplot(Hare) +
ggtitle("Auto Plot")
For gg_season: The data must contain at least one observation per seasonal period.
#pelt %>%
#gg_season(Hare)
#ggtitle("Seasonal Decomposition")
# This generates a subseries plot for the Hare variable from the pelt dataset.
pelt %>%
gg_subseries(Hare)+
ggtitle("Subseries Plot")
# This generates a lag plot for the Hare variable from the pelt dataset.
pelt %>%
gg_lag(Hare) +
ggtitle("Lag Plot")
# This generates an Autocorrelation Function (ACF) plot for the Hare variable from the pelt dataset.
pelt %>%
ACF(Hare) %>%
autoplot() +
ggtitle("Autocorrelation Function Plot")
In regards to Hare from pelt, the data demonstrates cyclical behavior with no apparent trend or clear seasonality. The ACF plot indicates a recurring cycle approximately every ten years. The trend shows notable fluctuations over time. There is a significant drop before the 1860s. A marked increase occurs after 1860. There is another drop after the mid-1860s. An increase is observed before 1870. These fluctuations appear to follow a pattern of varying significantly approximately every 5 years. The significant drops and increases at these points, especially around the 1860s, can be considered unusual or noteworthy.
Please note for Seasonal Decomposition Plot, the data must contain at least one observation per seasonal period. In which it lacks. For a Seasonal Decomposition Plot to be effective, the dataset needs to include at least one data point for each season or period within the cycle. However, the current dataset lacks this requirement, meaning there are not enough observations to properly capture or analyze the seasonal patterns. Without sufficient data for each season, the decomposition may not yield accurate or meaningful results.
# This generates a time series plot for the Cost variable within the H02 category from the PBS dataset. This plot visualizes how the cost associated with the H02 category changes over time.
h_02 <- PBS %>% filter(ATC2 == "H02")
h_02 %>% autoplot(Cost) +
ggtitle("Auto Plot")
# This generates a seasonal decomposition plot for the Cost variable within the H02 category from the h_02 dataset.
h_02 %>% gg_season(Cost) +
ggtitle("Seasonal Decompositon Plot")
# This generates a subseries plot for the Cost variable within the H02 category from the h_02 dataset.
h_02 %>% gg_subseries(Cost) +
ggtitle("Subseries Plot")
The data provided to contains more than one time series. Please filter a single time series to use `gg_lag()
#h_02 %>% gg_lag(Cost) +
#ggtitle("Lag Plot")
# This generates an Autocorrelation Function (ACF) plot for the Cost variable within the H02 category from the h_02 dataset.
h_02 %>%
ACF(Cost) %>%
autoplot() +
ggtitle("Autocorrelation Function Plot")
In regards to “H02” Cost from PBS, The data displays strong annual seasonality and distinct cyclic behavior, but no noticeable trend.From 1995 to 2005, each month shows an upward fluctuation for Concessional Co-Payments, General Co-Payments, and Concessional/Safety Co-Payments. However, General Safety Co-Payments experienced a significant drop in the first few months of 1995, followed by fluctuations until 2005. This period, especially the drop in early 1995 and the fluctuations throughout the rest of the decade, can be considered unusual or noteworthy.
Please note in regards to Lag Plot, the data provided needs to contains more than one time series. Please filter a single time series to use `gg_lag(). In which this data lacks. A Lag Plot is typically used to identify patterns or relationships within a single time series by plotting values at different time lags. However, the provided data contains more than one time series, which is not suitable for this type of analysis. To use the gg_lag() function, a single time series needs to be selected or filtered from the dataset. Since the current data lacks a singular time series, it cannot be used directly for creating a lag plot until a single time series is isolated.
# This generates a time series plot for the us_gasoline dataset.
data("us_gasoline")
us_gasoline %>%
autoplot() +
ggtitle("Autoplot")
# This generates a seasonal decomposition plot for the us_gasoline dataset.
us_gasoline %>%
gg_season() +
ggtitle("Seasonal Decomposition Plot")
# This generates a subseries plot for the us_gasoline dataset.
us_gasoline %>%
gg_subseries()+
ggtitle("Subseries Plot")
# This generates a lag plot for the us_gasoline dataset.
us_gasoline %>%
gg_lag() +
ggtitle("Lag Plot")
# This generates a lag plot for the us_gasoline dataset.
us_gasoline %>%
ACF() %>%
autoplot() +
ggtitle("Autocorrelation Function Plot")
In regards to Barrels from us_gasoline, There is no trend, seasonality, or cyclic behavior in the data. It shows random fluctuations that seem unpredictable, with no clear patterns to support the creation of a forecasting model.From 1995 to 2015, the data shows an increase in fluctuation over time. There is a slight downward trend from 2008 to after 2009, followed by an increase again. These periods of increased fluctuation and the specific downturn from 2008 to after 2009 could be considered unusual or noteworthy years.