knitr::opts_chunk$set(echo = TRUE, cache=TRUE)
setwd(“C:/Users/tangl/Desktop/241225”) ##########################################################
#question1 # Load the necessary library library(fpp2) options(width=50)
?gold ?woolyrnq ?gas
gold woolyrnq gas
autoplot(gold) + ggtitle(“Gold Prices”) autoplot(woolyrnq) + ggtitle(“Wool Yarn Production”) autoplot(gas) + ggtitle(“Gas Production”)
#Part 1b: Find the Frequency of Each Series # Frequency of each series frequency(gold) # Typically 1 (daily data) frequency(woolyrnq) # Quarterly data, frequency = 4 frequency(gas) # Monthly data, frequency = 12
#Part 1c: Use which.max() to Find the Outlier in the gold Series # Find the observation with the maximum value (outlier) in the gold series which.max(gold) outlier_index <- which.max(gold) outlier_index gold[770]
outlier_value <- gold[outlier_index] outlier_value
cat(“The outlier is at index:”, outlier_index, “with value:”, outlier_value, “”)
#question2 # Part 2a: Read the Data into R # Read the CSV file into R tute1 <- read.csv(“tute1.csv”, header = TRUE) # View the dataset to understand its structure View(tute1) str(tute1)
#Part 2b: Convert the Data to a Time Series # Convert the data to a time series object # Remove the first column (assuming it contains quarters) mytimeseries <- ts(tute1[, -1], start = 1981, frequency = 4) mytimeseries str(mytimeseries)
#Part 2c: Construct Time Series Plots # Plot the time series with separate facets for each variable autoplot(mytimeseries, facets = TRUE) + ggtitle(“Time Series Plots with Facets”) # Plot the time series without separate facets autoplot(mytimeseries, facets = FALSE) + ggtitle(“Combined Time Series Plot”)
#queastion3 #Part 3a: Read the Data into R # Install and load the readxl package if not already installed # install.packages(“readxl”) library(readxl)
retaildata <- read_excel(“retail.xlsx”, skip = 1) # View the dataset structure View(retaildata) str(retaildata)
#Part 3b: Select a Time Series # Replace “A3349873A” with the actual column name you wish to analyze: # Convert the selected column into a time series object myts <- ts(retaildata[,“A3349873A”], frequency=12, start=c(1982,4)) myts
#Part 3c: Explore the Retail Time Series # Use the provided visualization functions: # Plot the time series autoplot(myts) + ggtitle(“Time Series Plot”)
ggseasonplot(myts) + ggtitle(“Seasonal Plot”)
ggsubseriesplot(myts) + ggtitle(“Subseries Plot”) ##This subseries plot suggests that any forecasting model should account for both the trend (growth over time) and seasonality (repeated monthly patterns). ##Months like November and December might require special attention due to their consistently higher values, possibly indicating a holiday or event-related surge.
gglagplot(myts) + ggtitle(“Lag Plot”) ## High Autocorrelation: The strong relationships across all lags indicate that the time series is suitable for models that leverage lagged values, such as ARIMA or SARIMA. ## Seasonality: The clear pattern in lag 12 confirms the presence of seasonality, which should be explicitly modeled in forecasting. ## Trend and Cyclicity: The high correlation across multiple lags also suggests the presence of an underlying trend or cyclic component.
ggAcf(myts) + ggtitle(“Autocorrelation Function (ACF)”) ##Y-axis (ACF Values): The values represent the correlation of the time series with its lagged values, ranging from -1 to 1. ##X-axis (Lags): Indicates the number of lagged time periods. For monthly data, lags represent months (e.g., Lag 12 corresponds to one year). ##Significant Lags: Bars that extend beyond the dashed lines (confidence intervals) represent statistically significant correlations. ##Seasonality: If there are spikes at regular intervals (e.g., Lag 12, 24, 36), it indicates seasonality in the data. ##Trend: A slow decay in the ACF (bars gradually decreasing over many lags) suggests a trend in the data.
##High Autocorrelation at Small Lags: The ACF values are close to 1 for small lags (1-6). This suggests a strong correlation between consecutive observations, which is typical for time series with a trend. ##Gradual Decline: The slow decay of ACF values as the lag increases indicates the presence of a trend in the data. This suggests that the time series is not stationary and may require differencing to remove the trend before modeling. ##Seasonal Pattern: Spikes at Lag 12 and multiples of 12 (e.g., Lag 24) suggest a seasonal cycle. This is consistent with monthly data, where seasonality repeats annually. ##Statistical Significance: Bars that extend beyond the blue dashed lines are statistically significant. The fact that many bars remain significant even at higher lags supports the presence of long-term dependencies in the data.
#question4 #Step 1: Use help() to Find Details About Each Time Series ?bicoal ?chicken ?dole ?usdeaths ?lynx ?goog ?writing ?fancy ?a10 ?h02
#Step 2: Create Time Plots for Each Time Series # Load necessary library library(ggplot2) library(fpp2) # If these series are part of fpp2 package
autoplot(bicoal) + ggtitle(“Bicoal Time Series”) autoplot(chicken) + ggtitle(“Chicken Time Series”) autoplot(dole) + ggtitle(“Dole Time Series”) autoplot(usdeaths) + ggtitle(“US Deaths Time Series”) autoplot(lynx) + ggtitle(“Lynx Time Series”) autoplot(goog) + ggtitle(“Google Stock Prices Time Series”) autoplot(writing) + ggtitle(“Writing Time Series”) autoplot(fancy) + ggtitle(“Fancy Time Series”) autoplot(a10) + ggtitle(“A10 Time Series”) autoplot(h02) + ggtitle(“H02 Time Series”)
#Step 3: Modify the Axis Labels and Title for the goog Plot # Modify the plot for the Google time series autoplot(goog) + ggtitle(“Google Stock Prices”) + xlab(“Time (Years)”) + ylab(“Price (USD)”)
#question5 # Load necessary libraries library(ggplot2) library(fpp2)
?writing
ggseasonplot(writing) + ggtitle(“Seasonal Plot: Writing Time Series”) + ylab(“Values”) + xlab(“Month”)
ggsubseriesplot(writing) + ggtitle(“Subseries Plot: Writing Time Series”) + ylab(“Values”) + xlab(“Month”)
ggseasonplot(fancy) + ggtitle(“Seasonal Plot: Fancy Time Series”) + ylab(“Values”) + xlab(“Month”)
ggsubseriesplot(fancy) + ggtitle(“Subseries Plot: Fancy Time Series”) + ylab(“Values”) + xlab(“Month”)
ggseasonplot(a10) + ggtitle(“Seasonal Plot: A10 Time Series”) + ylab(“Values”) + xlab(“Month”)
ggsubseriesplot(a10) + ggtitle(“Subseries Plot: A10 Time Series”) + ylab(“Values”) + xlab(“Month”)
ggseasonplot(h02) + ggtitle(“Seasonal Plot: H02 Time Series”) + ylab(“Values”) + xlab(“Month”)
ggsubseriesplot(h02) + ggtitle(“Subseries Plot: H02 Time Series”) + ylab(“Values”) + xlab(“Month”)
##Seasonal Patterns: The seasonal plot (ggseasonplot) shows how values vary by month across years. Look for consistent patterns like peaks and troughs at specific months to identify seasonal trends. ###The subseries plot (ggsubseriesplot) groups all values by month, summarizing the average for each month and highlighting seasonal patterns more clearly. ##Unusual Years: Look for years in the seasonal plot where the pattern deviates significantly from others. ###For example, if one year has unusually high or low values during certain months, that may indicate an anomaly or special event.
#question6 # Load necessary libraries library(ggplot2) library(fpp2)
autoplot(hsales) + ggtitle(“Autoplot: Hsales Time Series”) ggseasonplot(hsales) + ggtitle(“Seasonal Plot: Hsales Time Series”) ggsubseriesplot(hsales) + ggtitle(“Subseries Plot: Hsales Time Series”) gglagplot(hsales) + ggtitle(“Lag Plot: Hsales Time Series”) ggAcf(hsales) + ggtitle(“Autocorrelation Function: Hsales Time Series”)
autoplot(usdeaths) + ggtitle(“Autoplot: US Deaths Time Series”) ggseasonplot(usdeaths) + ggtitle(“Seasonal Plot: US Deaths Time Series”) ggsubseriesplot(usdeaths) + ggtitle(“Subseries Plot: US Deaths Time Series”) gglagplot(usdeaths) + ggtitle(“Lag Plot: US Deaths Time Series”) ggAcf(usdeaths) + ggtitle(“Autocorrelation Function: US Deaths Time Series”)
autoplot(bricksq) + ggtitle(“Autoplot: Bricksq Time Series”) ggseasonplot(bricksq) + ggtitle(“Seasonal Plot: Bricksq Time Series”) ggsubseriesplot(bricksq) + ggtitle(“Subseries Plot: Bricksq Time Series”) gglagplot(bricksq) + ggtitle(“Lag Plot: Bricksq Time Series”) ggAcf(bricksq) + ggtitle(“Autocorrelation Function: Bricksq Time Series”)
autoplot(sunspotarea) + ggtitle(“Autoplot: Sunspot Area Time Series”) ggseasonplot(sunspotarea) + ggtitle(“Seasonal Plot: Sunspot Area Time Series”) ggsubseriesplot(sunspotarea) + ggtitle(“Subseries Plot: Sunspot Area Time Series”) gglagplot(sunspotarea) + ggtitle(“Lag Plot: Sunspot Area Time Series”) ggAcf(sunspotarea) + ggtitle(“Autocorrelation Function: Sunspot Area Time Series”)
autoplot(gasoline) + ggtitle(“Autoplot: Gasoline Time Series”) ggseasonplot(gasoline) + ggtitle(“Seasonal Plot: Gasoline Time Series”) ggsubseriesplot(gasoline) + ggtitle(“Subseries Plot: Gasoline Time Series”) gglagplot(gasoline) + ggtitle(“Lag Plot: Gasoline Time Series”) ggAcf(gasoline) + ggtitle(“Autocorrelation Function: Gasoline Time Series”)
##Seasonality: Look for recurring patterns in the seasonal plot (ggseasonplot) and subseries plot (ggsubseriesplot). ###If patterns repeat at regular intervals, this indicates seasonality. ##Cyclicity: Use the autoplot to identify long-term cycles beyond seasonal patterns. ##Trend: Check the autoplot and subseries plot for overall increasing or decreasing trends. ##Autocorrelation: Use the ACF plot (ggAcf) to observe how values at different lags are correlated. Significant spikes suggest meaningful relationships at those lags. ##Lag Structure: The lag plot (gglagplot) helps visualize relationships between consecutive observations and confirms the presence of autocorrelation.
#question7 # Load necessary libraries library(fpp2)
autoplot(arrivals) + ggtitle(“Quarterly International Arrivals to Australia”) + ylab(“Arrivals (in thousands)”) + xlab(“Year”)
ggseasonplot(arrivals[, “Japan”]) + ggtitle(“Seasonal Plot: Arrivals from Japan”) + ylab(“Arrivals (in thousands)”) + xlab(“Quarter”)
ggseasonplot(arrivals[, “NZ”]) + ggtitle(“Seasonal Plot: Arrivals from New Zealand”) + ylab(“Arrivals (in thousands)”) + xlab(“Quarter”)
ggseasonplot(arrivals[, “UK”]) + ggtitle(“Seasonal Plot: Arrivals from the UK”) + ylab(“Arrivals (in thousands)”) + xlab(“Quarter”)
ggseasonplot(arrivals[, “US”]) + ggtitle(“Seasonal Plot: Arrivals from the US”) + ylab(“Arrivals (in thousands)”) + xlab(“Quarter”)
ggsubseriesplot(arrivals[, “Japan”]) + ggtitle(“Subseries Plot: Arrivals from Japan”) + ylab(“Arrivals (in thousands)”) + xlab(“Quarter”)
ggsubseriesplot(arrivals[, “NZ”]) + ggtitle(“Subseries Plot: Arrivals from New Zealand”) + ylab(“Arrivals (in thousands)”) + xlab(“Quarter”)
ggsubseriesplot(arrivals[, “UK”]) + ggtitle(“Subseries Plot: Arrivals from the UK”) + ylab(“Arrivals (in thousands)”) + xlab(“Quarter”)
ggsubseriesplot(arrivals[, “US”]) + ggtitle(“Subseries Plot: Arrivals from the US”) + ylab(“Arrivals (in thousands)”) + xlab(“Quarter”)
##Seasonality: Use the seasonal plots to compare patterns across quarters for each country. Look for peaks and troughs to identify high and low travel seasons. ###For example, increased arrivals in specific quarters may indicate holiday-related travel. ##Trend: The autoplot shows the long-term trend for arrivals from each country. Check for increasing or decreasing trends over the years. ##Unusual Observations: Look for outliers or deviations in specific years or quarters in both the seasonal and subseries plots. ###Significant dips or spikes could indicate events like policy changes, economic shifts, or global crises (e.g., the COVID-19 pandemic).
#question8 ## Observations: ## Time Plot 1 (Daily Temperature of Cow): This shows no clear trend or seasonality. The data appears random or stationary with short-term dependencies. ### ACF Match: Likely Plot C, as it shows a rapid drop-off in autocorrelation, consistent with random/stationary data.
###ACF Match: Likely Plot B, as it shows periodic spikes indicating strong seasonality.
#question9 # Load necessary libraries library(fpp2)
mypigs <- window(pigs, start = 1990)
autoplot(mypigs) + ggtitle(“Monthly Total Number of Pigs Slaughtered (1990-1995)”) + ylab(“Number of Pigs”) + xlab(“Year”)
ggAcf(mypigs) + ggtitle(“ACF of Monthly Total Number of Pigs Slaughtered (1990-1995)”) + ylab(“ACF”)
##Visualize the Data: The autoplot() function gives a time plot to observe trends, seasonality, and potential irregularities in the series. ##Autocorrelation Function (ACF): The ggAcf() function displays the autocorrelation of the series. ## Compare the ACF plot to white noise characteristics: ### White noise typically has no significant autocorrelation at any lag. ### If the ACF shows significant spikes at specific lags, the series is not white noise.
#question10 # Compute daily changes in the Dow Jones Index ddj <- diff(dj)
autoplot(ddj) + ggtitle(“Daily Changes in the Dow Jones Index”) + ylab(“Change”) + xlab(“Day”)
ggAcf(ddj) + ggtitle(“ACF of Daily Changes in the Dow Jones Index”) + ylab(“ACF”)
##Differencing: The diff() function calculates the daily changes in the index by subtracting each value from the next. This helps remove any trends in the data. ##Plot Analysis: Use autoplot() to visualize the daily changes. For white noise, the plot should show random fluctuations with no discernible pattern. ##ACF Analysis: Use ggAcf() to examine the autocorrelation of the changes. White noise should have no significant autocorrelations (values close to 0) at any lag.
##Significance of Autocorrelation: Most of the spikes in the ACF plot are within the confidence intervals (blue dashed lines), suggesting that the series resembles white noise. ###A few minor spikes exceed the bounds, but they are not concentrated at lower lags, which indicates weak or negligible autocorrelation. ##White Noise Characteristics: White noise exhibits no significant correlation at any lag. This plot largely aligns with that behavior, with most lags showing values near zero. ##Deviations: Some small deviations may exist due to market factors or randomness in the data but are not strong enough to indicate a systematic pattern or trend. ##Conclusion: The daily changes in the Dow Jones Index exhibit characteristics similar to white noise. ###Any forecasting model based on these daily changes would primarily rely on randomness, as there is little structure or predictability in the series.