library(forecast)
## Warning: package 'forecast' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(AER)
## Warning: package 'AER' was built under R version 4.3.2
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
library(seasonal)
library(tseries)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(dynlm)
library(ggfortify)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Registered S3 methods overwritten by 'ggfortify':
##   method                 from    
##   autoplot.Arima         forecast
##   autoplot.acf           forecast
##   autoplot.ar            forecast
##   autoplot.bats          forecast
##   autoplot.decomposed.ts forecast
##   autoplot.ets           forecast
##   autoplot.forecast      forecast
##   autoplot.stl           forecast
##   autoplot.ts            forecast
##   fitted.ar              forecast
##   fortify.ts             forecast
##   residuals.ar           forecast
library(ggplot2)
library(seasonal)

Part 1)

data(GermanUnemployment)
head(GermanUnemployment) 
##         unadjusted adjusted
## 1962 Q1        1.1      0.6
## 1962 Q2        0.5      0.7
## 1962 Q3        0.4      0.7
## 1962 Q4        0.7      0.8
## 1963 Q1        1.6      0.9
## 1963 Q2        0.6      0.8
#the data function calls out the data we will utilize, the head function gives a short preview.

The GermanUnemployment data set spans from 1962 to 1991 with quarterly observations per year. The variables are unadjusted for the raw unemployment rate, and the adjusted for the seasonally adjusted employment rate in Germany. Because we will be conducting seasonal analysis anyway, we will analyze the unadjusted variable in this project.

Part 2)

A)

is.ts(GermanUnemployment)
## [1] TRUE
#checks if the data set is a time series model
unemployment <- GermanUnemployment[, "unadjusted"]
#now we can just call the unemployment object instead of recalling the data set for the variable.
plot(unemployment, main = "Time series plot of unemployment")

#plot function graphs a plot of the relationship between time and unemployment. #the main argument is what we name the plot. 

B)

Based on the plot, the unemployment’s mean does not seem to be stationary, since around 1970 and 1980, there is a big upward trend. The variance looks pretty starionary, as the seasonality within the model seems consistent.

C)

ggtsdisplay(unemployment)

#plots the time series plot, as well as the corresponding ACF and PACF

The ACF shows that all of the lags of the model exhibit autocorrelation. Because the ACF does not filter the previous lags when calculating each of the autocorrelation values like the PACF does, the PACF is more useful for determining the lag order. For the PACF, we have the first lag order clearly passing the threshold for autocorrelation, as well as the third, fourth, fifth, and ninth lags also exhibiting autocorrelation.

D)

 time_index <- c(1:length(unemployment)) 

ggplot(unemployment, aes(x=time_index, y=unemployment)) +
  geom_point() +  # Scatter plot of the original data
  geom_smooth(method="lm", formula = y ~ x, se=FALSE, color="red") +  
#Linear regression line in red
  geom_smooth(method="lm", formula = y ~ poly(x, 2), se=FALSE, color="blue") +   #Quadratic regression line in blue
  annotate("text", x = Inf, y = Inf, label = "Linear Fit", hjust = 1.1, vjust = 2, color = "red", 
           size = 4) +
  annotate("text", x = Inf, y = Inf, label = "Quadratic Fit", hjust = 1.1, vjust = 1, color = "blue", 
           size = 4) +
  theme_minimal() +  # Minimal theme for the plot
  labs(title="Linear and Quadratic Regression of Unemployment over Time",
       x="Time Index",
       y="Unemployment")

#the annotate function allows details for the plot, and labs is short for labels. 

The blue fit (linear) and red fit (quadratic) almost overlap each other because they are so similar (The plot looks like an upwards sin/cos graph would be the most appropriate).

E)

linear_model <- lm(unemployment ~ time_index)
# Fitting a linear model, with time as the explanatory variable 
#and unemployment as the dependent variable. 

quad_model <- lm(unemployment ~ time_index + I(time_index^2))
# Fitting a quadratic model,  

linear_fitted <- fitted(linear_model)
linear_residuals <- residuals(linear_model)


quad_fitted <- fitted(quad_model)
quad_residuals <- residuals(quad_model)
#created an object for the quadratic model's estimated plot and its residuals, or the y differences. 


plot(linear_fitted, linear_residuals, xlab = "Fitted Values", ylab = "Residuals", main = "Linear Model 
     Residuals vs. Fitted Values")
abline(h = 0, col = "red")

# Plotting Residuals vs Fitted Values for Linear Model


plot(quad_fitted, quad_residuals, xlab = "Fitted Values", ylab = "Residuals", main = "Quadratic Model 
     Residuals vs. Fitted Values")
abline(h = 0, col = "red")

# Plotting Residuals vs Fitted Values for Quadratic Model

Again, the linear and quadratic residuals visually almost look identical, since their fits were almost the same too. The residuals, which are the y differences between the y values and the y fitted, are from -3 to 4, and they demonstrate a sin-like cycle, where the highest absolute value of the residuals are also the peak of the cycle. The residuals also seem to increase as the time increases.

F)

hist(linear_residuals, main = "Histogram of Residuals (Linear Model)", xlab = "Residuals", col = "blue",
     border = "black", probability = TRUE)
lines(density(linear_residuals))

# Histogram of Residuals for the Linear Model using density/relativity as the y-value

hist(quad_residuals, main = "Histogram of Residuals (Quadratic Model)", xlab = "Residuals", col = "green", border = "black", probability = TRUE)
lines(density(quad_residuals))

# Histogram of Residuals for the Quadratic Model using density/relativity as the  y-value

#The color and border arguments act as aesthetic arguments, xlab is short for x-labels, and probability = TRUE makes it a density histogram. The lines function fits the best curve over the histogram. 

Both of the residuals for the linear model and quadratic model’s histogram have a slightly postiive kurtosis (heavier tails towards the middle), where the majority of the differences (over 70% of the data) are around -2 to 2. A normal distribution of the residuals suggests that the fit of the model is efficient and unbiased.

G)

summary(linear_model)
## 
## Call:
## lm(formula = unemployment ~ time_index)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1029 -1.2027 -0.0121  0.9091  3.7435 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.717521   0.273643  -2.622  0.00989 ** 
## time_index   0.083223   0.003925  21.203  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.489 on 118 degrees of freedom
## Multiple R-squared:  0.7921, Adjusted R-squared:  0.7903 
## F-statistic: 449.5 on 1 and 118 DF,  p-value: < 2.2e-16
summary(quad_model)
## 
## Call:
## lm(formula = unemployment ~ time_index + I(time_index^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2577 -1.1488 -0.0199  0.9314  3.7876 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -5.366e-01  4.160e-01  -1.290    0.200    
## time_index       7.433e-02  1.587e-02   4.683 7.66e-06 ***
## I(time_index^2)  7.352e-05  1.271e-04   0.579    0.564    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.494 on 117 degrees of freedom
## Multiple R-squared:  0.7927, Adjusted R-squared:  0.7891 
## F-statistic: 223.7 on 2 and 117 DF,  p-value: < 2.2e-16
#summarizes the data from the linear_model including residuals, coefficients, and significance levels.

The R^2 for the models are very similar, both having around an 80% goodness of fit. Both models also have significant overall p-values, but the p-values for the intercept and exponentiated time variable are very high, although that is not as relevant because we are more concerned with how well the model fits the data overall. The F-statistic for the quadratic model is smaller because it takes on two variables (time and time^2), so it is compensated by the degrees of freedom.

H)

BIC(linear_model, quad_model)
##              df      BIC
## linear_model  3 448.5079
## quad_model    4 452.9525
AIC(linear_model, quad_model)
##              df      AIC
## linear_model  3 440.1454
## quad_model    4 441.8025
#BIC and AIC are used to find the model of best fit

Both the AIC and BIC shows that the linear model is a very marginally a slightly better fit than the quadratic model. For the sake of simplicity, we weill choose our linear model for forecasting.

I)

t <- seq(1962, 1991, length = length(unemployment))
tn=data.frame(t=seq(1982,1994))
pred=predict(lm(unemployment ~ t), tn, se.fit = TRUE)
pred.plim = predict(lm(unemployment ~ t),tn, level =0.95, interval="prediction")
pred.clim = predict(lm(unemployment ~ t), tn,level=0.95, interval="confidence")
matplot(tn$t,cbind(pred.clim, pred.plim[,-1]),
        lty=c(1,1,1,3,3), type="l", lwd=2, ylab="predicted y",xlab="Time")

#We create two objects with the predict function that utilizes our linear regression of unemployment and time, with the 95% confidence interval, one with the prediction argument and one with the confidence argument. The prediction argument has a greater range since the forecasting values involves more uncertainty and needs to account for a wider range of values. Because our time units are in quarters, we have enough data to provide at least 12 units ahead (1991 + 3 years). Based on our observations, we predict that unemployment in Germany will increase in the next three years by about 3-6%. 

Part 2)

A)

additive_decomp <- decompose(unemployment, type = c("additive"))
residuals_only <- unemployment - additive_decomp$trend - additive_decomp$seasonal
acf(residuals_only, main = "ACF of only residuals", na.action = na.pass)

pacf(residuals_only, main = "PACF of only residuals", na.action = na.pass)

#plots the ACF and PACF of only residuals specifically for additive from the type argument. The decompose function is used to split the data into the trend, seasonality, and remainder. The residuals_only object is the leftovers of the remainder after removing the trend and seasonal component of the data. The na.action=na.pass argument ignores any NA values that have been formed from our computing. 

Both the ACF and PACF clearly has some autocorrelation left in its lags since many of them pass the dotted threshold, so there is still some noise/dynamics left in the remainder. The autocorrelation values are also decreasing as the lags increase.

B)

multi_decomp <- decompose(unemployment, type = c("multiplicative"))
residuals_only <- unemployment / multi_decomp$trend / multi_decomp$seasonal
acf(residuals_only, main = "ACF of only residuals", na.action = na.pass)

pacf(residuals_only, main = "PACF of only residuals", na.action = na.pass)

#plots the ACF and PACF of only residuals specifically for multiplicative. Again, the type = multiplicative specifies the type of decomposition, and residuals_only obtains the division leftovers after removing the trend and seasonality. 

Similarily to the additive decomposition, both the ACF and PACF clearly has some autocorrelation left in its lags since many of them pass the dotted threshold, especially the ACF, so it must still be fixed so that all of the lags are below the threshold.

C)

autoplot(decompose(unemployment))

#plots the data, remainder, seasonal, and trend, after decomposing the unemployment data. 

Clearly, the seasonal fluctuations do not vary across time (very consistent amplitude and wavelength), so we prefer the additive decomposition.

D)

autoplot(decompose(unemployment), type = c("additive"))

autoplot(decompose(unemployment), type = c("multiplicative"))

#These functions were already used in the previous question, we are just doing this to compare any differences between the additive and multiplicative models. 

Based on the two decompositions, both present consistent seasonality and a gradual increasing trend. The random component, or the remainder, does not seem to exhibit any pattern in either of the model, meaning that neither decomposition consists of cycles. Based on these three factors, as well as the similarities in the ACF & PACF, the additive and multiplicative models are very similar.

E)

decomposition <- decompose(unemployment, type = c("additive"))
autoplot(decomposition$seasonal)

#We created the decomposition object from its function adn then plotted it by extracting the seasonal part of the decomposition. 

There is a clear and consistent fluctuation. The amplitude is around -.3 to .75, and there is about a frequency of 10 per decade, or once per year.

F)

fit1=tslm(unemployment ~ trend + season+0)
fit2=tslm(unemployment ~ trend+0)
fit3=tslm(unemployment ~ season)
#Created three fits, where fit1 has both the trend and season as the explanatory variables, fit2 only using trend, and fit3 only using season. 
AIC(fit1, fit2, fit3)
##      df      AIC
## fit1  6 434.3649
## fit2  2 444.9413
## fit3  5 630.8659
BIC(fit1, fit2, fit3)
##      df      BIC
## fit1  6 451.0898
## fit2  2 450.5163
## fit3  5 644.8034
#We use the goodness of fit determinators to decide which model is the most ideal. 

The AIC shows that fit1 is preferred, while the BIC slightly prefers fit2. We will use fit1 for forecasting because it utilizes both trend and seasonality

plot(forecast(fit1, level = c(50, 80, 95), h = 12), main = "Forecast of seasonality and trend")

#The forecast function is similar to the predict function with the prediction argument, which forecasts 12 units ahead, or 3 years ahead. There is once again a clear upwards trend based on our forecasting, meaning that the unemployment in Germany will increase about 6-12%. 

Part 3)

For Modeling and Forecasting, we used the linear and quadratic functional forms to regress our time explanatory variable onto our unemployment data. Based on our time-series plots, residual plots, and goodness of fit indicators, it was evident that those models were very similar in results. We could have potentially improved our findings by trying out a different functional form, perhaps a cubic model, or a trig functional form would have been more accurate to minimize the sum of squared errors.

For Trend and Seasonal Adjustments, we found that the additive decomposition was more appropriate than the multiplicative decomposition, since the seasonality fluctuations were very consistent. However based on the analysis of the remainder of both the decompositions, we concluded that the two models were very similar in their decomposition patterns.

Based on our forecasting for both models, although our two models show different ranges of forecasted values, we predict that unemployment in Germany will overall increase, 3-6% based on our linear model, and 6-12% based on our seasonal + trend decomposed model.

Part 4)

References: Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.