response_like_column <- “unemployment_rate”
library(tsibble)
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(ggplot2)
unemployment_rate <- read.csv('./Downloads/students_dropout_and_academic_success.csv')
page_views <- read.csv('./Downloads/pageviews.csv')
page_views$date <- as.Date(page_views$Date)
# Subset the data
unemployment_rate_subset <- unemployment_rate[1:nrow(page_views), ]
# Create tsibble
timeseries <- tsibble(
Date = page_views$date,
Unemployment_rate = unemployment_rate_subset$Unemployment_rate,
PageViews = page_views$PageViews
)
## Using `Date` as index variable.
# Visualize
ggplot(timeseries, aes(x = Date, y = Unemployment_rate)) +
geom_line() +
labs(title = "Time Series Plot of Unemployment Rate",
x = "Date",
y = "Unemployment Rate")
Over the years most of the unemployment rate lies between 10.0 to 12.5
timeseries$Date <- as.Date(timeseries$Date)
# Fit linear regression model
lm_model <- lm(Unemployment_rate ~ Date, data = timeseries)
# Print summary
summary(lm_model)
##
## Call:
## lm(formula = Unemployment_rate ~ Date, data = timeseries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1321 -2.2622 -0.5187 2.2079 4.6485
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.047e+01 1.133e+00 9.237 <2e-16 ***
## Date 6.474e-05 6.242e-05 1.037 0.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.667 on 2796 degrees of freedom
## Multiple R-squared: 0.0003846, Adjusted R-squared: 2.711e-05
## F-statistic: 1.076 on 1 and 2796 DF, p-value: 0.2997
The linear regression analysis of the unemployment rate over time does not show a statistically significant relationship. The model’s coefficients indicate that the intercept is approximately 10.47, which represents the estimated unemployment rate at the starting point of the time series. However, the coefficient for the ‘Date’ variable is 6.474e-05, suggesting a very small and non-significant positive trend over time. The p-value associated with the ‘Date’ variable is 0.3, indicating that this trend is not statistically significant.
The residuals, representing the differences between the observed and predicted values, have a standard error of 2.667. The R-squared value is very close to zero (0.0003846), suggesting that the model does not explain much of the variability in the unemployment rate. The adjusted R-squared, which accounts for the number of predictors in the model, is also close to zero.
In summary, the linear regression analysis does not provide strong evidence of a meaningful trend in the unemployment rate over the given time period. The model’s lack of significance suggests that other factors beyond a simple linear relationship with time may influence the unemployment rate.
library(ggplot2)
# Calculate the 12-month moving average
timeseries$MA12 <- zoo::rollmean(timeseries$Unemployment_rate, k = 12, fill = NA)
# Visualize the original and smoothed data
ggplot(timeseries, aes(x = Date)) +
geom_line(aes(y = Unemployment_rate), color = "blue", linewidth = 1, alpha = 0.7) +
geom_line(aes(y = MA12), color = "red", linewidth = 1) +
labs(title = "Unemployment Rate with 12-Month Moving Average",
x = "Date",
y = "Unemployment Rate") +
theme_minimal()
## Warning: Removed 11 rows containing missing values (`geom_line()`).
timeseries$Date <- as.Date(timeseries$Date)
# Create a time series object
ts_data <- ts(timeseries$Unemployment_rate, frequency = 12)
# Plot ACF
acf(ts_data)
# Plot PACF
pacf(ts_data)
For ACF, there is no significant autocorrelation between the observations at different lags. Each autocorrelation value is close to zero, indicating that the observations in your time series are not correlated with each other at different time lags.