This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Importing libraries and cleaning data

library(nlme)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:nlme':
## 
##     collapse
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(faraway)
## Warning: package 'faraway' was built under R version 4.2.3
## Warning in check_dep_version(): ABI version mismatch: 
## lme4 was built with Matrix ABI version 1
## Current Matrix ABI version is 0
## Please re-install lme4 from source or restore original 'Matrix' package
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
psr <- read.csv("C:/Users/dylz3/Downloads/PSAVERT - PSAVERT.csv")
fsi <- read.csv("C:/Users/dylz3/Downloads/Financial stress - Monthly Financial Stress (1).csv")
smv <- read.csv("C:/Users/dylz3/Downloads/VXOCLS (1).csv")


fsi$date <- as.Date(fsi$date, format = "%m/%d/%Y")

psr$date <- as.Date(psr$date, format = "%m/%d/%Y")
smv$date <- as.Date(smv$observation_date, format = "%Y-%m-%d")

index <- nrow(smv)  
for(i in 2:(index-1)) {  
  if(is.na(smv[i,2])) {
    smv[i,2] <- mean(c(smv[i-1,2], smv[i+1,2]), na.rm = TRUE)  
  }
}


psr_fsi <- inner_join(psr, fsi, by = "date")
model_data <- inner_join(psr_fsi, smv, by = "date")
model_data$ID <- 0
for(i in 1:nrow(model_data)){
  model_data$ID[i] <- i
}

Creating models and testing

fsi_model <- lm(PSAVERT ~ indicator, data = model_data)
summary(fsi_model)
## 
## Call:
## lm(formula = PSAVERT ~ indicator, data = model_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7349 -1.1129  0.0195  1.3667  4.4094 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  24.4817    14.7687   1.658   0.0986 .
## indicator    -0.1826     0.1460  -1.251   0.2122  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.935 on 263 degrees of freedom
## Multiple R-squared:  0.005912,   Adjusted R-squared:  0.002133 
## F-statistic: 1.564 on 1 and 263 DF,  p-value: 0.2122
plot(fsi_model)

vxo_model <- lm(PSAVERT ~ VXOCLS, data = model_data)
summary(vxo_model)
## 
## Call:
## lm(formula = PSAVERT ~ VXOCLS, data = model_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4799 -1.1043 -0.0851  1.5003  4.5818 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.73433    0.32315  17.745   <2e-16 ***
## VXOCLS       0.01369    0.01485   0.922    0.357    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.938 on 263 degrees of freedom
## Multiple R-squared:  0.003222,   Adjusted R-squared:  -0.0005682 
## F-statistic: 0.8501 on 1 and 263 DF,  p-value: 0.3574
plot(vxo_model)

both_model <- lm(PSAVERT ~ VXOCLS + indicator, data = model_data)
summary(both_model)
## 
## Call:
## lm(formula = PSAVERT ~ VXOCLS + indicator, data = model_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.534 -1.140 -0.116  1.298  4.560 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 41.79363   17.26907   2.420   0.0162 *
## VXOCLS       0.03346    0.01753   1.909   0.0574 .
## indicator   -0.36048    0.17261  -2.088   0.0377 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.925 on 262 degrees of freedom
## Multiple R-squared:  0.01954,    Adjusted R-squared:  0.01206 
## F-statistic: 2.611 on 2 and 262 DF,  p-value: 0.07535
plot(both_model)

shapiro.test(residuals(both_model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(both_model)
## W = 0.98915, p-value = 0.0447
vif(both_model)
##    VXOCLS indicator 
##  1.411409  1.411409

Residual vs fitted plot isn’t great. r-value is low. QQplot and shapiro test imply there isn’t normality. Halfnorm plot implies there may be a few outliers. VIF suggests the predictors are not correlated.

Checking outliers

halfnorm(cooks.distance(both_model), nlab = 7)

# partial regression plot for outliers and influential points

# Testing FSI
pred_model <- lm(indicator ~ VXOCLS, data = model_data)
plot(residuals(pred_model), residuals(vxo_model))
text(residuals(pred_model), residuals(vxo_model))

# Testing Stock market vol
pred_model2 <- lm(VXOCLS ~ indicator, data = model_data)
plot(residuals(pred_model2), residuals(fsi_model))
text(residuals(pred_model2), residuals(fsi_model))

# Testing studentized residuals
count = 1
for(i in rstudent(both_model)){
  if(i < qt(.025, 264) | i > qt(.975, 264)){
  print(paste(count, i))
  }
  count = count + 1
}
## [1] "49 1.97541105847782"
## [1] "52 2.03913355398463"
## [1] "53 2.02812185522543"
## [1] "54 2.14314166761037"
## [1] "58 2.39727856989977"
## [1] "166 -2.04151417214631"
## [1] "167 -2.3869667995535"
## [1] "168 -2.00805694135515"
## [1] "169 -2.06820546448736"
## [1] "187 -2.1378006369969"

Results: halfnorm plot reveals a few outliers. Rows 135 and 196 appear to be outliers. Rows 167, 60, 49, 42, and 15 may be outliers. Partial regression plot for FSI reveals that row 196 is most likely an outlier. Row 15, may be an outlier. Partial regression plot for stock market volatility reveals that 195 and 194 are most likely outliers. 219, 197, 220, and 135 may be outliers. Based on studentized residuals and a .05 t test rows 49, 52, 53, 54, 58, 166, 167, 168, 169, and 187 are significant outliers.

Removing outliers

out_model_data <- model_data[-c(15, 49, 52, 53, 54, 58, 166, 167, 168, 169, 187, 135, 196, 42, 60, 195, 194, 219, 197, 220),]
out_model <- lm(PSAVERT ~ VXOCLS + indicator, data = out_model_data)
summary(out_model)
## 
## Call:
## lm(formula = PSAVERT ~ VXOCLS + indicator, data = out_model_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6223 -1.0853 -0.1218  1.1879  3.7930 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 49.59371   19.56207   2.535   0.0119 *
## VXOCLS       0.04243    0.01898   2.236   0.0263 *
## indicator   -0.43969    0.19541  -2.250   0.0253 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.779 on 242 degrees of freedom
## Multiple R-squared:  0.02694,    Adjusted R-squared:  0.0189 
## F-statistic:  3.35 on 2 and 242 DF,  p-value: 0.03673
plot(out_model)

plot(out_model_data$PSAVERT, 49.5937066 + 0.0424310*out_model_data$VXOCLS - 0.4396887*out_model_data$indicator)

Residual plots look much better. QQplot looks better, appears to be short-tailed.

Checking for linearity with partial residual plot

termplot(out_model, partial.resid = TRUE, terms = 1:2, col.res = "black")

Plots don’t appear linear. Perhaps transformation is needed.

Testing transformations

# Log
log_model <- lm(log(PSAVERT) ~ VXOCLS + indicator, data = out_model_data)
summary(log_model)
## 
## Call:
## lm(formula = log(PSAVERT) ~ VXOCLS + indicator, data = out_model_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92863 -0.14745  0.02575  0.23390  0.57363 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 10.799788   3.706997   2.913  0.00391 **
## VXOCLS       0.010165   0.003597   2.826  0.00510 **
## indicator   -0.091652   0.037031  -2.475  0.01401 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.337 on 242 degrees of freedom
## Multiple R-squared:  0.0377, Adjusted R-squared:  0.02975 
## F-statistic: 4.741 on 2 and 242 DF,  p-value: 0.00956
plot(log_model)

# squared model
squared_model <- lm(PSAVERT^2 ~ VXOCLS + indicator, data = out_model_data)
summary(squared_model)
## 
## Call:
## lm(formula = PSAVERT^2 ~ VXOCLS + indicator, data = out_model_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33.23 -15.04  -3.96  12.33  54.10 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 499.8936   235.2666   2.125   0.0346 *
## VXOCLS        0.3880     0.2283   1.700   0.0905 .
## indicator    -4.6345     2.3502  -1.972   0.0497 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.39 on 242 degrees of freedom
## Multiple R-squared:  0.01851,    Adjusted R-squared:  0.0104 
## F-statistic: 2.282 on 2 and 242 DF,  p-value: 0.1043
plot(squared_model)

# square root model 
sqrt_model <- lm(PSAVERT^.5 ~ VXOCLS + indicator, data = out_model_data)
summary(sqrt_model)
## 
## Call:
## lm(formula = PSAVERT^0.5 ~ VXOCLS + indicator, data = out_model_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90021 -0.19784  0.00472  0.26751  0.72928 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 12.230082   4.179482   2.926  0.00376 **
## VXOCLS       0.010266   0.004055   2.532  0.01199 * 
## indicator   -0.099095   0.041751  -2.373  0.01840 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.38 on 242 degrees of freedom
## Multiple R-squared:  0.03214,    Adjusted R-squared:  0.02414 
## F-statistic: 4.018 on 2 and 242 DF,  p-value: 0.0192
plot(sqrt_model)

# Interaction model
int_model <- lm(PSAVERT ~ VXOCLS*indicator, data = out_model_data)
summary(int_model)
## 
## Call:
## lm(formula = PSAVERT ~ VXOCLS * indicator, data = out_model_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7729 -1.0426  0.0307  1.1516  3.7676 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -114.97498   49.59467  -2.318 0.021271 *  
## VXOCLS              7.95279    2.20014   3.615 0.000366 ***
## indicator           1.18665    0.49091   2.417 0.016381 *  
## VXOCLS:indicator   -0.07808    0.02172  -3.596 0.000393 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.736 on 241 degrees of freedom
## Multiple R-squared:  0.07648,    Adjusted R-squared:  0.06498 
## F-statistic: 6.653 on 3 and 241 DF,  p-value: 0.0002469
plot(int_model)

# Polynomial regression on FSI
polyf_model <- lm(PSAVERT ~ VXOCLS + indicator^2, data = out_model_data)
summary(polyf_model)
## 
## Call:
## lm(formula = PSAVERT ~ VXOCLS + indicator^2, data = out_model_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6223 -1.0853 -0.1218  1.1879  3.7930 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 49.59371   19.56207   2.535   0.0119 *
## VXOCLS       0.04243    0.01898   2.236   0.0263 *
## indicator   -0.43969    0.19541  -2.250   0.0253 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.779 on 242 degrees of freedom
## Multiple R-squared:  0.02694,    Adjusted R-squared:  0.0189 
## F-statistic:  3.35 on 2 and 242 DF,  p-value: 0.03673
plot(polyf_model)

# Polynomial regression on stock market volatility
polysmv_model <- lm(PSAVERT ~ VXOCLS^2 + indicator, data = out_model_data)
summary(polysmv_model)
## 
## Call:
## lm(formula = PSAVERT ~ VXOCLS^2 + indicator, data = out_model_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6223 -1.0853 -0.1218  1.1879  3.7930 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 49.59371   19.56207   2.535   0.0119 *
## VXOCLS       0.04243    0.01898   2.236   0.0263 *
## indicator   -0.43969    0.19541  -2.250   0.0253 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.779 on 242 degrees of freedom
## Multiple R-squared:  0.02694,    Adjusted R-squared:  0.0189 
## F-statistic:  3.35 on 2 and 242 DF,  p-value: 0.03673
plot(polysmv_model)

Log: r-value is slightly higher, but residual and QQ plots look worse. Squared: r-value is lower. Residual and QQ plots look worse. Sqrt: r-value is slightly higher, but residual and qq plots look worse. Interaction: r-value is higher, but residual plots prove to have outliers. Polynomial fsi: r-value and plots stayed roughly the same Polynomial smv: r-value and plots stayed roughly the same

Conclusion: Ultimately, the variables do not appear to be fit for linear regression. It appears that a machine learning model may be better for prediction.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.