Questions / Answers

  1. First regress Unemp on Income, then add Educ to the model as a second predictor.
  1. Using the results write out the prediction equation for the Model 2 regression (be specific). Predicted Unemployment = 12.75+2.706e-05(Income)-1.057e-01(Education)
  2. What is the proportion of variability in Unemp that is accounted for by (only) Educ? R=0.3704 R2=.608605. In other words, roughly 60 percent of the variability in Unemp is accounted for by only Educ.
  3. Does the addition of the Educ predictor lead to a significant change in R2? How do you know? The SLR between Unemp and Income produces an R2 of 0.03244 whereas the MR including Educ increases the R2 to 0.1696. Pvalue increased from 0.2205 to 0.01525 as well
  4. What is the correlation between Unemp and Income controlling for Educ? 0.08240113
  5. Which of the two predictores in the two-predictor model should be considered relatively more important? How do you know? In the two-predictor model, Education has a larger t-value and is significant compared to income.
  6. Of the two models, which model should we select as the “best” model? Why is this the “best” model? Model 2 has a higher R2 value meaning that it explains more variability in Unemployment than model one. Therefore, using both Income and Education as predictors (model 2) is a better model than just Income. Model 2 also has a greater F statistic (4.597) and significant p-value (0.01525) compared to model 2 (F=1.542, p-value: 0.2205).
  1. Regress Unemp on Educ, Income and Poor.
  1. Does the model fit the data? How do you know? Yes. F-statistic is significant (pvalue = 0.0003174)
  2. Plot Cook’s D against studentized deleted residuals. DONE
  3. Identify any outliers that exist. Yes. To find influential outliers I looked for an states with Cook’s distance greater than 3 times the mean Cook’s Distance value. Also used a threshold vale of 4 / the number of observations (48) with all states having a Cook’s Distance value above this threshold as an influential outlier. Both methods indicated the same 4 states as outliers: California, Main, Rhode Island and West Virginia.
  4. If any outliers exist determine if they are influential using the F statistic approach (i.e., Dc1 > 50% point from the F distribution) for determining if the case is influential (the proportion to be cut off is 0.5). There are no influential outliers using the F statisti approach.

1. a-c & e

#library(ggm,ppcor,gvmla)
data<- read.csv("F:/Grad School/FALL 2019/EDPS 942_Correlation Methods/Assignmnet 2/data.csv")
fit1 <- lm(Unemp ~ Income, data=data)
summary(fit1)

Call:
lm(formula = Unemp ~ Income, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6896 -0.7229 -0.0954  0.7641  3.1944 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.244e+00  1.353e+00   5.356 2.64e-06 ***
Income      -5.203e-05  4.189e-05  -1.242    0.221    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.258 on 46 degrees of freedom
Multiple R-squared:  0.03244,   Adjusted R-squared:  0.01141 
F-statistic: 1.542 on 1 and 46 DF,  p-value: 0.2205
fit2 <- lm(Unemp ~ Income + Educ, data=data)
summary(fit2)

Call:
lm(formula = Unemp ~ Income + Educ, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.10082 -0.86434  0.01264  0.65182  2.95255 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.275e+01  2.382e+00   5.350 2.85e-06 ***
Income       2.706e-05  4.879e-05   0.555  0.58188    
Educ        -1.057e-01  3.876e-02  -2.727  0.00908 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.178 on 45 degrees of freedom
Multiple R-squared:  0.1696,    Adjusted R-squared:  0.1327 
F-statistic: 4.597 on 2 and 45 DF,  p-value: 0.01525
ppcor::spcor.test(data$Unemp,data$Educ,data$Income)#semi-partial only educ
ppcor::spcor.test(data$Unemp,data$Income,data$Educ)#semi-partial only income

1.d

#Partial controling for Education
ppcor::pcor.test(data$Unemp,data$Income,data$Educ)
#alternative version
reg1 <- lm(Unemp~Educ,data=data)   # run linear regression
resid1 <- resid(reg1)     # find the residuals - Unemp free of Educ
reg2 <- lm(Income ~ Educ,data=data)   # second regression
resid2 <- resid(reg2)     # second set of residuals - Income free of Educ
cor(resid1, resid2)       # correlation of residuals - partial correlation
[1] 0.08240113

2.a

fit2.1<- lm(Unemp ~ Educ+Income+Poor, data=data)
summary(fit2.1)

Call:
lm(formula = Unemp ~ Educ + Income + Poor, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.26480 -0.69444 -0.07819  0.50527  2.88918 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  2.763e+00  3.630e+00   0.761  0.45056   
Educ        -4.125e-02  3.967e-02  -1.040  0.30412   
Income       1.014e-04  4.902e-05   2.070  0.04439 * 
Poor         2.045e-01  6.001e-02   3.408  0.00141 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.06 on 44 degrees of freedom
Multiple R-squared:  0.343, Adjusted R-squared:  0.2982 
F-statistic: 7.658 on 3 and 44 DF,  p-value: 0.0003174

2.b-c

residfit2.1<-resid(fit2.1)
plot(cooks.distance(fit2.1),rstudent(fit2.1))+
abline(v= 4/nrow(data), col="red")+
abline(v=0.072,col="red",lty=2)
integer(0)
  abline(v=.801)

#finding outliers
data %>% mutate(cooks=cooks.distance(fit2.1))%>% group_by() %>% summarise(mean=mean(cooks))
data %>% mutate(cooks=cooks.distance(fit2.1))%>% filter(cooks>=3*0.024)#Dashed line indicating states with Cook's value 3 times greater than mean Cook's value. 
data %>% mutate(cooks=cooks.distance(fit2.1))%>% filter(cooks>=4/nrow(data))#Solid line indicating states over a threshold value of 4 / number of observations. 
#both methods indicate the same outlier states. California, Main, Rhode Island and West Virginia 

2.d

# Excel function  =FINV(0.5,3,44) = 0.801 providing the DCi >50% OR
qf(.5,3,44)
[1] 0.8010265
data %>% mutate(cooks=cooks.distance(fit2.1))%>% filter(cooks>=.801)#no influential outliers
gvmodel<-gvlma(fit2.1)
summary(gvmodel)

Call:
lm(formula = Unemp ~ Educ + Income + Poor, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.26480 -0.69444 -0.07819  0.50527  2.88918 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  2.763e+00  3.630e+00   0.761  0.45056   
Educ        -4.125e-02  3.967e-02  -1.040  0.30412   
Income       1.014e-04  4.902e-05   2.070  0.04439 * 
Poor         2.045e-01  6.001e-02   3.408  0.00141 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.06 on 44 degrees of freedom
Multiple R-squared:  0.343, Adjusted R-squared:  0.2982 
F-statistic: 7.658 on 3 and 44 DF,  p-value: 0.0003174


ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05 

Call:
 gvlma(x = fit2.1) 
influencePlot(fit2.1)

LS0tDQp0aXRsZTogIkVEUFMtOTQyIEFzc2lnbm1lbnQgMiINCmF1dGhvcjogIlF1aW50aW4gRGVhbiINCmRhdGU6ICJOb3ZlbWJlciAxOSwgMjAxOSINCm91dHB1dDoNCiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdA0KICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQNCiAgaHRtbF9kb2N1bWVudDogZGVmYXVsdA0KICBoZWFkZXItaW5jbHVkZXM6DQogICAgLSBcdXNlcGFja2FnZXtzZXRzcGFjZX1cZG91Ymxlc3BhY2luZw0KLS0tDQoNCiNRdWVzdGlvbnMgLyBBbnN3ZXJzDQoxLiBGaXJzdCByZWdyZXNzICpVbmVtcCogb24gKkluY29tZSosIHRoZW4gYWRkICpFZHVjKiB0byB0aGUgbW9kZWwgYXMgYSBzZWNvbmQgcHJlZGljdG9yLiANCiAgYS4gVXNpbmcgdGhlIHJlc3VsdHMgd3JpdGUgb3V0IHRoZSBwcmVkaWN0aW9uIGVxdWF0aW9uIGZvciB0aGUgTW9kZWwgMiByZWdyZXNzaW9uIChiZSBzcGVjaWZpYykuDQogICAgKipQcmVkaWN0ZWQgVW5lbXBsb3ltZW50ID0gMTIuNzUrMi43MDZlLTA1KEluY29tZSktMS4wNTdlLTAxKEVkdWNhdGlvbikqKg0KICBiLiBXaGF0IGlzIHRoZSBwcm9wb3J0aW9uIG9mIHZhcmlhYmlsaXR5IGluICpVbmVtcCogdGhhdCBpcyBhY2NvdW50ZWQgZm9yIGJ5IChvbmx5KSAqRWR1Yyo/DQogICAgKipSPTAuMzcwNCBSXjJePS42MDg2MDUuIEluIG90aGVyIHdvcmRzLCByb3VnaGx5IDYwIHBlcmNlbnQgb2YgdGhlIHZhcmlhYmlsaXR5IGluICpVbmVtcCogaXMgYWNjb3VudGVkIGZvciBieSBvbmx5ICpFZHVjKi4gKioNCiAgYy4gRG9lcyB0aGUgYWRkaXRpb24gb2YgdGhlICpFZHVjKiBwcmVkaWN0b3IgbGVhZCB0byBhIHNpZ25pZmljYW50IGNoYW5nZSBpbiBSXjJePyBIb3cgZG8geW91IGtub3c/IA0KICAgICoqVGhlIFNMUiBiZXR3ZWVuICpVbmVtcCogYW5kICpJbmNvbWUqIHByb2R1Y2VzIGFuIFJeMl4gb2YgMC4wMzI0NCB3aGVyZWFzIHRoZSBNUiBpbmNsdWRpbmcgKkVkdWMqIGluY3JlYXNlcyB0aGUgIFJeMl4gdG8gIDAuMTY5Ni4gUHZhbHVlIGluY3JlYXNlZCBmcm9tIDAuMjIwNSB0byAwLjAxNTI1IGFzIHdlbGwqKg0KICBkLiBXaGF0IGlzIHRoZSBjb3JyZWxhdGlvbiBiZXR3ZWVuICpVbmVtcCogYW5kICpJbmNvbWUqIGNvbnRyb2xsaW5nIGZvciAqRWR1Yyo/DQogICAgKiowLjA4MjQwMTEzICoqDQogIGUuIFdoaWNoIG9mIHRoZSB0d28gcHJlZGljdG9yZXMgaW4gdGhlIHR3by1wcmVkaWN0b3IgbW9kZWwgc2hvdWxkIGJlIGNvbnNpZGVyZWQgcmVsYXRpdmVseSBtb3JlIGltcG9ydGFudD8gSG93IGRvIHlvdSBrbm93Pw0KICAgICoqSW4gdGhlIHR3by1wcmVkaWN0b3IgbW9kZWwsIEVkdWNhdGlvbiBoYXMgYSBsYXJnZXIgdC12YWx1ZSBhbmQgaXMgc2lnbmlmaWNhbnQgY29tcGFyZWQgdG8gaW5jb21lLiAgICoqDQogIGYuIE9mIHRoZSB0d28gbW9kZWxzLCB3aGljaCBtb2RlbCBzaG91bGQgd2Ugc2VsZWN0IGFzIHRoZSAiYmVzdCIgbW9kZWw/IFdoeSBpcyB0aGlzIHRoZSAiYmVzdCIgbW9kZWw/DQogICAgKipNb2RlbCAyIGhhcyBhIGhpZ2hlciBSXjJeIHZhbHVlIG1lYW5pbmcgdGhhdCBpdCBleHBsYWlucyBtb3JlIHZhcmlhYmlsaXR5IGluIFVuZW1wbG95bWVudCB0aGFuIG1vZGVsIG9uZS4gVGhlcmVmb3JlLCB1c2luZyBib3RoIEluY29tZSBhbmQgRWR1Y2F0aW9uIGFzIHByZWRpY3RvcnMgKG1vZGVsIDIpIGlzIGEgYmV0dGVyIG1vZGVsIHRoYW4ganVzdCBJbmNvbWUuIE1vZGVsIDIgYWxzbyBoYXMgYSBncmVhdGVyIEYgc3RhdGlzdGljICg0LjU5NykgYW5kIHNpZ25pZmljYW50IHAtdmFsdWUgKDAuMDE1MjUpIGNvbXBhcmVkIHRvIG1vZGVsIDIgKEY9MS41NDIsIHAtdmFsdWU6IDAuMjIwNSkuICoqDQogICAgDQoyLiBSZWdyZXNzICpVbmVtcCogb24gKkVkdWMqLCAqSW5jb21lKiBhbmQgKlBvb3IqLg0KICBhLiBEb2VzIHRoZSBtb2RlbCBmaXQgdGhlIGRhdGE/IEhvdyBkbyB5b3Uga25vdz8NCiAgICAqKlllcy4gRi1zdGF0aXN0aWMgaXMgc2lnbmlmaWNhbnQgKHB2YWx1ZSA9IDAuMDAwMzE3NCkqKg0KICBiLiBQbG90IENvb2sncyBEIGFnYWluc3Qgc3R1ZGVudGl6ZWQgZGVsZXRlZCByZXNpZHVhbHMuICoqRE9ORSoqDQogIGMuIElkZW50aWZ5IGFueSBvdXRsaWVycyB0aGF0IGV4aXN0Lg0KICAgICoqWWVzLiBUbyBmaW5kIGluZmx1ZW50aWFsIG91dGxpZXJzIEkgbG9va2VkIGZvciBhbiBzdGF0ZXMgd2l0aCBDb29rJ3MgZGlzdGFuY2UgZ3JlYXRlciB0aGFuIDMgdGltZXMgdGhlIG1lYW4gQ29vaydzIERpc3RhbmNlIHZhbHVlLiBBbHNvIHVzZWQgYSB0aHJlc2hvbGQgdmFsZSBvZiA0IC8gdGhlIG51bWJlciBvZiBvYnNlcnZhdGlvbnMgKDQ4KSB3aXRoIGFsbCBzdGF0ZXMgaGF2aW5nIGEgQ29vaydzIERpc3RhbmNlIHZhbHVlIGFib3ZlIHRoaXMgdGhyZXNob2xkIGFzIGFuIGluZmx1ZW50aWFsIG91dGxpZXIuIEJvdGggbWV0aG9kcyBpbmRpY2F0ZWQgdGhlIHNhbWUgNCBzdGF0ZXMgYXMgb3V0bGllcnM6ICpDYWxpZm9ybmlhLCBNYWluLCBSaG9kZSBJc2xhbmQqIGFuZCAqV2VzdCBWaXJnaW5pYSouKioNCiAgZC4gSWYgYW55IG91dGxpZXJzIGV4aXN0IGRldGVybWluZSBpZiB0aGV5IGFyZSBpbmZsdWVudGlhbCB1c2luZyB0aGUgRiBzdGF0aXN0aWMgYXBwcm9hY2ggKGkuZS4sIERjMSA+IDUwJSBwb2ludCBmcm9tIHRoZSBGIGRpc3RyaWJ1dGlvbikgZm9yIGRldGVybWluaW5nIGlmIHRoZSBjYXNlIGlzIGluZmx1ZW50aWFsICh0aGUgcHJvcG9ydGlvbiB0byBiZSBjdXQgb2ZmIGlzIDAuNSkuIA0KICAgKipUaGVyZSBhcmUgbm8gaW5mbHVlbnRpYWwgb3V0bGllcnMgdXNpbmcgdGhlIEYgc3RhdGlzdGkgYXBwcm9hY2guICoqDQoNCioqMS4gYS1jICYgZSoqDQpgYGB7cn0NCiNsaWJyYXJ5KGdnbSxwcGNvcixndm1sYSkNCmRhdGE8LSByZWFkLmNzdigiRjovR3JhZCBTY2hvb2wvRkFMTCAyMDE5L0VEUFMgOTQyX0NvcnJlbGF0aW9uIE1ldGhvZHMvQXNzaWdubW5ldCAyL2RhdGEuY3N2IikNCmZpdDEgPC0gbG0oVW5lbXAgfiBJbmNvbWUsIGRhdGE9ZGF0YSkNCnN1bW1hcnkoZml0MSkNCg0KZml0MiA8LSBsbShVbmVtcCB+IEluY29tZSArIEVkdWMsIGRhdGE9ZGF0YSkNCnN1bW1hcnkoZml0MikNCg0KcHBjb3I6OnNwY29yLnRlc3QoZGF0YSRVbmVtcCxkYXRhJEVkdWMsZGF0YSRJbmNvbWUpI3NlbWktcGFydGlhbCBvbmx5IGVkdWMNCnBwY29yOjpzcGNvci50ZXN0KGRhdGEkVW5lbXAsZGF0YSRJbmNvbWUsZGF0YSRFZHVjKSNzZW1pLXBhcnRpYWwgb25seSBpbmNvbWUNCg0KDQpgYGANCioqMS5kKioNCmBgYHtyfQ0KI1BhcnRpYWwgY29udHJvbGluZyBmb3IgRWR1Y2F0aW9uDQpwcGNvcjo6cGNvci50ZXN0KGRhdGEkVW5lbXAsZGF0YSRJbmNvbWUsZGF0YSRFZHVjKQ0KI2FsdGVybmF0aXZlIHZlcnNpb24NCnJlZzEgPC0gbG0oVW5lbXB+RWR1YyxkYXRhPWRhdGEpICAgIyBydW4gbGluZWFyIHJlZ3Jlc3Npb24NCnJlc2lkMSA8LSByZXNpZChyZWcxKSAgICAgIyBmaW5kIHRoZSByZXNpZHVhbHMgLSBVbmVtcCBmcmVlIG9mIEVkdWMNCnJlZzIgPC0gbG0oSW5jb21lIH4gRWR1YyxkYXRhPWRhdGEpICAgIyBzZWNvbmQgcmVncmVzc2lvbg0KcmVzaWQyIDwtIHJlc2lkKHJlZzIpICAgICAjIHNlY29uZCBzZXQgb2YgcmVzaWR1YWxzIC0gSW5jb21lIGZyZWUgb2YgRWR1Yw0KY29yKHJlc2lkMSwgcmVzaWQyKSAgICAgICAjIGNvcnJlbGF0aW9uIG9mIHJlc2lkdWFscyAtIHBhcnRpYWwgY29ycmVsYXRpb24NCg0KYGBgDQoqKjIuYSoqIA0KYGBge3J9DQpmaXQyLjE8LSBsbShVbmVtcCB+IEVkdWMrSW5jb21lK1Bvb3IsIGRhdGE9ZGF0YSkNCnN1bW1hcnkoZml0Mi4xKQ0KYGBgDQoqKjIuYi1jKioNCmBgYHtyfQ0KcmVzaWRmaXQyLjE8LXJlc2lkKGZpdDIuMSkNCnBsb3QoY29va3MuZGlzdGFuY2UoZml0Mi4xKSxyc3R1ZGVudChmaXQyLjEpKSsNCmFibGluZSh2PSA0L25yb3coZGF0YSksIGNvbD0icmVkIikrDQphYmxpbmUodj0wLjA3Mixjb2w9InJlZCIsbHR5PTIpDQogIGFibGluZSh2PS44MDEpDQoNCiNmaW5kaW5nIG91dGxpZXJzDQpkYXRhICU+JSBtdXRhdGUoY29va3M9Y29va3MuZGlzdGFuY2UoZml0Mi4xKSklPiUgZ3JvdXBfYnkoKSAlPiUgc3VtbWFyaXNlKG1lYW49bWVhbihjb29rcykpDQpkYXRhICU+JSBtdXRhdGUoY29va3M9Y29va3MuZGlzdGFuY2UoZml0Mi4xKSklPiUgZmlsdGVyKGNvb2tzPj0zKjAuMDI0KSNEYXNoZWQgbGluZSBpbmRpY2F0aW5nIHN0YXRlcyB3aXRoIENvb2sncyB2YWx1ZSAzIHRpbWVzIGdyZWF0ZXIgdGhhbiBtZWFuIENvb2sncyB2YWx1ZS4gDQpkYXRhICU+JSBtdXRhdGUoY29va3M9Y29va3MuZGlzdGFuY2UoZml0Mi4xKSklPiUgZmlsdGVyKGNvb2tzPj00L25yb3coZGF0YSkpI1NvbGlkIGxpbmUgaW5kaWNhdGluZyBzdGF0ZXMgb3ZlciBhIHRocmVzaG9sZCB2YWx1ZSBvZiA0IC8gbnVtYmVyIG9mIG9ic2VydmF0aW9ucy4gDQojYm90aCBtZXRob2RzIGluZGljYXRlIHRoZSBzYW1lIG91dGxpZXIgc3RhdGVzLiBDYWxpZm9ybmlhLCBNYWluLCBSaG9kZSBJc2xhbmQgYW5kIFdlc3QgVmlyZ2luaWEgDQpgYGANCg0KDQoqKjIuZCoqDQpgYGB7cn0NCiMgRXhjZWwgZnVuY3Rpb24gID1GSU5WKDAuNSwzLDQ0KSA9IDAuODAxIHByb3ZpZGluZyB0aGUgRENpID41MCUgT1INCnFmKC41LDMsNDQpDQoNCmRhdGEgJT4lIG11dGF0ZShjb29rcz1jb29rcy5kaXN0YW5jZShmaXQyLjEpKSU+JSBmaWx0ZXIoY29va3M+PS44MDEpI25vIGluZmx1ZW50aWFsIG91dGxpZXJzDQoNCmd2bW9kZWw8LWd2bG1hKGZpdDIuMSkNCnN1bW1hcnkoZ3Ztb2RlbCkNCmluZmx1ZW5jZVBsb3QoZml0Mi4xKQ0KYGBgDQoNCiAgIA==