The code and output for all parts of Question 1 are given below. Responses to the subsections of Question 1 are given below.
The parameter estimates in the full model (e.g. prior to exlucding outliers) imply that 1) a province with no Catholic residents or males involved in agriculture as occupation would be expected to educate 24.6% of its residents beyond primary school. 2) for every one point increase in a province’s percentage of males involved in agriculture as occupation, that province’s would be expected to exhibit a .292 percentage decrease in those educated beyond primary school. 3) for every one point increase in a province’s percentage of Catholic residents, that province would be expected to exhibit a .028 percentage increase in those educated beyond primary school. However, this parameter did not reach significance.
The expected values for the levers and studentized deleted residuals are .064 (e.g. \(PA/n = 3/47\)) and 0, respectively.
The datapoint corresponding to province V. De Geneve is potentially problematic for a number of reasons. For one, this datapoint clearly stands out in the 3d plot above: its education value places it well above all other points, which exist in two poorly-educated clusters (one Catholic and one Protestant). Moreover, a brief foray into Swiss history reveals that the historical canton of Geneva housed the University of Geneva, then by far the largest in the country. Given this fact, it seems likely that, at least educationally, the province of V. De Geneve was quite unlike its 1888 counterparts.
Statistically, this province also appears to be an outlier. That is, its rstudent value is far and away the highest of any in the dataset and remain highly significant after Bonferroni correction (\(p_{corrected} = .00059\)) and its Cook’s D is well above the significant threshold for this measure (\(D_{threshold} = 4/47 = .0851\)) and is at least one order of magnitude greater than 46/47 Cook’s D values in the dataset. Importantly, no other datapoints show consistent evidence of being an outlier. That is, only the V. De Geneve datapoint attains a significant rstudent and only the La Chauxdfnd datapoint is slightly above the Cook’s D threshold. Visually, no datapoint other than that for V. De Geneve is easily distinguishable from the points in the poorly-educated Catholic and Protestant clusters.
The final two models in the output show the regression results for models excluding the V. De Geneve datapoint (or explicitly modeling it as a separate variable).
The parameter estimates in the full model (e.g. prior to exlucding outliers) imply that 1) a province with no Catholic residents or males involved in agriculture as occupation would be expected to educate 20.6% of its residents beyond primary school. 2) for every one point increase in a province’s percentage of males involved in agriculture as occupation, that province’s would be expected to exhibit a .211 percentage decrease in those educated beyond primary school. 3) for every one point increase in a province’s percentage of Catholic residents, that province would be expected to exhibit a .10 percentage increase in those educated beyond primary school.
Importantly, these changes are entirely expected given the demographic characteristics of V. De Geneve (that is, V. De Geneve is a highly educated, low agriculture, overwhelmingly Catholic province). In particular, excluding such a province results in a subset of provinces with a substantially lower mean education level: this explains why the intercept estimate decreases from 24.6 to 20.6. Likewise, the high level of Catholicism in V. De Geneve explains why the “Catholic” slope estimate decreased from .028 to .010. Finally, the low level of agriculture in V. De Geneve causes the negative relationship between agriculture and education to be overestimated in the full model: thus, excluding the province increases the “Agriculture” slope estimate from -.292 to -.211.
###################################################
#build and estimate full model to identify outliers
###################################################
swissTest = lm(Education ~ Agriculture + Catholic, data = swiss)
mcSummary(swissTest)
## Loading required package: car
## Warning: package 'car' was built under R version 3.6.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.6.3
## Registered S3 methods overwritten by 'car':
## method from
## influence.merMod lme4
## cooks.distance.influence.merMod lme4
## dfbeta.influence.merMod lme4
## dfbetas.influence.merMod lme4
## lm(formula = Education ~ Agriculture + Catholic, data = swiss)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 1792.828 2 896.414 0.422 16.032 0
## Error 2460.151 44 55.913
## Corr Total 4252.979 46 92.456
##
## RMSE AdjEtaSq
## 7.477 0.395
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 24.587 2.693 9.132 4662.443 0.655 NA 19.161 30.014 0.000
## Agriculture -0.292 0.053 -5.501 1692.149 0.408 0.839 -0.398 -0.185 0.000
## Catholic 0.028 0.029 0.977 53.406 0.021 0.839 -0.030 0.086 0.334
#plotting
plot3d(swiss$Catholic, swiss$Agriculture, swiss$Education, type="p", col="red", xlab="Catholic", ylab="Agriculture", zlab="Education", site=5, lwd=15)
#V. De Geneve is clearly separated from other provinces --> suggests outlier status
leveragePlots(swissTest)
#retrieve outlier diagnostics
fitted <- fitted(swissTest)
resid <- resid(swissTest)
rstudent <- rstudent(swissTest)
lever <- hatvalues(swissTest)
cookd <- cooks.distance(swissTest) #4/47 = .0851
diagnostics1 <- cbind(swiss[, c(2, 5, 4)], fitted, resid, rstudent, lever, cookd)
diagnostics1[with(diagnostics1, order(-cookd)), ] #print in order of descending cookd
## Agriculture Catholic Education fitted resid
## V. De Geneve 1.2 42.34 53 25.431540 27.5684595
## La Chauxdfnd 7.7 13.79 11 22.731344 -11.7313439
## ValdeTravers 18.7 8.65 7 19.379514 -12.3795145
## Neuchatel 17.6 16.92 32 19.933450 12.0665495
## Franches-Mnt 39.7 93.40 5 15.647649 -10.6476489
## Porrentruy 35.3 90.57 7 16.850574 -9.8505741
## Rive Gauche 27.7 58.33 29 18.156914 10.8430865
## Rive Droite 46.6 50.43 29 12.424132 16.5758677
## Lausanne 19.4 12.11 28 19.273029 8.7269713
## Courtelary 17.0 9.96 12 19.912068 -7.9120677
## Le Locle 16.7 11.22 13 20.035065 -7.0350650
## Lavaux 73.0 2.84 9 3.385425 5.6145747
## Grandson 34.0 3.30 8 14.768172 -6.7681719
## Monthey 64.9 98.22 3 8.436972 -5.4369716
## Moutier 36.5 33.77 7 14.898727 -7.8987272
## Val de Ruz 37.6 4.97 7 13.765756 -6.7657557
## Gruyere 53.3 97.67 7 11.803238 -4.8032378
## Moudon 55.1 4.52 3 8.651243 -5.6512427
## Aigle 62.0 8.52 12 6.752485 5.2475150
## Avenches 60.7 4.43 12 7.016122 4.9838784
## Delemont 45.1 84.84 9 13.831943 -4.8319434
## Entremont 84.9 99.68 6 2.647497 3.3525026
## St Maurice 75.9 99.06 9 5.253804 3.7461956
## Sion 63.1 96.83 13 8.922526 4.0774737
## Oron 71.2 2.40 1 3.897774 -2.8977741
## Paysd'enhaut 63.5 2.56 3 6.147088 -3.1470881
## Rolle 60.8 7.72 10 7.079761 2.9202393
## Veveyse 64.5 98.61 6 8.564584 -2.5645844
## Orbe 54.1 4.20 6 8.933750 -2.9337500
## Morges 59.8 5.23 10 7.301064 2.6989355
## Neuveville 43.5 5.16 15 12.051072 2.9489280
## Aubonne 67.5 2.27 7 4.972778 2.0272217
## Echallens 72.6 24.20 2 4.104484 -2.1044835
## Yverdon 49.5 6.10 8 10.328388 -2.3283883
## Martigwy 78.2 98.96 6 4.580459 1.4195411
## Vevey 26.8 18.46 19 17.294785 1.7052150
## Nyone 50.9 15.14 12 10.175210 1.8247899
## Boudry 38.4 5.62 12 13.550862 -1.5508625
## Sarine 45.2 91.38 13 13.987247 -0.9872466
## Herens 89.7 100.00 2 1.257166 0.7428338
## Cossonay 69.3 2.82 5 4.463532 0.5364680
## Conthey 85.9 99.71 2 2.356811 -0.3568109
## Glane 67.8 97.16 8 7.561630 0.4383696
## Sierre 84.6 99.46 3 2.728752 0.2712478
## Broye 70.2 92.85 7 6.740391 0.2596087
## La Vallee 15.2 2.15 20 20.216550 -0.2165504
## Payerne 58.1 5.23 8 7.796670 0.2033301
## rstudent lever cookd
## V. De Geneve 4.93431543 0.14546339 9.025827e-01
## La Chauxdfnd -1.68749786 0.09933278 1.004683e-01
## ValdeTravers -1.75247710 0.06551995 6.855046e-02
## Neuchatel 1.70691150 0.06734022 6.719916e-02
## Franches-Mnt -1.50629042 0.08054966 6.439976e-02
## Porrentruy -1.39271777 0.08617112 5.969318e-02
## Rive Gauche 1.51834435 0.06081060 4.832239e-02
## Rive Droite 2.35747761 0.02421594 4.165969e-02
## Lausanne 1.21211055 0.06299852 3.257966e-02
## Courtelary -1.09960993 0.06963582 3.002454e-02
## Le Locle -0.97518368 0.07024144 2.397501e-02
## Lavaux 0.78234404 0.08697428 1.960777e-02
## Grandson -0.92356936 0.04271542 1.272961e-02
## Monthey -0.74704064 0.06215186 1.245297e-02
## Moutier -1.07435168 0.02986573 1.180302e-02
## Val de Ruz -0.92139589 0.03896833 1.151433e-02
## Gruyere -0.66037762 0.06594488 1.039618e-02
## Moudon -0.77011454 0.04581406 9.580548e-03
## Aigle 0.71666137 0.05170677 9.439285e-03
## Avenches 0.68127511 0.05450530 9.028697e-03
## Delemont -0.66103217 0.05659940 8.851825e-03
## Entremont 0.46508222 0.08722321 7.014727e-03
## St Maurice 0.51547986 0.07116269 6.901184e-03
## Sion 0.55802849 0.06003867 6.735395e-03
## Oron -0.40065797 0.08228796 4.891275e-03
## Paysd'enhaut -0.43066048 0.06260290 4.206633e-03
## Rolle 0.39693798 0.05051812 2.848920e-03
## Veveyse -0.35070044 0.06263559 2.795168e-03
## Orbe -0.39762098 0.04498946 2.531106e-03
## Morges 0.36702452 0.05188736 2.506666e-03
## Neuveville 0.39811883 0.03748514 2.097695e-03
## Aubonne 0.27852503 0.07239409 2.061333e-03
## Echallens -0.28692140 0.05788806 1.722046e-03
## Yverdon -0.31433140 0.03874475 1.355239e-03
## Martigwy 0.19513756 0.07422530 1.040416e-03
## Vevey 0.23091372 0.04565872 8.690490e-04
## Nyone 0.24531130 0.03148873 6.664096e-04
## Boudry -0.20915822 0.03806662 5.898893e-04
## Sarine -0.13510970 0.06638194 4.425195e-04
## Herens 0.10347592 0.09901084 4.012330e-04
## Cossonay 0.07379968 0.07627979 1.533861e-04
## Conthey -0.04943719 0.08945777 8.189620e-05
## Glane 0.05983316 0.06170401 8.029420e-05
## Sierre 0.03751799 0.08636209 4.538130e-05
## Broye 0.03536539 0.05810824 2.631746e-05
## La Vallee -0.02979578 0.07674051 2.516892e-05
## Payerne 0.02756758 0.04913199 1.339358e-05
#test outliers
outlierTest(swissTest) #concluding/confirming that data point for V. De Geneve is outlier
## rstudent unadjusted p-value Bonferroni p
## V. De Geneve 4.934315 1.2549e-05 0.00058978
######################################################
#build and estimate model without V. De Geneve outlier
######################################################
# compute a dummy code for pnum 6
swiss_no_outlier = swiss #create new dataset ultimately excluding outlier province
swiss_no_outlier$obs_DeGeneve = as.numeric(rownames(swiss_no_outlier)=="V. De Geneve")
swissTest_outlier = lm(formula = Education ~ Agriculture + Catholic + obs_DeGeneve, data = swiss_no_outlier)
mcSummary(swissTest_outlier)
## lm(formula = Education ~ Agriculture + Catholic + obs_DeGeneve,
## data = swiss_no_outlier)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 2682.222 3 894.074 0.631 24.476 0
## Error 1570.757 43 36.529
## Corr Total 4252.979 46 92.456
##
## RMSE AdjEtaSq
## 6.044 0.605
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.563 2.324 8.848 2859.651 0.645 NA 15.876 25.250 0.00
## Agriculture -0.211 0.046 -4.602 773.690 0.330 0.733 -0.303 -0.119 0.00
## Catholic 0.010 0.024 0.429 6.716 0.004 0.819 -0.037 0.058 0.67
## obs_DeGeneve 32.261 6.538 4.934 889.394 0.362 0.873 19.076 45.447 0.00
swissTest_no_outlier = lm(formula = Education ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0, ])
mcSummary(swissTest_no_outlier)
## lm(formula = Education ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve ==
## 0, ])
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 878.047 2 439.024 0.359 12.018 0
## Error 1570.757 43 36.529
## Corr Total 2448.804 45 54.418
##
## RMSE AdjEtaSq
## 6.044 0.329
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.563 2.324 8.848 2859.651 0.645 NA 15.876 25.250 0.00
## Agriculture -0.211 0.046 -4.602 773.690 0.330 0.819 -0.303 -0.119 0.00
## Catholic 0.010 0.024 0.429 6.716 0.004 0.819 -0.037 0.058 0.67
The code and output for all parts of Question 2 are given below. Responses to the subsections of Question 2 are given below.
The diagnostic plots are given in the output below.
A number of concerns arise when inspecting the diagnostic plots. Perhaps most importantly, the “Residuals vs. Fitted” plot exhibits a funnel shape while the Scale-Location regression line has a substantially positive slope. These observations suggest that the model errors are heterogeneous: that the variance in the distribution of errors increases with increasing predicted values. Secondly, the Q-Q plot diverges from theoretical expectation for the most largest residuals; this suggests that the residuals are not normally distributed but substantially positively skewed. The “Residuals vs. Leverage” plot confirms that the V. De Geneve datapoint is a likely outlier with a Cook’s D well above the acceptable threshold.
Note: The Swiss Education percent variable was converted to a proportion so that the transformations described below would be mathematically possible.
The arcsin, logit, folded-root and square-root transformations were applied to models relating the proportions of a province’s residents educated beyond primary school to the percentage of Catholic residents and those involved in agriculture as occupation with and without the V. De Geneve outlier. Diagnostic plots indicated that each of the transformations corrected for much of the assumption violations described in Question 2b - this was especially true when excluding the V. De Geneve outlier.
The square root-transformed model excluding the V. De Geneve outlier arguably yielded the best corrections and is therefore taken as the final version of the analysis. This model resulted in slightly different parameter estimates from the original. The intercept, for instance, decreased from 24.6% to 21.3% after re-conversion to an untransformed criterion variable. The “Agriculture” slope estimate likewise remained similar (e.g. -.292 in the original version and -.3 in the final version). Notably, however, the influence of Catholicism on education levels became almost imperceptible in the final model and registered an even larger p-value (e.g. .334 vs. .683).
The Swiss R dataset includes demographic variables for each of the country’s 47 provinces circa 1888 - in particular, the dataset includes variables for the percentage of residents educated beyond primary school (percEDU), percentage of Catholic residents (percCATH) and percentage of residents claiming agriculture as occupation (percAG). In an initial model regressing percEDU on percCATH and percAG, diagnostic plots revealed substantial heterogeneity and non-normality of errors as well as an outlier province, the historically well-educated V. De Geneve. To correct for these errors and return interpretable parameter estimates, a corrected model regressed a square root-transformed proportion of educated residents variable on percCATH and percAG while excluding the V. De Geneve outlier province. This model estimated that a province with no Catholic residents or agriculture could be expected to educate about 21.3% of its population (F(1, 43) = 199.5, PRE = .823, p < .001) and that a one percentage point increase in residents claiming agriculture as occupation decreased predicted education levels by .3% (F(1, 43) = 25.5, PRE = .372, p < .001). However, above and beyond agriculture percentages, the percentage of Catholic residents did not reliably predict educational levels (F(1, 43) = .17, PRE = .001, p = .683).
##############################
#original model for comparison
swiss_no_outlier = swiss #create new dataset ultimately excluding outlier province
swiss_no_outlier$obs_DeGeneve = as.numeric(rownames(swiss_no_outlier)=="V. De Geneve")
swiss_no_outlier$EDU_proportions = swiss_no_outlier$Education/100
swissTest = lm(Education ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest)
## lm(formula = Education ~ Agriculture + Catholic, data = swiss_no_outlier)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 1792.828 2 896.414 0.422 16.032 0
## Error 2460.151 44 55.913
## Corr Total 4252.979 46 92.456
##
## RMSE AdjEtaSq
## 7.477 0.395
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 24.587 2.693 9.132 4662.443 0.655 NA 19.161 30.014 0.000
## Agriculture -0.292 0.053 -5.501 1692.149 0.408 0.839 -0.398 -0.185 0.000
## Catholic 0.028 0.029 0.977 53.406 0.021 0.839 -0.030 0.086 0.334
#generating plots for assessing types of error
plot(swissTest)
##############################
##############################
#*logit* transform data and re-run model+diagnostics w/ outlier
swiss_no_outlier$logitEDU = logit(swiss_no_outlier$EDU_proportions)
swissTest_logit = lm(logitEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest_logit)
## lm(formula = logitEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 17.662 2 8.831 0.473 19.777 0
## Error 19.647 44 0.447
## Corr Total 37.309 46 0.811
##
## RMSE AdjEtaSq
## 0.668 0.449
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -1.019 0.241 -4.233 8.001 0.289 NA -1.503 -0.534 0.000
## Agriculture -0.029 0.005 -6.028 16.225 0.452 0.839 -0.038 -0.019 0.000
## Catholic 0.002 0.003 0.775 0.268 0.013 0.839 -0.003 0.007 0.443
plot(swissTest_logit)
#*logit* transform data and re-run model+diagnostics w/o outlier
swiss_no_outlier$logitEDU = logit(swiss_no_outlier$EDU_proportions)
swissTest_logit_NO = lm(logitEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0,])
mcSummary(swissTest_logit_NO)
## lm(formula = logitEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve ==
## 0, ])
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 12.649 2 6.324 0.409 14.892 0
## Error 18.261 43 0.425
## Corr Total 30.910 45 0.687
##
## RMSE AdjEtaSq
## 0.652 0.382
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -1.177 0.251 -4.698 9.375 0.339 NA -1.683 -0.672 0.000
## Agriculture -0.025 0.005 -5.132 11.186 0.380 0.819 -0.035 -0.015 0.000
## Catholic 0.001 0.003 0.504 0.108 0.006 0.819 -0.004 0.006 0.617
plot(swissTest_logit_NO)
##############################
##############################
#*folded root* transform data and re-run model+diagnostics w/ outlier
swiss_no_outlier$foldRootEDU = sqrt(swiss_no_outlier$EDU_proportions)-sqrt(1-swiss_no_outlier$EDU_proportions)
swissTest_foldRoot = lm(foldRootEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest_foldRoot)
## lm(formula = foldRootEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.663 2 0.332 0.46 18.758 0
## Error 0.778 44 0.018
## Corr Total 1.441 46 0.031
##
## RMSE AdjEtaSq
## 0.133 0.436
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -0.371 0.048 -7.744 1.060 0.577 NA -0.467 -0.274 0.000
## Agriculture -0.006 0.001 -5.908 0.617 0.442 0.839 -0.007 -0.004 0.000
## Catholic 0.000 0.001 0.891 0.014 0.018 0.839 -0.001 0.001 0.378
plot(swissTest_foldRoot)
#*folded root* transform data and re-run model+diagnostics w/o outlier
swissTest_foldRoot_NO = lm(foldRootEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0,])
mcSummary(swissTest_foldRoot_NO)
## lm(formula = foldRootEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve ==
## 0, ])
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.383 2 0.192 0.394 13.97 0
## Error 0.590 43 0.014
## Corr Total 0.973 45 0.022
##
## RMSE AdjEtaSq
## 0.117 0.366
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -0.429 0.045 -9.528 1.246 0.679 NA -0.520 -0.338 0.000
## Agriculture -0.004 0.001 -4.949 0.336 0.363 0.819 -0.006 -0.003 0.000
## Catholic 0.000 0.000 0.425 0.002 0.004 0.819 -0.001 0.001 0.673
plot(swissTest_foldRoot_NO)
##############################
##############################
#*arcsine* transform data and re-run model+diagnostics w/ outlier
swiss_no_outlier$arcsinEDU = asin(sqrt(swiss_no_outlier$EDU_proportions))
swissTest_arcsin = lm(arcsinEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest_arcsin)
## lm(formula = arcsinEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.398 2 0.199 0.467 19.282 0
## Error 0.455 44 0.010
## Corr Total 0.853 46 0.019
##
## RMSE AdjEtaSq
## 0.102 0.443
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 0.521 0.037 14.232 2.092 0.822 NA 0.447 0.595 0.000
## Agriculture -0.004 0.001 -5.979 0.369 0.448 0.839 -0.006 -0.003 0.000
## Catholic 0.000 0.000 0.862 0.008 0.017 0.839 0.000 0.001 0.393
plot(swissTest_arcsin)
#*arcsine* transform data and re-run model+diagnostics w/o outlier
swissTest_arcsin_NO = lm(arcsinEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0,])
mcSummary(swissTest_arcsin_NO)
## lm(formula = arcsinEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve ==
## 0, ])
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.239 2 0.120 0.4 14.332 0
## Error 0.359 43 0.008
## Corr Total 0.599 45 0.013
##
## RMSE AdjEtaSq
## 0.091 0.372
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 0.479 0.035 13.634 1.553 0.812 NA 0.408 0.550 0.000
## Agriculture -0.003 0.001 -5.010 0.210 0.369 0.819 -0.005 -0.002 0.000
## Catholic 0.000 0.000 0.423 0.001 0.004 0.819 -0.001 0.001 0.674
plot(swissTest_arcsin_NO)
##############################
##############################
#*sqrt* transform data and re-run model+diagnostics w/ outlier
swiss_no_outlier$sqrtEDU = sqrt(swiss_no_outlier$EDU_proportions)
swissTest_sqrt = lm(sqrtEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest_sqrt)
## lm(formula = sqrtEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.331 2 0.166 0.472 19.675 0
## Error 0.370 44 0.008
## Corr Total 0.702 46 0.015
##
## RMSE AdjEtaSq
## 0.092 0.448
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 0.495 0.033 14.972 1.887 0.836 NA 0.428 0.561 0.000
## Agriculture -0.004 0.001 -6.022 0.305 0.452 0.839 -0.005 -0.003 0.000
## Catholic 0.000 0.000 0.805 0.005 0.015 0.839 0.000 0.001 0.425
plot(swissTest_sqrt)
#*sqrt* transform data and re-run model+diagnostics w/o outlier
swissTest_sqrt_NO = lm(sqrtEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0,])
mcSummary(swissTest_sqrt_NO)
## lm(formula = sqrtEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve ==
## 0, ])
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.211 2 0.105 0.404 14.59 0
## Error 0.311 43 0.007
## Corr Total 0.521 45 0.012
##
## RMSE AdjEtaSq
## 0.085 0.377
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 0.462 0.033 14.125 1.441 0.823 NA 0.396 0.528 0.000
## Agriculture -0.003 0.001 -5.049 0.184 0.372 0.819 -0.005 -0.002 0.000
## Catholic 0.000 0.000 0.412 0.001 0.004 0.819 -0.001 0.001 0.683
plot(swissTest_sqrt_NO)
##############################