Question 1

The code and output for all parts of Question 1 are given below. Responses to the subsections of Question 1 are given below.

Question 1a

The parameter estimates in the full model (e.g. prior to exlucding outliers) imply that 1) a province with no Catholic residents or males involved in agriculture as occupation would be expected to educate 24.6% of its residents beyond primary school. 2) for every one point increase in a province’s percentage of males involved in agriculture as occupation, that province’s would be expected to exhibit a .292 percentage decrease in those educated beyond primary school. 3) for every one point increase in a province’s percentage of Catholic residents, that province would be expected to exhibit a .028 percentage increase in those educated beyond primary school. However, this parameter did not reach significance.

Question 1b

The expected values for the levers and studentized deleted residuals are .064 (e.g. \(PA/n = 3/47\)) and 0, respectively.

Question 1c

The datapoint corresponding to province V. De Geneve is potentially problematic for a number of reasons. For one, this datapoint clearly stands out in the 3d plot above: its education value places it well above all other points, which exist in two poorly-educated clusters (one Catholic and one Protestant). Moreover, a brief foray into Swiss history reveals that the historical canton of Geneva housed the University of Geneva, then by far the largest in the country. Given this fact, it seems likely that, at least educationally, the province of V. De Geneve was quite unlike its 1888 counterparts.

Statistically, this province also appears to be an outlier. That is, its rstudent value is far and away the highest of any in the dataset and remain highly significant after Bonferroni correction (\(p_{corrected} = .00059\)) and its Cook’s D is well above the significant threshold for this measure (\(D_{threshold} = 4/47 = .0851\)) and is at least one order of magnitude greater than 46/47 Cook’s D values in the dataset. Importantly, no other datapoints show consistent evidence of being an outlier. That is, only the V. De Geneve datapoint attains a significant rstudent and only the La Chauxdfnd datapoint is slightly above the Cook’s D threshold. Visually, no datapoint other than that for V. De Geneve is easily distinguishable from the points in the poorly-educated Catholic and Protestant clusters.

Question 1d

The final two models in the output show the regression results for models excluding the V. De Geneve datapoint (or explicitly modeling it as a separate variable).

Question 1e

The parameter estimates in the full model (e.g. prior to exlucding outliers) imply that 1) a province with no Catholic residents or males involved in agriculture as occupation would be expected to educate 20.6% of its residents beyond primary school. 2) for every one point increase in a province’s percentage of males involved in agriculture as occupation, that province’s would be expected to exhibit a .211 percentage decrease in those educated beyond primary school. 3) for every one point increase in a province’s percentage of Catholic residents, that province would be expected to exhibit a .10 percentage increase in those educated beyond primary school.

Importantly, these changes are entirely expected given the demographic characteristics of V. De Geneve (that is, V. De Geneve is a highly educated, low agriculture, overwhelmingly Catholic province). In particular, excluding such a province results in a subset of provinces with a substantially lower mean education level: this explains why the intercept estimate decreases from 24.6 to 20.6. Likewise, the high level of Catholicism in V. De Geneve explains why the “Catholic” slope estimate decreased from .028 to .010. Finally, the low level of agriculture in V. De Geneve causes the negative relationship between agriculture and education to be overestimated in the full model: thus, excluding the province increases the “Agriculture” slope estimate from -.292 to -.211.

###################################################
#build and estimate full model to identify outliers
###################################################
swissTest = lm(Education ~ Agriculture + Catholic, data = swiss)
mcSummary(swissTest)

## Loading required package: car

## Warning: package 'car' was built under R version 3.6.3

## Loading required package: carData

## Warning: package 'carData' was built under R version 3.6.3

## Registered S3 methods overwritten by 'car':
##   method                          from
##   influence.merMod                lme4
##   cooks.distance.influence.merMod lme4
##   dfbeta.influence.merMod         lme4
##   dfbetas.influence.merMod        lme4

## lm(formula = Education ~ Agriculture + Catholic, data = swiss)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model      1792.828  2 896.414 0.422 16.032 0
## Error      2460.151 44  55.913               
## Corr Total 4252.979 46  92.456               
## 
##   RMSE AdjEtaSq
##  7.477    0.395
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept) 24.587 2.693  9.132 4662.443 0.655    NA 19.161  30.014 0.000
## Agriculture -0.292 0.053 -5.501 1692.149 0.408 0.839 -0.398  -0.185 0.000
## Catholic     0.028 0.029  0.977   53.406 0.021 0.839 -0.030   0.086 0.334

#plotting
plot3d(swiss$Catholic, swiss$Agriculture, swiss$Education, type="p", col="red", xlab="Catholic", ylab="Agriculture", zlab="Education", site=5, lwd=15)
#V. De Geneve is clearly separated from other provinces --> suggests outlier status
leveragePlots(swissTest)

#retrieve outlier diagnostics
fitted <- fitted(swissTest)
resid <- resid(swissTest)
rstudent <- rstudent(swissTest)
lever <- hatvalues(swissTest)
cookd <- cooks.distance(swissTest) #4/47 = .0851
diagnostics1 <- cbind(swiss[, c(2, 5, 4)], fitted, resid, rstudent, lever, cookd)
diagnostics1[with(diagnostics1, order(-cookd)), ] #print in order of descending cookd

##              Agriculture Catholic Education    fitted       resid
## V. De Geneve         1.2    42.34        53 25.431540  27.5684595
## La Chauxdfnd         7.7    13.79        11 22.731344 -11.7313439
## ValdeTravers        18.7     8.65         7 19.379514 -12.3795145
## Neuchatel           17.6    16.92        32 19.933450  12.0665495
## Franches-Mnt        39.7    93.40         5 15.647649 -10.6476489
## Porrentruy          35.3    90.57         7 16.850574  -9.8505741
## Rive Gauche         27.7    58.33        29 18.156914  10.8430865
## Rive Droite         46.6    50.43        29 12.424132  16.5758677
## Lausanne            19.4    12.11        28 19.273029   8.7269713
## Courtelary          17.0     9.96        12 19.912068  -7.9120677
## Le Locle            16.7    11.22        13 20.035065  -7.0350650
## Lavaux              73.0     2.84         9  3.385425   5.6145747
## Grandson            34.0     3.30         8 14.768172  -6.7681719
## Monthey             64.9    98.22         3  8.436972  -5.4369716
## Moutier             36.5    33.77         7 14.898727  -7.8987272
## Val de Ruz          37.6     4.97         7 13.765756  -6.7657557
## Gruyere             53.3    97.67         7 11.803238  -4.8032378
## Moudon              55.1     4.52         3  8.651243  -5.6512427
## Aigle               62.0     8.52        12  6.752485   5.2475150
## Avenches            60.7     4.43        12  7.016122   4.9838784
## Delemont            45.1    84.84         9 13.831943  -4.8319434
## Entremont           84.9    99.68         6  2.647497   3.3525026
## St Maurice          75.9    99.06         9  5.253804   3.7461956
## Sion                63.1    96.83        13  8.922526   4.0774737
## Oron                71.2     2.40         1  3.897774  -2.8977741
## Paysd'enhaut        63.5     2.56         3  6.147088  -3.1470881
## Rolle               60.8     7.72        10  7.079761   2.9202393
## Veveyse             64.5    98.61         6  8.564584  -2.5645844
## Orbe                54.1     4.20         6  8.933750  -2.9337500
## Morges              59.8     5.23        10  7.301064   2.6989355
## Neuveville          43.5     5.16        15 12.051072   2.9489280
## Aubonne             67.5     2.27         7  4.972778   2.0272217
## Echallens           72.6    24.20         2  4.104484  -2.1044835
## Yverdon             49.5     6.10         8 10.328388  -2.3283883
## Martigwy            78.2    98.96         6  4.580459   1.4195411
## Vevey               26.8    18.46        19 17.294785   1.7052150
## Nyone               50.9    15.14        12 10.175210   1.8247899
## Boudry              38.4     5.62        12 13.550862  -1.5508625
## Sarine              45.2    91.38        13 13.987247  -0.9872466
## Herens              89.7   100.00         2  1.257166   0.7428338
## Cossonay            69.3     2.82         5  4.463532   0.5364680
## Conthey             85.9    99.71         2  2.356811  -0.3568109
## Glane               67.8    97.16         8  7.561630   0.4383696
## Sierre              84.6    99.46         3  2.728752   0.2712478
## Broye               70.2    92.85         7  6.740391   0.2596087
## La Vallee           15.2     2.15        20 20.216550  -0.2165504
## Payerne             58.1     5.23         8  7.796670   0.2033301
##                 rstudent      lever        cookd
## V. De Geneve  4.93431543 0.14546339 9.025827e-01
## La Chauxdfnd -1.68749786 0.09933278 1.004683e-01
## ValdeTravers -1.75247710 0.06551995 6.855046e-02
## Neuchatel     1.70691150 0.06734022 6.719916e-02
## Franches-Mnt -1.50629042 0.08054966 6.439976e-02
## Porrentruy   -1.39271777 0.08617112 5.969318e-02
## Rive Gauche   1.51834435 0.06081060 4.832239e-02
## Rive Droite   2.35747761 0.02421594 4.165969e-02
## Lausanne      1.21211055 0.06299852 3.257966e-02
## Courtelary   -1.09960993 0.06963582 3.002454e-02
## Le Locle     -0.97518368 0.07024144 2.397501e-02
## Lavaux        0.78234404 0.08697428 1.960777e-02
## Grandson     -0.92356936 0.04271542 1.272961e-02
## Monthey      -0.74704064 0.06215186 1.245297e-02
## Moutier      -1.07435168 0.02986573 1.180302e-02
## Val de Ruz   -0.92139589 0.03896833 1.151433e-02
## Gruyere      -0.66037762 0.06594488 1.039618e-02
## Moudon       -0.77011454 0.04581406 9.580548e-03
## Aigle         0.71666137 0.05170677 9.439285e-03
## Avenches      0.68127511 0.05450530 9.028697e-03
## Delemont     -0.66103217 0.05659940 8.851825e-03
## Entremont     0.46508222 0.08722321 7.014727e-03
## St Maurice    0.51547986 0.07116269 6.901184e-03
## Sion          0.55802849 0.06003867 6.735395e-03
## Oron         -0.40065797 0.08228796 4.891275e-03
## Paysd'enhaut -0.43066048 0.06260290 4.206633e-03
## Rolle         0.39693798 0.05051812 2.848920e-03
## Veveyse      -0.35070044 0.06263559 2.795168e-03
## Orbe         -0.39762098 0.04498946 2.531106e-03
## Morges        0.36702452 0.05188736 2.506666e-03
## Neuveville    0.39811883 0.03748514 2.097695e-03
## Aubonne       0.27852503 0.07239409 2.061333e-03
## Echallens    -0.28692140 0.05788806 1.722046e-03
## Yverdon      -0.31433140 0.03874475 1.355239e-03
## Martigwy      0.19513756 0.07422530 1.040416e-03
## Vevey         0.23091372 0.04565872 8.690490e-04
## Nyone         0.24531130 0.03148873 6.664096e-04
## Boudry       -0.20915822 0.03806662 5.898893e-04
## Sarine       -0.13510970 0.06638194 4.425195e-04
## Herens        0.10347592 0.09901084 4.012330e-04
## Cossonay      0.07379968 0.07627979 1.533861e-04
## Conthey      -0.04943719 0.08945777 8.189620e-05
## Glane         0.05983316 0.06170401 8.029420e-05
## Sierre        0.03751799 0.08636209 4.538130e-05
## Broye         0.03536539 0.05810824 2.631746e-05
## La Vallee    -0.02979578 0.07674051 2.516892e-05
## Payerne       0.02756758 0.04913199 1.339358e-05

#test outliers
outlierTest(swissTest) #concluding/confirming that data point for V. De Geneve is outlier

##              rstudent unadjusted p-value Bonferroni p
## V. De Geneve 4.934315         1.2549e-05   0.00058978

######################################################
#build and estimate model without V. De Geneve outlier
######################################################
# compute a dummy code for pnum 6
swiss_no_outlier = swiss #create new dataset ultimately excluding outlier province
swiss_no_outlier$obs_DeGeneve = as.numeric(rownames(swiss_no_outlier)=="V. De Geneve")


swissTest_outlier = lm(formula = Education ~ Agriculture + Catholic + obs_DeGeneve, data = swiss_no_outlier)
mcSummary(swissTest_outlier)

## lm(formula = Education ~ Agriculture + Catholic + obs_DeGeneve, 
##     data = swiss_no_outlier)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model      2682.222  3 894.074 0.631 24.476 0
## Error      1570.757 43  36.529               
## Corr Total 4252.979 46  92.456               
## 
##   RMSE AdjEtaSq
##  6.044    0.605
## 
## Coefficients
##                 Est StErr      t   SSR(3) EtaSq   tol CI_2.5 CI_97.5    p
## (Intercept)  20.563 2.324  8.848 2859.651 0.645    NA 15.876  25.250 0.00
## Agriculture  -0.211 0.046 -4.602  773.690 0.330 0.733 -0.303  -0.119 0.00
## Catholic      0.010 0.024  0.429    6.716 0.004 0.819 -0.037   0.058 0.67
## obs_DeGeneve 32.261 6.538  4.934  889.394 0.362 0.873 19.076  45.447 0.00

swissTest_no_outlier = lm(formula = Education ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0, ])
mcSummary(swissTest_no_outlier)

## lm(formula = Education ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve == 
##     0, ])
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model       878.047  2 439.024 0.359 12.018 0
## Error      1570.757 43  36.529               
## Corr Total 2448.804 45  54.418               
## 
##   RMSE AdjEtaSq
##  6.044    0.329
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq   tol CI_2.5 CI_97.5    p
## (Intercept) 20.563 2.324  8.848 2859.651 0.645    NA 15.876  25.250 0.00
## Agriculture -0.211 0.046 -4.602  773.690 0.330 0.819 -0.303  -0.119 0.00
## Catholic     0.010 0.024  0.429    6.716 0.004 0.819 -0.037   0.058 0.67

Question 2

The code and output for all parts of Question 2 are given below. Responses to the subsections of Question 2 are given below.

Question 2a

The diagnostic plots are given in the output below.

Question 2b

A number of concerns arise when inspecting the diagnostic plots. Perhaps most importantly, the “Residuals vs. Fitted” plot exhibits a funnel shape while the Scale-Location regression line has a substantially positive slope. These observations suggest that the model errors are heterogeneous: that the variance in the distribution of errors increases with increasing predicted values. Secondly, the Q-Q plot diverges from theoretical expectation for the most largest residuals; this suggests that the residuals are not normally distributed but substantially positively skewed. The “Residuals vs. Leverage” plot confirms that the V. De Geneve datapoint is a likely outlier with a Cook’s D well above the acceptable threshold.

Question 2c

Note: The Swiss Education percent variable was converted to a proportion so that the transformations described below would be mathematically possible.

The arcsin, logit, folded-root and square-root transformations were applied to models relating the proportions of a province’s residents educated beyond primary school to the percentage of Catholic residents and those involved in agriculture as occupation with and without the V. De Geneve outlier. Diagnostic plots indicated that each of the transformations corrected for much of the assumption violations described in Question 2b - this was especially true when excluding the V. De Geneve outlier.

Question 2d

The square root-transformed model excluding the V. De Geneve outlier arguably yielded the best corrections and is therefore taken as the final version of the analysis. This model resulted in slightly different parameter estimates from the original. The intercept, for instance, decreased from 24.6% to 21.3% after re-conversion to an untransformed criterion variable. The “Agriculture” slope estimate likewise remained similar (e.g. -.292 in the original version and -.3 in the final version). Notably, however, the influence of Catholicism on education levels became almost imperceptible in the final model and registered an even larger p-value (e.g. .334 vs. .683).

Journal Summary:

The Swiss R dataset includes demographic variables for each of the country’s 47 provinces circa 1888 - in particular, the dataset includes variables for the percentage of residents educated beyond primary school (percEDU), percentage of Catholic residents (percCATH) and percentage of residents claiming agriculture as occupation (percAG). In an initial model regressing percEDU on percCATH and percAG, diagnostic plots revealed substantial heterogeneity and non-normality of errors as well as an outlier province, the historically well-educated V. De Geneve. To correct for these errors and return interpretable parameter estimates, a corrected model regressed a square root-transformed proportion of educated residents variable on percCATH and percAG while excluding the V. De Geneve outlier province. This model estimated that a province with no Catholic residents or agriculture could be expected to educate about 21.3% of its population (F(1, 43) = 199.5, PRE = .823, p < .001) and that a one percentage point increase in residents claiming agriculture as occupation decreased predicted education levels by .3% (F(1, 43) = 25.5, PRE = .372, p < .001). However, above and beyond agriculture percentages, the percentage of Catholic residents did not reliably predict educational levels (F(1, 43) = .17, PRE = .001, p = .683).

##############################
#original model for comparison
swiss_no_outlier = swiss #create new dataset ultimately excluding outlier province
swiss_no_outlier$obs_DeGeneve = as.numeric(rownames(swiss_no_outlier)=="V. De Geneve")
swiss_no_outlier$EDU_proportions = swiss_no_outlier$Education/100

swissTest = lm(Education ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest)

## lm(formula = Education ~ Agriculture + Catholic, data = swiss_no_outlier)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model      1792.828  2 896.414 0.422 16.032 0
## Error      2460.151 44  55.913               
## Corr Total 4252.979 46  92.456               
## 
##   RMSE AdjEtaSq
##  7.477    0.395
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept) 24.587 2.693  9.132 4662.443 0.655    NA 19.161  30.014 0.000
## Agriculture -0.292 0.053 -5.501 1692.149 0.408 0.839 -0.398  -0.185 0.000
## Catholic     0.028 0.029  0.977   53.406 0.021 0.839 -0.030   0.086 0.334

#generating plots for assessing types of error
plot(swissTest)

##############################


##############################
#*logit* transform data and re-run model+diagnostics w/ outlier
swiss_no_outlier$logitEDU = logit(swiss_no_outlier$EDU_proportions)
swissTest_logit = lm(logitEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest_logit)

## lm(formula = logitEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
## 
## Omnibus ANOVA
##                SS df    MS EtaSq      F p
## Model      17.662  2 8.831 0.473 19.777 0
## Error      19.647 44 0.447               
## Corr Total 37.309 46 0.811               
## 
##   RMSE AdjEtaSq
##  0.668    0.449
## 
## Coefficients
##                Est StErr      t SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept) -1.019 0.241 -4.233  8.001 0.289    NA -1.503  -0.534 0.000
## Agriculture -0.029 0.005 -6.028 16.225 0.452 0.839 -0.038  -0.019 0.000
## Catholic     0.002 0.003  0.775  0.268 0.013 0.839 -0.003   0.007 0.443

plot(swissTest_logit)

#*logit* transform data and re-run model+diagnostics w/o outlier
swiss_no_outlier$logitEDU = logit(swiss_no_outlier$EDU_proportions)
swissTest_logit_NO = lm(logitEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0,])
mcSummary(swissTest_logit_NO)

## lm(formula = logitEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve == 
##     0, ])
## 
## Omnibus ANOVA
##                SS df    MS EtaSq      F p
## Model      12.649  2 6.324 0.409 14.892 0
## Error      18.261 43 0.425               
## Corr Total 30.910 45 0.687               
## 
##   RMSE AdjEtaSq
##  0.652    0.382
## 
## Coefficients
##                Est StErr      t SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept) -1.177 0.251 -4.698  9.375 0.339    NA -1.683  -0.672 0.000
## Agriculture -0.025 0.005 -5.132 11.186 0.380 0.819 -0.035  -0.015 0.000
## Catholic     0.001 0.003  0.504  0.108 0.006 0.819 -0.004   0.006 0.617

plot(swissTest_logit_NO)

##############################


##############################
#*folded root* transform data and re-run model+diagnostics w/ outlier
swiss_no_outlier$foldRootEDU = sqrt(swiss_no_outlier$EDU_proportions)-sqrt(1-swiss_no_outlier$EDU_proportions)
swissTest_foldRoot = lm(foldRootEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest_foldRoot)

## lm(formula = foldRootEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
## 
## Omnibus ANOVA
##               SS df    MS EtaSq      F p
## Model      0.663  2 0.332  0.46 18.758 0
## Error      0.778 44 0.018               
## Corr Total 1.441 46 0.031               
## 
##   RMSE AdjEtaSq
##  0.133    0.436
## 
## Coefficients
##                Est StErr      t SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept) -0.371 0.048 -7.744  1.060 0.577    NA -0.467  -0.274 0.000
## Agriculture -0.006 0.001 -5.908  0.617 0.442 0.839 -0.007  -0.004 0.000
## Catholic     0.000 0.001  0.891  0.014 0.018 0.839 -0.001   0.001 0.378

plot(swissTest_foldRoot)

#*folded root* transform data and re-run model+diagnostics w/o outlier
swissTest_foldRoot_NO = lm(foldRootEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0,])
mcSummary(swissTest_foldRoot_NO)

## lm(formula = foldRootEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve == 
##     0, ])
## 
## Omnibus ANOVA
##               SS df    MS EtaSq     F p
## Model      0.383  2 0.192 0.394 13.97 0
## Error      0.590 43 0.014              
## Corr Total 0.973 45 0.022              
## 
##   RMSE AdjEtaSq
##  0.117    0.366
## 
## Coefficients
##                Est StErr      t SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept) -0.429 0.045 -9.528  1.246 0.679    NA -0.520  -0.338 0.000
## Agriculture -0.004 0.001 -4.949  0.336 0.363 0.819 -0.006  -0.003 0.000
## Catholic     0.000 0.000  0.425  0.002 0.004 0.819 -0.001   0.001 0.673

plot(swissTest_foldRoot_NO)

##############################


##############################
#*arcsine* transform data and re-run model+diagnostics w/ outlier
swiss_no_outlier$arcsinEDU = asin(sqrt(swiss_no_outlier$EDU_proportions))
swissTest_arcsin = lm(arcsinEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest_arcsin)

## lm(formula = arcsinEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
## 
## Omnibus ANOVA
##               SS df    MS EtaSq      F p
## Model      0.398  2 0.199 0.467 19.282 0
## Error      0.455 44 0.010               
## Corr Total 0.853 46 0.019               
## 
##   RMSE AdjEtaSq
##  0.102    0.443
## 
## Coefficients
##                Est StErr      t SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept)  0.521 0.037 14.232  2.092 0.822    NA  0.447   0.595 0.000
## Agriculture -0.004 0.001 -5.979  0.369 0.448 0.839 -0.006  -0.003 0.000
## Catholic     0.000 0.000  0.862  0.008 0.017 0.839  0.000   0.001 0.393

plot(swissTest_arcsin)

#*arcsine* transform data and re-run model+diagnostics w/o outlier
swissTest_arcsin_NO = lm(arcsinEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0,])
mcSummary(swissTest_arcsin_NO)

## lm(formula = arcsinEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve == 
##     0, ])
## 
## Omnibus ANOVA
##               SS df    MS EtaSq      F p
## Model      0.239  2 0.120   0.4 14.332 0
## Error      0.359 43 0.008               
## Corr Total 0.599 45 0.013               
## 
##   RMSE AdjEtaSq
##  0.091    0.372
## 
## Coefficients
##                Est StErr      t SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept)  0.479 0.035 13.634  1.553 0.812    NA  0.408   0.550 0.000
## Agriculture -0.003 0.001 -5.010  0.210 0.369 0.819 -0.005  -0.002 0.000
## Catholic     0.000 0.000  0.423  0.001 0.004 0.819 -0.001   0.001 0.674

plot(swissTest_arcsin_NO)

##############################


##############################
#*sqrt* transform data and re-run model+diagnostics w/ outlier
swiss_no_outlier$sqrtEDU = sqrt(swiss_no_outlier$EDU_proportions)
swissTest_sqrt = lm(sqrtEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
mcSummary(swissTest_sqrt)

## lm(formula = sqrtEDU ~ Agriculture + Catholic, data = swiss_no_outlier)
## 
## Omnibus ANOVA
##               SS df    MS EtaSq      F p
## Model      0.331  2 0.166 0.472 19.675 0
## Error      0.370 44 0.008               
## Corr Total 0.702 46 0.015               
## 
##   RMSE AdjEtaSq
##  0.092    0.448
## 
## Coefficients
##                Est StErr      t SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept)  0.495 0.033 14.972  1.887 0.836    NA  0.428   0.561 0.000
## Agriculture -0.004 0.001 -6.022  0.305 0.452 0.839 -0.005  -0.003 0.000
## Catholic     0.000 0.000  0.805  0.005 0.015 0.839  0.000   0.001 0.425

plot(swissTest_sqrt)

#*sqrt* transform data and re-run model+diagnostics w/o outlier
swissTest_sqrt_NO = lm(sqrtEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve==0,])
mcSummary(swissTest_sqrt_NO)

## lm(formula = sqrtEDU ~ Agriculture + Catholic, data = swiss_no_outlier[swiss_no_outlier$obs_DeGeneve == 
##     0, ])
## 
## Omnibus ANOVA
##               SS df    MS EtaSq     F p
## Model      0.211  2 0.105 0.404 14.59 0
## Error      0.311 43 0.007              
## Corr Total 0.521 45 0.012              
## 
##   RMSE AdjEtaSq
##  0.085    0.377
## 
## Coefficients
##                Est StErr      t SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept)  0.462 0.033 14.125  1.441 0.823    NA  0.396   0.528 0.000
## Agriculture -0.003 0.001 -5.049  0.184 0.372 0.819 -0.005  -0.002 0.000
## Catholic     0.000 0.000  0.412  0.001 0.004 0.819 -0.001   0.001 0.683

plot(swissTest_sqrt_NO)

##############################

HM21_SMM

Spencer Moore

4/19/2021