Question 1

Lets try to create a regression model that can predict college education based on a number of factors about a neighborhood in the Midwest

Create Dummy Variables

Generate Model

## Reordering variables and trying again:
## Subset selection object
## Call: regsubsets.formula(percollege ~ ., data = midwest2, method = "exhaustive", 
##     nvmax = NULL, nbest = 1, really.big = T)
## 20 Variables  (and intercept)
##                      Forced in Forced out
## state.IL                 FALSE      FALSE
## state.IN                 FALSE      FALSE
## state.MI                 FALSE      FALSE
## state.OH                 FALSE      FALSE
## area                     FALSE      FALSE
## percwhite                FALSE      FALSE
## percblack                FALSE      FALSE
## percamerindan            FALSE      FALSE
## percasian                FALSE      FALSE
## percother                FALSE      FALSE
## perchsd                  FALSE      FALSE
## percprof                 FALSE      FALSE
## percpovertyknown         FALSE      FALSE
## percbelowpoverty         FALSE      FALSE
## percchildbelowpovert     FALSE      FALSE
## percadultpoverty         FALSE      FALSE
## percelderlypoverty       FALSE      FALSE
## inmetro.0                FALSE      FALSE
## state.WI                 FALSE      FALSE
## inmetro.1                FALSE      FALSE
## 1 subsets of each size up to 18
## Selection Algorithm: exhaustive
##           state.IL state.IN state.MI state.OH state.WI area percwhite
## 1  ( 1 )  "*"      " "      " "      " "      " "      " "  " "      
## 2  ( 1 )  "*"      "*"      " "      " "      " "      " "  " "      
## 3  ( 1 )  "*"      "*"      "*"      " "      " "      " "  " "      
## 4  ( 1 )  "*"      "*"      "*"      "*"      " "      " "  " "      
## 5  ( 1 )  "*"      "*"      "*"      "*"      " "      "*"  " "      
## 6  ( 1 )  "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 7  ( 1 )  "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 8  ( 1 )  "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 9  ( 1 )  "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 10  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 11  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 12  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 13  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 14  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 15  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 16  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 17  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
## 18  ( 1 ) "*"      "*"      "*"      "*"      " "      "*"  "*"      
##           percblack percamerindan percasian percother perchsd percprof
## 1  ( 1 )  " "       " "           " "       " "       " "     " "     
## 2  ( 1 )  " "       " "           " "       " "       " "     " "     
## 3  ( 1 )  " "       " "           " "       " "       " "     " "     
## 4  ( 1 )  " "       " "           " "       " "       " "     " "     
## 5  ( 1 )  " "       " "           " "       " "       " "     " "     
## 6  ( 1 )  " "       " "           " "       " "       " "     " "     
## 7  ( 1 )  "*"       " "           " "       " "       " "     " "     
## 8  ( 1 )  "*"       "*"           " "       " "       " "     " "     
## 9  ( 1 )  "*"       "*"           "*"       " "       " "     " "     
## 10  ( 1 ) "*"       "*"           "*"       "*"       " "     " "     
## 11  ( 1 ) "*"       "*"           "*"       "*"       "*"     " "     
## 12  ( 1 ) "*"       "*"           "*"       "*"       "*"     "*"     
## 13  ( 1 ) "*"       "*"           "*"       "*"       "*"     "*"     
## 14  ( 1 ) "*"       "*"           "*"       "*"       "*"     "*"     
## 15  ( 1 ) "*"       "*"           "*"       "*"       "*"     "*"     
## 16  ( 1 ) "*"       "*"           "*"       "*"       "*"     "*"     
## 17  ( 1 ) "*"       "*"           "*"       "*"       "*"     "*"     
## 18  ( 1 ) "*"       "*"           "*"       "*"       "*"     "*"     
##           percpovertyknown percbelowpoverty percchildbelowpovert
## 1  ( 1 )  " "              " "              " "                 
## 2  ( 1 )  " "              " "              " "                 
## 3  ( 1 )  " "              " "              " "                 
## 4  ( 1 )  " "              " "              " "                 
## 5  ( 1 )  " "              " "              " "                 
## 6  ( 1 )  " "              " "              " "                 
## 7  ( 1 )  " "              " "              " "                 
## 8  ( 1 )  " "              " "              " "                 
## 9  ( 1 )  " "              " "              " "                 
## 10  ( 1 ) " "              " "              " "                 
## 11  ( 1 ) " "              " "              " "                 
## 12  ( 1 ) " "              " "              " "                 
## 13  ( 1 ) "*"              " "              " "                 
## 14  ( 1 ) "*"              "*"              " "                 
## 15  ( 1 ) "*"              "*"              "*"                 
## 16  ( 1 ) "*"              "*"              "*"                 
## 17  ( 1 ) "*"              "*"              "*"                 
## 18  ( 1 ) "*"              "*"              "*"                 
##           percadultpoverty percelderlypoverty inmetro.0 inmetro.1
## 1  ( 1 )  " "              " "                " "       " "      
## 2  ( 1 )  " "              " "                " "       " "      
## 3  ( 1 )  " "              " "                " "       " "      
## 4  ( 1 )  " "              " "                " "       " "      
## 5  ( 1 )  " "              " "                " "       " "      
## 6  ( 1 )  " "              " "                " "       " "      
## 7  ( 1 )  " "              " "                " "       " "      
## 8  ( 1 )  " "              " "                " "       " "      
## 9  ( 1 )  " "              " "                " "       " "      
## 10  ( 1 ) " "              " "                " "       " "      
## 11  ( 1 ) " "              " "                " "       " "      
## 12  ( 1 ) " "              " "                " "       " "      
## 13  ( 1 ) " "              " "                " "       " "      
## 14  ( 1 ) " "              " "                " "       " "      
## 15  ( 1 ) " "              " "                " "       " "      
## 16  ( 1 ) "*"              " "                " "       " "      
## 17  ( 1 ) "*"              "*"                " "       " "      
## 18  ( 1 ) "*"              "*"                "*"       " "

## 
## Call:
## lm(formula = percollege ~ . - state.WI - inmetro.0 - inmetro.1, 
##     data = midwest2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5848 -0.9852 -0.0471  0.8787  6.1165 
## 
## Coefficients: (1 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -24.90566   11.20392  -2.223 0.026751 *  
## state.IL              -1.33901    0.27098  -4.941 1.12e-06 ***
## state.IN              -5.46053    0.31923 -17.105  < 2e-16 ***
## state.MI              -1.75952    0.27151  -6.481 2.56e-10 ***
## state.OH              -2.83819    0.28494  -9.961  < 2e-16 ***
## area                 -12.02948    6.29205  -1.912 0.056575 .  
## percwhite              0.06079    0.10428   0.583 0.560248    
## percblack              0.11849    0.11029   1.074 0.283290    
## percamerindan          0.12937    0.10679   1.211 0.226389    
## percasian              0.26477    0.26645   0.994 0.320946    
## percother                   NA         NA      NA       NA    
## perchsd                0.24683    0.02729   9.044  < 2e-16 ***
## percprof               2.11693    0.07401  28.603  < 2e-16 ***
## percpovertyknown       0.12903    0.03617   3.567 0.000402 ***
## percbelowpoverty      -1.17554    0.34538  -3.404 0.000729 ***
## percchildbelowpovert   0.34523    0.08983   3.843 0.000140 ***
## percadultpoverty       0.46888    0.20294   2.311 0.021344 *  
## percelderlypoverty     0.28925    0.08149   3.550 0.000429 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.576 on 420 degrees of freedom
## Multiple R-squared:  0.939,  Adjusted R-squared:  0.9367 
## F-statistic:   404 on 16 and 420 DF,  p-value: < 2.2e-16

Evaluation

The diagnostics indicate that 3 variables (inmetro.0, inmetro.1, and state.WI) do not add more value to the model than they detract. As such, those three variables were left out of the model and at a ~.94 R^2, the model does a pretty good job at providing insight into whether a person was college educated or not. Analysis of the residuals does give the impression that there is significant clustering going on with the appearance of 3 distinct clusters.