Question 1

Create Dummy Variables

Generate Model

## Subset selection object
## Call: regsubsets.formula(Petal.Width ~ ., data = iris2, method = "exhaustive", 
##     nvmax = NULL, nbest = 1)
## 6 Variables  (and intercept)
##                   Forced in Forced out
## Sepal.Length          FALSE      FALSE
## Sepal.Width           FALSE      FALSE
## Petal.Length          FALSE      FALSE
## Speciessetosa         FALSE      FALSE
## Speciesversicolor     FALSE      FALSE
## Speciesvirginica      FALSE      FALSE
## 1 subsets of each size up to 5
## Selection Algorithm: exhaustive
##          Sepal.Length Sepal.Width Petal.Length Speciessetosa
## 1  ( 1 ) " "          " "         "*"          " "          
## 2  ( 1 ) " "          " "         "*"          " "          
## 3  ( 1 ) " "          " "         "*"          " "          
## 4  ( 1 ) " "          "*"         "*"          "*"          
## 5  ( 1 ) "*"          "*"         "*"          "*"          
##          Speciesversicolor Speciesvirginica
## 1  ( 1 ) " "               " "             
## 2  ( 1 ) " "               "*"             
## 3  ( 1 ) "*"               "*"             
## 4  ( 1 ) " "               "*"             
## 5  ( 1 ) "*"               " "

## 
## Call:
## lm(formula = Petal.Width ~ . - Speciesvirginica, data = iris2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59239 -0.08288 -0.01349  0.08773  0.45239 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.57323    0.19308   2.969   0.0035 ** 
## Sepal.Length      -0.09293    0.04458  -2.084   0.0389 *  
## Sepal.Width        0.24220    0.04776   5.072 1.20e-06 ***
## Petal.Length       0.24220    0.04884   4.959 1.97e-06 ***
## Speciessetosa     -1.04637    0.16548  -6.323 3.03e-09 ***
## Speciesversicolor -0.39826    0.05707  -6.978 1.01e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1666 on 144 degrees of freedom
## Multiple R-squared:  0.9538, Adjusted R-squared:  0.9522 
## F-statistic: 594.9 on 5 and 144 DF,  p-value: < 2.2e-16

Evaluation

Based on all the visuals, it is clear that there are 3 clusters of data corresponding to the species. It is interesting that the model did not need the Speciesvirginica variable to be the most accurate based on Cp value. Based on the coefficients, being Setosa had the greatest impact on Petal.Width. A residual analysis confirms a roughly normal distribution.