Question 1
Create Dummy Variables
Generate Model
## Subset selection object
## Call: regsubsets.formula(Petal.Width ~ ., data = iris2, method = "exhaustive",
## nvmax = NULL, nbest = 1)
## 6 Variables (and intercept)
## Forced in Forced out
## Sepal.Length FALSE FALSE
## Sepal.Width FALSE FALSE
## Petal.Length FALSE FALSE
## Speciessetosa FALSE FALSE
## Speciesversicolor FALSE FALSE
## Speciesvirginica FALSE FALSE
## 1 subsets of each size up to 5
## Selection Algorithm: exhaustive
## Sepal.Length Sepal.Width Petal.Length Speciessetosa
## 1 ( 1 ) " " " " "*" " "
## 2 ( 1 ) " " " " "*" " "
## 3 ( 1 ) " " " " "*" " "
## 4 ( 1 ) " " "*" "*" "*"
## 5 ( 1 ) "*" "*" "*" "*"
## Speciesversicolor Speciesvirginica
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " "*"
## 3 ( 1 ) "*" "*"
## 4 ( 1 ) " " "*"
## 5 ( 1 ) "*" " "
##
## Call:
## lm(formula = Petal.Width ~ . - Speciesvirginica, data = iris2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59239 -0.08288 -0.01349 0.08773 0.45239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.57323 0.19308 2.969 0.0035 **
## Sepal.Length -0.09293 0.04458 -2.084 0.0389 *
## Sepal.Width 0.24220 0.04776 5.072 1.20e-06 ***
## Petal.Length 0.24220 0.04884 4.959 1.97e-06 ***
## Speciessetosa -1.04637 0.16548 -6.323 3.03e-09 ***
## Speciesversicolor -0.39826 0.05707 -6.978 1.01e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1666 on 144 degrees of freedom
## Multiple R-squared: 0.9538, Adjusted R-squared: 0.9522
## F-statistic: 594.9 on 5 and 144 DF, p-value: < 2.2e-16
Evaluation
Based on all the visuals, it is clear that there are 3 clusters of data corresponding to the species. It is interesting that the model did not need the Speciesvirginica variable to be the most accurate based on Cp value. Based on the coefficients, being Setosa had the greatest impact on Petal.Width. A residual analysis confirms a roughly normal distribution.