model = lm(Sepal.Length~Sepal.Width + Petal.Length + Petal.Width + Species, data = iris)
summary(model)
##
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width +
## Species, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79424 -0.21874 0.00899 0.20255 0.73103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.17127 0.27979 7.760 1.43e-12 ***
## Sepal.Width 0.49589 0.08607 5.761 4.87e-08 ***
## Petal.Length 0.82924 0.06853 12.101 < 2e-16 ***
## Petal.Width -0.31516 0.15120 -2.084 0.03889 *
## Speciesversicolor -0.72356 0.24017 -3.013 0.00306 **
## Speciesvirginica -1.02350 0.33373 -3.067 0.00258 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3068 on 144 degrees of freedom
## Multiple R-squared: 0.8673, Adjusted R-squared: 0.8627
## F-statistic: 188.3 on 5 and 144 DF, p-value: < 2.2e-16
In this multiple regression model, the coefficients equal the change in Sepal.Length, given one unit increase in Sepal.Width, Petal.Length, or Petal.Width if all other variables stay constant.The coefficients of species indicate the change in Sepal.Length given the species either being versicolor or virginica. Setosa is the reference level the other species are measured against. Petal.Width, Versicolor, and Verginica, have a negative relationship with Sepal.Length, while Sepal.Width, and Petal.Length have a positive relationship.
R squared of 0.8673, indicates that 87% of the variation in Sepal.Length is explained by this linear model. All p-values are well below 0.05 percent so we can reject the null hypothesis that the slope coefficient is 0 for any of the variables.
plot(model$residuals)
par(mfrow=c(2,2))
plot(model)
par(mfrow=c(2,2))
boxplot(model$residuals ~ iris$Species)
plot(model$residuals ~ iris$Petal.Width)
plot(model$residuals ~ iris$Petal.Length)
plot(model$residuals ~ iris$Sepal.Width)
The diagnostic plots show that both our assumptions of normality and constant variability of the the residuals hold. I then looked at graphs showing residuals compared to each predictor variable. There does not look to be any significant variability across the species or across the numerical data.
model2 = lm(Sepal.Length~Sepal.Width + Petal.Length + Petal.Width, data = iris)
summary(model2)
##
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width,
## data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.82816 -0.21989 0.01875 0.19709 0.84570
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.85600 0.25078 7.401 9.85e-12 ***
## Sepal.Width 0.65084 0.06665 9.765 < 2e-16 ***
## Petal.Length 0.70913 0.05672 12.502 < 2e-16 ***
## Petal.Width -0.55648 0.12755 -4.363 2.41e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3145 on 146 degrees of freedom
## Multiple R-squared: 0.8586, Adjusted R-squared: 0.8557
## F-statistic: 295.5 on 3 and 146 DF, p-value: < 2.2e-16
model3 = lm(Sepal.Length~Sepal.Width + Petal.Length + Species, data = iris)
summary(model3)
##
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Species,
## data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.82156 -0.20530 0.00638 0.22645 0.74999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.39039 0.26227 9.114 5.94e-16 ***
## Sepal.Width 0.43222 0.08139 5.310 4.03e-07 ***
## Petal.Length 0.77563 0.06425 12.073 < 2e-16 ***
## Speciesversicolor -0.95581 0.21520 -4.442 1.76e-05 ***
## Speciesvirginica -1.39410 0.28566 -4.880 2.76e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3103 on 145 degrees of freedom
## Multiple R-squared: 0.8633, Adjusted R-squared: 0.8595
## F-statistic: 228.9 on 4 and 145 DF, p-value: < 2.2e-16
model4 = lm(Sepal.Length~Sepal.Width + Petal.Width + Species, data = iris)
summary(model4)
##
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width + Petal.Width + Species,
## data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.23591 -0.23740 -0.02601 0.18813 1.27132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.5211 0.3939 6.401 2.01e-09 ***
## Sepal.Width 0.6982 0.1195 5.843 3.24e-08 ***
## Petal.Width 0.3716 0.1983 1.873 0.06302 .
## Speciesversicolor 0.9881 0.2747 3.597 0.00044 ***
## Speciesvirginica 1.2376 0.3913 3.162 0.00191 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4342 on 145 degrees of freedom
## Multiple R-squared: 0.7324, Adjusted R-squared: 0.725
## F-statistic: 99.21 on 4 and 145 DF, p-value: < 2.2e-16
model5 = lm(Sepal.Length~ Petal.Length + Petal.Width + Species, data = iris)
summary(model5)
##
## Call:
## lm(formula = Sepal.Length ~ Petal.Length + Petal.Width + Species,
## data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.75238 -0.23089 -0.00211 0.23100 1.03108
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.682982 0.107403 34.291 < 2e-16 ***
## Petal.Length 0.905946 0.074311 12.191 < 2e-16 ***
## Petal.Width -0.005995 0.156260 -0.038 0.969
## Speciesversicolor -1.598362 0.205706 -7.770 1.32e-12 ***
## Speciesvirginica -2.112647 0.304024 -6.949 1.16e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3392 on 145 degrees of freedom
## Multiple R-squared: 0.8367, Adjusted R-squared: 0.8322
## F-statistic: 185.8 on 4 and 145 DF, p-value: < 2.2e-16
I ran the regression model without each one of the 4 dependent variables, and no model produced a higher R squared than the full model. This means that all four variables are relevant to predicting Sepal.Length and the full model will give us the best fit line.