library(datasets)
data(iris)
iris$Species = as.factor(iris$Species)
model1 = lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + iris$Species, data = iris)
summary(model1)
##
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width +
## iris$Species, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79424 -0.21874 0.00899 0.20255 0.73103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.17127 0.27979 7.760 1.43e-12 ***
## Sepal.Width 0.49589 0.08607 5.761 4.87e-08 ***
## Petal.Length 0.82924 0.06853 12.101 < 2e-16 ***
## Petal.Width -0.31516 0.15120 -2.084 0.03889 *
## iris$Speciesversicolor -0.72356 0.24017 -3.013 0.00306 **
## iris$Speciesvirginica -1.02350 0.33373 -3.067 0.00258 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3068 on 144 degrees of freedom
## Multiple R-squared: 0.8673, Adjusted R-squared: 0.8627
## F-statistic: 188.3 on 5 and 144 DF, p-value: < 2.2e-16
par(mfrow = c(2,2))
plot(model1)
hist(model1$residuals)
I fitted a standard LM model that included all variables and converted the species variable to a factor. This process returned an \(R^2\) value of .8673. This suggests a very good fit and does a decent job of showing the relationship between these two variables. The Intercept, Sepal Width, and Petal Length, were all found to be highly significant (0.001). While Petal Width was much less significant (0.05) as well as the species Versicolor and Virginica (0.01). The species Setosa was not found to be a significant explanatory factor and was dropped from the model.
In looking at the Residuals vs Fitted plot, the data appears to be randomly scattered but points 15, 85, and 136 which may explain the non-linear pattern on the right hand side of the plot. The Q-Q plot follows the 45 degree line very well, with some minor deviation at the tails. The histogram skews a bit to the right but overall it appears that the residuals are approximately normally distributed. The Scale-Location plot suggests some heteroscedasticity, with the data grouping between 6.0 and 6.5. Lastly the Residuals vs Leverage plot shows the data falls within the band of 1 and -1, with some variables that may be influencing the model( 107, 135, 142).