Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?
I will be using the built in iris data set
data(iris)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# Build a linear regression model
iris_lm <- lm(Petal.Width ~ Sepal.Length + Sepal.Width, data = iris)
# Conduct residual analysis
# Residual vs Fitted plot
plot(iris_lm, which = 1)
# Normal Q-Q plot
plot(iris_lm, which = 2)
# Scale-Location plot
plot(iris_lm, which = 3)
# Residuals vs Leverage plot
plot(iris_lm, which = 5)
# Summary of the linear model
summary(iris_lm)
##
## Call:
## lm(formula = Petal.Width ~ Sepal.Length + Sepal.Width, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7231 -0.2973 -0.0566 0.1929 1.1088
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.56349 0.33911 -4.611 8.66e-06 ***
## Sepal.Length 0.72329 0.03876 18.659 < 2e-16 ***
## Sepal.Width -0.47872 0.07364 -6.501 1.17e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3891 on 147 degrees of freedom
## Multiple R-squared: 0.7429, Adjusted R-squared: 0.7394
## F-statistic: 212.4 on 2 and 147 DF, p-value: < 2.2e-16
The linear regression model built using the iris dataset indicates that both sepal length and width significantly influence petal width (p < 0.001). The model explains approximately 74% of the variability in petal width (adjusted R-squared = 0.7394), suggesting a good fit to the data. Residual analysis suggests that the model assumptions are met, with no apparent patterns in the residuals. Therefore, the linear model appears appropriate for predicting petal width based on sepal characteristics.