Predict petal length (Petal.Length) based on sepal length (Sepal.Length)
library(ggplot2)
data(iris)
model <- lm(Petal.Length ~ Sepal.Length, data=iris)
summary(model)
##
## Call:
## lm(formula = Petal.Length ~ Sepal.Length, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.47747 -0.59072 -0.00668 0.60484 2.49512
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.10144 0.50666 -14.02 <2e-16 ***
## Sepal.Length 1.85843 0.08586 21.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8678 on 148 degrees of freedom
## Multiple R-squared: 0.76, Adjusted R-squared: 0.7583
## F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16
plot(Petal.Length ~ Sepal.Length, data=iris)
abline(model)
par(mfrow=c(2,2))
plot(model)
residuals <- resid(model)
ggplot(data=iris, aes(x=residuals)) +
geom_histogram(binwidth = 0.2, color='black', fill='skyblue') +
ggtitle("Histogram of Residuals")
shapiro.test(residuals)
##
## Shapiro-Wilk normality test
##
## data: residuals
## W = 0.99437, p-value = 0.831
The slope of 1.85843. for every one-unit increase in Sepal.Length, the Petal.Length is expected to increase by about 1.85843 units. Both coefficients are highly significant (p-value < 2e-16), indicating a strong relationship. .
The R-squared value is 0.76, meaning the model explains 76% of the variability in petal length, which is quite high.
Residuals in Q-Q plot does follow a straight line The histogram of residuals also confirmed that.
The Shapiro-Wilk test on the residuals gives a p-value is 0.831. Since this p-value is greater than 0.05, we fail to reject the null hypothesis, suggesting that there is not enough evidence to say the residuals are not normally distributed. Therefore, residual is normally distributed. Therefore linear regression assumption hold.