R Markdown
This blog will discuss four assumptions associated with a linear regression model,
Linearity,Normality,Homoscedasticity,Independence through the gifed dataset from the openintro library .The variables taken into account for the linear regression model below are from gifted dataset:count(Age in months when the child first counted to 10 successfully.) and analytical scores where the count has been used as the independent variable and analytical scores are used as the dependent variable.
library(openintro)## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(ggplot2)gifted## # A tibble: 36 x 8
## score fatheriq motheriq speak count read edutv cartoons
## <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 159 115 117 18 26 1.9 3 2
## 2 164 117 113 20 37 2.5 1.75 3.25
## 3 154 115 118 20 32 2.2 2.75 2.5
## 4 157 113 131 12 24 1.7 2.75 2.25
## 5 156 110 109 17 34 2.2 2.25 2.5
## 6 150 113 109 13 28 1.9 1.25 3.75
## 7 155 118 119 19 24 1.8 2 3
## 8 161 117 120 18 32 2.3 2.25 2.5
## 9 163 111 128 22 28 2.1 1 4
## 10 162 122 120 18 27 2.1 2.25 2.75
## # ... with 26 more rows
plot(gifted$count, gifted$score)my_lm <- lm(gifted$score ~ gifted$count)
summary(my_lm)##
## Call:
## lm(formula = gifted$score ~ gifted$count)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5655 -1.8192 0.9747 2.0805 6.7629
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 141.2147 4.7842 29.517 < 2e-16 ***
## gifted$count 0.5840 0.1544 3.782 0.000601 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.941 on 34 degrees of freedom
## Multiple R-squared: 0.2962, Adjusted R-squared: 0.2755
## F-statistic: 14.31 on 1 and 34 DF, p-value: 0.0006013
Residual Analysis:
gifted$predicted <- predict(my_lm) # Save the predicted values
gifted$residuals <- residuals(my_lm) # Save the residual values
ggplot(gifted, aes(x = speak, y = score)) +
geom_smooth(method = "lm", se = FALSE, color = "lightgrey") + # regression line
geom_segment(aes(xend = speak, yend = predicted), alpha = .2) + # draw line from point to line
geom_point(aes(color = abs(residuals), size = abs(residuals))) + # size of the points
scale_color_continuous(low = "green", high = "red") + # colour of the points mapped to residual size - green smaller, red larger
guides(color = FALSE, size = FALSE) + # Size legend removed
geom_point(aes(y = predicted), shape = 1) +
theme_bw()## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## `geom_smooth()` using formula 'y ~ x'
plot(my_lm)hist(my_lm$residuals,type = "l",col="blue")## Warning in plot.window(xlim, ylim, "", ...): graphical parameter "type" is
## obsolete
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...):
## graphical parameter "type" is obsolete
## Warning in axis(1, ...): graphical parameter "type" is obsolete
## Warning in axis(2, ...): graphical parameter "type" is obsolete
### Is the linear model appropriate? ### In order to determine appropriateness of the linear model, we must take into consideration the four criteria listed and explained below.