R Markdown

This blog will discuss four assumptions associated with a linear regression model,

Linearity,Normality,Homoscedasticity,Independence through the gifed dataset from the openintro library .The variables taken into account for the linear regression model below are from gifted dataset:count(Age in months when the child first counted to 10 successfully.) and analytical scores where the count has been used as the independent variable and analytical scores are used as the dependent variable.

library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(ggplot2)
gifted
## # A tibble: 36 x 8
##    score fatheriq motheriq speak count  read edutv cartoons
##    <int>    <int>    <int> <int> <int> <dbl> <dbl>    <dbl>
##  1   159      115      117    18    26   1.9  3        2   
##  2   164      117      113    20    37   2.5  1.75     3.25
##  3   154      115      118    20    32   2.2  2.75     2.5 
##  4   157      113      131    12    24   1.7  2.75     2.25
##  5   156      110      109    17    34   2.2  2.25     2.5 
##  6   150      113      109    13    28   1.9  1.25     3.75
##  7   155      118      119    19    24   1.8  2        3   
##  8   161      117      120    18    32   2.3  2.25     2.5 
##  9   163      111      128    22    28   2.1  1        4   
## 10   162      122      120    18    27   2.1  2.25     2.75
## # ... with 26 more rows
plot(gifted$count, gifted$score)

my_lm <- lm(gifted$score ~ gifted$count)
summary(my_lm)
## 
## Call:
## lm(formula = gifted$score ~ gifted$count)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5655 -1.8192  0.9747  2.0805  6.7629 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  141.2147     4.7842  29.517  < 2e-16 ***
## gifted$count   0.5840     0.1544   3.782 0.000601 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.941 on 34 degrees of freedom
## Multiple R-squared:  0.2962, Adjusted R-squared:  0.2755 
## F-statistic: 14.31 on 1 and 34 DF,  p-value: 0.0006013

Residual Analysis:

gifted$predicted <- predict(my_lm)   # Save the predicted values
gifted$residuals <- residuals(my_lm) # Save the residual values
ggplot(gifted, aes(x = speak, y = score)) +
  geom_smooth(method = "lm", se = FALSE, color = "lightgrey") +     # regression line  
  geom_segment(aes(xend = speak, yend = predicted), alpha = .2) +      # draw line from point to line
  geom_point(aes(color = abs(residuals), size = abs(residuals))) +  # size of the points
  scale_color_continuous(low = "green", high = "red") +             # colour of the points mapped to residual size - green smaller, red larger
  guides(color = FALSE, size = FALSE) +                             # Size legend removed
  geom_point(aes(y = predicted), shape = 1) +
  theme_bw()
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## `geom_smooth()` using formula 'y ~ x'

plot(my_lm)

hist(my_lm$residuals,type = "l",col="blue")
## Warning in plot.window(xlim, ylim, "", ...): graphical parameter "type" is
## obsolete
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...):
## graphical parameter "type" is obsolete
## Warning in axis(1, ...): graphical parameter "type" is obsolete
## Warning in axis(2, ...): graphical parameter "type" is obsolete

### Is the linear model appropriate? ### In order to determine appropriateness of the linear model, we must take into consideration the four criteria listed and explained below.

There are four assumptions associated with a linear regression model

Linearity

Normality

Homoscedasticity

Independence

The relationship between analytical score and count age is linear as based on the Residuals vs. Fitted plot, the the red line is approximately horizontal at zero, suggesting linearity.However the histogram and Normal Q-Q depicts that the x and y is not normally distributed. Therefore linear model is not appropriate here.