15-12-2023                       >eR-BioStat


Simple linear regression using R

Ziv Shkedy et al: Edited by Halima

## load/install libraries
.libPaths(c("./Rpackages",.libPaths()))
library(knitr)
library(tidyverse)
library(deSolve)
library(minpack.lm)
library(ggpubr)
library(readxl)
library(gamlss)
library(data.table)
library(grid)
library(png)
library(nlme)
library(gridExtra)
library(mvtnorm)
library(e1071)
library(lattice)
library(ggplot2)
library(dslabs)
library(NHANES)
library(plyr)
library(dplyr)
library(nasaweather)
library(ggplot2)
library(gganimate)
library(av)
library(gifski)
library(foreach)
library("DAAG")
library(DT)

1. The data

The mtcars dataset

dim(mtcars)
## [1] 32 11
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Miles/(US) gallon vs. the car’s Weight

Scaterplot

#plot(mtcars$wt,mtcars$mpg, ylab = "mpg", xlab = "weight (1000 lbs)")
qplot(wt,mpg,data = mtcars)
mog vs. weight

Figure 1: mog vs. weight

Correlation

cor(mtcars$wt,mtcars$mpg)
## [1] -0.8676594

2. Simple linear regression using R

The lm() R function

For the mtcars dataset, we consider the model

\[mpg_{i}=\beta_{0}+\beta_{1} \times weight_{i}+\varepsilon_{i}\].

fit.lm<-lm(mtcars$mpg~mtcars$wt)
summary(fit.lm)
## 
## Call:
## lm(formula = mtcars$mpg ~ mtcars$wt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## mtcars$wt    -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

The parametr estimates for the intercept and slope are equal. respectivly, to \(\hat{\beta}_{0}=37.28\) and \(\hat{\beta}_{1}=-5.34\)

3. Data and estimated model

Figure 2 shows the data (mpg vs. weight) and fitted regression line, \(\hat{mpg}_{i}=37.28-5.34 \times wt_{i}\)

qplot(wt,mpg,data = mtcars)+
geom_smooth(method = "lm",se = F)
Data and fitted model

Figure 2: Data and fitted model

4. Model diagnostic

The mtcars dataset

For the mtcars data, the residuals from the model can be obtained by calling to the object resid, Figure 5 shows the diagnostic plots for the regression model.

fit.lm$resid
##          1          2          3          4          5          6          7 
## -2.2826106 -0.9197704 -2.0859521  1.2973499 -0.2001440 -0.6932545 -3.9053627 
##          8          9         10         11         12         13         14 
##  4.1637381  2.3499593  0.2998560 -1.1001440  0.8668731 -0.0502472 -1.8830236 
##         15         16         17         18         19         20         21 
##  1.1733496  2.1032876  5.9810744  6.8727113  1.7461954  6.4219792 -2.6110037 
##         22         23         24         25         26         27         28 
## -2.9725862 -3.7268663 -3.4623553  2.4643670  0.3564263  0.1520430  1.2010593 
##         29         30         31         32 
## -4.5431513 -2.7809399 -3.2053627 -1.0274952
par(mfrow=c(1,2))
qqnorm(fit.lm$resid)
abline(0,1)
plot(fit.lm$fit,fit.lm$resid)
abline(0,0,col=2)
Diagnostic plots for the residuals

Figure 3: Diagnostic plots for the residuals