Basic Linear Regression

Consider the data

  x |  y
----|----
  5 |  7
 10 | 11
 13 | 12
 17 | 15
 20 | 21

Initialize vectors x and y in R with data

x <- c(5, 10, 13, 17, 20)
y <- c(7, 11, 12, 15, 21)

Make a dispersion plot with

plot(x, y)

plot of chunk unnamed-chunk-2

To estimate parameters a and b in the model \( E(y|x)=a+b\ x \), use function lm. The following code fits the model, saves the results into the variable mymodel.lm and shows a summary of these results:

mymodel.lm <- lm(y ~ x)  ## mymodel.lm is an arbitrary model name
summary(mymodel.lm)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      1      2      3      4      5 
##  0.641  0.365 -1.200 -1.620  1.814 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    2.084      1.937    1.08   0.3608   
## x              0.855      0.138    6.19   0.0085 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.62 on 3 degrees of freedom
## Multiple R-squared: 0.927,   Adjusted R-squared: 0.903 
## F-statistic: 38.3 on 1 and 3 DF,  p-value: 0.00849 
## 

The result shows that \( \hat E(y|x)=2.084+0.855 x \). The p-value 0.0085 (column Pr (>|t|)) associated with the coefficient 0.855, suggests rejection of the null hypothesis \( b=0 \).

To plot the curve into the dispersion plot, when there is one explanatory variable in a linear regression, use

plot(x, y)
abline(mymodel.lm)

plot of chunk unnamed-chunk-4

For a more general approach, useful for general curve fitting (beyond linear) use

newx <- seq(5, 20, 1)  # produce a vector with 5,6,7,...,20
newy <- predict(mymodel.lm, newdata = data.frame(x = newx), type = "response")
plot(x, y)
lines(newx, newy, type = "l")

plot of chunk unnamed-chunk-5