Consider the data
x | y
----|----
5 | 7
10 | 11
13 | 12
17 | 15
20 | 21
Initialize vectors x and y in R with data
x <- c(5, 10, 13, 17, 20)
y <- c(7, 11, 12, 15, 21)
Make a dispersion plot with
plot(x, y)
To estimate parameters a and b in the model \( E(y|x)=a+b\ x \), use function lm. The following code fits the model, saves the results into the variable mymodel.lm and shows a summary of these results:
mymodel.lm <- lm(y ~ x) ## mymodel.lm is an arbitrary model name
summary(mymodel.lm)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## 1 2 3 4 5
## 0.641 0.365 -1.200 -1.620 1.814
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.084 1.937 1.08 0.3608
## x 0.855 0.138 6.19 0.0085 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.62 on 3 degrees of freedom
## Multiple R-squared: 0.927, Adjusted R-squared: 0.903
## F-statistic: 38.3 on 1 and 3 DF, p-value: 0.00849
##
The result shows that \( \hat E(y|x)=2.084+0.855 x \). The p-value 0.0085 (column Pr (>|t|)) associated with the coefficient 0.855, suggests rejection of the null hypothesis \( b=0 \).
To plot the curve into the dispersion plot, when there is one explanatory variable in a linear regression, use
plot(x, y)
abline(mymodel.lm)
For a more general approach, useful for general curve fitting (beyond linear) use
newx <- seq(5, 20, 1) # produce a vector with 5,6,7,...,20
newy <- predict(mymodel.lm, newdata = data.frame(x = newx), type = "response")
plot(x, y)
lines(newx, newy, type = "l")