R Markdown Implementation
| x | y |
|---|---|
| 60 | 1 |
| 63 | 0 |
| 65 | 1 |
| 70 | 2 |
| 70 | 5 |
| 70 | 1 |
| 80 | 4 |
| 90 | 6 |
| 80 | 2 |
| 80 | 3 |
| 85 | 5 |
| 89 | 4 |
| 90 | 6 |
| 90 | 8 |
| 90 | 4 |
| 90 | 5 |
| 94 | 7 |
| 100 | 9 |
| 100 | 7 |
| 100 | 6 |
The following quantities may be computed: \[ \begin{aligned} n&=20 \\ \sum_{i = 1}^{20}x_i&=1,656\\ \sum_{i = 1}^{20}y_i&=86\\ \bar{x}&=82.8 \\ \bar{y}&=4.3\\ \sum_{i = 1}^{20}x_i^2&=140,176\\ \sum_{i = 1}^{20}y_i^2&=494\\ \sum_{i = 1}^{20}x_1y_i&=7,654\\ \end{aligned} \]
We then calculate for \(S_{xx}\) and \(S_{xy}\) using the following formulas: \[ \begin{aligned} S_{xx}&=\sum_{i = 1}^{n}x_i^2-\frac{(\sum_{i = 1}^{n}x_i)^2}{n}\\ S_{xy}&=\sum_{i = 1}^{n}x_iy_i-\frac{(\sum_{i = 1}^{n}x_i)(\sum_{i = 1}^{n}y_i)}{n}\\ \end{aligned} \] Then substitute the known values: \[ \begin{aligned} S_{xx}&=\sum_{i = 1}^{n}x_i^2-\frac{(\sum_{i = 1}^{n}x_i)^2}{n}\\ S_{xx}&=140,176-\frac{(1,656)^2}{20}\\ S_{xx}&=140,176-\frac{2,742,336}{20}\\ S_{xx}&=3,059.2\\ \end{aligned} \] \[ \begin{aligned} S_{xy}&=\sum_{i = 1}^{n}x_iy_i-\frac{(\sum_{i = 1}^{n}x_i)(\sum_{i = 1}^{n}y_i)}{n}\\ S_{xy}&=7,654-\frac{(1,656)(86)}{20}\\ S_{xy}&=7,654-\frac{142,416}{20}\\ S_{xy}&=533.2\\ \end{aligned} \]
Therefore, the least squares estimate of the slope and intercept are: \[ \begin{aligned} \hat{\beta}_1&=\frac{S_{xy}}{S_{xx}}\\ \hat{\beta}_1&=\frac{533.2}{3,059.2}\\ \hat{\beta}_1&=0.1742939\\ \end{aligned} \] and \[ \begin{aligned} \hat{\beta}_0&=\bar{y}-\hat{\beta_1\bar{x}}\\ &=4.3-(0.1742939)(82.8)\\ &=-10.1315349\\ \end{aligned} \]
Thus, the fitted simple linear regression model (with the coefficients reported to only five decimal places) is: \[\hat{y}=-10.13153+0.17429x\]## x y
## 1 60 1
## 2 63 0
## 3 65 1
## 4 70 2
## 5 70 5
## 6 70 1
## 7 80 4
## 8 90 6
## 9 80 2
## 10 80 3
## 11 85 5
## 12 89 4
## 13 90 6
## 14 90 8
## 15 90 4
## 16 90 5
## 17 94 7
## 18 100 9
## 19 100 7
## 20 100 6
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## -10.1315 0.1743
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8120 -0.9040 -0.1333 0.5023 2.9310
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.13154 1.99490 -5.079 7.83e-05 ***
## x 0.17429 0.02383 7.314 8.57e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.318 on 18 degrees of freedom
## Multiple R-squared: 0.7483, Adjusted R-squared: 0.7343
## F-statistic: 53.5 on 1 and 18 DF, p-value: 8.567e-07
Graph 1.b Scatter plot of y (Blood Pressure Rise in mm Hg) versus x (Sound Pressure Level in dB) with fitted simple linear regression model \(\hat{y}=-10.13153+0.17429x\).
x = df$x
y = df$y
Sxy = sum((x - mean(x)) * (y - mean(y)))
Sxx = sum((x - mean(x)) ^ 2)
Syy = sum((y - mean(y)) ^ 2)
c(Sxy, Sxx, Syy)
## [1] 533.2 3059.2 124.2
Then we can calculate for the \(\hat{\beta}_0\) and \(\hat{\beta}_1\).
x = df$x
y = df$y
beta_1_hat = Sxy / Sxx
beta_0_hat = mean(y) - beta_1_hat * mean(x)
c(beta_0_hat, beta_1_hat)
## [1] -10.1315377 0.1742939
Notice that we had the same value from earlier, thus, the fitted simple line regression is \[\hat{y}=-10.13153+0.17429x\]
Estimating \(\sigma^2\), variance
We can estimate the variance using r.
y_hat = beta_0_hat + beta_1_hat * x
e = y - y_hat
n = length(e)
s2_e = sum(e^2) / (n - 2)
s2_e
## [1] 1.737026
The estimated variance, \(\sigma^2\) is 1.737026.
The lm Function
Now, how can we check if the values we got are consistent with the graph we made in r? We will be using the lm function. Notice that we already have it in Graph 1.b but we can check it again.fit <- lm(y~x, data=df)
summary(fit)
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8120 -0.9040 -0.1333 0.5023 2.9310
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.13154 1.99490 -5.079 7.83e-05 ***
## x 0.17429 0.02383 7.314 8.57e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.318 on 18 degrees of freedom
## Multiple R-squared: 0.7483, Adjusted R-squared: 0.7343
## F-statistic: 53.5 on 1 and 18 DF, p-value: 8.567e-07
By checking using the lm function, the fitted simple line regression is \(\hat{y}=-10.13153+0.17429x\) is given by the intercept and x of -10.13154 and 0.17429, respectively. The residual standard error is also the standard deviation that means if we square it, we will be getting the value of the variance which is \((1.318)^2 = 1.737\) when decimals are reported to three decimal places.
\[ \begin{aligned} \hat{y}&=-10.13153+0.17429x\\ \hat{y}&=-10.13153+(0.17429)(85)\\ \hat{y}&=4.68312 \end{aligned} \] Manipulating this with r, we can have:
y_hat <- beta_0_hat + beta_1_hat * 85
y_hat
## [1] 4.683447
We have 4.68377.
Both of them are approximately 5, then the estimated mean rise in the blood pressure level with a sound pressure level of 85 decibels is 5 mm Hg.