## x y
## 1 60 1
## 2 63 0
## 3 65 1
## 4 70 2
## 5 70 5
## 6 70 1
## 7 80 4
## 8 90 6
## 9 80 2
## 10 80 3
## 11 85 5
## 12 89 4
## 13 90 6
## 14 90 8
## 15 90 4
## 16 90 5
## 17 94 7
## 18 100 9
## 19 100 7
## 20 100 6
After observing the scatter plot above, it seems likely that there is a linear relationship between the two variables.
plot(x, y, pch = 16, cex = 1, col = "red", main = "Scatter Plot of Blood Pressure Rise and Sound Pressure Level", xlab = "Sound Pressure Level (db)", ylab = "Blood Pressure Rise (mm)")
model <- lm(y ~ x)
model
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## -10.1315 0.1743
abline(model)
The plot shows the same scatter plot but with a linear regression model fitted on it. The lm() function provides a linear regression model using least squares. The resulting linear regression model is given by the equation:
The formula for an estimate of \(\sigma^2\) is given by the formula:
\[\begin{aligned} \sigma^2 = \frac{SS_E}{n - 2} \end{aligned}\]where \(SS_E\) is the error sum of squares and \(n = 20\) is the number of samples.
The function deviance() gives the \(SS_E\) of the linear regression model of the dataset.
deviance(model)
## [1] 31.26647
\[\begin{aligned}
\sigma^2 = \frac{31.26647}{20 - 2}
\\
\sigma^2 = 1.737026
\end{aligned}\]
This is obtained simply by substituting \(x = 85\) in the linear regression model.
\[\begin{aligned} \hat{y} = -10.1315 + 0.1743 * 85 = 4.684 \end{aligned}\]With a sound pressure level of \(85\), the predicted mean rise in blood pressure level is \(4.684\).
## usefulrange brightness contrast
## 1 96 54 56
## 2 50 61 80
## 3 50 65 70
## 4 112 100 50
## 5 96 100 65
## 6 80 100 80
## 7 155 50 25
## 8 144 57 35
## 9 255 54 26
The data is visualized in a 3D scatter plot below, with a fitted multiple regression model.
##
## Call:
## lm(formula = usefulrange ~ brightness + contrast, data = correlator.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.334 -20.090 -8.451 8.413 69.047
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 238.5569 45.2285 5.274 0.00188 **
## brightness 0.3339 0.6763 0.494 0.63904
## contrast -2.7167 0.6887 -3.945 0.00759 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared: 0.7557, Adjusted R-squared: 0.6742
## F-statistic: 9.278 on 2 and 6 DF, p-value: 0.01459
The fitted regression model is also summarized, providing the estimated regression coefficients for each regressor variable, along with their respective \(t\)-values that can be used to check their individual significance to the regression model. The \(F\)-statistic is also provided, which measures the linear relationship between the resulting variable and the regressor variables.
The formula for a multiple linear regression model is given as:
\[y = \beta_0 + \beta_1 X_1 + \beta_2 X_2\] where:
\(y\) = Useful range (ng)
\(\beta_0\) = y-intercept
\(\beta_1\) = slope 1
\(\beta_2\) = slope 2
\(X_1\) = Brightness (%)
\(X_2\) = Contrast (%)
From the summary of the fitted regression model, it is solved to be:
\[ y = 238.56 + 0.3339 X_1 + (-2.7167) X_2 \\ \\ y = 238.56 + 0.3339 X_1 -2.7167 X_2\]
\(\sigma^2\) can easily be obtained by obtaining the standard deviation \(\sigma\) from the regression model. From the summary, \(\sigma = 36.35\). Squaring this value provides us with the variance \(\sigma^2\):
(summary(multiple.regression)$sigma)**2
## [1] 1321.273
As seen from the summary of the model, the individual standard errors for each regression coefficient are:
\[ SE(\beta_0)= 45.2285 \\ SE(\beta_1)= 0.6763 \\ SE(\beta_2)= 0.6887\]
We will use our multiple regression model and will be inputting our values knowing that \(X_1\) = 80 and \(X_2\) = 75.
\[ y = 238.56 + 0.3339 X_1 - 2.7167 X_2 \\ = 238.56 + 0.3339(80) - 2.7167(75) \\ y = 61.5195 \]
The test for significance of regression is used to determine if a linear relationship exists between the response variable \(y\) and the subset of regressor variables \(x_1, x_2\).
The hypotheses are:
Null Hypothesis \(H_0: \beta_1 = \beta_2 = 0\)
Alternative Hypothesis \(H_1: \beta_j \neq 0\) for at least one \(j\)
Rejection of the null hypothesis implies that at least one of the regressor variables contributes greatly to the regression model.
As seen in our code chunk, the \(P\)-value for this test is \(0.01459\). As we should know, if the \(P\)-value for a variable is less than our significance level, we have enough evidence to reject our null hypothesis. Since we are given the significance level \(\alpha=0.05\),
\[0.01459 < 0.05\]
Since the \(P\)-value is less that \(\alpha\), the null hypothesis is rejected. Therefore, at least one of the variables is significant to the model.
The \(t\)-test is used to check the significance of each regression coefficient in the multiple linear regression model. The hypotheses for this test are:
Null Hypothesis \(H_0: \beta_j = 0\)
The tested regression coefficient is not significant to the regression model.
Alternative Hypothesis \(H_1: \beta_j \neq 0\)
The tested regression coefficient is significant to the regression model.
From the summary of the regression model, the \(t\)-value for each regression coefficient is given:
For \(\beta_1\) or “brightness,” \(t_0 = 0.494\)
For \(\beta_2\) or “contrast,” \(t_0 = -3.945\)
The degrees of freedom for the \(t\)-value is given as \(n - p\), where \(n\) is the number of samples and \(p\) is the number of parameters given.
\(n = 9, p = 3\), giving us a degree of freedom of \(6\).
The null hypothesis is rejected if \(|t_0| > t_{\alpha/2, n - p}\)
qt(0.025, 6, lower.tail = F)
## [1] 2.446912
\[ |(t_0)_{\hat{\beta}_1}| = 0.494 \ngtr 2.446912 \\ |(t_0)_{\hat{\beta}_2}| = 3.945 > 2.446912 \]
From the \(t\)-test for significance for \(\beta_1\), there is not enough evidence to reject the null hypothesis. Therefore, the brightness variable is not significant to the regression model.
For the significance for \(\beta_2\), there is enough evidence to reject the null hypothesis. Therefore, the contrast variable is significant to the regression model.
“StatQuest: Multiple regression in R,” YouTube, 30-Oct-2017. [Online]. Available: https://www.youtube.com/watch?v=hokALdIst8k.
D. C. Montgomery and G. C. Runger, in Applied Statistics and Probability for Engineers, 7th ed., Hoboken, NJ: Wiley, 2018, pp. 280–329.
“How to create a scatterplot with a regression line in R,” Statology, 06-Mar-2021. [Online]. Available: https://www.statology.org/scatterplot-with-regression-line-r/.
“How to create tables in r (with examples),” Statology, 21-Oct-2020. [Online]. Available: https://www.statology.org/create-table-in-r/.
“Scatterplot3d: 3D graphics - R software and data visualization,” STHDA. [Online]. Available: http://www.sthda.com/english/wiki/scatterplot3d-3d-graphics-r-software-and-data-visualization.