Module 4 Pair Quiz

Question 1

An article in the Journal of Sound and Vibration [“Measurement of Noise-Evoked Blood Pressure by Means of Averaging Method: Relation between Blood Pressure Rise and PSL” (1991, Vol. 151(3), pp. 383-394)] described a study investigating the relationship between noise exposure and hypertension. The following data are representative of those reported in the article.

##      x y
## 1   60 1
## 2   63 0
## 3   65 1
## 4   70 2
## 5   70 5
## 6   70 1
## 7   80 4
## 8   90 6
## 9   80 2
## 10  80 3
## 11  85 5
## 12  89 4
## 13  90 6
## 14  90 8
## 15  90 4
## 16  90 5
## 17  94 7
## 18 100 9
## 19 100 7
## 20 100 6

A. Draw a scatter diagram of \(y\) (blood pressure rise in millimeters of mercury) versus \(x\) (sound pressure level in decibels). Does a simple linear regression model seem reasonable in this situation?

After observing the scatter plot above, it seems likely that there is a linear relationship between the two variables.

B. Fit the simple linear regression model using least squares. Find an estimate of \(\sigma^2\)

plot(x, y, pch = 16, cex = 1, col = "red", main = "Scatter Plot of Blood Pressure Rise and Sound Pressure Level", xlab = "Sound Pressure Level (db)", ylab = "Blood Pressure Rise (mm)")

model <- lm(y ~ x)

model

## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##    -10.1315       0.1743

abline(model)

The plot shows the same scatter plot but with a linear regression model fitted on it. The lm() function provides a linear regression model using least squares. The resulting linear regression model is given by the equation:

\[\begin{aligned} \hat{y} = -10.1315 + 0.1743x \end{aligned}\]

The formula for an estimate of \(\sigma^2\) is given by the formula:

\[\begin{aligned} \sigma^2 = \frac{SS_E}{n - 2} \end{aligned}\]

where \(SS_E\) is the error sum of squares and \(n = 20\) is the number of samples.

The function deviance() gives the \(SS_E\) of the linear regression model of the dataset.

deviance(model)

## [1] 31.26647

\[\begin{aligned} \sigma^2 = \frac{31.26647}{20 - 2} \\ \sigma^2 = 1.737026 \end{aligned}\]

C. Find the predicted mean rise in blood pressure level associated with a sound pressure level of \(85\) decibels.

This is obtained simply by substituting \(x = 85\) in the linear regression model.

\[\begin{aligned} \hat{y} = -10.1315 + 0.1743 * 85 = 4.684 \end{aligned}\]

With a sound pressure level of \(85\), the predicted mean rise in blood pressure level is \(4.684\).

Question 2

An article in Optical Engineering [“Operating Curve Extraction of a Correlator’s Filter” (2004, Vol. 43, pp. 2775-2779)] reported on the use of an optical correlator to perform an experiment by varying brightness and contrast. The resulting modulation is characterized by the useful range of gray levels. The data follow:

##   usefulrange brightness contrast
## 1          96         54       56
## 2          50         61       80
## 3          50         65       70
## 4         112        100       50
## 5          96        100       65
## 6          80        100       80
## 7         155         50       25
## 8         144         57       35
## 9         255         54       26

A. Fit a multiple linear regression model to these data.

The data is visualized in a 3D scatter plot below, with a fitted multiple regression model.

## 
## Call:
## lm(formula = usefulrange ~ brightness + contrast, data = correlator.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.334 -20.090  -8.451   8.413  69.047 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 238.5569    45.2285   5.274  0.00188 **
## brightness    0.3339     0.6763   0.494  0.63904   
## contrast     -2.7167     0.6887  -3.945  0.00759 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared:  0.7557, Adjusted R-squared:  0.6742 
## F-statistic: 9.278 on 2 and 6 DF,  p-value: 0.01459

The fitted regression model is also summarized, providing the estimated regression coefficients for each regressor variable, along with their respective \(t\)-values that can be used to check their individual significance to the regression model. The \(F\)-statistic is also provided, which measures the linear relationship between the resulting variable and the regressor variables.

The formula for a multiple linear regression model is given as:

\[y = \beta_0 + \beta_1 X_1 + \beta_2 X_2\] where:

\(y\) = Useful range (ng)
\(\beta_0\) = y-intercept
\(\beta_1\) = slope 1
\(\beta_2\) = slope 2
\(X_1\) = Brightness (%)
\(X_2\) = Contrast (%)

From the summary of the fitted regression model, it is solved to be:

\[ y = 238.56 + 0.3339 X_1 + (-2.7167) X_2 \\ \\ y = 238.56 + 0.3339 X_1 -2.7167 X_2\]

B. Estimate \(\sigma^2\).

\(\sigma^2\) can easily be obtained by obtaining the standard deviation \(\sigma\) from the regression model. From the summary, \(\sigma = 36.35\). Squaring this value provides us with the variance \(\sigma^2\):

(summary(multiple.regression)$sigma)**2

## [1] 1321.273

C. Compute the standard errors of the regression coefficients.

As seen from the summary of the model, the individual standard errors for each regression coefficient are:

\[ SE(\beta_0)= 45.2285 \\ SE(\beta_1)= 0.6763 \\ SE(\beta_2)= 0.6887\]

D. Predict the useful range when brightness= 80 and contrast= 75.

We will use our multiple regression model and will be inputting our values knowing that \(X_1\) = 80 and \(X_2\) = 75.

\[ y = 238.56 + 0.3339 X_1 - 2.7167 X_2 \\ = 238.56 + 0.3339(80) - 2.7167(75) \\ y = 61.5195 \]

E. Test for significance of regression using \(\alpha=0.05\). What is the \(P\)-value for this test?

The test for significance of regression is used to determine if a linear relationship exists between the response variable \(y\) and the subset of regressor variables \(x_1, x_2\).

The hypotheses are:

Null Hypothesis \(H_0: \beta_1 = \beta_2 = 0\)

Alternative Hypothesis \(H_1: \beta_j \neq 0\) for at least one \(j\)

Rejection of the null hypothesis implies that at least one of the regressor variables contributes greatly to the regression model.

As seen in our code chunk, the \(P\)-value for this test is \(0.01459\). As we should know, if the \(P\)-value for a variable is less than our significance level, we have enough evidence to reject our null hypothesis. Since we are given the significance level \(\alpha=0.05\),

\[0.01459 < 0.05\]

Since the \(P\)-value is less that \(\alpha\), the null hypothesis is rejected. Therefore, at least one of the variables is significant to the model.

F. Construct a t-test on each regression coefficient. What conclusions can you draw about the variables in this model? Use \(\alpha = 0.05\).

The \(t\)-test is used to check the significance of each regression coefficient in the multiple linear regression model. The hypotheses for this test are:

Null Hypothesis \(H_0: \beta_j = 0\)

The tested regression coefficient is not significant to the regression model.

Alternative Hypothesis \(H_1: \beta_j \neq 0\)

The tested regression coefficient is significant to the regression model.

From the summary of the regression model, the \(t\)-value for each regression coefficient is given:

For \(\beta_1\) or “brightness,” \(t_0 = 0.494\)

For \(\beta_2\) or “contrast,” \(t_0 = -3.945\)

The degrees of freedom for the \(t\)-value is given as \(n - p\), where \(n\) is the number of samples and \(p\) is the number of parameters given.

\(n = 9, p = 3\), giving us a degree of freedom of \(6\).

The null hypothesis is rejected if \(|t_0| > t_{\alpha/2, n - p}\)

qt(0.025, 6, lower.tail = F)

## [1] 2.446912

\[ |(t_0)_{\hat{\beta}_1}| = 0.494 \ngtr 2.446912 \\ |(t_0)_{\hat{\beta}_2}| = 3.945 > 2.446912 \]

From the \(t\)-test for significance for \(\beta_1\), there is not enough evidence to reject the null hypothesis. Therefore, the brightness variable is not significant to the regression model.

For the significance for \(\beta_2\), there is enough evidence to reject the null hypothesis. Therefore, the contrast variable is significant to the regression model.

References

“StatQuest: Multiple regression in R,” YouTube, 30-Oct-2017. [Online]. Available: https://www.youtube.com/watch?v=hokALdIst8k.

D. C. Montgomery and G. C. Runger, in Applied Statistics and Probability for Engineers, 7th ed., Hoboken, NJ: Wiley, 2018, pp. 280–329.

“How to create a scatterplot with a regression line in R,” Statology, 06-Mar-2021. [Online]. Available: https://www.statology.org/scatterplot-with-regression-line-r/.

“How to create tables in r (with examples),” Statology, 21-Oct-2020. [Online]. Available: https://www.statology.org/create-table-in-r/.

“Scatterplot3d: 3D graphics - R software and data visualization,” STHDA. [Online]. Available: http://www.sthda.com/english/wiki/scatterplot3d-3d-graphics-r-software-and-data-visualization.