Question 1:

An article in the Journal of Sound and Vibration [“Measurement of Noise-Evoked Blood Pressure by Means of Averaging Method: Relation between Blood Pressure Rise and PSL” (1991, Vol. 151(3), pp. 383-394)] described a study investigating the relationship between noise exposure and hypertension. The following data are representative of those reported in the article.

##    psl bpr
## 1   60   1
## 2   63   0
## 3   65   1
## 4   70   2
## 5   70   5
## 6   70   1
## 7   80   4
## 8   90   6
## 9   80   2
## 10  80   3
## 11  85   5
## 12  89   4
## 13  90   6
## 14  90   8
## 15  90   4
## 16  90   5
## 17  94   7
## 18 100   9
## 19 100   7
## 20 100   6

A.Draw a scatter diagram of y (blood pressure rise in millimeters of mercury) versus x (sound pressure level in decibels). Does a simple linear regression model seem reasonable in this situation?

It is reasonable to use the Linear Regression Model in analyzing this problem because the result that the research wants to answer is the relationship between the two variables— the sound pressure levels(dB) and blood pressure rise in millimeter of mercury.

We can utilize the library of ggplot2 to create an accurate regression model of the sound pressure levels(dB) and blood pressure rise in millimeter of mercury. We can do so by calling the library of ggplot2 and appointing the x and y values and using the ggplot(data=MyDataFile, aes(x,y)) function.

library(ggplot2)
scatterBPR_PSL=ggplot(data=BPR_PSL, aes(bpr,psl)) +   
  geom_point()
  
scatterBPR_PSL = scatterBPR_PSL+ geom_point(size=2) +
  xlab("PSL (dB)") +  
  ylab("Blood Pressure Rise (mmHg)") +   
  ggtitle("Measurement of Noise-Evoked Blood Pressure") 
scatterBPR_PSL

B. Fit the simple linear regression model using least squares. Find an estimate of σ2.

Step 1:

To fit a simple linear regression model, we can use the equation of the fitted or estimated regression line:

\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1x \]

where \(\hat{\beta}_0\) and \(\hat{\beta}_1x\) are represented (according to the least squares) as:

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x} \]

\[ \hat{\beta}_1 = \frac{S_{xx}}{S_{xy}} \]

where \(S_{xx} = \sum_{i = 1}^{n} x^2_i - \frac{(\sum_{i = 1}^{n} x_i)^2}{n}\), \(S_{xy} = \sum_{i = 1}^{n} x_iy_i - \frac{(\sum_{i = 1}^{n} x_i)(\sum_{i = 1}^{n} y_i)}{n}\), \(\bar{y} = (1/n) \sum_{i = 1}^{n} y_i\), and \(\bar{x} = (1/n) \sum_{i = 1}^{n} x_i\)

Step 2:

Let us get the required values to solve for the linear regression line using the functions of R:

The sample size:

\(n = 20\) ;

\[\sum_{i = 1}^{20} x_i = 1656\]

sumX = sum(psl); sumX
## [1] 1656

\[\sum_{i = 1}^{20} y_i = 86\]

sumY = sum(bpr); sumY
## [1] 86

\[\bar{x} = 82.8\]

barX = mean(psl); barX
## [1] 82.8

\[\bar{y} = 4.3\]

barY = mean(bpr); barY
## [1] 4.3

\[\sum_{i = 1}^{20} x_i^2 = 140176\]

sumXsqr = sum(psl^2); sumXsqr
## [1] 140176

\[\sum_{i = 1}^{20} y_i^2 = 494\]

sumYsqr =sum(bpr^2); sumYsqr
## [1] 494

\[\sum_{i = 1}^{20} x_1 y_i = 7654\]

sumXY = sum(bpr*psl); sumXY
## [1] 7654

\[S_{xx} = \sum_{i = 1}^{20} x^2_i - \frac{(\sum_{i = 1}^{20} x_i)^2}{n} = 140176 - \frac{1656^2}{20}= 3059.2\]

Sxx = sumXsqr- ((sumX^2)/20); Sxx
## [1] 3059.2

\[S_{xy} = \sum_{i = 1}^{n} x_iy_i - \frac{(\sum_{i = 1}^{n} x_i)(\sum_{i = 1}^{n} y_i)}{n} = 7654 - \frac{(1656)(86)}{20} = 533.2 \]

Sxy = sumXY - ((sumX*sumY)/20); Sxy
## [1] 533.2

Step 3:

Therefore, the least squares estimates of the slope and intercepts are:

\[ \hat{\beta}_1 = \frac{S_{xy}}{S_{xx}} = \frac{533.2}{3059.2} = 0.1742939 \]

betaHat1 = Sxy/Sxx; betaHat1
## [1] 0.1742939

and

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x} = 4.3 - (0.1742939)82.6 = -10.13154 \]

betHat0 = barY - (betaHat1*barX); betHat0
## [1] -10.13154

Step 4:

The fitted simple linear regression model (with the coefficients reported to three decimal places) is \[ \hat{y} = -10.132 + 0.174x \] We can also check if our coefficients are correct by utilizing the coef function of ggplot:

coeftest = lm(bpr~psl)
coef(coeftest)
## (Intercept)         psl 
## -10.1315377   0.1742939

Here is the scatter plot with the fitted linear regression

## `geom_smooth()` using formula 'y ~ x'

Step 5: Let us now find the estimate of \(\sigma^2\):

To find the estimate of \(\sigma^2\), we need to look for the value of \(SS_E\) where

\[SS_E = \sum_{i = 1}^{n} (y_i - \hat{y})^2\] We can also easily check the Squared Sum due to Errors by using the dataBase$resduals function of R:

SSe = sum(scatterBPR_PSL$residuals^2); SSe
## [1] 0

\[SS_E = 0 \]

finally, we can solve for \(\sigma^2\) by

\[ \sigma^2 = \frac{SS_E}{n-2} \]

Since \(SS_E = 0\) that means that we can estimate \(\sigma^2\) will also be equal to \(0\)

Interpretation:

If \(SS_E = 0\)—which we can also assume that \(\sigma^2 = 0\)— means that all of the observations are in the regression line and there is no residual in data given.

C. Find the predicted mean rise in blood pressure level associated with a sound pressure level of 85 decibels.

Since we already established the equation of the linear regression we can easily get the predicted mean rise in blood pressure level (y) associated with a sound pressure level of 85 decibels.

\[ \hat{y} = -10.132 + 0.174x = -10.132 + 0.174(85) \] Therefore, \[ \hat{y} = 4.658 \]


Question 2:

An article in Optical Engineering [“Operating Curve Extraction of a Correlator’s Filter” (2004, Vol. 43, pp. 2775-2779)] reported on the use of an optical correlator to perform an experiment by varying brightness and contrast. The resulting modulation is characterized by the useful range of gray levels. The data follow:

##   brightness contrast usefulrange
## 1         54       56          96
## 2         61       80          50
## 3         65       70          50
## 4        100       50         112
## 5        100       65          96
## 6        100       80          80
## 7         50       25         155
## 8         57       35         144
## 9         54       26         255
## (Intercept)  brightness    contrast 
##  238.556911    0.333912   -2.716735

A. Fit a multiple linear regression model to these data.

Multiple Linear Regression Model:

\[ {Y} = {\beta}_0 + {\beta}_1x_1 + {\beta}_2x_2 + {ϵ} \]

Where \[{x_1 = brightness, x_2 = contrast, Y = usefulrange}\]

Using the given data, we calculate:

\[n=9, \sum_{i = 1}^{9} y_i=1038 \] \[\sum_{i=1}^9 x_i1 = 641, \sum_{i=1}^9 x_i2 = 487\] \[\sum_{i=1}^9 x^2_i1 = 49,527, \sum_{i=1}^9 x^2_i2 = 30,087\] \[\sum_{i=1}^9 x_i1x_i2 = 36,603, \sum_{i=1}^9 x_i1y_1 = 70,012, \sum_{i=1}^9 x_i2y_i = 46,661\]

For this model \({Y} = {\beta}_0 + {\beta}_1x_1 + {\beta}_2x_2 + {ϵ}\), the normal equations 12-10 are \[n\hat{\beta}_0 + \hat{\beta}_1\sum_{i=1}^9x_i1 + \hat{\beta}_2\sum_{i=1}^9x_i2 = \sum_{i=1}^9y_i\] \[\hat{\beta}_0 \sum_{i=1}^9x_i1 + \hat{\beta}_1\sum_{i=1}^9x^2_i1 + \hat{\beta}_2\sum_{i=1}^9x_i1x_i2 = \sum_{i=1}^9x_i1y_i\] \[\hat{\beta}_0 \sum_{i=1}^9x_i2 + \hat{\beta}_1\sum_{i=1}^9x_i1x_i2 + \hat{\beta}_2\sum_{i=1}^9x^2_i2 = \sum_{i=1}^9x_i2y_i\] Inserting the computed summations into the normal equations, we obtain: \[9\hat{\beta}_0 + 641 \hat{\beta}_1 + 487\hat{\beta}_2 = 1038\] \[641\hat{\beta}_0 + 49,527\hat{\beta}_1 + 36,603\hat{\beta}_2 = 70,012\] \[487\hat{\beta}_0 + 36,603\hat{\beta}_1 + 30,087\hat{\beta}_2 = 46,661\] The solution to this set of equations is \[\hat{\beta}_0 = 238.556911, \hat{\beta}_1 = 0.333912, \hat{\beta}_2 = -2.716735\] Therefore, the fitted regression equation is \[\hat{y= 238.556911 + 0.333912x_1 + -2.716735x_2}\] Here is the matrix of scatter plots

pairs(~usefulrange + brightness + contrast)

B. Estimate σ2.

The formula for variance is: \[s^{2} = \frac{SS}{N - 1} = \frac{\sum (x_{i} - \bar{x})^{2}}{N - 1}\]

\[\frac{\sum (x_{i} - \bar{x})^{2}}{9 - 1}\] \[\bar{x_1}=71, \bar{x_2}=54\] \[\frac{3874}{8}=484.25, \frac{3735}{8}=466.875\]

C. Compute the standard errors of the regression coefficients. \[\sigma_{\bar{X}} = \frac{s}{\sqrt{N}}\] \[\sigma_{\bar{X}} = \frac{22}{\sqrt{9}}, \sigma_{\bar{X}} = \frac{\sqrt466}{\sqrt{9}}\] \[\sigma_{\bar{X}_1}=7.33 ,\sigma_{\bar{X}_2}=7.20\] D. Predict the useful range when brightness = 80 and contrast = 75. \[x_1=80,x_2=75,y=67\] E.Test for significance of regression using α=0.05. What is the P-value for this test? \[F_0=\frac{SS_R/k}{SS_E/(n-p)}=\frac{MS_R}{MS_E}\] \[\frac{3874_R/k}{3874_E/(n-p)}=\frac{MS_R}{MS_E}, \frac{3735_R/k}{3735_E/(n-p)}=\frac{MS_R}{MS_E}\] F. Construct a t-test on each regression coefficient. What conclusions can you draw about the variables in this model? Use α=0.05. \[T_0=\frac{\hat{\beta_1}-\beta_1,0}{se(\hat\beta_1)}\]