Problem

An article in Optical Engineering11 Read the article by Operating Curve Extraction of a Correlator’s Filter (⊕Letendre, Bergeron, and Arsenault 2004Letendre, Mathias, Alain Bergeron, and Henri H Arsenault. 2004. “Operating Curve Extraction of a Correlator’s Filter.” Optical Engineering 43 (11): 2775–79.) reported on the use of an optical correlator to perform an experiment by varying brightness and contrast. The resulting modulation is characterized by the useful range of gray levels. The data follow:

Table 1.1 Useful Range Data

Useful Range (ng) \(y_i\)	Brightness % (\(x_i1\))	Contrast %(\(x_{i2}\))
96	54	56
50	61	80
50	65	70
112	100	50
96	100	65
80	100	80
155	50	25
144	57	35
255	54	26

Matrix of Scatter plots for the Useful Range Data in Table 1.1

## Loading required package: scatterplot3d

Questions

Fit a multiple linear regression model to these data.
Estimate \(σ^2\).
Compute the standard errors of the regression coefficients.
Predict the useful range when brightness = 80 and contrast = 75.
Test for significance of regression using α=0.05. What is the P-value for this test?
Construct a t-test on each regression coefficient. What conclusions can you draw about the variables in this model? Use α=0.05.

Answer to Question 1:

Question: Fit a multiple linear regression model to these data.

We will fit the linear regression model model:

\(y=\beta_0\ + \beta_1x_1 +\beta_2x_2 + \epsilon\)

where \(y\) = useful range of gray levels, \(x_1\) = brightness percentage and \(x_2\) = contrast percentage.

R Computation

Optical.data <- data.frame(
  Useful_Range = c(96,50,50,112,96,80,155,144,255),
  Brightness = c(54,61,65,100,100,100,50,57,54),
  Contrast = c(56,80,70,50,65,80,25,35,26)
)
Eyi_<- sum(Optical.data$Useful_Range)
Exi_1<- sum(Optical.data$Brightness) 
Exi_2<- sum(Optical.data$Contrast)
Exi_1_xi_2<- sum(Optical.data$Brightness*Optical.data$Contrast)
Exi_1Squared<- sum(Optical.data$Brightness^2)
Exi_2Squared<- sum(Optical.data$Contrast^2)
Exi_1_yi_<- sum(Optical.data$Brightness*Optical.data$Useful_Range) 
Exi_2_yi_<- sum(Optical.data$Contrast*Optical.data$Useful_Range)

## [1] 1038

## [1] 641

## [1] 487

## [1] 36603

## [1] 49527

## [1] 30087

## [1] 70012

## [1] 46661

Manual Computation

Extending the table using the values of Brightness percentage, contrast percentage and useful range of gray levels:

Table 1.2 Extended Version of Table 1.1 Useful Range Data
Observation Number	Useful Range (ng) \(y_i\)	Brightness % (\(x_i1\))	Contrast %(\(x_{i2}\))	\(x_{i1}x_{i2}\)	\({x_{i1}}^2\)	\({x_{i2}}^2\)	\(x_{i1}y_i\)	\(x_{i2}y_i\)
1	96	54	56	3024	2916	3136	5184	5376
2	50	61	80	4880	3721	6400	3050	4000
3	50	65	70	4550	4225	4900	3250	3500
4	112	100	50	5000	10000	2500	11200	5600
5	96	100	65	6500	10000	4225	9600	6240
6	80	100	80	8000	10000	6400	8000	6400
7	155	50	25	1250	2500	625	7750	3875
8	144	57	35	1995	3249	1225	8208	5040
9	255	54	26	1404	2916	676	13770	6630

From the data in the Table 1.2, we could calculate that:

\(n=9\)
\(\sum_{i=1}^{9}y_{i} = 1038\)
\(\sum_{i=1}^{9}x_{i1} = 641\)
\(\sum_{i=1}^{9}x_{i2} = 487\)
\(\sum_{i=1}^{9}x_{i1}x_{i2} = 36603\)
\(\sum_{i=1}^{9}x_{i1}^2 = 49527\)
\(\sum_{i=1}^{9}x_{i2}^2 = 30087\)
\(\sum_{i=1}^{9}x_{i1}y_i = 70012\)
\(\sum_{i=1}^{9}x_{i2}y_i = 46661\)

For the model \(Y=\beta_0\ + \beta_1x_1 +\beta_2x_2 + \epsilon\), the least square normal equations on the book of22 Equations on Chap 12.1(⊕Montgomery and Runger 2010Montgomery, Douglas C, and George C Runger. 2010. Applied Statistics and Probability for Engineers. John Wiley & Sons.) are:

\(n{\hat{\beta}}_0 + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i2}= \sum_{i=1}^{n}y_{i}\)

\({\hat{\beta}}_0\sum_{i=1}^{n}x_{i1} + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}^2+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i1}x_{i2}= \sum_{i=1}^{n}x_{i1}y_{i}\)

\({\hat{\beta}}_0\sum_{i=1}^{n}x_{i2} + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}x_{i2}+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i2}^2 = \sum_{i=1}^{n}x_{i2}y_{i}\)

Inserting the computed summations into the normal equations, we obtain:

\(9{\hat{\beta}}_0 + 643{\hat{\beta}}_1+ 487{\hat{\beta}_2}= 1038\)

\({641\hat{\beta}}_0 + 49527{\hat{\beta}}_1 + 36603{\hat{\beta}}_2 = 70012\)

\({487\hat{\beta}}_0 + 36603{\hat{\beta}}_1 + {30087\hat{\beta}}_2 = 46661\)

The solution to this set of equations is:

\({\hat{\beta}}_0=238.5569\)
\({\hat{\beta}}_1=0.3339\)
\({\hat{\beta}}_2=-2.7167\)

DOUBLE CHECKING WITH R

model <- lm (Useful_Range ~ Brightness + Contrast, data=Optical.data)
model

## 
## Call:
## lm(formula = Useful_Range ~ Brightness + Contrast, data = Optical.data)
## 
## Coefficients:
## (Intercept)   Brightness     Contrast  
##    238.5569       0.3339      -2.7167

Therefore, the fitted equation is

\(\hat{y}=238.5569\ +\ 0.3339x_1-2.7167x_2\)

Practical Interpretation: The fitted equation above can be used in predicting the userful range of gray level, given that the value for the regressor variables, which are the brightness percentage \((x_1)\) and contrast percentage, are known.

# Answer to Question 2:

Estimate \(σ^2\).

The mean squared error of the useful range data points from the fitted linear model is represented by \(σ^2\). This unbiased estimator, \(σ^2\), is given by the equation:

\(\sigma^2=\frac{\sum_{i=1}^{n}{e_i}^2}{n-p}\)

where the \(\sum_{i=1}^{n}{e_i}^2\) or the sum of squares of the residual error can also be represented as \(SS_E\)

\(\sigma^2=\frac{SS_E}{n-p}\)

Note that n is the number of observations and p is the number of regressors plus one. Thus, in this problem, n = 9 and p=2+1=3.

The residual error \((e_i)\) is the difference of the observed value \((y)\) of the useful range and the fitted value \((\hat{y_i})\) of it from the fitted linear equation:

\(\hat{y}=238.5569\ +\ 0.3339x_1-2.7167x_2\)

Manual Computation

Table 1.3 displays all 9 fitted values of useful range \((\hat{y_i})\) and the corresponding residuals \(e\) and its squares \(e^2\).

Table: Table 1.3 Observations, Fitted Values and Residuals for Useful Range Data

| Observation Number| \(y_i\)| \(\hat{y_i}\)| \(e_i=y_i-\hat{y_i}\)| \(e_i^2\)| |——————:|—–:|———–:|——————-:|———-:| | 1| 96| 104.4523| -8.4523| 71.44138| | 2| 50| 41.5888| 8.4112| 70.74829| | 3| 50| 70.0914| -20.0914| 403.66435| | 4| 112| 136.1190| -24.1190| 581.72616| | 5| 96| 95.3614| 0.6386| 0.40781| | 6| 80| 54.6109| 25.3891| 644.60640| | 7| 155| 187.3344| -32.3340| 1045.48776| | 8| 144| 162.5047| -18.5047| 342.42392| | 9| 255| 185.9533| 69.0467| 4767.44678|

From the data in the Table 1.3, we could calculate for the \(SS_E\) or the Sum of the Squares of the Residuals \(e^2\):

* \(\sum_{i=1}^{9}e_{i}^2 = 7927.952843\)

We know that n = 9 and p = 3. Substituting the known values:

\(\sigma^2=\frac{SS_E}{n-p}\)
\(\sigma^2=\frac{7927.952843}{9-3}\)
\(\sigma^2=1321.325414\)

* \(\sigma^2\approx1321.3\)

Double Check with R

r lm2 <- lm(Useful_Range~Brightness+Contrast,data=Optical.data) anova(lm2)

## Analysis of Variance Table ## ## Response: Useful_Range ##            Df  Sum Sq Mean Sq F value   Pr(>F) ## Brightness  1  3960.3  3960.3  2.9973 0.134119 ## Contrast    1 20558.1 20558.1 15.5593 0.007585 ** ## Residuals   6  7927.6  1321.3 ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Practical Interpretation: This implies that the average of the squared difference between each predicted values and the actual values for the useful range is \(\sigma^2\approx1321.3\).

Answer to Question 3:

Question: Compute the standard errors of the regression coefficients.

“The standard deviation of an estimate is called the standard error. The standard error of the coefficient measures how the regression model precisely estimates the coefficient’s unknown value.”33 Read What is the standard error of the coefficient? (⊕Minitab, n.d.Minitab. n.d. “What Is the Standard Error of the Coefficient?” https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/regression/supporting-topics/regression-models/what-is-the-standard-error-of-the-coefficient/.)

This means that the larger the standard error of the regression coefficients, the less fit the regression model is to our observed values.

MATRIX-APPROACH COMPUTATION

In matrix-approach computation, the variances of the regression coefficients is given by the diagonal elements of the matrix \(\sigma^2\left(X'X\right)^{-1}\)

Our matrix \(X\) is given by:

library(knitr)
x.data <- c(1,54,56,1,61,80,1,65,70,1,100,50,1,100,
            65,1,100,80,1,50,25,1,57,35,1,54,26
            ) 
x <- matrix(x.data,nrow=9,ncol=3,byrow=TRUE) #Matrix of X-Values
x

\(\left[\begin{matrix}1&54&56\\1&61&80\\1&65&70\\1&100&50\\1&100&65\\1&100&80\\1&50&25\\1&57&35\\1&54&26\\\end{matrix}\right]\)

Meanwhile, the transposed version of it \(X'\) is given by:

xp <- t(x)  #Transpose Matrix of X-Values
xp

\(\left[\begin{matrix}1&1&1&1&1&1&1&1&1\\54&61&65&100&100&100&50&57&54\\96&50&50&112&96&80&115&144&255\\\end{matrix}\right]\)

Multiplying these two matrices, \(X'X\):

xxp <- (xp %*% x) #Product of Original and Transposed Matrix
xxp

\(\left[\begin{matrix}1&1&1&1&1&1&1&1&1\\54&61&65&100&100&100&50&57&54\\96&50&50&112&96&80&115&144&255\\\end{matrix}\right]\left[\begin{matrix}1&54&56\\1&61&80\\1&65&70\\1&100&50\\1&100&65\\1&100&80\\1&50&25\\1&57&35\\1&54&26\\\end{matrix}\right]=\left[\begin{matrix}9&641&487\\641&49527&36603\\487&36603&30087\\\end{matrix}\right]\)

Solving the inverse of the product,\(\left(X'X\right)^{-1}\)

xxp_I <- solve(xxp) #Inverse of Two Matrices' Product
xxp_I

\(\left[\begin{matrix}9&641&487\\641&49527&36603\\487&36603&30087\\\end{matrix}\right]^{-1}=\left[\begin{matrix}1.54821553\ &-0.0150363856\ &-0.006767180\\-0.01503639&0.0003461619&-0.000177746\\-0.00676718&-0.0001777460&0.000359014\\\end{matrix}\right]\)

Finally, multiply the derived mean square error from question 2, \(σ^2=1321.3\), to the inverse of the product of the two matrices \(\left(X'X\right)^{-1}\):

MSE <- 1321.325414
C <- (xxp_I *MSE )

\(\left[\begin{matrix}1.54821553\ &-0.0150363856\ &-0.006767180\\-0.01503639&0.0003461619&-0.000177746\\-0.00676718&-0.0001777460&0.000359014\\\end{matrix}\right]\times1321.3=\left[\begin{matrix}2045.696527\ &-19.8679585\ &-8.9416467\\-19.867959&0.4573925&-0.2348603\\-8.941647&-0.2348603&0.4743743\\\end{matrix}\right]\)

From the matrix \(\sigma^2\left(X'X\right)^{-1}\), knowing that the variances of the regression coefficients are given by the diagonal elements of it means that:

\(\left[\begin{matrix}2045.696527\ &-19.8679585\ &-8.9416467\\-19.867959&0.4573925&-0.2348603\\-8.941647&-0.2348603&0.4743743\\\end{matrix}\right]\)

The Standard error for the intercept of the regression model is \(S_E({\hat{B}}_0)=\sigma^2C_{00}=\sqrt{2045.696527}=45.22937861\)
The Standard error for the Brightness Percentage, one of the two regressors of the model \(\beta_1\) is \(S_E({\hat{B}}_1)=\sigma^2C_{11}=\sqrt{0.4573925}=0.676307992\)
The Standard error for the Contrast Percentage, one of the two regressors of the modelis \(S_E({\hat{B}}_1)=\sigma^2C_{11}=\sqrt{0.4573925}=0.6887483575\)

R COMPUTATION

multiple.regression <- lm (Useful_Range ~ Brightness + 
                          Contrast, data=Optical.data)
summary(multiple.regression)


Call:
lm(formula = Useful_Range ~ Brightness + Contrast, data = Optical.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-32.334 -20.090  -8.451   8.413  69.047 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 238.5569    45.2285   5.274  0.00188 **
Brightness    0.3339     0.6763   0.494  0.63904   
Contrast     -2.7167     0.6887  -3.945  0.00759 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 36.35 on 6 degrees of freedom
Multiple R-squared:  0.7557,    Adjusted R-squared:  0.6742 
F-statistic: 9.278 on 2 and 6 DF,  p-value: 0.01459

Answer to Question 4:

Question: Predict the useful range when brightness = 80 and contrast = 75 .

Multiple Linear Regression Analysis

An implementation in R Markdown

Ma. Angelika C. Regoso, Angelica T. Masa

2021-07-29