An article in Optical Engineering1 Read the article by Operating Curve Extraction of a Correlator’s Filter (Letendre, Bergeron, and Arsenault 2004) reported on the use of an optical correlator to perform an experiment by varying brightness and contrast. The resulting modulation is characterized by the useful range of gray levels. The data follow:
Table 1.1 Useful Range Data
| Useful Range (ng) \(y_i\) | Brightness % (\(x_i1\)) | Contrast %(\(x_{i2}\)) |
|---|---|---|
| 96 | 54 | 56 |
| 50 | 61 | 80 |
| 50 | 65 | 70 |
| 112 | 100 | 50 |
| 96 | 100 | 65 |
| 80 | 100 | 80 |
| 155 | 50 | 25 |
| 144 | 57 | 35 |
| 255 | 54 | 26 |
Matrix of Scatter plots for the Useful Range Data in Table 1.1
## Loading required package: scatterplot3d
Interactive Three Dimensional Scatter Plot of the Useful Range Data
Task: Fit a multiple linear regression model to these data.
We will fit the linear regression model model:
\(y=\beta_0\ + \beta_1x_1 +\beta_2x_2 + \epsilon\)
where \(y\) = useful range of gray levels, \(x_1\) = brightness percentage and \(x_2\) = contrast percentage.
R Computation
Optical.data <- data.frame(
Useful_Range = c(96,50,50,112,96,80,155,144,255),
Brightness = c(54,61,65,100,100,100,50,57,54),
Contrast = c(56,80,70,50,65,80,25,35,26)
)
Eyi_<- sum(Optical.data$Useful_Range)
Exi_1<- sum(Optical.data$Brightness)
Exi_2<- sum(Optical.data$Contrast)
Exi_1_xi_2<- sum(Optical.data$Brightness*Optical.data$Contrast)
Exi_1Squared<- sum(Optical.data$Brightness^2)
Exi_2Squared<- sum(Optical.data$Contrast^2)
Exi_1_yi_<- sum(Optical.data$Brightness*Optical.data$Useful_Range)
Exi_2_yi_<- sum(Optical.data$Contrast*Optical.data$Useful_Range)
## [1] 1038
## [1] 641
## [1] 487
## [1] 36603
## [1] 49527
## [1] 30087
## [1] 70012
## [1] 46661
Manual Computation
Extending the table using the values of Brightness percentage, contrast percentage and useful range of gray levels:
| Observation Number | Useful Range (ng) \(y_i\) | Brightness % (\(x_i1\)) | Contrast %(\(x_{i2}\)) | \(x_{i1}x_{i2}\) | \({x_{i1}}^2\) | \({x_{i2}}^2\) | \(x_{i1}y_i\) | \(x_{i2}y_i\) |
|---|---|---|---|---|---|---|---|---|
| 1 | 96 | 54 | 56 | 3024 | 2916 | 3136 | 5184 | 5376 |
| 2 | 50 | 61 | 80 | 4880 | 3721 | 6400 | 3050 | 4000 |
| 3 | 50 | 65 | 70 | 4550 | 4225 | 4900 | 3250 | 3500 |
| 4 | 112 | 100 | 50 | 5000 | 10000 | 2500 | 11200 | 5600 |
| 5 | 96 | 100 | 65 | 6500 | 10000 | 4225 | 9600 | 6240 |
| 6 | 80 | 100 | 80 | 8000 | 10000 | 6400 | 8000 | 6400 |
| 7 | 155 | 50 | 25 | 1250 | 2500 | 625 | 7750 | 3875 |
| 8 | 144 | 57 | 35 | 1995 | 3249 | 1225 | 8208 | 5040 |
| 9 | 255 | 54 | 26 | 1404 | 2916 | 676 | 13770 | 6630 |
From the data in the Table 1.2, we could calculate that:
For the model \(Y=\beta_0\ + \beta_1x_1 +\beta_2x_2 + \epsilon\), the least square normal equations on the book of2 Equations on Chap 12.1(Montgomery and Runger 2010) are:
\(n{\hat{\beta}}_0 + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i2}= \sum_{i=1}^{n}y_{i}\)
\({\hat{\beta}}_0\sum_{i=1}^{n}x_{i1} + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}^2+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i1}x_{i2}= \sum_{i=1}^{n}x_{i1}y_{i}\)
\({\hat{\beta}}_0\sum_{i=1}^{n}x_{i2} + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}x_{i2}+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i2}^2 = \sum_{i=1}^{n}x_{i2}y_{i}\)
Inserting the computed summations into the normal equations, we obtain:
\(9{\hat{\beta}}_0 + 643{\hat{\beta}}_1+ 487{\hat{\beta}_2}= 1038\)
\({641\hat{\beta}}_0 + 49527{\hat{\beta}}_1 + 36603{\hat{\beta}}_2 = 70012\)
\({487\hat{\beta}}_0 + 36603{\hat{\beta}}_1 + {30087\hat{\beta}}_2 = 46661\)
The solution to this set of equations is:
\({\hat{\beta}}_0=238.5569\)
\({\hat{\beta}}_1=0.3339\)
\({\hat{\beta}}_2=-2.7167\)
DOUBLE CHECKING WITH R
model <- lm (Useful_Range ~ Brightness + Contrast, data=Optical.data)
model
##
## Call:
## lm(formula = Useful_Range ~ Brightness + Contrast, data = Optical.data)
##
## Coefficients:
## (Intercept) Brightness Contrast
## 238.5569 0.3339 -2.7167
\(\hat{y}=238.5569\ +\ 0.3339x_1-2.7167x_2\)
Practical Interpretation: The fitted equation above can be used in predicting the useful range of gray level, given that the value for the regressorS, which are the brightness percentage \((x_1)\) and contrast percentage \((x_2)\), are known.
Task:Estimate \(σ^2\).
The mean squared error of the useful range data points from the fitted linear model is represented by \(σ^2\). This unbiased estimator, \(σ^2\), is given by the equation:
\(\sigma^2=\frac{\sum_{i=1}^{n}{e_i}^2}{n-p}\)
where the \(\sum_{i=1}^{n}{e_i}^2\) or the sum of squares of the residual error can also be represented as \(SS_E\)
\(\sigma^2=\frac{SS_E}{n-p}\)
Note that n is the number of observations and p is the number of regressors plus one. Thus, in this problem, n = 9 and p=2+1=3.
The residual error \((e_i)\) is the difference of the observed value \((y_i)\) of the useful range and the fitted value \((\hat{y_i})\) of it from the fitted linear equation:
\(\hat{y}=238.5569\ +\ 0.3339x_1-2.7167x_2\)
Manual Computation
Table 1.3 displays all 9 fitted values of useful range \((\hat{y_i})\) and the corresponding residuals \(e\) and its squares \(e^2\).
Table 1.3 Observations, Fitted Values and Residuals for Useful Range Data
| Observation Number | \(y_i\) | \(\hat{y_i}\) | \(e_i=y_i-\hat{y_i}\) | \(e_i^2\) |
|---|---|---|---|---|
| 1 | 96 | 104.4523 | -8.4523 | 71.44138 |
| 2 | 50 | 41.5888 | 8.4112 | 70.74829 |
| 3 | 50 | 70.0914 | -20.0914 | 403.66435 |
| 4 | 112 | 136.1190 | -24.1190 | 581.72616 |
| 5 | 96 | 95.3614 | 0.6386 | 0.40781 |
| 6 | 80 | 54.6109 | 25.3891 | 644.60640 |
| 7 | 155 | 187.3344 | -32.3340 | 1045.48776 |
| 8 | 144 | 162.5047 | -18.5047 | 342.42392 |
| 9 | 255 | 185.9533 | 69.0467 | 4767.44678 |
From the data in the Table 1.3, we could calculate for the \(SS_E\) or the Sum of the Squares of the Residuals \(e_i^2\):
We know that n = 9 and p = 3. Substituting the known values:
\(\sigma^2=\frac{SS_E}{n-p}\)
\(\sigma^2=\frac{7927.952843}{9-3}\)
\(\sigma^2=1321.325414\)
Double Check with R
lm2 <- lm(Useful_Range~Brightness+Contrast,data=Optical.data)
anova(lm2)
## Analysis of Variance Table
##
## Response: Useful_Range
## Df Sum Sq Mean Sq F value Pr(>F)
## Brightness 1 3960.3 3960.3 2.9973 0.134119
## Contrast 1 20558.1 20558.1 15.5593 0.007585 **
## Residuals 6 7927.6 1321.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Practical Interpretation: This implies that the average of the squared difference between each predicted values and the actual values for the useful range is \(\sigma^2\approx1321.3\).
Task: Compute the standard errors of the regression coefficients.
“The standard deviation of an estimate is called the standard error. The standard error of the coefficient measures how the regression model precisely estimates the coefficient’s unknown value.”3 Read What is the standard error of the coefficient? (Minitab, n.d.)
This means that the larger the standard error of the regression coefficients, the less precise is the estimation of the coefficient’s value.
MATRIX-APPROACH COMPUTATION
In matrix-approach computation, the variances of the regression coefficients is given by the diagonal elements of the matrix \(\sigma^2\left(X'X\right)^{-1}\)
Our matrix \(X\) is given by:
library(knitr)
x.data <- c(1,54,56,1,61,80,1,65,70,1,100,50,1,100,
65,1,100,80,1,50,25,1,57,35,1,54,26
)
x <- matrix(x.data,nrow=9,ncol=3,byrow=TRUE) #Matrix of X-Values
x
\(\left[\begin{matrix}1&54&56\\1&61&80\\1&65&70\\1&100&50\\1&100&65\\1&100&80\\1&50&25\\1&57&35\\1&54&26\\\end{matrix}\right]\)
Meanwhile, the transposed version of it \(X'\) is given by:
xp <- t(x) #Transpose Matrix of X-Values
xp
\(\left[\begin{matrix}1&1&1&1&1&1&1&1&1\\54&61&65&100&100&100&50&57&54\\96&50&50&112&96&80&115&144&255\\\end{matrix}\right]\)
Multiplying these two matrices, \(X'X\):
xxp <- (xp %*% x) #Product of Original and Transposed Matrix
xxp
\(\left[\begin{matrix}1&1&1&1&1&1&1&1&1\\54&61&65&100&100&100&50&57&54\\96&50&50&112&96&80&115&144&255\\\end{matrix}\right]\left[\begin{matrix}1&54&56\\1&61&80\\1&65&70\\1&100&50\\1&100&65\\1&100&80\\1&50&25\\1&57&35\\1&54&26\\\end{matrix}\right]=\left[\begin{matrix}9&641&487\\641&49527&36603\\487&36603&30087\\\end{matrix}\right]\)
Solving the inverse of the product,\(\left(X'X\right)^{-1}\)
xxp_I <- solve(xxp) #Inverse of Two Matrices' Product
xxp_I
\(\left[\begin{matrix}9&641&487\\641&49527&36603\\487&36603&30087\\\end{matrix}\right]^{-1}=\left[\begin{matrix}1.54821553\ &-0.0150363856\ &-0.006767180\\-0.01503639&0.0003461619&-0.000177746\\-0.00676718&-0.0001777460&0.000359014\\\end{matrix}\right]\)
Finally, multiply the derived mean square error from question 2, \(σ^2=1321.3\), to the inverse of the product of the two matrices \(\left(X'X\right)^{-1}\):
MSE <- 1321.325414
C <- (xxp_I *MSE )
\(\left[\begin{matrix}1.54821553\ &-0.0150363856\ &-0.006767180\\-0.01503639&0.0003461619&-0.000177746\\-0.00676718&-0.0001777460&0.000359014\\\end{matrix}\right]\times1321.3=\left[\begin{matrix}2045.696527\ &-19.8679585\ &-8.9416467\\-19.867959&0.4573925&-0.2348603\\-8.941647&-0.2348603&0.4743743\\\end{matrix}\right]\)
From the matrix \(\sigma^2\left(X'X\right)^{-1}\), knowing that the variances of the regression coefficients are given by the diagonal elements of it means that:
\(\left[\begin{matrix}2045.696527\ &-19.8679585\ &-8.9416467\\-19.867959&0.4573925&-0.2348603\\-8.941647&-0.2348603&0.4743743\\\end{matrix}\right]\)
R COMPUTATION
multiple.regression <- lm (Useful_Range ~ Brightness +
Contrast, data=Optical.data)
summary(multiple.regression)
Call:
lm(formula = Useful_Range ~ Brightness + Contrast, data = Optical.data)
Residuals:
Min 1Q Median 3Q Max
-32.334 -20.090 -8.451 8.413 69.047
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 238.5569 45.2285 5.274 0.00188 **
Brightness 0.3339 0.6763 0.494 0.63904
Contrast -2.7167 0.6887 -3.945 0.00759 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 36.35 on 6 degrees of freedom
Multiple R-squared: 0.7557, Adjusted R-squared: 0.6742
F-statistic: 9.278 on 2 and 6 DF, p-value: 0.01459
The computed values for the standard errors on the regression coefficients- intercept, brightness, and contrast - through the matrix-approach are equal to the computed values of R using the function multiple.regression.
Task: Predict the useful range when brightness = 80 and contrast = 75
From what we had solved in part 1, the fitted regression model of the Useful Range Data in Table 1.1 is given by the equation:
\(\hat{y}=238.5569\ +\ 0.3339x_1-2.7167x_2\)
This regression model can be used to predict values of useful range \((y)\) for various percentages of Brightness \((x_1)\) and Contrast \((x_2)\)
If the observation has a brightness \((x_1)\) of 80 and contrast \((x_2)\) of 75:
This implies that the corresponding fitted value of the useful range is \(75.0999\) ng when the brightness percentage \((x_1)\) is \(80\) and the contrast percentage \((x_2)\) is \(75\).
Task: Test for significance of regression using α=0.05. What is the P-value for this test.
The appropriate hypotheses are:
The test statistic is:
Manual Computation
From part 2, we know that the mean square error \(MS_E\) or the \(σ^2\) is \(1321.3\). We also know that \(k=2\) because there are two regressors. However, we don’t know \(SS_R\) or the sum of squares of regression. But we know that \(SS_R=SS_T-SS_E\). And we could calculate this \(SS_T\) through the equation:
\({\rm SS}_T=y'y-\frac{\left(\sum_{i=}^{n}y_i\right)^2}{n}\)
Range.data <- data.frame(
Useful_Range = c(96,50,50,112,96,80,155,144,255)
)
Eyi_<- sum(Range.data$Useful_Range)
(Eyi_)^2
[1] 1077444
Based on this R-computation,\(\left(\sum_{i=}^{9}y_i\right)^2=1077444\). We also know that \(n=9\)
To find the \(SS_T\), we would just need the product of the matrix \(y'y\).
library(knitr)
y.data <- c(96,50,50,112,96,80,155,144,255
)
y <- matrix(y.data,nrow=9,ncol=1,byrow=TRUE) #Matrix of y-Values
yp <- t(y) #Transpose Matrix of X-Values
ypy <- (yp %*% y) #Product of Original and Transposed Matrix
ypy
[,1]
[1,] 152162
As computed by R, \(y'y\) is given by:
\(\left[\begin{matrix}96&50&50&112&96&80&115&144&255\\\end{matrix}\right]\times\left[\begin{matrix}96\\50\\50\\112\\96\\80\\115\\144\\255\\\end{matrix}\right]=152162\)
Putting all these values together to find the \(SS_T\),:
Now that we have the value for \(SS_T\), we can now compute for the value of \(SS_R\), given that \(SS_E=7927.952843\)
Computing for the test statistic:
\(f_0=\frac{24518.04157/2}{{7927.952843}/(9-3)}= 9.277999535\) \(f_0=9.278\)
R Computation
Call:
lm(formula = Useful_Range ~ Brightness + Contrast, data = Optical.data)
Residuals:
Min 1Q Median 3Q Max
-32.334 -20.090 -8.451 8.413 69.047
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 238.5569 45.2285 5.274 0.00188 **
Brightness 0.3339 0.6763 0.494 0.63904
Contrast -2.7167 0.6887 -3.945 0.00759 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 36.35 on 6 degrees of freedom
Multiple R-squared: 0.7557, Adjusted R-squared: 0.6742
F-statistic: 9.278 on 2 and 6 DF, p-value: 0.01459
Through manual and R-computation, we had verified that the F-statistic is \(f_0=9.278\) on \(2\) and \(6\) degrees of freedom.
#find P-value
pf(9.278, 2, 6, lower.tail = FALSE)
[1] 0.0145875
#find F-critical Value
qf(0.05, 2, 6, lower.tail = FALSE)
[1] 5.143253
Since the p-value is relatively smaller than the significance level, the \(\alpha=0.05\), and \(f_0>f_{0.05,2,6}\), we reject the null hypothesis. Thus, we could then conclude that the useful range is linearly related to either the brightness percentage, the contrast percentage or both.
Practical Interpretation: Being able to reject the null hypothesis does not necessarily imply that the regression model found is an accurate function of the brightness percentage \((x_1)\) and the contrast percentage \((x_2)\). There are other tests of model adequacy needed before we became fully confident on using the regression model.
Task: Construct a t-test on each regression coefficient. What conclusions can you draw about the variables in this model? Use α=0.05.
T-test on Regression Coefficient for Brightness Percentage \(\hat\beta_1\)
The hypotheses are:
The test-statistic is:
Reject \(H_0\): \(\beta_1\) = 0 if \(|t_0|\) > \(t_{\alpha/2, n-p}\).
#critical-value
qt(p=.025, df=6, lower.tail = FALSE)
[1] 2.446912
Manual Computation
We know that the standard error on the regression coefficient of Brightness Percentage \((x_1)\) is \(se({\hat{B}}_1)=0.676307992\) from Part 3. We also know that the regression coefficient of it is \({\hat{\beta}}_1=0.3339\) from Part 1. Thus:
\(t_0=\frac{0.3339-0}{0.676307992}\)
#test-statistic
B_x <- 0.3339
B_0 <- 0.0
seBx <-0.676307992
t_0 <- {(B_x)-(B_0)}/seBx
t_0
[1] 0.49371
\(t_0=0.49371\)
\(t_0\approx0.494\)
[1] 0.639
\(p-value = 0.639\)
R Computation
##
## Call:
## lm(formula = Useful_Range ~ Brightness + Contrast, data = Optical.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.334 -20.090 -8.451 8.413 69.047
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 238.5569 45.2285 5.274 0.00188 **
## Brightness 0.3339 0.6763 0.494 0.63904
## Contrast -2.7167 0.6887 -3.945 0.00759 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared: 0.7557, Adjusted R-squared: 0.6742
## F-statistic: 9.278 on 2 and 6 DF, p-value: 0.01459
Since \(t_0\) < \(t_{0.025,6}\), which means that the computed t-statistic did not fall in the rejection criteria, we then failed to reject the \(H_0\): \(\beta_1\) = 0. This implies that the Brightness Pecentage \((x_1)\) is not statistically significant.
P-value is also another basis to draw conclusions. The P-value for \(t_0\) = 0.494 is \(P\approx0.639\). The P-value is greater than the significance level, which is the \(\alpha=0.05\). Thus, we failed to reject the null hypothesis.
Practical Interpretation: This test is a marginal test, which means, it measures the marginal contribution of Brightness Percentage \((x_1)\), given that the Contrast Percentage \((x_2)\) is in the regression model.
T-test on Regression Coefficient for Contrast Percentage \(\hat\beta_2\)
The hypotheses are:
The test-statistic is:
Reject \(H_0\): \(\beta_2\) = 0 if \(|t_0|\) > \(t_{\alpha/2, n-p}\).
#critical-value
qt(p=.025, df=6, lower.tail = FALSE)
[1] 2.446912
Manual Computation
We know that the standard error on the regression coefficient of Brightness Percentage \((x_2)\) is \(se({\hat{B}}_2)=0.6887483575\) from Part 3. We also know that the regression coefficient of it is \({\hat{\beta}}_2=-2.7167\) from Part 1. Thus:
\(t_0=\frac{-2.7167-0}{0.6887483575}\)
#test-statistic
B_x <- -2.7167
B_0 <- 0.0
seBx <-0.6887483575
t_0 <- {(B_x)-(B_0)}/seBx
t_0
[1] -3.944401
\(t_0= -3.9444\)
\(|t_0|= 3.9444\)
#find two-tailed p-value
2*pt(q=3.9444, df=6, lower.tail=FALSE)
[1] 0.007586218
\(p-value = 0.00759\)
R Computation
##
## Call:
## lm(formula = Useful_Range ~ Brightness + Contrast, data = Optical.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.334 -20.090 -8.451 8.413 69.047
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 238.5569 45.2285 5.274 0.00188 **
## Brightness 0.3339 0.6763 0.494 0.63904
## Contrast -2.7167 0.6887 -3.945 0.00759 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared: 0.7557, Adjusted R-squared: 0.6742
## F-statistic: 9.278 on 2 and 6 DF, p-value: 0.01459
Since \(|t_0|\) > \(t_{0.025,6}\), we reject the \(H_0\): \(\beta_2\) = 0. This implies that the Contrast percentage \(x_2\) is statistically significant.
P-value is also another basis to draw conclusions. The P-value for \(|t_0|\) = 3.9444 is P = 0.00759. The P-value is less than the significance level, which is the \(\alpha=0.05\). Thus, we reject the null hypothesis.
Practical Interpretation: This test is a marginal test. Meaning, it measures the marginal contribution of Contrast Percentage \((x_2)\), given that the Brightness Percentage \((x_1)\) is in the model.