Multiple Linear Regression

R Markdown Implementation


Question 2

An article in Optical Engineering reported on the use of an optical correlator to perform an experiment by varying brightness and contrast. The resulting modulation is characterized by the useful range of gray levels. The data follow:

Useful Range (ng) Brightness (%) Contrast (%)
96 54 56
50 61 80
50 65 70
112 100 50
96 100 65
80 100 80
155 50 25
144 57 35
255 54 26
Table 1. Table of values for the useful range of gray levels with varying brightness and contrast.


A. Fit a multiple linear regression to these data.

##   Range Brightness Contrast
## 1    96         54       56
## 2    50         61       80
## 3    50         65       70
## 4   112        100       50
## 5    96        100       65
## 6    80        100       80
## 7   155         50       25
## 8   144         57       35
## 9   255         54       26

## 
## Call:
## lm(formula = Range ~ Brightness + Contrast, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.334 -20.090  -8.451   8.413  69.047 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 238.5569    45.2285   5.274  0.00188 **
## Brightness    0.3339     0.6763   0.494  0.63904   
## Contrast     -2.7167     0.6887  -3.945  0.00759 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared:  0.7557, Adjusted R-squared:  0.6742 
## F-statistic: 9.278 on 2 and 6 DF,  p-value: 0.01459

Graph 1.a Scatter plots of values from Table 1.

We will fit a multiple linear regression model to those data by: \[y=\beta_0 +\beta_1x_{i1}+\beta_2x_{i2}+\epsilon\]

Consulting with r, the values are known such as: \[ \begin{aligned} \hat{\beta}_0 & =238.5569\\ \hat{\beta}_1 & =0.3339\\ \hat{\beta}_2 & =-2.7167\\ \end{aligned} \] and substituting this with the multiple linear regression model, \[ \begin{aligned} y&=\beta_0 +\beta_1x_{i1}+\beta_2x_{i2}+\epsilon\\ y&=238.5569 +0.3339x_{i1}-2.7167x_{i2}\\ \end{aligned} \]

Thus, the multiple linear regression model is \(y=238.5569 +0.3339x_{i1}-2.7167x_{i2}\).

Practical Interpretation: This equation can be used to predict the useful range of gray levels for pairs of values of the regressor variables Brightness (\(x_{i1}\)) and Contrast (\(x_{i2}\)). Essentially, this is the same with the scatter plot found in Graph 1.a.


B. Estimate \(\sigma^2\).

lm2 <- lm(Range~Brightness+Contrast,data=df)
anova(lm2)
## Analysis of Variance Table
## 
## Response: Range
##            Df  Sum Sq Mean Sq F value   Pr(>F)   
## Brightness  1  3960.3  3960.3  2.9973 0.134119   
## Contrast    1 20558.1 20558.1 15.5593 0.007585 **
## Residuals   6  7927.6  1321.3                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The residuals (mean sq) is the variance, which is 1321.3.


C. Compute the standard errors of the regression coefficients.

We can use R again for the computation of the standard errors of the regression coeffiicents.

##   Range Brightness Contrast
## 1    96         54       56
## 2    50         61       80
## 3    50         65       70
## 4   112        100       50
## 5    96        100       65
## 6    80        100       80
## 7   155         50       25
## 8   144         57       35
## 9   255         54       26
## 
## Call:
## lm(formula = Range ~ Brightness + Contrast, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.334 -20.090  -8.451   8.413  69.047 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 238.5569    45.2285   5.274  0.00188 **
## Brightness    0.3339     0.6763   0.494  0.63904   
## Contrast     -2.7167     0.6887  -3.945  0.00759 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared:  0.7557, Adjusted R-squared:  0.6742 
## F-statistic: 9.278 on 2 and 6 DF,  p-value: 0.01459
The column for the Std. Error tells us the standard errors of the regression coefficients. Thus, the standard errors of the regression coefficients are:


D. Predict the useful range when brightness = 80 and contrast = 75.

Remember that we have solved for the fitted multiple regression line that is\(y=238.5569 +0.3339x_{i1}-2.7167x_{i2}\). We will be using this by substituting the value of the brightness and contrast to find the useful range. The given from the data are as follows: \[ \begin{aligned} y&=238.5569 +0.3339x_{i1}-2.7167x_{i2}\\ brightness (x_{i1})&=80\\ contrast (x_{i2})&=75\\ \end{aligned} \] The computation is as follows: \[ \begin{aligned} y&=238.5569 +0.3339x_{i1}-2.7167x_{i2}\\ y&=238.5569 +0.3339(80)-2.7167(75)\\ y&=61.5164\\ \end{aligned} \] The useful range is 61.5164 when the brightness is 80 and contrast is 75.


E. Test for significance of regression using \(\alpha=0.05\). What is the P-value for this test?

The test for significance of regression is a test done to determine whether a linear relationship exists between the response variable \(y\) and the subset of regressor variables \(x_1, x_2, ..., x_k\). The appropriate hypotheses are given as: \[H_0:\beta_1=\beta_2=...=\beta_k=0\]
\(H_1: \beta_j \neq 0\) for at least one \(j\).

The test statistic for \(H_0:\beta_1=\beta_2=...=\beta_k=0\) is given by: \[F_0=\frac{SS_r/k}{SS_E/(n-p)}=\frac{MS_R}{MS_E}\]

We will be rejecting the null hypothesis if the computed value for the test statistic is greater than \(f_{\alpha,k,n-p}\).

Using R computation, we can easily determine the f-statistic, degree of freedom, and the p-value associated with it.

##   Range Brightness Contrast
## 1    96         54       56
## 2    50         61       80
## 3    50         65       70
## 4   112        100       50
## 5    96        100       65
## 6    80        100       80
## 7   155         50       25
## 8   144         57       35
## 9   255         54       26
## 
## Call:
## lm(formula = Range ~ Brightness + Contrast, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.334 -20.090  -8.451   8.413  69.047 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 238.5569    45.2285   5.274  0.00188 **
## Brightness    0.3339     0.6763   0.494  0.63904   
## Contrast     -2.7167     0.6887  -3.945  0.00759 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared:  0.7557, Adjusted R-squared:  0.6742 
## F-statistic: 9.278 on 2 and 6 DF,  p-value: 0.01459
From there, the following can now be determined:

Conclusions: Because the f-statistic is \(f_0 > f_{0.05,2,6}\) (\(9.278 > 5.14325285\)) and the p-value (\(0.01459\)) is relatively smaller compared to the significance level \(\alpha=0.05\), we reject the null hypothesis and we can conclude that useful range is linearly related to either the brightness or contrast, or can be both. Therefore, binding from the solution there is a significant population slope coefficient and linear relationship between the dependent and independent variables.

Practical Interpretation: When we rejected \(H_0\), it does not necessarily imply that the relationship we found is an appropriate model for predicting the useful range as a function of brightness and contrast of the gray scale. There is a requirement for further tests for the accuracy of the model adequacy before we can be comfortable using this model in practice.


F. Construct a t-test on each regression coefficient. What conclusions can you draw about the variables in this model? Use \(\alpha=0.05\).

The tests on the individual regression coefficients are useful in determining the potential value of each of the regressor variables in the regression model.

BRIGHTNESS COEFFICIENT TEST, \(\beta_1\)

Formulate the hypotheses: \[ \begin{aligned} H_0&:\beta_1=0\\ H_1&:\beta_1\neq0 \end{aligned} \] The test statistic is given by the formula: \[t_0 = \frac{\hat{\beta_j}-\hat{\beta}_{j0}}{se(\hat{\beta}_j)}\] The null hypothesis will be rejected if \(|t_0|>t_{\alpha/2,n-p}\).

To solve for for the test statistic, the following values are known since we have already solved for it in the other questions.

\[ \begin{aligned} se(\hat{\beta}_1)&= 0.6763\\ \hat{\beta_1}&=0.3339\\ \end{aligned} \] and substituting it to the known values, we have: \[ \begin{aligned} t_0 &= \frac{\hat{\beta_j}-\hat{\beta}_{j0}}{se(\hat{\beta}_j)}\\ t_0 &= \frac{0.3339}{0.6763}\\ t_0 &= 0.4937 \end{aligned} \]

The t-score is \(t_0=0.4937\) and the critical t-value is 2.4469, with a p-value of 0.63904.

Again, we can check the values we got using r computation such as:

##   Range Brightness Contrast
## 1    96         54       56
## 2    50         61       80
## 3    50         65       70
## 4   112        100       50
## 5    96        100       65
## 6    80        100       80
## 7   155         50       25
## 8   144         57       35
## 9   255         54       26
## 
## Call:
## lm(formula = Range ~ Brightness + Contrast, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.334 -20.090  -8.451   8.413  69.047 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 238.5569    45.2285   5.274  0.00188 **
## Brightness    0.3339     0.6763   0.494  0.63904   
## Contrast     -2.7167     0.6887  -3.945  0.00759 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared:  0.7557, Adjusted R-squared:  0.6742 
## F-statistic: 9.278 on 2 and 6 DF,  p-value: 0.01459
From the r computation generated, we can determine the following value that we have obtained using the manual computation:

After this, we can now proceed to the conclusion.

Conclusion: Because \(t_0<t_{0.025,6}\) (\(0.4937 < 2.4469\)) and the p-value is greater than the significance level (0.63904 > 0.05), we then fail to reject the null hypothesis \(H_0:\beta_1=0\) and conclude that the variable \(x_1\) (brightness) does not contribute significantly to the model.

Practical Interpretation: We have to remember that this test is measuring the marginal or partial contribution of \(x_1\) (brightness) given that \(x_2\) (contrast) is present in the model.

CONTRAST COEFFICIENT TEST, \(\beta_2\)

Formulate the hypotheses: \[ \begin{aligned} H_0&:\beta_2=0\\ H_1&:\beta_2\neq0 \end{aligned} \] The test statistic is given by the formula: \[t_0 = \frac{\hat{\beta_j}-\hat{\beta}_{j0}}{se(\hat{\beta}_j)}\] The null hypothesis will be rejected if \(|t_0|>t_{\alpha/2,n-p}\).

To solve for for the test statistic, the following values are known since we have already solved for it in the other questions.

\[ \begin{aligned} se(\hat{\beta}_2)&= 0.6887\\ \hat{\beta_2}&=-2.7167\\ \end{aligned} \] and substituting it to the known values, we have: \[ \begin{aligned} t_0 &= \frac{\hat{\beta_j}-\hat{\beta}_{j0}}{se(\hat{\beta}_j)}\\ t_0 &= \frac{-2.7167}{0.6887}\\ t_0 &= -3.9447\\ |t_0| &= 3.9447\\ \end{aligned} \]

The t-score is \(t_0=3.9447\) and the critical t-value is 2.4469, with a p-value of 0.00759.

Again, we can check the values we got using r computation such as:

##   Range Brightness Contrast
## 1    96         54       56
## 2    50         61       80
## 3    50         65       70
## 4   112        100       50
## 5    96        100       65
## 6    80        100       80
## 7   155         50       25
## 8   144         57       35
## 9   255         54       26
## 
## Call:
## lm(formula = Range ~ Brightness + Contrast, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.334 -20.090  -8.451   8.413  69.047 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 238.5569    45.2285   5.274  0.00188 **
## Brightness    0.3339     0.6763   0.494  0.63904   
## Contrast     -2.7167     0.6887  -3.945  0.00759 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.35 on 6 degrees of freedom
## Multiple R-squared:  0.7557, Adjusted R-squared:  0.6742 
## F-statistic: 9.278 on 2 and 6 DF,  p-value: 0.01459
From the r computation generated, we can determine the following value that we have obtained using the manual computation:

After this, we can now proceed to the conclusion.

Conclusion: Because \(t_0>t_{0.025,6}\) (\(3.9447 > 2.4469\)) and the p-value is less than the significance level (0.00759 < 0.05), we then reject the null hypothesis \(H_0:\beta_2=0\) and conclude that the variable \(x_2\) (contrast) contributes significantly to the model.

Practical Interpretation: We have to remember that this test is measuring the marginal or partial contribution of \(x_2\) (contrast) given that \(x_1\) (brightness) is present in the model.


References:

D. C. Montgomery and G. C. Runger, Applied Statistics and Probability for Engineers. 2018.

D. N. (bookdown translation: E. Kothe), “Chapter 15 Linear regression.” https://learningstatisticswithr.com/book/regression.html (accessed Jul. 30, 2021).