Researchers at National Institutes of Standards and Technology (NIST) collected pipeline data on ultrasonic measurements of the depth of defects in the Alaska pipeline in the field. The depth of the defects were then remeasured in the laboratory. These measurements were performed in six different batches. It turns out that this batch effect is not significant and so can be ignored in the analysis that follows. The laboratory measurements are more accurate than the in-field measurements, but more time consuming and expensive. We want to develop a regression equation for correcting the in-field measurements.
## Field Lab Batch
## 1 18 20.2 1
## 2 38 56.0 1
## 3 15 12.5 1
## 4 20 21.2 1
## 5 18 15.5 1
## 6 36 39.0 1
## [1] 107 3
## [1] "Field" "Lab" "Batch"
##
## Call:
## lm(formula = Lab ~ Field, data = pipeline)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.985 -4.072 -1.431 2.504 24.334
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.96750 1.57479 -1.249 0.214
## Field 1.22297 0.04107 29.778 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.865 on 105 degrees of freedom
## Multiple R-squared: 0.8941, Adjusted R-squared: 0.8931
## F-statistic: 886.7 on 1 and 105 DF, p-value: < 2.2e-16
## [1] 0.08754773
##
## Shapiro-Wilk normality test
##
## data: residuals(pipeline_lm)
## W = 0.92697, p-value = 1.841e-05
There doesn’t appear to be any autocorrelation from repeated experiments, however the Shapiro-Wilk test supports rejecting the null hypothesis that residuals are normally distributed.
i <- order(pipeline\(Field) npipe <- pipeline[i,] ff <- gl(12,9)[-108] meanfield <- unlist(lapply(split(npipe\)Field,ff),mean)) varlab <- unlist(lapply(split(npipe$Lab,ff),var))
Suppose we guess that the the variance in the response is linked to the predictor in the following way:
\[var(Lab) = a_0 Field^{a_1}\]
Regress log(varlab) on log(meanfield) to estimate a0 and a1. (You might choose to remove the last point.) Use this to determine appropriate weights in a WLS fit of Lab on Field. Show the regression summary.
## (Intercept) log(meanfield)
## -0.3537921 1.1244202
In this case, the intercept is log(a0) and the slope of meanfield is a1.
## Generalized least squares fit by REML
## Model: Lab ~ Field
## Data: pipeline
## AIC BIC logLik
## 708.1764 716.1383 -351.0882
##
## Variance function:
## Structure: fixed weights
## Formula: ~(var_lab)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -1.494365 0.9070661 -1.64747 0.1025
## Field 1.208276 0.0348839 34.63704 0.0000
##
## Correlation:
## (Intr)
## Field -0.809
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.7814680 -0.6930709 -0.2727923 0.5313567 2.9450736
##
## Residual standard error: 1.169044
## Degrees of freedom: 107 total; 105 residual
##
## Call:
## lm(formula = sqrt(Lab) ~ Field, data = pipeline)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1688 -0.4060 -0.0514 0.2766 1.7103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.558536 0.122530 20.88 <2e-16 ***
## Field 0.100642 0.003195 31.50 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6119 on 105 degrees of freedom
## Multiple R-squared: 0.9043, Adjusted R-squared: 0.9034
## F-statistic: 991.9 on 1 and 105 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log(Lab) ~ Field, data = pipeline)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.97034 -0.13220 0.00857 0.15797 0.49898
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.251322 0.052269 43.07 <2e-16 ***
## Field 0.035526 0.001363 26.06 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.261 on 105 degrees of freedom
## Multiple R-squared: 0.8661, Adjusted R-squared: 0.8648
## F-statistic: 679.2 on 1 and 105 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = 1/(Lab) ~ (Field), data = pipeline)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.023296 -0.012976 -0.005380 0.005615 0.150379
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0892885 0.0044632 20.00 <2e-16 ***
## Field -0.0014220 0.0001164 -12.22 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02229 on 105 degrees of freedom
## Multiple R-squared: 0.587, Adjusted R-squared: 0.5831
## F-statistic: 149.2 on 1 and 105 DF, p-value: < 2.2e-16