2.32

The diameter of a ball bearing was measured by 12 inspectors, each using two different kinds of calipers. The results were

# Read in the data table for 2.32
dat232<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/232Table.csv")
dat232
##    Inspector Caliper1 Caliper2
## 1          1    0.265    0.264
## 2          2    0.265    0.265
## 3          3    0.266    0.264
## 4          4    0.267    0.266
## 5          5    0.267    0.267
## 6          6    0.265    0.268
## 7          7    0.267    0.264
## 8          8    0.267    0.265
## 9          9    0.265    0.265
## 10        10    0.268    0.267
## 11        11    0.268    0.268
## 12        12    0.265    0.269

(a) Is there a significant difference between the means of the population of measurements from which the two samples were selected? Use α = 0.05.

Yes. We test the null hypothesis using a paired t-test, as the same inspector ties each pair of measurements, and can reject the null hypothesis with this data.

# paired t-test with default 95% confidence level
t.test(dat232[,2], dat232[,3], paired=TRUE)
## 
##  Paired t-test
## 
## data:  dat232[, 2] and dat232[, 3]
## t = 0.43179, df = 11, p-value = 0.6742
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.001024344  0.001524344
## sample estimates:
## mean of the differences 
##                 0.00025

(b) Find the P-value for the test in part (a).

The p-value = 0.6742 > 0.05 = α.

(c) Construct a 95 percent confidence interval on the difference in mean diameter measurements for the two types of calipers.

95 percent confidence interval: [-0.001024344, 0.001524344]

2.34

An article in the Journal of Strain Analysis (vol. 18, no. 2, 1983) compares several procedures for predicting the shear strength for steel plate girders. Data for nine girders in the form of the ratio of predicted to observed load for two of these procedures, the Karlsruhe and Lehigh methods, are as follows:

# Read in the data table for 2.34
dat234<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/234Table.csv")
dat234
##   Girder Karlsruhe Lehigh
## 1   S1/1     1.186  1.061
## 2   S2/1     1.151  0.992
## 3   S3/1     1.322  1.063
## 4   S4/1     1.339  1.062
## 5   S5/1     1.200  1.065
## 6   S2/1     1.402  1.178
## 7   S2/2     1.365  1.037
## 8   S2/3     1.537  1.086
## 9   S2/4     1.559  1.052

(a) Is there any evidence to support a claim that there is a difference in mean performance between the two methods? Use α = 0.05.

No. We test the hypothesis using a paired t-test, as the same girder type ties each pair of measurements, and cannot reject the null hypothesis.

# paired t-test with default 95% confidence level
t.test(dat234[,2], dat234[,3], paired=TRUE)
## 
##  Paired t-test
## 
## data:  dat234[, 2] and dat234[, 3]
## t = 6.0819, df = 8, p-value = 0.0002953
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1700423 0.3777355
## sample estimates:
## mean of the differences 
##               0.2738889

(b) What is the P-value for the test in part (a)?

The p-value = 0.0002953 < 0.05 = α.

(c) Construct a 95 percent confidence interval for the difference in mean predicted to observed load.

95 percent confidence interval: [0.1700423, 0.3777355]

(d) Investigate the normality assumption for both samples.

The normal probability plots indicate a mostly normal distribution for both data sets, but there is an outlier (1.178) in the Lehigh method data that may require investigation to include or exclude for a two-sample t-test.

# ratio of predicted to observed load for steel plate girders developed using the Karlsruhe and Lehigh methods on a normal probability plot,
qqnorm(dat234[,2], ylab="ratio of predicted to observed load, Sample Quantiles", xlab="Standard Normal Quantiles", main="Karlsruhe steel Normal Probability Plot", col="steelblue") 
qqline(dat234[,2], col="steelblue")

qqnorm(dat234[,3], ylab="ratio of predicted to observed load, Sample Quantiles", xlab="Standard Normal Quantiles", main="Lehigh steel Normal Probability Plot", col="firebrick2") 
qqline(dat234[,3], col="firebrick2")

(e) Investigate the normality assumption for the difference in ratios for the two methods.

The differences of the ratios of the two methods are generally normally distributed.

# Calculate differences and put into a normal probability plot
difdat234<-c(abs(dat234[,2] - dat234[,3]))
difdat234
## [1] 0.125 0.159 0.259 0.277 0.135 0.224 0.328 0.451 0.507
qqnorm(dat234[,2], ylab="difference of the ratios, Sample Quantiles", xlab="Standard Normal Quantiles", main="Difference between Karlsruhe and Lehigh steel Normal Probability Plot", col="forestgreen") 
qqline(dat234[,2], col="forestgreen")

(f) Discuss the role

In a paired t-test, the assumption of normality applies to the distribution of the differences, not the distribution of the samples measured separately, so the normality of the difference fulfills the assumption of normality.

2.29 (e,f)

Photoresist is a light-sensitive material applied to semiconductor wafers so that the circuit pattern can be imaged on to the wafer. After application, the coated wafers are baked to remove the solvent in the photoresist mixture and to harden the resist. Here are measurements of photoresist thickness (in kA) for eight wafers baked at two different temperatures. Assume that all of the runs were made in random order.

# Read in the data table for 2.29
dat229<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/229Table.csv")
names(dat229)[names(dat229) == "X95.C"] <- "95°C"
names(dat229)[names(dat229) == "X100.C"] <- "100°C"
dat229
##     95°C 100°C
## 1 11.176 5.263
## 2  7.089 6.748
## 3  8.097 7.461
## 4 11.739 7.015
## 5 11.291 8.133
## 6 10.759 7.418
## 7  6.467 3.772
## 8  8.315 8.963

(e) Check the assumption of normality of the photoresist thickness.

The normal probability plots indicate a mostly normal distribution for both data sets, but there is an outlier (3.772) in the 100°C data that may require investigation to include or exclude for a two-sample t-test.

# photoresist thickness for wafers baked at 95°C and 100°C on a normal probability plot,
qqnorm(dat229[,1], ylab="photoresist thickness (in kA), Sample Quantiles", xlab="Standard Normal Quantiles", main="95°C Normal Probability Plot", col="steelblue") 
qqline(dat229[,1], col="steelblue")

qqnorm(dat229[,2], ylab="photoresist thickness (in kA), Sample Quantiles", xlab="Standard Normal Quantiles", main="100°C Normal Probability Plot", col="firebrick2") 
qqline(dat229[,2], col="firebrick2")

(f) Find the power of this test for detecting an actual difference in means of 2.5 kA.

Given:

# The standard deviations of each distribution, σ1 and σ2
sd(dat229[,1])
## [1] 2.099564
sd(dat229[,2])
## [1] 1.640427
# Using Cohen's d formula for pooled variance, which should be between, σ1 and σ2
sqrt((7*(sd(dat229[,1])^2) + 7*(sd(dat229[,2])^2))/14)
## [1] 1.884034
# Checking the answer with an R effectsize library function
library(effectsize)
## Registered S3 methods overwritten by 'parameters':
##   method                           from      
##   as.double.parameters_kurtosis    datawizard
##   as.double.parameters_skewness    datawizard
##   as.double.parameters_smoothness  datawizard
##   as.numeric.parameters_kurtosis   datawizard
##   as.numeric.parameters_skewness   datawizard
##   as.numeric.parameters_smoothness datawizard
##   print.parameters_distribution    datawizard
##   print.parameters_kurtosis        datawizard
##   print.parameters_skewness        datawizard
##   summary.parameters_kurtosis      datawizard
##   summary.parameters_skewness      datawizard
sd_pooled(dat229[,1], dat229[,2])
## [1] 1.884034
library(pwr)
#power for 2-Sample t-test to detect difference in the means of 2.5 kA.
pwr.t.test(n=8, d=sqrt((7*(sd(dat229[,1])^2) + 7*(sd(dat229[,2])^2))/14), sig.level=0.05, power=NULL, type="two.sample")
## 
##      Two-sample t test power calculation 
## 
##               n = 8
##               d = 1.884034
##       sig.level = 0.05
##           power = 0.9381252
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Solution: Power = 0.94 [2 significant figures]

2.27 (a - Using a non-parametric method)

An article in Solid State Technology, “Orthogonal Design for Process Optimization and Its Application to Plasma Etching” by G. Z. Yin and D. W. Jillie (May 1987) describes an experiment to determine the effect of the C2F6 flow rate on the uniformity of the etch on a silicon wafer used in integrated circuit manufacturing. All of the runs were made in random order. Data for two flow rates are as follows:

C2F6 Flow Uniformity Observation

# Read in the data table for 2.27
dat227<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/227Table.csv")
names(dat227)[names(dat227) == "ï...SCCM."] <- "(SCCM)"
names(dat227)[names(dat227) == "X125"] <- "125"
names(dat227)[names(dat227) == "X200"] <- "200"
dat227
##   (SCCM) 125 200
## 1      1 2.7 4.6
## 2      2 4.6 3.4
## 3      3 2.6 2.9
## 4      4 3.0 3.5
## 5      5 3.2 4.1
## 6      6 3.8 5.1

(a) Does the C2F6 flow rate affect average etch uniformity? Use α = 0.05.

Yes. We test the null hypothesis using a Mann-Whitney-U test and p-value = 0.1994 > 0.05 = α so we can reject the null hypothesis with this data.

# Run Wilcox test with 2 parameters a.ka. a Mann-Whitney-U test, with default 95% confidence level
wilcox.test(dat227[,2], dat227[,3])
## Warning in wilcox.test.default(dat227[, 2], dat227[, 3]): cannot compute exact
## p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  dat227[, 2] and dat227[, 3]
## W = 9.5, p-value = 0.1994
## alternative hypothesis: true location shift is not equal to 0

Complete Code

Here we display the complete R code used in this analysis.

# Read in the data table for 2.32
dat232<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/232Table.csv")
dat232

# paired t-test with default 95% confidence level
t.test(dat232[,2], dat232[,3], paired=TRUE)

# Read in the data table for 2.34
dat234<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/234Table.csv")
dat234

# paired t-test with default 95% confidence level
t.test(dat234[,2], dat234[,3], paired=TRUE)

# ratio of predicted to observed load for steel plate girders developed using the Karlsruhe and Lehigh methods on a normal probability plot,
qqnorm(dat234[,2], ylab="ratio of predicted to observed load, Sample Quantiles", xlab="Standard Normal Quantiles", main="Karlsruhe steel Normal Probability Plot", col="steelblue") 
qqline(dat234[,2], col="steelblue")
qqnorm(dat234[,3], ylab="ratio of predicted to observed load, Sample Quantiles", xlab="Standard Normal Quantiles", main="Lehigh steel Normal Probability Plot", col="firebrick2") 
qqline(dat234[,3], col="firebrick2")

# Calculate differences and put into a normal probability plot
difdat234<-c(abs(dat234[,2] - dat234[,3]))
difdat234
qqnorm(dat234[,2], ylab="difference of the ratios, Sample Quantiles", xlab="Standard Normal Quantiles", main="Difference between Karlsruhe and Lehigh steel Normal Probability Plot", col="forestgreen") 
qqline(dat234[,2], col="forestgreen")

# Read in the data table for 2.29
dat229<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/229Table.csv")
names(dat229)[names(dat229) == "X95.C"] <- "95°C"
names(dat229)[names(dat229) == "X100.C"] <- "100°C"
dat229

# photoresist thickness for wafers baked at 95°C and 100°C on a normal probability plot,
qqnorm(dat229[,1], ylab="photoresist thickness (in kA), Sample Quantiles", xlab="Standard Normal Quantiles", main="95°C Normal Probability Plot", col="steelblue") 
qqline(dat229[,1], col="steelblue")
qqnorm(dat229[,2], ylab="photoresist thickness (in kA), Sample Quantiles", xlab="Standard Normal Quantiles", main="100°C Normal Probability Plot", col="firebrick2") 
qqline(dat229[,2], col="firebrick2")

# The standard deviations of each distribution, σ1 and σ2
sd(dat229[,1])
sd(dat229[,2])
# Using Cohen's d formula for pooled variance, which should be between, σ1 and σ2
sqrt((7*(sd(dat229[,1])^2) + 7*(sd(dat229[,2])^2))/14)
# Checking the answer with an R effectsize library function
library(effectsize)
sd_pooled(dat229[,1], dat229[,2])
library(pwr)
#power for 2-Sample t-test to detect difference in the means of 2.5 kA.
pwr.t.test(n=8, d=sqrt((7*(sd(dat229[,1])^2) + 7*(sd(dat229[,2])^2))/14), sig.level=0.05, power=NULL, type="two.sample")

# Read in the data table for 2.27
dat227<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/227Table.csv")
names(dat227)[names(dat227) == "ï...SCCM."] <- "(SCCM)"
names(dat227)[names(dat227) == "X125"] <- "125"
names(dat227)[names(dat227) == "X200"] <- "200"
dat227

# Run Wilcox test with 2 parameters a.ka. a Mann-Whitney-U test, with default 95% confidence level
wilcox.test(dat227[,2], dat227[,3])