Question 1

The diameter of a ball bearing was measured by 12 inspectors, each using two different kinds of calipers. The results were:

Inspector <- c(1    ,2, 3,  4,  5,  6,  7,  8,  9,  10, 11, 12)
Caliper1 <- c(  0.265,0.265 ,0.266, 0.267   ,0.267, 0.265   ,0.267  ,0.267, 0.265   ,0.268,0.268,   0.265)
Caliper2 <- c(  0.264,0.265,    0.264   ,0.266, 0.267,  0.268   ,0.264, 0.265   ,0.265, 0.267,  0.268   ,0.269)
dafr <- data.frame(Inspector,Caliper1,Caliper2)

(a) Is there a significant difference between the means of the population of measurements from which the two samples were selected? Use a= 0.05.

qqnorm(abs(Caliper1-Caliper2),main='Difference Between Points Q-Q plot')

First checking for normality, it does not seem to me that these two are normally distributed. Because these two samples are paired, we plot the difference between them. Because we do not have normality, we use a Wilcox test.

Ho: mu1=mu2 or Caliper1’s mean and Caliper2’s mean are not different

Ha: mu1≠mu2 or Caliper1’s mean and Caliper2’s mean are different

wilcox.test(Caliper1,Caliper2,paired=TRUE,conf.int=TRUE)

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  Caliper1 and Caliper2
## V = 21.5, p-value = 0.6721
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -0.001529272  0.002024144
## sample estimates:
## (pseudo)median 
##   0.0009999638

Because our p-value is greater then our alpha, we do not reject the null hypothesis and state that there is not a significant difference between the two means

(b) Find the P-value for the test in part (a).

The p-value was given as .6721

(c) Construct a 95 percent confidence interval on the difference in mean diameter measurements for the two types of calipers.

The confidence interval at 95% was given as (-.001 & .002)

Question 2

An article in the Journal of Strain Analysis (vol. 18, no. 2, 1983) compares several procedures for predicting the shear strength for steel plate girders. Data for nine girders in the form of the ratio of predicted to observed load for two of these procedures, the Karlsruhe and Lehigh methods, are as follows:

Girder <- c('S1/1', 'S2/1', 'S3/1', 'S4/1', 'S5/1', 'S2/1', 'S2/2', 'S2/3', 'S2/4')
Karlsruhe <- c(1.186,   1.151   ,1.322, 1.339,  1.2,    1.402,  1.365   ,1.537, 1.559)
Lehigh <- c(1.061   ,0.992, 1.063,  1.062   ,1.065, 1.178,  1.037   ,1.086, 1.052)
dafr <- data.frame(Girder,Karlsruhe,Lehigh)

(a) Is there any evidence to support a claim that there is a difference in mean performance between the two methods? Use a= 0.05.

Testing for equal variances:

var.test(Karlsruhe, Lehigh, alternative = "two.sided")

## 
##  F test to compare two variances
## 
## data:  Karlsruhe and Lehigh
## F = 8.7454, num df = 8, denom df = 8, p-value = 0.006008
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##   1.972674 38.770520
## sample estimates:
## ratio of variances 
##           8.745375

Our ratio of variances is not equal, by this Ftest(this test assumes normality, and that assumption is proven in later questions)

Therefore, we must use a unpooled variances.

Ho: mu1=mu2 or Karlsruhe’s mean and Lehigh’s mean are not different

Ha: mu1≠mu2 or Karlsruhe’s mean and Lehigh’s mean are different

t.test(Karlsruhe,Lehigh,paired=TRUE,var.equal=FALSE)

## 
##  Paired t-test
## 
## data:  Karlsruhe and Lehigh
## t = 6.0819, df = 8, p-value = 0.0002953
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1700423 0.3777355
## sample estimates:
## mean of the differences 
##               0.2738889

Yes, there is enough evidence to state the the two means are not equal because our pvalue is below our alpha.

(b) What is the P-value for the test in part (a)?

Our pvalue is stated to be .0002953

(c) Construct a 95 percent confidence interval for the difference in mean predicted to observed load.

Our confidence interval on the difference between the loads was (.1700 & .3777)

(d) Investigate the normality assumption for both samples.

qqnorm(Karlsruhe,main='Karlsruhe Q-Q plot')

qqnorm(Lehigh,main='Lehigh Q-Q plot')

It is a little hard to tell, given that we have so few data points, but the data does not appear to be centered in a line, but rather seems to drift around towards the ends

(e) Investigate the normality assumption for the difference in ratios for the two methods.

qqnorm(abs(Karlsruhe-Lehigh),main='Difference Between both Q-Q Plot')

It may be arguable, but this does look to me to be Normally distributed, since they are mostly centered around a center line.

(f) Discuss the role of the normality assumption in the paired t-test.

For the paired t test, we are checking to see if the differences between the two samples is normally distributed If it is not, then our paired t test should not be performed, as it relies on the assumption that it is normal.

Question 3

Photoresist is a light-sensitive material applied to semiconductor wafers so that the circuit pattern can be imaged on to the wafer. After application, the coated wafers are baked to remove the solvent in the photoresist mixture and to harden the resist. Here are measurements of photoresist thickness (in kA) for eight wafers baked at two different temperatures. Assume that all of the runs were made in random order.

  data <- c(11.176, 7.089,  8.097,  11.739  ,11.291 ,10.759 ,6.467  ,8.315  ,5.263  ,6.748  ,7.461  ,7.015  ,8.133, 7.418,  3.772,  8.963)
temp <- c(95,   95, 95, 95, 95, 95, 95, 95  ,100    ,100,   100,    100 ,100    ,100    ,100    ,100)

e) Check the assumption of normality of the photoresist thickness.

qqnorm(data[1:8])
qqline(data[1:8],main='Temp 95 Q-Q Plot')

qqnorm(data[8:16])
qqline(data[8:16],main='Temp 100 Q-Q Plot')

The data does appear to be normally distributed

f)Find the power of this test for detecting an actual difference in means of 2.5 kA.

library(pwr)
pwr.t.test(n=8,d=(abs(mean(data[1:8])-mean(data[8:16]))/sd(data)),sig.level=0.05,power=NULL,type="two.sample",alternative="two.sided")

## 
##      Two-sample t test power calculation 
## 
##               n = 8
##               d = 1.053342
##       sig.level = 0.05
##           power = 0.5006574
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

The power of this test is .5, which means we are 50% confident that we will correctly reject our original hypothesis (that the two means are equal)

Question 4

An article in Solid State Technology, “Orthogonal Design for Process Optimization and Its Application to Plasma Etching” by G. Z. Yin and D. W. Jillie (May 1987) describes an experiment to determine the effect of the C2F6 flow rate on the uniformity of the etch on a silicon wafer used in integrated circuit manufacturing. All of the runs were made in random order. Data for two flow rates are as follows:

cf125 <-  c(2.7, 4.6 ,2.6 ,3.0 ,3.2 ,3.8)
cf200 <- c(4.6, 3.4, 2.9, 3.5 ,4.1 ,5.1)

(a) Does the C2F6 flow rate affect average etch uniformity? Use a= 0.05

qqnorm(cf125,main='C2F6 Flow Rate 125 Q-Q Plot')

qqnorm(cf200,main='C2F6 Flow Rate 200 Q-Q Plot')

It’s somewhat difficult to tell, but they don’t appear to be not normally distributed

We should then test to see if the Variances are equal

boxplot(cf125,cf200,names=c('CF125','CF200'))

The variances seem equal, based on visual inspection. Both have the same kind of spread of quartiles.

t.test(Karlsruhe,Lehigh)

## 
##  Welch Two Sample t-test
## 
## data:  Karlsruhe and Lehigh
## t = 5.3302, df = 9.8059, p-value = 0.0003557
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1590886 0.3886892
## sample estimates:
## mean of x mean of y 
##  1.340111  1.066222

Since our p value is below our alpha of .05, we do reject the null hypothesis and state that C2F6 flow rate does affect etch uniformity.

All Code Used:

Inspector <- c(1    ,2, 3,  4,  5,  6,  7,  8,  9,  10, 11, 12)
Caliper1 <- c(  0.265,0.265 ,0.266, 0.267   ,0.267, 0.265   ,0.267  ,0.267, 0.265   ,0.268,0.268,   0.265)
Caliper2 <- c(  0.264,0.265,    0.264   ,0.266, 0.267,  0.268   ,0.264, 0.265   ,0.265, 0.267,  0.268   ,0.269)
dafr <- data.frame(Inspector,Caliper1,Caliper2)

qqnorm(abs(Caliper1-Caliper2),main='Difference Between Points Q-Q plot')

wilcox.test(Caliper1,Caliper2,paired=TRUE,conf.int=TRUE)

## Warning in wilcox.test.default(Caliper1, Caliper2, paired = TRUE, conf.int =
## TRUE): cannot compute exact p-value with ties

## Warning in wilcox.test.default(Caliper1, Caliper2, paired = TRUE, conf.int =
## TRUE): cannot compute exact confidence interval with ties

## Warning in wilcox.test.default(Caliper1, Caliper2, paired = TRUE, conf.int =
## TRUE): cannot compute exact p-value with zeroes

## Warning in wilcox.test.default(Caliper1, Caliper2, paired = TRUE, conf.int =
## TRUE): cannot compute exact confidence interval with zeroes

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  Caliper1 and Caliper2
## V = 21.5, p-value = 0.6721
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -0.001529272  0.002024144
## sample estimates:
## (pseudo)median 
##   0.0009999638

Girder <- c('S1/1', 'S2/1', 'S3/1', 'S4/1', 'S5/1', 'S2/1', 'S2/2', 'S2/3', 'S2/4')
Karlsruhe <- c(1.186,   1.151   ,1.322, 1.339,  1.2,    1.402,  1.365   ,1.537, 1.559)
Lehigh <- c(1.061   ,0.992, 1.063,  1.062   ,1.065, 1.178,  1.037   ,1.086, 1.052)
dafr <- data.frame(Girder,Karlsruhe,Lehigh)

var.test(Karlsruhe, Lehigh, alternative = "two.sided")

## 
##  F test to compare two variances
## 
## data:  Karlsruhe and Lehigh
## F = 8.7454, num df = 8, denom df = 8, p-value = 0.006008
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##   1.972674 38.770520
## sample estimates:
## ratio of variances 
##           8.745375

t.test(Karlsruhe,Lehigh,paired=TRUE,var.equal=FALSE)

## 
##  Paired t-test
## 
## data:  Karlsruhe and Lehigh
## t = 6.0819, df = 8, p-value = 0.0002953
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1700423 0.3777355
## sample estimates:
## mean of the differences 
##               0.2738889

qqnorm(Karlsruhe,main='Karlsruhe Q-Q plot')

qqnorm(Lehigh,main='Lehigh Q-Q plot')

qqnorm(abs(Karlsruhe-Lehigh),main='Difference Between Both Q-Q Plot')

  data <- c(11.176, 7.089,  8.097,  11.739  ,11.291 ,10.759 ,6.467  ,8.315  ,5.263  ,6.748  ,7.461  ,7.015  ,8.133, 7.418,  3.772,  8.963)
temp <- c(95,   95, 95, 95, 95, 95, 95, 95  ,100    ,100,   100,    100 ,100    ,100    ,100    ,100)

qqnorm(data[1:8])
qqline(data[1:8],main='Temp 95 Q-Q Plot')

qqnorm(data[8:16])
qqline(data[8:16],main='Temp 100 Q-Q Plot')

library(pwr)
pwr.t.test(n=8,d=(abs(mean(data[1:8])-mean(data[8:16]))/sd(data)),sig.level=0.05,power=NULL,type="two.sample",alternative="two.sided")

## 
##      Two-sample t test power calculation 
## 
##               n = 8
##               d = 1.053342
##       sig.level = 0.05
##           power = 0.5006574
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

cf125 <-  c(2.7, 4.6 ,2.6 ,3.0 ,3.2 ,3.8)
cf200 <- c(4.6, 3.4, 2.9, 3.5 ,4.1 ,5.1)

qqnorm(cf125,main='C2F6 Flow Rate 125 Q-Q Plot')

qqnorm(cf200,main='C2F6 Flow Rate 200 Q-Q Plot')

boxplot(cf125,cf200,names=c('CF125','CF200'))

t.test(Karlsruhe,Lehigh)

## 
##  Welch Two Sample t-test
## 
## data:  Karlsruhe and Lehigh
## t = 5.3302, df = 9.8059, p-value = 0.0003557
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1590886 0.3886892
## sample estimates:
## mean of x mean of y 
##  1.340111  1.066222

Homework 2

Kathryn Vernon

9/17/2021