2.24

Two machines are used for filling plastic bottles with a net volume of 16.0 ounces. The filling processes can be assumed to be normal, with standard deviations of σ1 = 0.015 and σ2 = 0.018. The quality engineering department suspects that both machines fill to the same net volume, whether or not this volume is 16.0 ounces. An experiment is performed by taking a random sample from the output of each machine.

Machine 1 Machine 2
16.03    16.01    16.02    16.03   
16.04    15.96    15.97    16.04   
16.05    15.98    15.96    16.02   
16.05    16.02    16.01    16.01   
16.02    15.99    15.99    16.00   


(a) State the hypotheses that should be tested in this experiment.

The hypotheses to test are:

  H0: µ1 - µ2 = 0
  H1: µ1 - µ2 ≠ 0

(b) Test these hypotheses using α = 0.05. What are your conclusions?

Let’s calculate the sample means:

# input 2.24 data
# the mean of the sample from machine, x1_bar ≅ µ1
x1 <- c(16.03, 16.01, 16.04, 15.96, 16.05, 15.98, 16.05, 16.02, 16.02, 15.99)
mean(x1)
## [1] 16.015
# the mean of the sample from machine, x2_bar ≅ µ2
x2 <- c(16.02, 16.03, 15.97, 16.04, 15.96, 16.02, 16.01, 16.01, 15.99, 16.00)
mean(x2)
## [1] 16.005

Because the distributions are assumed normal, our test statistic z0 is:

# test statistic z0
z0.score <- round((mean(x1)-mean(x2))/sqrt((0.015^2 + 0.018^2)/10),4)
z0.score
## [1] 1.3496

This yields a p-value of:

# p-value for two-tailed test
2*pnorm(q=z0.score, lower.tail=FALSE)
## [1] 0.1771443

0.177 > 0.05 = α, so we fail to reject the null hypothesis H0.

Conclusion: We are 95% confident that both machines fill to the same mean net volume.

(c) Find the P-value for this test.

The p-value = 0.1771443

(d) Find a 95 percent confidence interval on the difference in mean fill volume for the two machines.

For a 95% confidence interval, the z score is 1.96, so the confidence interval for the mean fill volume is:

2.26

The following are the burning times (in minutes) of chemical flares of two different formulations. The design engineers are interested in both the mean and variance of the burning times.

Type 1    Type 2   
65    82    64    56   
81    67    71    69   
57    59    83    74   
66    75    59    82   
82    70    65    79   
# input 2.26 data into a dataframe
x1 <- c(65, 82, 81, 67, 57, 59, 66, 75, 82, 70)
x2 <- c(64, 56, 71, 69, 83, 74, 59, 82, 65, 79)
df<-data.frame(as.numeric(c(x1,x2)),as.factor(rep(c(1,2),each=10,len=20)))
colnames(df) <- c("duration","formulation")
head(df)
##   duration formulation
## 1       65           1
## 2       82           1
## 3       81           1
## 4       67           1
## 5       57           1
## 6       59           1

(a) Test the hypothesis that the two variances are equal. Use α = 0.05 (use Levene’s test)

# run a Levene's test to compare variances using α = 0.05
library(lawstat)
levene.test(df$duration, df$formulation, location="mean", trim.alpha = 0.05)
## 
##  Classical Levene's test based on the absolute deviations from the mean
##  ( none not applied because the location is not set to median )
## 
## data:  df$duration
## Test Statistic = 0.0014598, p-value = 0.9699

The p-value = 0.9699 > 0.05 = α, so we fail to reject the null hypothesis. We are 95% confident that the distribution of durations for each formulation have the same variance.

(b) Using the results of (a), test the hypothesis that the mean burning times are equal. Use α = 0.05. What is the P-value for this test?

Because we cannot assume the durations have a normal distribution and our sample sizes are small, we must use a t-test, and the variance is likely the same, so we can pool variances.

# perform 2 sample t-test with default 95% confidence level
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
form1<-df %>% filter(formulation==1) %>% select(duration)
form2<-df %>% filter(formulation==2) %>% select(duration)
t.test(form1,form2,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  form1 and form2
## t = 0.048008, df = 18, p-value = 0.9622
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.552441  8.952441
## sample estimates:
## mean of x mean of y 
##      70.4      70.2

The p-value = 0.9622 > 0.05 = α, so we fail to reject the null hypothesis. We are 95% confident that the distribution of durations for each formulation have the same mean.

2.29

Photoresist is a light-sensitive material applied to semiconductor wafers so that the circuit pattern can be imaged on to the wafer. After application, the coated wafers are baked to remove the solvent in the photoresist mixture and to harden the resist. Here are measurements of photoresist thickness (in kA) for eight wafers baked at two different temperatures. Assume that all of the runs were made in random order.

95_C    100_C   
11.176    5.263   
7.089    6.748   
8.097    7.461   
11.739    7.015   
11.291    8.133   
10.759    7.418   
6.467    3.772   
8.315    8.963   
# input 2.29 photoresist data
x1 <- c(11.176, 7.089, 8.097, 11.739, 11.291, 10.759, 6.467, 8.315)
x2 <- c(5.263, 6.748, 7.461, 7.015, 8.133, 7.418, 3.772, 8.963)
dfwafers<-data.frame(as.numeric(c(x1,x2)),as.factor(rep(c(95,100),each=8,len=16)))
colnames(dfwafers) <- c("thickness","temperature")
head(dfwafers)
##   thickness temperature
## 1    11.176          95
## 2     7.089          95
## 3     8.097          95
## 4    11.739          95
## 5    11.291          95
## 6    10.759          95

(a) Is there evidence to support the claim that the higher baking temperature results in wafers with a lower mean photoresist thickness? Use α = 0.05.

Because we cannot assume the durations have a normal distribution and our sample sizes are small, we must use a t-test, and we are currently not certain the variances are the same so we do not pool variance.

# perform 2 sample t-test with default 95% confidence level
library(dplyr)
bake95<-dfwafers %>% filter(temperature==95) %>% select(thickness)
bake100<-dfwafers %>% filter(temperature==100) %>% select(thickness)
t.test(bake95,bake100,var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  bake95 and bake100
## t = 2.6751, df = 13.226, p-value = 0.01885
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.4884278 4.5515722
## sample estimates:
## mean of x mean of y 
##  9.366625  6.846625

Yes. We are 95% confident that the means are not equal and that the mean thickness of baking wafers at 100 Celsius is less than the mean thickness of baking wafers at 95 Celsius

(b) What is the P-value for the test conducted in part (a)?

The p-value = 0.018852 < 0.05 = α, so we reject the null hypothesis.

(c) Find a 95 percent confidence interval on the difference in means. Provide a practical interpretation of this interval.

95 percent confidence interval from R = (0.4884278, 4.5515722).

0.4884278 represents the lower bound of our estimated µ1 - µ2 meaning that we are 95% confident there is at least 0.4884 kA difference in the mean thickness of the wafers cooked 95°C and the mean thickness of wafers cooked at 100°C, likely higher, and may be up to the upper bound of 4.552 kA.

(e) Check the assumption of normality of the photoresist thickness.

With such a small samples size, a density distribution might show us the normality of the distribution.

# density distribution to check for normality
plot(density(subset(dfwafers, temperature == 95)$thickness), main="thickness (kA) of Wafers cooked at 95°C")

plot(density(subset(dfwafers, temperature == 100)$thickness), main="thickness (kA) of Wafers cooked at 100°C")

These distributions do not demonstrate normality. However, the central limit theorem supports t-test results without an assumption of normality.

Complete Code

Here we display the complete R code used in this analysis.

# input 2.24 data
# calculate the mean of the sample from machine, x1_bar ≅ µ1
x1 <- c(16.03, 16.01, 16.04, 15.96, 16.05, 15.98, 16.05, 16.02, 16.02, 15.99)
mean(x1)
# the mean of the sample from machine, x2_bar ≅ µ2
x2 <- c(16.02, 16.03, 15.97, 16.04, 15.96, 16.02, 16.01, 16.01, 15.99, 16.00)
mean(x2)

# test statistic z0
z0.score <- round((mean(x1)-mean(x2))/sqrt((0.015^2 + 0.018^2)/10),4)
z0.score

# p-value for two-tailed test
2*pnorm(q=z0.score, lower.tail=FALSE)

# input 2.26 data into a dataframe
x1 <- c(65, 82, 81, 67, 57, 59, 66, 75, 82, 70)
x2 <- c(64, 56, 71, 69, 83, 74, 59, 82, 65, 79)
df<-data.frame(as.numeric(c(x1,x2)),as.factor(rep(c(1,2),each=10,len=20)))
colnames(df) <- c("duration","formulation")
head(df)

# run a Levene's test to compare variances using α = 0.05
library(lawstat)
levene.test(df$duration, df$formulation, location="mean", trim.alpha = 0.05)

# perform 2 sample t-test with default 95% confidence level
library(dplyr)
form1<-df %>% filter(formulation==1) %>% select(duration)
form2<-df %>% filter(formulation==2) %>% select(duration)
t.test(form1,form2,var.equal=TRUE)

# input 2.29 photoresist data
x1 <- c(11.176, 7.089, 8.097, 11.739, 11.291, 10.759, 6.467, 8.315)
x2 <- c(5.263, 6.748, 7.461, 7.015, 8.133, 7.418, 3.772, 8.963)
dfwafers<-data.frame(as.numeric(c(x1,x2)),as.factor(rep(c(95,100),each=8,len=16)))
colnames(dfwafers) <- c("thickness","temperature")
head(dfwafers)

# perform 2 sample t-test with default 95% confidence level
library(dplyr)
bake95<-dfwafers %>% filter(temperature==95) %>% select(thickness)
bake100<-dfwafers %>% filter(temperature==100) %>% select(thickness)
t.test(bake95,bake100,var.equal=FALSE)

# density distribution to check for normality
plot(density(subset(dfwafers, temperature == 95)$thickness), main="thickness (kA) of Wafers cooked at 95°C")
plot(density(subset(dfwafers, temperature == 100)$thickness), main="thickness (kA) of Wafers cooked at 100°C")