Homework 3

Question 2.32

Hypothesis to be tested

H_o: \(\mu_1 = \mu_2\) - Null Hypothesis for difference in mean diameter

H_a: \(\mu_1 \neq \mu_2\) - Alternative Hypothesis for difference in mean diameter

where:

\(\sigma^2_1\) = 0.0000014772, \(\sigma^2_2\) = 0.00000309, \(n_1\) = 12, \(n_2\) = 12

\(\mu_1\) is the mean of caliper 1

\(\mu_2\) is the mean of caliper 2

\(\sigma^2_1\) is variance of caliper 1

\(\sigma^2_2\) is vairance of caliper 2

Note sample sizes are not large and we can’t be certain to assume normality of the data unless we perform some data visualization

dball<-data.frame("caliper1"=c(0.265,0.265,0.266,0.267,0.267,0.265,0.267,0.267,0.265,0.268,0.268,0.265),"caliper2"=c(0.264,0.265,0.264,0.266,0.267,0.268,0.264,0.265,0.265,0.267,0.268,0.269))

dm1<-mean(dball$caliper1)
dm2<-mean(dball$caliper2)
dv1<-var(dball$caliper1)
dv2<-var(dball$caliper2)

c(dm1,dm2,dv1,dv2)

## [1] 2.662500e-01 2.660000e-01 1.477273e-06 3.090909e-06

Testing Hypothesis

Before testing the hypothesis, we need to clarify the assumption that data is normal. First we check the boxplot to see how spread out the data samples which gives us an idea of normaility

#boxplot to compare variance
boxplot(dball$caliper1, dball$caliper2, names = c("caliper1", "caliper2"), main="Comparing Boxplot for caliper1 and caliper2")

From the boxplot, caliper2 is more spread out than caliper1 and there seems to be no presence of skewness in the plot.

T-Statistic of original Data for paired t-test

t.test(dball$caliper1,dball$caliper2,paired = TRUE)

## 
##  Paired t-test
## 
## data:  dball$caliper1 and dball$caliper2
## t = 0.43179, df = 11, p-value = 0.6742
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.001024344  0.001524344
## sample estimates:
## mean of the differences 
##                 0.00025

Conclusion

In conclusion, we Fail to reject the null hypothesis H_o at /t_o/ = 0.43179, since P-value at 0.6742 is greater than \(\alpha\) = 0.05 is the probability H_o is true

The 95% confidence interval are -0.001024344 and 0.001524344

Question 2.34

Hypothesis to be tested

H_o: \(\mu_1 = \mu_2\) - Null Hypothesis that there is no difference in mean

H_a: \(\mu_1 \neq \mu_2\) - Alternative Hypothesis that there is a difference in the mean

where:

\(\sigma^2_1\) = 0.0213, \(\sigma^2_2\) = 0.0024, \(n_1\) = 10, \(n_2\) = 10

\(\mu_1\) is the mean of Karlsruhe method

\(\mu_2\) is the mean of lehigh method

\(\sigma^2_1\) is variance of Karlsruhe method

\(\sigma^2_2\) is vairance of lehigh method

Note sample sizes are not large so we cant assume central limit theorem. We may also want to check normality in the data.

girdata<-data.frame("Girder"=c("S1/1","S2/1","S3/1","S4/1","S5/1","S2/1","S2/2","S2/3","S2/4"),"karlsruhe"=c(1.186,1.151,1.322,1.339,1.200,1.402,1.365,1.537,1.559),"lehigh"=c(1.061,0.992,1.063,1.062,1.065,1.178,1.037,1.086,1.052))
#head(girdata)
gm1<-mean(girdata$karlsruhe)
gm2<-mean(girdata$lehigh)
gv1<-var(girdata$karlsruhe)
gv2<-var(girdata$lehigh)

c(gm1,gm2,gv1,gv2)

## [1] 1.340111111 1.066222222 0.021325111 0.002438444

Testing Hypothesis

Before testing the hypothesis, we need to clarify the assumption that the data is normally distributed. First we check the boxplot to see how spread out the data samples which gives us an idea if its normal or run a normal probability plot

cor(girdata$karlsruhe, girdata$lehigh)

## [1] 0.3821669

The correlation shows that data from the methods are positively correlated, hence we can do a paired t-test.

boxplot(girdata$karlsruhe, girdata$lehigh, names = c("karlsruhe", "lehigh"), main="Comparing Boxplot for both methods")

From the boxplot, Karlsruhe method data appears to be more spread out than lehigh method and there seems to be evidence of skewness in lehigh method at upper and lower tail of the boxplot.

T-Statistic for for Paired T-test

t.test(girdata$karlsruhe, girdata$lehigh,paired=TRUE)

## 
##  Paired t-test
## 
## data:  girdata$karlsruhe and girdata$lehigh
## t = 6.0819, df = 8, p-value = 0.0002953
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1700423 0.3777355
## sample estimates:
## mean of the differences 
##               0.2738889

Conclusion

In conclusion, we reject null hypothesis H_o at /t_o/ = 6.0819, since P-value at 0.0002953 is lower than \(\alpha\) = 0.05 is the probability H_o is false

The 95% confidence interval are 0.1700423 and 0.3777355

Mean of the differences is 0.2738889

qqnorm(girdata$lehigh, main = "Normal Distribution of Lehigh Method", xlab = "x", ylab = "ratio points")
qqline(girdata$lehigh)

The data appears to be rightly skewed with presence of outliers at both extremes. The 25th and 75th quantile seems to have more data accumulated closer to mean

qqnorm(girdata$karlsruhe, main = "Normal Distribution of karlsruhe Method", xlab = "x", ylab = "ratio points")
qqline(girdata$karlsruhe)

From the plot the data appears to be normal with points closer to the mean when compared with the lehigh method.

Conclusion

In conclusion for a paired T-test, certain assumptions need to be present before we perform the test. We must assume normality of the data and this can be validated by visualizing the data with methods such as normal probability plot, boxplots. This is usually down with a small sample size below 30. For a large sample size we assume normality if its above 30, hence it meets the central limit theorem.

Question 2.29

Hypothesis to be tested

H_o: \(\mu_1 = \mu_2\) - Null Hypothesis is the mean are equal

H_a: \(\mu_1 \neq \mu_2\) - Alternative Hypothesis is the low temp mean is greater than high temp mean

Note sample sizes are not large and we can’t be certain to assume population variances are equal or not equal

Temp1<-data.frame("kA95C"=c(11.176,7.089,8.097,11.739,11.291,10.759,6.467,8.315))
Temp2<-data.frame("kA100C"=c(5.263,6.748,7.461,7.015,8.133,7.418,3.772,8.963))

mean(Temp1$kA95C)

## [1] 9.366625

mean(Temp2$kA100C)

## [1] 6.846625

var(Temp1$kA95C)

## [1] 4.40817

var(Temp2$kA100C)

## [1] 2.690999

Testing normality assumption

qqnorm(Temp1$kA95C,main="Normal plot for low temperature", xlab = 'temp', ylab = 'kA')
qqline(Temp1$kA95C, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7)

From the plot, the low temperature data appears to be on a straight line, with few exception to an extreme value. Other than that low temp sample is normal

qqnorm(Temp2$kA100C,main="Normal plot for high temperature", xlab = 'temp', ylab = 'kA')
qqline(Temp2$kA100C, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7)

From the plot, the high temperature data appears to be on a straight line, with few exception of outliers. Other than that high temp sample is normal

Also its important to note that smaller sample sizes results in non-normal distribution as the size has a significant effect on the distribution

f) Power

library(pwr)
pwr.t.test(n=NULL,d=-1.5,sig.level=0.05,power=.9,type="two.sample",alternative="two.sided")

## 
##      Two-sample t test power calculation 
## 
##               n = 10.40147
##               d = 1.5
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

From the result we would need to collect 11 sample sizes each such that there would be a mean difference of 1.5kA for a 90% power chance to reject null hypothesis H_o with \(\alpha\) = 0.05 using two-sided two-sample t-test

Question 2.27

Hypothesis to be tested

H_o: \(\mu_1 = \mu_2\) - Null Hypothesis that flow rate average are equal

H_a: \(\mu_1 \neq \mu_2\) - Alternative Hypothesis that flow rate average are not equal

Note sample sizes are not large and we can’t be certain to assume population variances are equal or not equal. Also its a continuous data so we have to check normality with a normal probability plot

cfgas<-data.frame("125flrate"=c(2.7,4.6,2.6,3.0,3.2,3.8), "200flrate"=c(4.6,3.4,2.9,3.5,4.1,5.1))

qqnorm(cfgas$X125flrate,main="Normal plot for 125 flow rate", xlab = 'x', ylab = "cf*")
qqline(cfgas$X125flrate,probs = c(0.25, 0.75), qtype = 7)

From the plot, the data sample obtained from the 125 flow rate appears to be on a straight line with few exceptions of outliers at tail ends of the distribution.

qqnorm(cfgas$X200flrate,main="Normal plot for 200 flow rate", xlab = 'x', ylab = "cf*")
qqline(cfgas$X200flrate,probs = c(0.25, 0.75), qtype = 7)

From the plot, the data sample obtained from the 200 flow rate appears to be on a straight line which indicates that it is normally distributed.

T-Statistic for using Non-parametric method To test the hypothesis, we would be using the mann whitney U test

wilcox.test(cfgas$X125flrate, cfgas$X200flrate)

## Warning in wilcox.test.default(cfgas$X125flrate, cfgas$X200flrate): cannot
## compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  cfgas$X125flrate and cfgas$X200flrate
## W = 9.5, p-value = 0.1994
## alternative hypothesis: true location shift is not equal to 0

Conclusion

In conclusion, we fail to reject null hypothesis H_o at /t_o/ = 9.5, since P-value at 0.1994 is greater than \(\alpha\) = 0.05 is the probability H_o is true

### Question 2.32
#**a)**

#***Hypothesis to be tested***
#**H~o~**: $\mu_1 =  \mu_2$
#**H~a~**: $\mu_1 \neq \mu_2$ 

dball<-data.frame("caliper1"=c(0.265,0.265,0.266,0.267,0.267,0.265,0.267,0.267,0.265,0.268,0.268,0.265),"caliper2"=c(0.264,0.265,0.264,0.266,0.267,0.268,0.264,0.265,0.265,0.267,0.268,0.269))

dm1<-mean(dball$caliper1)
dm2<-mean(dball$caliper2)
dv1<-var(dball$caliper1)
dv2<-var(dball$caliper2)

c(dm1,dm2,dv1,dv2)

#***Testing Hypothesis***
#boxplot to compare variance
boxplot(dball$caliper1, dball$caliper2, names = c("caliper1", "caliper2"), main="Comparing Boxplot for caliper1 and caliper2")

#***T-Statistic of original Data for paired t-test***
t.test(dball$caliper1,dball$caliper2,paired = TRUE)

### Question 2.34
#**a)**

#***Hypothesis to be tested***
#**H~o~**: $\mu_1 =  \mu_2$ 
#**H~a~**: $\mu_1 \neq \mu_2$ 

girdata<-data.frame("Girder"=c("S1/1","S2/1","S3/1","S4/1","S5/1","S2/1","S2/2","S2/3","S2/4"),"karlsruhe"=c(1.186,1.151,1.322,1.339,1.200,1.402,1.365,1.537,1.559),"lehigh"=c(1.061,0.992,1.063,1.062,1.065,1.178,1.037,1.086,1.052))
#head(girdata)
gm1<-mean(girdata$karlsruhe)
gm2<-mean(girdata$lehigh)
gv1<-var(girdata$karlsruhe)
gv2<-var(girdata$lehigh)

c(gm1,gm2,gv1,gv2)

#***Testing Hypothesis***
cor(girdata$karlsruhe, girdata$lehigh)

boxplot(girdata$karlsruhe, girdata$lehigh, names = c("karlsruhe", "lehigh"), main="Comparing Boxplot for both methods")

#***T-Statistic for for Paired T-test***
t.test(girdata$karlsruhe, girdata$lehigh,paired=TRUE)

qqnorm(girdata$lehigh, main = "Normal Distribution of Lehigh Method", xlab = "x", ylab = "ratio points")
qqline(girdata$lehigh)
qqnorm(girdata$karlsruhe, main = "Normal Distribution of karlsruhe Method", xlab = "x", ylab = "ratio points")
qqline(girdata$karlsruhe)

### Question 2.29
#**e)**

#***Hypothesis to be tested***
#**H~o~**: $\mu_1 =  \mu_2$
#**H~a~**: $\mu_1 \neq \mu_2$

Temp1<-data.frame("kA95C"=c(11.176,7.089,8.097,11.739,11.291,10.759,6.467,8.315))
Temp2<-data.frame("kA100C"=c(5.263,6.748,7.461,7.015,8.133,7.418,3.772,8.963))

mean(Temp1$kA95C)
mean(Temp2$kA100C)
var(Temp1$kA95C)
var(Temp2$kA100C)

#***Testing normality assumption***
qqnorm(Temp1$kA95C,main="Normal plot for low temperature", xlab = 'temp', ylab = 'kA')
qqline(Temp1$kA95C, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7)
qqnorm(Temp2$kA100C,main="Normal plot for high temperature", xlab = 'temp', ylab = 'kA')
qqline(Temp2$kA100C, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7)

#**f)**
#***Power***

library(pwr)
pwr.t.test(n=NULL,d=-1.5,sig.level=0.05,power=.9,type="two.sample",alternative="two.sided")

### Question 2.27
#**a)**

#***Hypothesis to be tested***
#**H~o~**: $\mu_1 =  \mu_2$ 
#**H~a~**: $\mu_1 \neq \mu_2$ 

# Checking normality
cfgas<-data.frame("125flrate"=c(2.7,4.6,2.6,3.0,3.2,3.8), "200flrate"=c(4.6,3.4,2.9,3.5,4.1,5.1))

qqnorm(cfgas$X125flrate,main="Normal plot for 125 flow rate", xlab = 'x', ylab = "cf*")
qqline(cfgas$X125flrate,probs = c(0.25, 0.75), qtype = 7)
qqnorm(cfgas$X200flrate,main="Normal plot for 200 flow rate", xlab = 'x', ylab = "cf*")
qqline(cfgas$X200flrate,probs = c(0.25, 0.75), qtype = 7)

#***T-Statistic for using Non-parametric method***
#To test the hypothesis, we would be using the mann whitney U test

wilcox.test(cfgas$X125flrate, cfgas$X200flrate)

Homework 3

Alex Eghorieta /

9/16/2021

Question 2.32

Question 2.34

Question 2.29

Question 2.27