a)
Hypothesis to be tested
Ho: \(\mu_1 = \mu_2\) - Null Hypothesis for difference in mean diameter
Ha: \(\mu_1 \neq \mu_2\) - Alternative Hypothesis for difference in mean diameter
where:
\(\sigma^2_1\) = 0.0000014772, \(\sigma^2_2\) = 0.00000309, \(n_1\) = 12, \(n_2\) = 12
\(\mu_1\) is the mean of caliper 1
\(\mu_2\) is the mean of caliper 2
\(\sigma^2_1\) is variance of caliper 1
\(\sigma^2_2\) is vairance of caliper 2
Note sample sizes are not large and we can’t be certain to assume normality of the data unless we perform some data visualization
dball<-data.frame("caliper1"=c(0.265,0.265,0.266,0.267,0.267,0.265,0.267,0.267,0.265,0.268,0.268,0.265),"caliper2"=c(0.264,0.265,0.264,0.266,0.267,0.268,0.264,0.265,0.265,0.267,0.268,0.269))
dm1<-mean(dball$caliper1)
dm2<-mean(dball$caliper2)
dv1<-var(dball$caliper1)
dv2<-var(dball$caliper2)
c(dm1,dm2,dv1,dv2)
## [1] 2.662500e-01 2.660000e-01 1.477273e-06 3.090909e-06
Testing Hypothesis
Before testing the hypothesis, we need to clarify the assumption that data is normal. First we check the boxplot to see how spread out the data samples which gives us an idea of normaility
#boxplot to compare variance
boxplot(dball$caliper1, dball$caliper2, names = c("caliper1", "caliper2"), main="Comparing Boxplot for caliper1 and caliper2")
From the boxplot, caliper2 is more spread out than caliper1 and there seems to be no presence of skewness in the plot.
T-Statistic of original Data for paired t-test
t.test(dball$caliper1,dball$caliper2,paired = TRUE)
##
## Paired t-test
##
## data: dball$caliper1 and dball$caliper2
## t = 0.43179, df = 11, p-value = 0.6742
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.001024344 0.001524344
## sample estimates:
## mean of the differences
## 0.00025
Conclusion
In conclusion, we Fail to reject the null hypothesis Ho at /to/ = 0.43179, since P-value at 0.6742 is greater than \(\alpha\) = 0.05 is the probability Ho is true
The 95% confidence interval are -0.001024344 and 0.001524344
a)
Hypothesis to be tested
Ho: \(\mu_1 = \mu_2\) - Null Hypothesis that there is no difference in mean
Ha: \(\mu_1 \neq \mu_2\) - Alternative Hypothesis that there is a difference in the mean
where:
\(\sigma^2_1\) = 0.0213, \(\sigma^2_2\) = 0.0024, \(n_1\) = 10, \(n_2\) = 10
\(\mu_1\) is the mean of Karlsruhe method
\(\mu_2\) is the mean of lehigh method
\(\sigma^2_1\) is variance of Karlsruhe method
\(\sigma^2_2\) is vairance of lehigh method
Note sample sizes are not large so we cant assume central limit theorem. We may also want to check normality in the data.
girdata<-data.frame("Girder"=c("S1/1","S2/1","S3/1","S4/1","S5/1","S2/1","S2/2","S2/3","S2/4"),"karlsruhe"=c(1.186,1.151,1.322,1.339,1.200,1.402,1.365,1.537,1.559),"lehigh"=c(1.061,0.992,1.063,1.062,1.065,1.178,1.037,1.086,1.052))
#head(girdata)
gm1<-mean(girdata$karlsruhe)
gm2<-mean(girdata$lehigh)
gv1<-var(girdata$karlsruhe)
gv2<-var(girdata$lehigh)
c(gm1,gm2,gv1,gv2)
## [1] 1.340111111 1.066222222 0.021325111 0.002438444
Testing Hypothesis
Before testing the hypothesis, we need to clarify the assumption that the data is normally distributed. First we check the boxplot to see how spread out the data samples which gives us an idea if its normal or run a normal probability plot
cor(girdata$karlsruhe, girdata$lehigh)
## [1] 0.3821669
The correlation shows that data from the methods are positively correlated, hence we can do a paired t-test.
boxplot(girdata$karlsruhe, girdata$lehigh, names = c("karlsruhe", "lehigh"), main="Comparing Boxplot for both methods")
From the boxplot, Karlsruhe method data appears to be more spread out than lehigh method and there seems to be evidence of skewness in lehigh method at upper and lower tail of the boxplot.
T-Statistic for for Paired T-test
t.test(girdata$karlsruhe, girdata$lehigh,paired=TRUE)
##
## Paired t-test
##
## data: girdata$karlsruhe and girdata$lehigh
## t = 6.0819, df = 8, p-value = 0.0002953
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1700423 0.3777355
## sample estimates:
## mean of the differences
## 0.2738889
Conclusion
In conclusion, we reject null hypothesis Ho at /to/ = 6.0819, since P-value at 0.0002953 is lower than \(\alpha\) = 0.05 is the probability Ho is false
The 95% confidence interval are 0.1700423 and 0.3777355
Mean of the differences is 0.2738889
qqnorm(girdata$lehigh, main = "Normal Distribution of Lehigh Method", xlab = "x", ylab = "ratio points")
qqline(girdata$lehigh)
The data appears to be rightly skewed with presence of outliers at both extremes. The 25th and 75th quantile seems to have more data accumulated closer to mean
qqnorm(girdata$karlsruhe, main = "Normal Distribution of karlsruhe Method", xlab = "x", ylab = "ratio points")
qqline(girdata$karlsruhe)
From the plot the data appears to be normal with points closer to the mean when compared with the lehigh method.
Conclusion
In conclusion for a paired T-test, certain assumptions need to be present before we perform the test. We must assume normality of the data and this can be validated by visualizing the data with methods such as normal probability plot, boxplots. This is usually down with a small sample size below 30. For a large sample size we assume normality if its above 30, hence it meets the central limit theorem.
e)
Hypothesis to be tested
Ho: \(\mu_1 = \mu_2\) - Null Hypothesis is the mean are equal
Ha: \(\mu_1 \neq \mu_2\) - Alternative Hypothesis is the low temp mean is greater than high temp mean
Note sample sizes are not large and we can’t be certain to assume population variances are equal or not equal
Temp1<-data.frame("kA95C"=c(11.176,7.089,8.097,11.739,11.291,10.759,6.467,8.315))
Temp2<-data.frame("kA100C"=c(5.263,6.748,7.461,7.015,8.133,7.418,3.772,8.963))
mean(Temp1$kA95C)
## [1] 9.366625
mean(Temp2$kA100C)
## [1] 6.846625
var(Temp1$kA95C)
## [1] 4.40817
var(Temp2$kA100C)
## [1] 2.690999
Testing normality assumption
qqnorm(Temp1$kA95C,main="Normal plot for low temperature", xlab = 'temp', ylab = 'kA')
qqline(Temp1$kA95C, datax = FALSE, distribution = qnorm,
probs = c(0.25, 0.75), qtype = 7)
From the plot, the low temperature data appears to be on a straight line, with few exception to an extreme value. Other than that low temp sample is normal
qqnorm(Temp2$kA100C,main="Normal plot for high temperature", xlab = 'temp', ylab = 'kA')
qqline(Temp2$kA100C, datax = FALSE, distribution = qnorm,
probs = c(0.25, 0.75), qtype = 7)
From the plot, the high temperature data appears to be on a straight line, with few exception of outliers. Other than that high temp sample is normal
Also its important to note that smaller sample sizes results in non-normal distribution as the size has a significant effect on the distribution
f) Power
library(pwr)
pwr.t.test(n=NULL,d=-1.5,sig.level=0.05,power=.9,type="two.sample",alternative="two.sided")
##
## Two-sample t test power calculation
##
## n = 10.40147
## d = 1.5
## sig.level = 0.05
## power = 0.9
## alternative = two.sided
##
## NOTE: n is number in *each* group
From the result we would need to collect 11 sample sizes each such that there would be a mean difference of 1.5kA for a 90% power chance to reject null hypothesis Ho with \(\alpha\) = 0.05 using two-sided two-sample t-test
a)
Hypothesis to be tested
Ho: \(\mu_1 = \mu_2\) - Null Hypothesis that flow rate average are equal
Ha: \(\mu_1 \neq \mu_2\) - Alternative Hypothesis that flow rate average are not equal
Note sample sizes are not large and we can’t be certain to assume population variances are equal or not equal. Also its a continuous data so we have to check normality with a normal probability plot
cfgas<-data.frame("125flrate"=c(2.7,4.6,2.6,3.0,3.2,3.8), "200flrate"=c(4.6,3.4,2.9,3.5,4.1,5.1))
qqnorm(cfgas$X125flrate,main="Normal plot for 125 flow rate", xlab = 'x', ylab = "cf*")
qqline(cfgas$X125flrate,probs = c(0.25, 0.75), qtype = 7)
From the plot, the data sample obtained from the 125 flow rate appears to be on a straight line with few exceptions of outliers at tail ends of the distribution.
qqnorm(cfgas$X200flrate,main="Normal plot for 200 flow rate", xlab = 'x', ylab = "cf*")
qqline(cfgas$X200flrate,probs = c(0.25, 0.75), qtype = 7)
From the plot, the data sample obtained from the 200 flow rate appears to be on a straight line which indicates that it is normally distributed.
T-Statistic for using Non-parametric method To test the hypothesis, we would be using the mann whitney U test
wilcox.test(cfgas$X125flrate, cfgas$X200flrate)
## Warning in wilcox.test.default(cfgas$X125flrate, cfgas$X200flrate): cannot
## compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: cfgas$X125flrate and cfgas$X200flrate
## W = 9.5, p-value = 0.1994
## alternative hypothesis: true location shift is not equal to 0
Conclusion
In conclusion, we fail to reject null hypothesis Ho at /to/ = 9.5, since P-value at 0.1994 is greater than \(\alpha\) = 0.05 is the probability Ho is true
### Question 2.32
#**a)**
#***Hypothesis to be tested***
#**H~o~**: $\mu_1 = \mu_2$
#**H~a~**: $\mu_1 \neq \mu_2$
dball<-data.frame("caliper1"=c(0.265,0.265,0.266,0.267,0.267,0.265,0.267,0.267,0.265,0.268,0.268,0.265),"caliper2"=c(0.264,0.265,0.264,0.266,0.267,0.268,0.264,0.265,0.265,0.267,0.268,0.269))
dm1<-mean(dball$caliper1)
dm2<-mean(dball$caliper2)
dv1<-var(dball$caliper1)
dv2<-var(dball$caliper2)
c(dm1,dm2,dv1,dv2)
#***Testing Hypothesis***
#boxplot to compare variance
boxplot(dball$caliper1, dball$caliper2, names = c("caliper1", "caliper2"), main="Comparing Boxplot for caliper1 and caliper2")
#***T-Statistic of original Data for paired t-test***
t.test(dball$caliper1,dball$caliper2,paired = TRUE)
### Question 2.34
#**a)**
#***Hypothesis to be tested***
#**H~o~**: $\mu_1 = \mu_2$
#**H~a~**: $\mu_1 \neq \mu_2$
girdata<-data.frame("Girder"=c("S1/1","S2/1","S3/1","S4/1","S5/1","S2/1","S2/2","S2/3","S2/4"),"karlsruhe"=c(1.186,1.151,1.322,1.339,1.200,1.402,1.365,1.537,1.559),"lehigh"=c(1.061,0.992,1.063,1.062,1.065,1.178,1.037,1.086,1.052))
#head(girdata)
gm1<-mean(girdata$karlsruhe)
gm2<-mean(girdata$lehigh)
gv1<-var(girdata$karlsruhe)
gv2<-var(girdata$lehigh)
c(gm1,gm2,gv1,gv2)
#***Testing Hypothesis***
cor(girdata$karlsruhe, girdata$lehigh)
boxplot(girdata$karlsruhe, girdata$lehigh, names = c("karlsruhe", "lehigh"), main="Comparing Boxplot for both methods")
#***T-Statistic for for Paired T-test***
t.test(girdata$karlsruhe, girdata$lehigh,paired=TRUE)
qqnorm(girdata$lehigh, main = "Normal Distribution of Lehigh Method", xlab = "x", ylab = "ratio points")
qqline(girdata$lehigh)
qqnorm(girdata$karlsruhe, main = "Normal Distribution of karlsruhe Method", xlab = "x", ylab = "ratio points")
qqline(girdata$karlsruhe)
### Question 2.29
#**e)**
#***Hypothesis to be tested***
#**H~o~**: $\mu_1 = \mu_2$
#**H~a~**: $\mu_1 \neq \mu_2$
Temp1<-data.frame("kA95C"=c(11.176,7.089,8.097,11.739,11.291,10.759,6.467,8.315))
Temp2<-data.frame("kA100C"=c(5.263,6.748,7.461,7.015,8.133,7.418,3.772,8.963))
mean(Temp1$kA95C)
mean(Temp2$kA100C)
var(Temp1$kA95C)
var(Temp2$kA100C)
#***Testing normality assumption***
qqnorm(Temp1$kA95C,main="Normal plot for low temperature", xlab = 'temp', ylab = 'kA')
qqline(Temp1$kA95C, datax = FALSE, distribution = qnorm,
probs = c(0.25, 0.75), qtype = 7)
qqnorm(Temp2$kA100C,main="Normal plot for high temperature", xlab = 'temp', ylab = 'kA')
qqline(Temp2$kA100C, datax = FALSE, distribution = qnorm,
probs = c(0.25, 0.75), qtype = 7)
#**f)**
#***Power***
library(pwr)
pwr.t.test(n=NULL,d=-1.5,sig.level=0.05,power=.9,type="two.sample",alternative="two.sided")
### Question 2.27
#**a)**
#***Hypothesis to be tested***
#**H~o~**: $\mu_1 = \mu_2$
#**H~a~**: $\mu_1 \neq \mu_2$
# Checking normality
cfgas<-data.frame("125flrate"=c(2.7,4.6,2.6,3.0,3.2,3.8), "200flrate"=c(4.6,3.4,2.9,3.5,4.1,5.1))
qqnorm(cfgas$X125flrate,main="Normal plot for 125 flow rate", xlab = 'x', ylab = "cf*")
qqline(cfgas$X125flrate,probs = c(0.25, 0.75), qtype = 7)
qqnorm(cfgas$X200flrate,main="Normal plot for 200 flow rate", xlab = 'x', ylab = "cf*")
qqline(cfgas$X200flrate,probs = c(0.25, 0.75), qtype = 7)
#***T-Statistic for using Non-parametric method***
#To test the hypothesis, we would be using the mann whitney U test
wilcox.test(cfgas$X125flrate, cfgas$X200flrate)