library(readr)
semiconductorFrame_1_ <- read_csv("D:/Education/Statistics/hw5/semiconductorFrame(1).csv")
New names:
* `` -> ...1
Rows: 20 Columns: 3
-- Column specification ----------------------------------------------------------------------------------------
Delimiter: ","
chr (1): solution
dbl (2): ...1, etchRate
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(semiconductorFrame_1_)
deflectionFrame <- read_csv("D:/Education/Statistics/hw5/deflectionFrame.csv")
New names:
* `` -> ...1
Rows: 30 Columns: 3
-- Column specification ----------------------------------------------------------------------------------------
Delimiter: ","
dbl (3): ...1, temp, type
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(deflectionFrame)
__not sure what this is
sOne = c(91.5, 94.18, 92.18, 95.39, 91.79, 89.07, 94.72, 89.21)
(var(sOne))^(1/2)
[1] 2.385019
mean(sOne)
[1] 92.255
#5-22 –a
qqnorm(DeflectionTemp1)
qqline(DeflectionTemp1)
qqnorm(DeflectionTemp2)
qqline(DeflectionTemp2)
library(ggplot2)
ggplot(deflectionFrame) + geom_boxplot(aes(x=as.factor(type), y=temp))
–From looking at the normal Q-Q plots we can see that both do not deviate to far from the line, therefore we can say each of the sets are normal. When we take a look at the box plots, we notice that the medians are not equal to each other. I don’t think the graphs can show us the difference in variances so I decided to find the variances using R code.(down below) <-along with a few other things
DeflectionTemp1 = c(206, 193, 192, 188, 207, 210, 205, 185, 194, 187, 189, 178, 194, 213, 205)
#mean(DeflectionTemp1)
#(var(DeflectionTemp1))^(1/2)
print(c("Mean 1", mean(DeflectionTemp1)))
[1] "Mean 1" "196.4"
print(c("Standard Deviation 1", (var(DeflectionTemp1))^(1/2)))
[1] "Standard Deviation 1" "10.4799127586336"
print(c("variance 1", (var(DeflectionTemp1))))
[1] "variance 1" "109.828571428571"
DeflectionTemp2 = c(177, 176, 198, 197, 185, 188, 206, 200, 189, 201, 197, 203, 180, 192, 192)
#mean(DeflectionTemp2)
#(var(DeflectionTemp2))^(1/2)
print(c("Mean 2", mean(DeflectionTemp2)))
[1] "Mean 2" "192.066666666667"
print(c("Standard Deviation 2", (var(DeflectionTemp2))^(1/2)))
[1] "Standard Deviation 2" "9.43751379689941"
print(c("variance 2", (var(DeflectionTemp2))))
[1] "variance 2" "89.0666666666667"
t.test(temp~type, deflectionFrame, alternative = "g")
Welch Two Sample t-test
data: temp by type
t = 1.19, df = 27.698, p-value = 0.1221
alternative hypothesis: true difference in means between group 1 and group 2 is greater than 0
95 percent confidence interval:
-1.863448 Inf
sample estimates:
mean in group 1 mean in group 2
196.4000 192.0667
– using the t.test, we can find a p-value that is larger than our assumed alpha of 0.05. thus we can keep the null and conclude that the deflection temperature type 1 does exceed type 2.
t.test(DeflectionTemp1, DeflectionTemp2, mu = mean(DeflectionTemp1) - mean(DeflectionTemp2), alternative = "t", conf.level = 0.95)
Welch Two Sample t-test
data: DeflectionTemp1 and DeflectionTemp2
t = 0, df = 27.698, p-value = 1
alternative hypothesis: true difference in means is not equal to 4.333333
95 percent confidence interval:
-3.129368 11.796035
sample estimates:
mean of x mean of y
196.4000 192.0667
– not sure why we didn’t use this type of t.test to solve our problem. The p-value in this problem turned out to be 1, and I’m not sure why it is.
shapiro.test(DeflectionTemp1)
Shapiro-Wilk normality test
data: DeflectionTemp1
W = 0.93989, p-value = 0.381
shapiro.test(DeflectionTemp2)
Shapiro-Wilk normality test
data: DeflectionTemp2
W = 0.94774, p-value = 0.4895
– Something extra^^
power.t.test(delta = 5, power = 0.9, sig.level = 0.05, sd = sd(deflectionFrame$temp), type = "t", alternative = "o")
Two-sample t test power calculation
n = 69.79702
delta = 5
sd = 10.04364
sig.level = 0.05
power = 0.9
alternative = one.sided
NOTE: n is number in *each* group
The choice of n1 = n2 = 15 is not adequate to use for this situation. when we do the power.t.test we can see that our n must at least be size of 70. With a size less than 70, we would be finding data that is most likely unreliable, and therefore should not be used.
#5-23
sol1 = c(9.9, 9.4, 9.3, 9.6, 10.2, 10.6, 10.3, 10.0, 10.3, 10.1)
sol2 = c(10.2, 10.6, 10.7, 10.4, 10.5, 10.0, 10.2, 10.7, 10.4, 10.3)
etchRates = c(sol1, sol2)
rep("sol1", 10)
rep("sol2", 10)
soltype = c(rep("sol1", 10), rep("sol2",10 ))
etchFrame = data.frame(etchRates, soltype)
etchFrame
mean(sol1)
[1] 9.97
mean(sol2)
[1] 10.4
– from looking at our menas for each set, we find that the mean etch rates are not equal to each other.
t.test(etchRates~soltype, etchFrame)
Welch Two Sample t-test
data: etchRates by soltype
t = -2.8278, df = 13.952, p-value = 0.01346
alternative hypothesis: true difference in means between group sol1 and group sol2 is not equal to 0
95 percent confidence interval:
-0.7562424 -0.1037576
sample estimates:
mean in group sol1 mean in group sol2
9.97 10.40
–t.test(etchRates, soltype, mu = (mean(sol1) - mean(sol2)), alternative = “t”, conf.level = 0.95) -Tried this code, thought this would be the perfect situation to use this, but wasn’t exactly sure how to use it. Tried to find etchrates for solution 1 and solution 2 separately.
– Found a p-value of 0.01346 which is less than 0.05, thus we can reject the null hypothesis.
library(ggplot2)
ggplot(etchFrame)+stat_qq(aes(sample = etchRates, col = soltype))+stat_qq_line(aes(sample = etchRates, col = soltype))
5-33
dose20 = c(24, 28, 37, 30)
dose30 = c(37, 44, 31, 35)
bioActive = c(dose20, dose30)
dose = c(rep("dose20", 4), rep("dose30", 4))
DoseFrame = data.frame(bioActive, dose)
qqnorm(dose20)
qqline(dose20)
qqnorm(dose30)
qqline(dose30)
ggplot(DoseFrame) + geom_boxplot(aes(x=as.factor(dose), y=bioActive))
t.test(dose20, dose30, , alternative = "g")
Welch Two Sample t-test
data: dose20 and dose30
t = -1.8201, df = 6, p-value = 0.9407
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-14.47346 Inf
sample estimates:
mean of x mean of y
29.75 36.75
The difference in two means is greater than 0, which claims that the bio activity does change with the increase or decrease in dosage amounts.
I do have a little bit of a concern for the normality of each of the sets, in the top right corner of the qqnorm plots for each set, we a see a value that deviates far from the qqline, comparatively to the other values.
5-43
Car=c(1:7)
Finite_elements = c(14.58,48.52, 97.22,113.99,174.73,212.72,277.38)
Equivalent_plate=c(14.76,49.1,99.99,117.53,181.22,220.14,294.80)
JournalofAircraft = data.frame(Car, Finite_elements, Equivalent_plate)
qqnorm(Finite_elements)
qqline(Finite_elements)
qqnorm(Equivalent_plate)
qqline(Equivalent_plate)
#5-43
library(readxl)
JournAir <- read_excel("D:/Education/Statistics/hw5/JournAir.xlsx")
View(JournAir)
#Car=c(1:7)
#Finite_elements = c(14.58,48.52, 97.22,113.99,174.73,212.72,277.38)
#Equivalent_plate=c(14.76,49.1,99.99,117.53,181.22,220.14,294.80)
#JournalofAircraft = data.frame(Car, Finite_elements, Equivalent_plate)
qqnorm(JournAir$finiteElement)
qqline(JournAir$finiteElement)
qqnorm(JournAir$equiPLate)
qqline(JournAir$equiPLate)
t.test(JournAir$finiteElement, JournAir$equiPLate, paired = T)
Paired t-test
data: JournAir$finiteElement and JournAir$equiPLate
t = -2.4501, df = 6, p-value = 0.04979
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.964397921 -0.007030651
sample estimates:
mean of the differences
-5.485714
a.The p-value from our t.tests is less than our our 1 minus confidence interval, thus we can reject the null hypothesis. From the differences of means section of the code, we can conclude that means are not the same.
#5-45
library(readxl)
ImpSteel <- read_excel("D:/Education/Statistics/hw5/ImpSteel.xlsx")
View(ImpSteel)
qqnorm(ImpSteel$T1)
qqline(ImpSteel$T1)
qqnorm(ImpSteel$T2)
qqline(ImpSteel$T2)
t.test(ImpSteel$T1, ImpSteel$T2, paired = T, conf.level = 0.99)
Paired t-test
data: ImpSteel$T1 and ImpSteel$T2
t = -3.4805, df = 7, p-value = 0.01026
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
-0.426159965 0.001159965
sample estimates:
mean of the differences
-0.2125
–The p-value is greater than 0.01, and our confidence interval for this test is 0.01, thus we fail to reject the null hypothesis. From the paired t.test, we are shown that the difference in means is not equal to zero, so no, there is not sufficient evidence to conclude that both tests give the same mean impurity levels. In fact, we have sufficient evidence that supports the claim that we have non-equal means for the tests given.