Consider our model :
\[ y_{ij} = \mu_i + \epsilon_{ij} \\ \text{For :} \\ \text{i = Group} \\ \text{j = Observation} \]
Here’s what we know :
\[ \Delta_{\bar{x}} = 2.35; \\ \text{df} = 18; \\ t_{o}=2.01; \\ P_{\text{value}} = 2.98\% \]
Recall :
\[ t_o = \frac{\Delta_{\bar{x}}}{SE(\Delta_{\bar{x}})} \\ \therefore \\ SE(\Delta_{\bar{x}}) = \frac{\Delta_{\bar{x}}}{t_o} \\ \approx 1.169 \]
Consider the following R-Snippet:
pt(2.01, df = 18, lower.tail = FALSE)
## [1] 0.02983103
pt(-2.01, df = 18, lower.tail = TRUE)
## [1] 0.02983103
In other words \(P(T > t_o) \approx 3\%\) and, \(P(T < t_o) \approx 3\%\); therefore, \(P(|T| > t_o) \approx 6\%\).
Therefore this t-test is one sided; so we are either testing :
\[ H_a : \mu_1 > \mu_2 \text{ or, } H_a : \mu_1 < \mu_2 \]
Reference ( pt() function explained )
\[ P_{\text{Value}} < \alpha \implies \text{Reject Null Hyp.} \]
Recall the general form :
\[ \text{Pt. Est.} \pm (\text{Crit-Val})(\text{SE}) \]
So,
\[ \Delta_{\bar{x}} \pm t_{\frac{\alpha}{2},df}SE(\Delta_{\bar{x}}) \\ \text{For: } \alpha = 10\% ; \text{df} = 18 \]
alpha <- .1
conf_coef <- 1-alpha/2
qt(conf_coef, df = 18)
## [1] 1.734064
Therefore the \(90\%-CI\) is :
\[ 2.35 \pm 1.734064*1.169 \\ \text{or, }\\ [0.322, 4.377] \]
According to the results at \(\alpha = 10\%\), we are 90% confident that based upon our data, the true difference is between about .3 to 4.4.
Reference ( Lab 7: pt() versus qt() in R )
\[ P_{\text{Value}} < \alpha \implies \text{Reject H}_o \]
In other words, based upon random-sample, there appears to be statistically significant evidence to conclude the to conclude that the two population-means are not the same.
Well in the computer-software output it states : “T-Test of Difference 0 (vs not =) : T-Value = -3.47”, indicating that this is a two-sided test.
\[ H_o : \mu_1 - \mu_1 = 2 \\ H_a : \mu_1 - \mu_1 \ne 2 \]
at \(\alpha = 5\%\), do we reject \(H_o\)?
\[ t_o = \frac{\Delta_{\bar{x} -\Delta_{\mu}}}{SE(\Delta_{\bar{x}})} \\ = \frac{\Delta_{\bar{x}}- 2}{SE(\Delta_{\bar{x}})} \\ \approx \frac{-2.33- 2}{SE(\Delta_{\bar{x}})} \]
Consider that the calculation of \(SE(\Delta_{\bar{x}})\) differs depending on whether or not the variance are equivalent ( \(\sigma_1=\sigma_2\) ). In our computer output it states, “Both use Pooled Std. Dev”–therefore indicating we should use the calculation that assumes equal variance.
\[ SE(\Delta_{\bar{x}}) = S_p * \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \\ \approx 2.1277 * \sqrt{\frac{2}{20}} \\ \approx 0.6728 \]
Therefore,
\[ t_o \approx -6.440 \]
Which, with such a large \(t_o\), its clear that we reject \(H_o\). Indicating that it is extremely unlikely that their true difference is 2; or that is \(\mu_1\) is 2 larger than \(\mu_2\). We should have suspected this result as it would take two-pooled Std. Dev to the left of our center (2), making it quite unlikely.
\[ H_o : \mu_1 - \mu_1 = 2 \\ H_a : \mu_1 - \mu_1 < 2 \]
at \(\alpha = 5\%\), do we reject \(H_o\)?
Okay, so there is no need to calculate anything here. We already know from earlier, \(H_o\) is unconvincing and we also see from our sample that \(\Delta_{\bar{x}}\approx -2.33\) with \(S_d = 2.12\), so its suggestive that the true difference is below 2.
Recall :
\[ \Delta_{\bar{x}} \pm t_{\alpha,\text{df}}*SE(\Delta_{\bar{x}}) \]
Notice its \(\alpha\) and not \(\frac{\alpha}{2}\) as this is one-sided.
So, calc \(t_{\alpha,\text{df}}\) :
qt(1 - .05, df = 38)
## [1] 1.685954
-2.33 + 1.685954 * 0.6728
## [1] -1.19569
\[ [-2.33,-2.33 + 1.685954 * 0.6728] \\ [-2.33,-1.19569] \]
So, from our point estimate to about -1.2 is the upper end of the CI-95%. In other words, we are 95% confident on the upper end it is from about -2.33 to -1.2.
\[ H_o : \mu_1 - \mu_1 = 2 \\ H_a : \mu_1 - \mu_1 \ne 2 \]
2*pt(-6.4405, df = 38)
## [1] 1.418823e-07
So, extremely unlikely as indicated before.
Okay, so what we would like to determine :
ARE BOTH MACHINES OUTPUTTING THE SAME QUANTITY?
To determine this we have the following Hypothesis :
\[ H_o : \mu_1 = \mu_2 \\ H_a : \mu_1 \ne \mu_2 \]
Note that this Problem is different by the fact we know the Population Distributions and, Variance.
\[ \text{Let X = Dist. Of Net Bottle-Volume} \\ \text{Machine1} \implies X \sim N(\mu_1 = \ ?, \sigma_1^2 = 0.020^2) \\ \text{Machine2} \implies X \sim N(\mu_2 = \ ?, \sigma_2^2 = 0.025^2) \\ \]All this to say, the \(\text{Z-Test}\) is most Appropriate :
\[ Z_o = \frac{\bar{x}_1 - \bar{x}_2}{SE(\bar{x}_1 - \bar{x}_2)} \\ \text{For} : \\ SE(\bar{x}_1 - \bar{x}_2) = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \]
x1 <- mean(df$Machine1)
x2 <- mean(df$Machine2)
a <- (.015^2)/10
b <- (.018^2)/10
z_score <- (x1 - x2)/sqrt(a+b)
p_value <- 2 * pnorm(z_score, lower.tail = FALSE); p_value
## [1] 0.1771356
Therefore since \(\alpha=5\%\) :
\[ \text{P-val} \approx 19\% > \alpha \]
We can Fail to reject the Null-Hypothesis; in other words, it is unlikely that there is a difference between machine net volume.
95%-CI for \(\Delta_{\bar{x}}\)
\[ \Delta_{\bar{x}} \pm Z_{\frac{\alpha}{2}}*SE(\Delta_{\bar{x}}) \]
z_crit <- qnorm(0.975) # for 95% CI
me <- sqrt(a + b)
lower <- (x1 - x2) - z_crit*me
upper <- (x1 - x2) + z_crit*me
c(lower, upper)
## [1] -0.004522262 0.024522262
Notice that the interval includes 0, indicating we have evidence against alternative hypothesis as previously stated.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Based upon the QQ-plot, our data appears to be approximately normal, however likely due to the small sample size, it doesn’t look great per-se.
\[ H_o : \Delta_\mu = 0 \\ H_a : \Delta_\mu \ne 0 \\ \]The most appropriate test is clearly a paired-t-test
t.test(df$BirthOrder1, df$BirthOrder2, paired = TRUE, conf.level = 0.95)
##
## Paired t-test
##
## data: df$BirthOrder1 and df$BirthOrder2
## t = -0.36577, df = 9, p-value = 0.723
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.3664148 0.2644148
## sample estimates:
## mean difference
## -0.051
As indicated by the R-output :
-0.3664148 0.2644148
Is our CI-95% for \(\Delta_{\bar{x}}\). Furthermore,
p-value = 0.723
Indicating clearly that our Alternative Hypothesis is statistically unconvincing. Which makes sense because it would be silly for birth order to matter.
Consider the QQ-Plot for Formulation1
Consider the QQ-Plot for Formulation2
If you notice, for both there appears to be a staircase pattern ( ie. it over and under approx. )–furthermore, clearly in each dist. there are outliers and the tails aren’t looking good. However, i do see that they tend to over and under approximate at the same locations (approx.) to the same degree (approx.)
Therefore, the variance appears to be approx. equal however, they are pushing it when it comes to normality.
Additionally, consider the samples numerical-evidence :
sd(df$Formulation1)^2
## [1] 103.5455
sd(df$Formulation2)^2
## [1] 98.99242
Nonetheless, I wouldn’t consider this a good sample size as it looks like we may be capturing noise, we should gather more data.
\[ \text{Let. : } \mu_1 = \text{Avr-Form1}\\ H_o : \mu_1 = \mu_2 \\ H_o : \mu_1 > \mu_2 \]
So this is a one sided t-test. With assumed equal variance :
t.test(df$Formulation1, df$Formulation2, var.equal = TRUE, alternative = "greater")
##
## Two Sample t-test
##
## data: df$Formulation1 and df$Formulation2
## t = 0.34483, df = 22, p-value = 0.3667
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -5.637883 Inf
## sample estimates:
## mean of x mean of y
## 194.5000 193.0833
Based upon our P-value :
p-value = 0.3667
And our sample Estimates :
sample estimates:
mean of x mean of y
194.5000 193.0833
And as we know the sample-variance from earlier
[1] 103.5455
[1] 98.99242
It is clear that at \(\alpha = 5\%\) level, we fail to reject the Null Hypothesis.
t.test(machine1, machine2, var.equal = FALSE, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: machine1 and machine2
## t = 0.79894, df = 17.493, p-value = 0.435
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.01635123 0.03635123
## sample estimates:
## mean of x mean of y
## 16.015 16.005
Consider that for this example, the conclusion doesn’t change whether population variance is known or unknown
p-value = 0.435
Therefore, both indicate that we dont find statistically significant evidence to reject the Null Hypothesis. In other words, we dont find a convincing difference in the averages.