Statistical Hypothesis Testing

2023-01-23

What?

Using sample data, hypothesis testing is used to determine whether a claim is reasonable. The test offers proof that the hypothesis is reasonable in light of the available data. A random sample of the population being studied is measured and examined by statistical analysts in order to test a hypothesis.The hypothesis Testing Steps are as follows:

Determine the null and alternative hypotheses.
Select a level of significance, a, based on the risk of making a Type I error.
Calculate the test statistic.
Find the critical value.
Compare the test statistic to the critical value.
State the conclusion.

Clarification

The critical value is established using the level of significance. The max standard deviation which the sample mean can deviate from 0 before the null hypothesis is rejected is represented by the critical region. The group of all values such that the null hypothesis is rejected is known as the critical (rejection) region. A hypothesis is a claim about a characteristic shared by one or more populations. Here, we have the Alternative Hypothesis (Ha) and the Null Hypothesis (H0) (HA or H1).

Example Walk through

The data of an adhesive mixture is approximately normally distributed. The manufacturer claims that the mean of the heat evolved in the (CAL/Gram) mixture is u = 101 and the standard deviation is σ = 2. You are given a sample of the HE (CAL/Gram) on the next slide. Is the manufacturer’s claim a fair claim to make?

Let’s say we are given the following data (EXAMPLE CONTINUED)

ManufacturerMEAN = 101 #Acknowledging the given data
ManufacturerSTD = 2 #Acknowledging the given data
SAMPLE = list(heatevolved = c(100, 102, 101, 98, 98, 97, 99, 101)) 
HDF = data.frame(SAMPLE) #The sample data given through (1)
SampleM = mean(HDF$heatevolved) #extracting mean for hypothesis testing
SampleS = sd(HDF$heatevolved) #extracting STDEV for hypothesis testing
HDF #Displaying the data from the sample adhesive mixture.

##   heatevolved
## 1         100
## 2         102
## 3         101
## 4          98
## 5          98
## 6          97
## 7          99
## 8         101

Continued

#Here we plot the sample data to understand how it is normally distrusted
fig <- plot_ly(x = c("M1", "M2", "M3", "M4", "M5", "M6", "M7", "M8"), 
y = HDF$heatevolved,name = "heat", type = "bar") 
fig <- fig %>% layout(title = "Sample Adhesive Heat Evolving (CAL/Gram)",
xaxis = list(title = "Mixture #"), yaxis = list(title = "Heat Evolving (CAL/Gram)"))
fig

We can see that the data is normally distributed. So we can begin the hypothesis testing.

Continued

Another way in which we can visualize the data in R using more clarity and detailed graphics:

df<-data.frame(Heat.Evolved =c(100, 102, 101, 98, 98, 97, 99, 101),
sd=c(SampleS,SampleS,SampleS,SampleS,SampleS, SampleS, SampleS, SampleS),
Mixture.Type=c("M1", "M2", "M3", "M4", "M5", "M6", "M7", "M8"),
Insert= c(0.0, 0.1, 0.3, 0.5, 1.0, 1.5, 1.8, 2.1))
ggplot(df, aes(x=Mixture.Type, y=Heat.Evolved, fill=Mixture.Type)) +
  geom_bar(stat="identity",
           colour='orange') +
  geom_errorbar(aes(ymin=Heat.Evolved-sd, ymax=Heat.Evolved+sd), width=.2)

Heat Map Visualization

Another interesting correlation with the data is that each mixture type name corresponds with the mixtures density in kg/m3. To further confirm that the data is normally distributed we can also take a look at a heat map graphic which is shown below!

d = data.frame(Type=c("M1", "M2", "M3", "M4", "M5", "M6", "M7", "M8"),
density=c("1", "2", "3", "4", "5", "6", "7", "8"),
Heat.Evolved =c(100, 102, 101, 98, 98, 97, 99, 101))
ggplot(d, aes(Type, density)) + geom_tile(aes(fill = Heat.Evolved))

Hypothesis Testing

1.) The parameter of interest is the heat evolved (CAL/Gram).
2.) The hypothesis statements are Ho : $μ = 100$ & Ha : ${μ} \not = 100$.
3.) The level of significance is $α = 0.05.
4.) The test statistic formula and the appropriate computation of given values to assess whether the manufacturers claim to make is valid is \[ z = {x̅ - μ \over {σ \over \sqrt{n}}} = {99.5 - 101 \over {2 \over \sqrt{8}}} \approx {-2.121}\]

Hypothesis Testing (Continued)

5.) Since this is a two sided significance level of 0.05 then the critical value(s) would be: \[ {z = \pm 1.96} \]
6.) We can see that the critical value is $ {z = } $ meaning that our test statistic of $ z $ is not within the appropriate interval of (-2.121, 2.121). This means that our test statistic evidence is surprising which means that we reject the null hypothesis.
For our conclusion statement we can state that
- “The evidence supports rejecting the null. The evidence supports the heat evolved (CAL/Gram) of the cement mixture is different to 101.”

Further Evidence (P-VALUE & Confidence Interval)

We can also compute the p value which will further strengthen our result! \[ p = {2 * p(z < -3)} = {p < 1%}\] This means that with a p value of p < 1% very little support for the null is offered meaning that we have to reject it.
We can also compute the confidence interval to support our decision: \[ x̅ \pm {z2 * σ \over {\sqrt{n}}} = 99.5 \pm {1.96 * 2 \over {\sqrt{8}}} \approx {(98.114,100.886)}\]

We can clearly see that the mean of 101 is not withing the given interval of (98.114,100.886) thus confirming that the null hypothesis should be rejected!