As a data scientist you probably retain or reject hypothesis based on measurements of observed samples. The decision is often based on a statistical mechanism called hypothesis testing. Let’s watching the following video:
There are three conditions of having hypothesis testing included:
Left Tailed Test: When the \(\bar{x}\) is significantly below the hypothesised population mean \(µ_0\) then \(H_0\) will be rejected and the test used will be the left tailed test (lower tailed test) since the critical region (denoting rejection of \(H_0\)) will be in the left tail of the normal curve (representing sampling distribution of sample statistic \(\bar{x}\)).
Right Tailed Test: When the \(\bar{x}\) is significantly above the hypothesized population mean \(µ_0\) then \(H_0\) will be rejected and the test used will be right tailed test (upper tailed test) since the critical region (denoting rejection of \(H_0\) will be in the right tail of the normal curve (representing sampling distribution of sample statistic \(\bar{x}\) ).
Two Tailed Test: When the \(\bar{x}\) is significantly different (significantly higher or lower than) from the hypothesis population mean \(µ_0\) then \(H_0\) will will be rejected. In this case, the two tailed test will be applicable because there will be two critical regions (denoting rejection of \(H_0\)) on both the tails of the normal curve (representing sampling distribution of sample statistic \(\bar{x}\)).
The critical regions for Hypothesis Testing are shown as shaded portions in the following figure:
Hypothesis Testing
On comparing the observed value of Test statistic with that of the critical value, we may identify whether the observed value lies in the critical region (reject \(H_0\)) or in the acceptance region (do not reject \(H_0\)) and decide accordingly.
Left Tailed Test: If \(Z_{Crit} < -1.645\), then reject \(H_0\) at 5% level of Significance (\(\alpha\) is taken as 5% in most of the analytic situations).
Right Tailed Test: If \(Z_{Crit} > 1.645\), then reject \(H_0\) at 5% level of Significance.
Two Tailed Test: If \(Z_{Crit}> 1.96\) or If \(Z_{Crit} < -1.96\), then reject \(H_0\) at 5% Level of Significance.
There is also an alternative approach for hypothesis testing, this approach is very much used in all the software packages. Here, you will fucos on the following statement:
If p-value \(< \alpha\): reject \(H_0\)
If p-value \(\ge \alpha\) : Fails to Reject \(H_0\)
Procedure for Finding P-Values:
P-values
3 Type of Error I & II
A Type I error is the mistake of rejecting the null hypothesis when it is true. The symbol \(\alpha\) (alpha) is used to represent the probability of a type I error.
A Type II error is the mistake of failing to reject the null hypothesis when it is false. The symbol \(\beta\) (beta) is used to represent the probability of a type II error.
Type of Error
4 Type I ~ One Tail Z-test
The null hypothesis of the One-tail (left/right) test of the population mean \(\mu\) and \(\sigma\) can be expressed as follows:
where \(\mu_0\) is a hypothesized left/right bound of the true population mean \(\mu\).
Let us define the test statistic \(z\) in terms of the sample mean, the sample size and the population standard deviation \(\sigma\):
\[z={\bar{x}-\mu_0 \over \sigma/\sqrt{n}}\]
Then the null hypothesis of the left tail test is to be rejected if \(z \le −z_\alpha\) , where \(z_\alpha\) is the \(100(1-\alpha)\) percentile of the standard normal distribution.
4.1 Example 1
Left Tail: Suppose the manufacturer claims that the mean speed of a motorcycle is more than 100 km/hours. In a sample of 30 motorcycles, it was found that they only last 99 km/hours on average. Assume the population standard deviation is 1.2 km/hours. At .05 significance level, can we reject the claim by the manufacturer?
4.1.1 Z-test statistics
First, we calculate the z-test statistics according to the information that we have from the Example 1. In this case, we use z-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 30\).
Now, we can conclude that the test statistic -4.5644 is less than the critical value of -1.6449. Consequently, at .05 significance level, we reject the claim that mean lifetime of a motorcycle is above 100 km/hours.
4.1.3 P-value
Alternative solution: Instead of using the critical value, we apply the pnorm function to compute the left tail p-value of the test statistic. As it turns out to be less than the .05 significance level, we reject the null hypothesis that \(μ \ge 100\).
Right Tail: A food company argue that for each a cookie bag of their products, there is at most 2 grams of saturated fat in a single cookie. In a sample of 40 cookies, it is found that the mean amount of saturated fat per cookie is 2.1 grams. Assume that the population standard deviation is 0.25 grams. At 0.05 significance level, can we reject the claim?
4.2.1 Z-test statistics
First, we calculate the z-test statistics according to the information that we have from the Exercise 1. In this case, we use z-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 30\).
Now, we can conclude that the test statistic 2.529822 is greater than the critical value of 1.644854. Consequently, at .05 significance level, we reject the claim that mean there is at most 2 grams of saturated fat in cookie.
4.2.3 P-value
Alternative solution: Instead of using the critical value, we apply the pnorm function to compute the left tail p-value of the test statistic. As it turns out to be less than the .05 significance level, we reject the null hypothesis that \(μ \leq 2\).
The null hypothesis of the two-tailed test of the population mean \(\mu\) and \(\sigma\) can be expressed as follows:
\[\mu_0 = \mu\]
where \(\mu_0\) is a hypothesized value of the true population mean \(\mu\).
Let us define the test statistic \(z\) in terms of the sample mean, the sample size and the population standard deviation \(\sigma\):
\[z={\bar{x}-\mu_0 \over \sigma/\sqrt{n}}\]
Then the null hypothesis of the two-tailed test is to be rejected if \(z \le - z_{\alpha/2}\) or \(z \ge z_{\alpha/2}\) , where \(z_{\alpha/2}\) is the \(100(1-\alpha/2)\) percentile of the standard normal distribution.
5.1 Example 2
Suppose the mean weight of King Penguins found in an Antarctic colony last year was 15.4 kg. In a sample of 35 penguins same time this year in the same colony, the mean penguin weight is 14.6 kg. Assume the population standard deviation is 2.5 kg. At .05 significance level, can we reject the null hypothesis that the mean penguin weight does not differ from last year?
5.1.1 Z-test statistics
First, we calculate the z-test statistics according to the information that we have from the Example 2. In this case, we use z-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 30\).
The test statistic -1.8931 lies between the critical values -1.9600 and 1.9600. Hence, at .05 significance level, we do not reject the null hypothesis that the mean penguin weight does not differ from last year.
5.1.3 P-value
Alternative solution: Instead of using the critical value, we apply the 2*pnorm() function to compute the two tail p-value of the test statistic.
As it turns out to be greater than the .05 significance level, we do not reject the null hypothesis that \(μ > 15.4\).
5.2 Exercise 2
To test the hypothesis that the mean systolic blood pressure in a certain population equals 140 mmHg. The standard deviation has a known value of 20 and a data set of 55 patients is available.
First, we calculate the z-test statistics according to the information that we have from the Exercise 2. In this case, we use z-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 30\).
The test statistic -3.708099 lies between the critical values -1.959964 and 1.959964. Hence, at .05 significance level, we reject the null hypothesis that the mean systolic blood pressure in a certain population equals 140 mmHg
5.2.3 P-value
Alternative solution: Instead of using the critical value, we apply the 2*pnorm() function to compute the two tail p-value of the test statistic.
where \(\mu_0\) is a hypothesized left/right bound of the true population mean \(\mu\).
Let us define the test statistic \(t\) in terms of the sample mean, the sample size and the sample standard deviation \(s\):
\[t={\bar{x}-\mu_0 \over s/\sqrt{n}}\]
Then the null hypothesis of the lower tail test is to be rejected if \(t\le−t_\alpha\) , where \(t_\alpha\) is the \(100(1 − \alpha)\) percentile of the Student \(t\) distribution with \(n − 1\) degrees of freedom.
6.1 Example 3
Suppose the manufacturer claims that the mean lifetime of a light bulb is more than 10,000 hours. In a sample of 30 light bulbs, it was found that they only last 9,900 hours on average. Assume the sample standard deviation is 125 hours. At .05 significance level, can we reject the claim by the manufacturer?
6.1.1 T-test statistics
First, we calculate the t-test statistics according to the information that we have from the Example 1. In this case, we use t-statistics because we dont know the mean \(\mu\) and standard deviation \(\sigma\) of pupulation, also we know that the sample size \(\ge 30\).
The test statistic -4.3818 is less than the critical value of -1.6991. Hence, at .05 significance level, we can reject the claim that mean lifetime of a light bulb is above 10,000 hours.
6.1.3 P-value
Alternative Solution: Instead of using the critical value, we apply the pt function to compute the lower tail p-value of the test statistic.
As it turns out to be less than the .05 significance level, we reject the null hypothesis that \(\mu \ge 10000.\)
6.2 Exercise 3
Right tail: Garuda-food Indonesia claims that for each a cookie bag states of their product, there is at most 2 grams of saturated fat in a single cookie. In a sample of 40 cookies, it is found that the mean amount of saturated fat per cookie is 2.1 grams. Assume that the sample standard deviation is 0.3 gram. At .05 significance level, can we reject the claim?
6.2.1 T-test statistics
First, we calculate the t-test statistics according to the information that we have from the Exercises 3. In this case, we use t-statistics because we dont know the mean \(\mu\) and standard deviation \(\sigma\) of pupulation, also we know that the sample size \(\ge 30\).
The test statistic 2.108185 is greater than the critical value of 1.684875. Hence, at .05 significance level, we can reject the claim that mean for each a cookie bag states of Garuda Food Indonesia’s product, there is at most 2 grams of saturated fat in a single cookie
6.2.3 P-value
Alternative Solution: Instead of using the critical value, we apply the pt function to compute the lower tail p-value of the test statistic.
As it turns out to be greater than the .05 significance level, we fail to reject the null hypothesis that \(\mu \leq 2.\)
7 Type I ~ Two Tail T-test
The null hypothesis of the two-tailed test of the population mean \(\mu\) and unknown \(\sigma\) can be expressed as follows:
\[\mu_0 = \mu\]
where \(\mu_0\) is a hypothesized value of the true population mean \(\mu\).
Let us define the test statistic \(t\) in terms of the sample mean, the sample size and the sample standard deviation \(s:\)
\[t={\bar{x}-\mu_0 \over s/\sqrt{n}}\]
Then the null hypothesis of the two-tailed test is to be rejected if \(t\le-t_{\alpha∕2}\) or \(t\ge t_{\alpha∕2}\) , where \(t_{\alpha∕2}\) is the \(100(1-\alpha)\) percentile of the Student \(t\) distribution with \(n−1\) degrees of freedom.
7.1 Example 4
Some journals concluded that the average weight of Hachiko Dogs around the world last ten years was 15.4 kg. Researchers want to make sure if there a change in the average weight of these varieties after ten years. Therefore, they pick up a random sample of 35 Dogs from the same varieties and at the same time this year, they found that the mean penguin weight is 14.6 kg. Assume the sample standard deviation is 2.5 kg. At .05 significance level, can we reject the null hypothesis that the mean Hachiko Dogs does not differ from last ten years?
7.1.1 T-test statistics
First, we calculate the t-test statistics according to the information that we have from the Example 2. In this case, we use t-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 35\).
The test statistic -1.8931 lies between the critical values -2.04523 and 2.04523. Hence, at .05 significance level, we do not reject the null hypothesis that the mean penguin weight does not differ from last year.
7.1.3 P-value
Alternative solution: Instead of using the critical value, we apply the 2*pt() function to compute the two tail p-value of the test statistic.
Since it turns out to be greater than the .05 significance level, we do not reject the null hypothesis that \(μ \ge 15.4\).
7.2 Exercise 4
To test the hypothesis that the mean systolic blood pressure in a certain population equals 140 mmHg. The dataset at hands has measurements on 55 patients.
First, we calculate the t-test statistics according to the information that we have from the Exercise 4. In this case, we use t-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 35\).
The test statistic -3.869272 lies between the critical values -2.004879 and 2.004879. Hence, at .05 significance level, we reject the null hypothesis that the mean systolic blood pressure in a certain population equals 140 mmHg
7.2.3 P-value
Alternative solution: Instead of using the critical value, we apply the 2*pt() function to compute the two tail p-value of the test statistic.
Since it turns out to be less than the .05 significance level, we reject the null hypothesis that \(μ = 140\).
8 Type II ~ One Tail
In a left/right tail test of the population mean, the null hypothesis claims that the true population mean \(\mu\) is greater than a given hypothetical value \(\mu_0\).
A type II error occurs if the hypothesis test based on a random sample fails to reject the null hypothesis even when the true population mean \(\mu\) is in fact less than \(\mu_0\).
8.1 Example 5
Suppose the manufacturer claims that the mean speed of a motorcycle is more than 100 km/hours. In Assume the population standard deviation is 1.2 km/hours. At .05 significance level, what is the probability of having type II error for a sample size of 30 motorcycles?
8.1.1 Standard Error of the Mean
We begin with computing the standard deviation of the mean, sem.
Therefore, so long as the sample mean is greater than 99.64 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 99.50, we can compute the probability of the sample mean above 99.64, and thus found the probability of type II error.
If the motorcycle sample size is 30, the actual mean motorcycles speed is 9,950 hours and the population standard deviation is 120 hours, then the probability of type II error for testing the null hypothesis \(\mu \ge 10000\) at .05 significance level is 26.2%, and the power of the hypothesis test is 73.8%.
8.2 Exercise 5
Right tail: Garuda-food Indonesia claims that for each a cookie bag states of their product, there is at most 2 grams of saturated fat in a single cookie. Assume the actual mean amount of saturated fat per cookie is 2.075 grams and the sample standard deviation is 0.25 grams. At .05 significance level, what is the probability of having a type II error for a sample size of 35 cookies?
8.2.1 Standard Error of the Mean
We begin with computing the standard deviation of the mean, sem.
Therefore, so long as the sample mean is greater than 1.93 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 2.075, we can compute the probability of the sample mean above 1.93, and thus found the probability of type II error.
If the Garuda Food Indonesia sample size is 35, the actual mean amount of saturated fat per cookie is 2.075 grams and the sample standard deviation is 0.25 grams, then the probability of type II error for testing the null hypothesis \(\mu \leq 2\) at .05 significance level is 99%, and the power of the hypothesis test is 1%.
9 Type II ~ Two Tail
In a two-tailed test of the population mean, the null hypothesis claims that the true population mean \(\mu\) is equal to a given hypothetical value \(\mu_0\).
\[\mu_0 = \mu\]
A type II error occurs if the hypothesis test based on a random sample fails to reject the null hypothesis even when the true population mean \(\mu\) is in fact different from \(\mu_0\).
Assume that the population has a known standard deviation \(\sigma\). By the Central Limit Theorem, the population of all possible means of samples of sufficiently large size n approximately follows the normal distribution. Hence we can compute the range of sample means for which the null hypothesis will not be rejected, and then obtain an estimate of the probability of type II error.
9.1 Example 6
Some journals concluded that the average weight of Hachiko Dogs around the world last ten years was 15.4 kg. Researchers want to make sure if there a change in the average weight of these varieties after ten years. Assume the actual mean population weight is 15.1 kg, and the population standard deviation is 2.5 kg. At .05 significance level, what is the probability of having type II error for a sample size of 35 Hachiko Dogs?
9.1.1 Standard Error of the Mean
We begin with computing the standard deviation of the mean, sem.
Therefore, so long as the sample mean is between 14.572 and 16.228 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 15.1, we can compute the lower tail probabilities of both end points.
Finally, the probability of type II error is the probability between the two end points. If the sample size of Hachiko Dogs is 35, the actual mean population weight is 15.1 kg and the population standard deviation is 2.5 kg, then the probability of type II error for testing the null hypothesis \(\mu = 15.4\) at .05 significance level is 89.1%, and the power of the hypothesis test is 10.9%.
9.2 Exercise 6
Under same assumptions as Example 6, if actual mean population weight is 14.9 kg, what is the probability of type II errors? What is the power of the hypothesis test?
9.2.1 Standard Error of the Mean
We begin with computing the standard deviation of the mean, sem.
Therefore, so long as the sample mean is between 14.572 and 16.228 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 14.9, we can compute the lower tail probabilities of both end points.
Finally, the probability of type II error is the probability between the two end points. If the sample size of Hachiko Dogs is 35, the actual mean population weight is 14.9 kg and the population standard deviation is 2.5 kg, then the probability of type II error for testing the null hypothesis \(\mu = 15.4\) at .05 significance level is 78%, and the power of the hypothesis test is 22%.