1 Introduction

As a data scientist you probably retain or reject hypothesis based on measurements of observed samples. The decision is often based on a statistical mechanism called hypothesis testing. Let’s watching the following video:


There are three conditions of having hypothesis testing included:

  • Left Tailed Test: When the \(\bar{x}\) is significantly below the hypothesised population mean \(µ_0\) then \(H_0\) will be rejected and the test used will be the left tailed test (lower tailed test) since the critical region (denoting rejection of \(H_0\)) will be in the left tail of the normal curve (representing sampling distribution of sample statistic \(\bar{x}\)).

\[\text{Left Tailed Test} = \begin{cases} {H_0: \mu \ge \mu_0} \\ {H1: \mu < \mu_0 } \end{cases}\]

  • Right Tailed Test: When the \(\bar{x}\) is significantly above the hypothesized population mean \(µ_0\) then \(H_0\) will be rejected and the test used will be right tailed test (upper tailed test) since the critical region (denoting rejection of \(H_0\) will be in the right tail of the normal curve (representing sampling distribution of sample statistic \(\bar{x}\) ).

\[\text{Right Tailed Test} = \begin{cases} {H_0: \mu \le \mu_0} \\ {H1: \mu > \mu_0 } \end{cases}\]

  • Two Tailed Test: When the \(\bar{x}\) is significantly different (significantly higher or lower than) from the hypothesis population mean \(µ_0\) then \(H_0\) will will be rejected. In this case, the two tailed test will be applicable because there will be two critical regions (denoting rejection of \(H_0\)) on both the tails of the normal curve (representing sampling distribution of sample statistic \(\bar{x}\)).

\[\text{Two Tailed Test} = \begin{cases} {H_0: \mu = \mu_0} \\ {H1: \mu \neq \mu_0 } \end{cases}\]

2 Hypothesis Testing

The critical regions for Hypothesis Testing are shown as shaded portions in the following figure:

Hypothesis Testing

Hypothesis Testing

On comparing the observed value of Test statistic with that of the critical value, we may identify whether the observed value lies in the critical region (reject \(H_0\)) or in the acceptance region (do not reject \(H_0\)) and decide accordingly.

  • Left Tailed Test: If \(Z_{Crit} < -1.645\), then reject \(H_0\) at 5% level of Significance (\(\alpha\) is taken as 5% in most of the analytic situations).
  • Right Tailed Test: If \(Z_{Crit} > 1.645\), then reject \(H_0\) at 5% level of Significance.
  • Two Tailed Test: If \(Z_{Crit}> 1.96\) or If \(Z_{Crit} < -1.96\), then reject \(H_0\) at 5% Level of Significance.

There is also an alternative approach for hypothesis testing, this approach is very much used in all the software packages. Here, you will fucos on the following statement:

  • If p-value \(< \alpha\): reject \(H_0\)
  • If p-value \(\ge \alpha\) : Fails to Reject \(H_0\)

Procedure for Finding P-Values:

P-values

P-values

3 Type of Error I & II

  • A Type I error is the mistake of rejecting the null hypothesis when it is true. The symbol \(\alpha\) (alpha) is used to represent the probability of a type I error.

  • A Type II error is the mistake of failing to reject the null hypothesis when it is false. The symbol \(\beta\) (beta) is used to represent the probability of a type II error.

Type of Error

Type of Error

4 Type I ~ One Tail Z-test

The null hypothesis of the One-tail (left/right) test of the population mean \(\mu\) and \(\sigma\) can be expressed as follows:

\[\text{Hypothesis Testing $H_0$} = \begin{cases} {\mu \ge \mu_0} & \text{Left Tail} \\ {\mu \le \mu_0} & \text{Right Tail} \end{cases}\]

where \(\mu_0\) is a hypothesized left/right bound of the true population mean \(\mu\).

Let us define the test statistic \(z\) in terms of the sample mean, the sample size and the population standard deviation \(\sigma\):

\[z={\bar{x}-\mu_0 \over \sigma/\sqrt{n}}\]

Then the null hypothesis of the left tail test is to be rejected if \(z \le −z_\alpha\) , where \(z_\alpha\) is the \(100(1-\alpha)\) percentile of the standard normal distribution.

4.1 Example 1

Left Tail: Suppose the manufacturer claims that the mean speed of a motorcycle is more than 100 km/hours. In a sample of 30 motorcycles, it was found that they only last 99 km/hours on average. Assume the population standard deviation is 1.2 km/hours. At .05 significance level, can we reject the claim by the manufacturer?

4.1.1 Z-test statistics

First, we calculate the z-test statistics according to the information that we have from the Example 1. In this case, we use z-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 30\).

## [1] -4.564355

4.1.2 Critical value

Then, we calculate the left critical value.

## [1] -1.644854

Now, we can conclude that the test statistic -4.5644 is less than the critical value of -1.6449. Consequently, at .05 significance level, we reject the claim that mean lifetime of a motorcycle is above 100 km/hours.

4.1.3 P-value

Alternative solution: Instead of using the critical value, we apply the pnorm function to compute the left tail p-value of the test statistic. As it turns out to be less than the .05 significance level, we reject the null hypothesis that \(μ \ge 100\).

## [1] 2.505166e-06

4.2 Exercise 1

Right Tail: A food company argue that for each a cookie bag of their products, there is at most 2 grams of saturated fat in a single cookie. In a sample of 40 cookies, it is found that the mean amount of saturated fat per cookie is 2.1 grams. Assume that the population standard deviation is 0.25 grams. At 0.05 significance level, can we reject the claim?

4.2.1 Z-test statistics

First, we calculate the z-test statistics according to the information that we have from the Exercise 1. In this case, we use z-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 30\).

## [1] 2.529822

4.2.2 Critical value

Then, we calculate the left critical value.

## [1] 1.644854

Now, we can conclude that the test statistic 2.529822 is greater than the critical value of 1.644854. Consequently, at .05 significance level, we reject the claim that mean there is at most 2 grams of saturated fat in cookie.

4.2.3 P-value

Alternative solution: Instead of using the critical value, we apply the pnorm function to compute the left tail p-value of the test statistic. As it turns out to be less than the .05 significance level, we reject the null hypothesis that \(μ \leq 2\).

## [1] 0.994294

5 Type I ~ Two Tail Z-test

The null hypothesis of the two-tailed test of the population mean \(\mu\) and \(\sigma\) can be expressed as follows:

\[\mu_0 = \mu\]

where \(\mu_0\) is a hypothesized value of the true population mean \(\mu\).

Let us define the test statistic \(z\) in terms of the sample mean, the sample size and the population standard deviation \(\sigma\):

\[z={\bar{x}-\mu_0 \over \sigma/\sqrt{n}}\]

Then the null hypothesis of the two-tailed test is to be rejected if \(z \le - z_{\alpha/2}\) or \(z \ge z_{\alpha/2}\) , where \(z_{\alpha/2}\) is the \(100(1-\alpha/2)\) percentile of the standard normal distribution.

5.1 Example 2

Suppose the mean weight of King Penguins found in an Antarctic colony last year was 15.4 kg. In a sample of 35 penguins same time this year in the same colony, the mean penguin weight is 14.6 kg. Assume the population standard deviation is 2.5 kg. At .05 significance level, can we reject the null hypothesis that the mean penguin weight does not differ from last year?

5.1.1 Z-test statistics

First, we calculate the z-test statistics according to the information that we have from the Example 2. In this case, we use z-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 30\).

## [1] -1.893146

5.1.2 Critical value

Then, we calculate the left critical value.

## [1] -1.959964  1.959964

The test statistic -1.8931 lies between the critical values -1.9600 and 1.9600. Hence, at .05 significance level, we do not reject the null hypothesis that the mean penguin weight does not differ from last year.

5.1.3 P-value

Alternative solution: Instead of using the critical value, we apply the 2*pnorm() function to compute the two tail p-value of the test statistic.

## [1] 0.05833852

As it turns out to be greater than the .05 significance level, we do not reject the null hypothesis that \(μ > 15.4\).

5.2 Exercise 2

To test the hypothesis that the mean systolic blood pressure in a certain population equals 140 mmHg. The standard deviation has a known value of 20 and a data set of 55 patients is available.

##    no status mmhg
## 1   1      0  120
## 2   2      0  115
## 3   3      0   94
## 4   4      0  118
## 5   5      0  111
## 6   6      0  102
## 7   7      0  102
## 8   8      0  131
## 9   9      0  104
## 10 10      0  107
## 11 11      0  115
## 12 12      0  139
## 13 13      0  115
## 14 14      0  113
## 15 15      0  114
## 16 16      0  105
## 17 17      0  115
## 18 18      0  134
## 19 19      0  109
## 20 20      0  109
## 21 21      0   93
## 22 22      0  118
## 23 23      0  109
## 24 24      0  106
## 25 25      0  125
## 26 26      1  150
## 27 27      1  142
## 28 28      1  119
## 29 29      1  127
## 30 30      1  141
## 31 31      1  149
## 32 32      1  144
## 33 33      1  142
## 34 34      1  149
## 35 35      1  161
## 36 36      1  143
## 37 37      1  140
## 38 38      1  148
## 39 39      1  149
## 40 40      1  141
## 41 41      1  146
## 42 42      1  159
## 43 43      1  152
## 44 44      1  135
## 45 45      1  134
## 46 46      1  161
## 47 47      1  130
## 48 48      1  125
## 49 49      1  141
## 50 50      1  148
## 51 51      1  153
## 52 52      1  145
## 53 53      1  137
## 54 54      1  147
## 55 55      1  169

5.2.1 Z-test statistics

First, we calculate the z-test statistics according to the information that we have from the Exercise 2. In this case, we use z-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 30\).

## [1] -3.708099

5.2.2 Critical value

Then, we calculate the left critical value.

## [1] -1.959964  1.959964

The test statistic -3.708099 lies between the critical values -1.959964 and 1.959964. Hence, at .05 significance level, we reject the null hypothesis that the mean systolic blood pressure in a certain population equals 140 mmHg

5.2.3 P-value

Alternative solution: Instead of using the critical value, we apply the 2*pnorm() function to compute the two tail p-value of the test statistic.

## [1] 0.0002088208

As it turns out to be less than the .05 significance level, we reject the null hypothesis that \(μ = 140\).

6 Type I ~ One Tail t-test

The null hypothesis of the one-tail (left/right) test of the population mean \(\mu\) and unknown \(\sigma\) can be expressed as follows:

\[\text{Hypothesis Testing $H_0$} = \begin{cases} {\mu \le \mu_0} & \text{Left Tail} \\ {\mu \ge \mu_0} & \text{Right Tail} \end{cases}\]

where \(\mu_0\) is a hypothesized left/right bound of the true population mean \(\mu\).

Let us define the test statistic \(t\) in terms of the sample mean, the sample size and the sample standard deviation \(s\):

\[t={\bar{x}-\mu_0 \over s/\sqrt{n}}\]

Then the null hypothesis of the lower tail test is to be rejected if \(t\le−t_\alpha\) , where \(t_\alpha\) is the \(100(1 − \alpha)\) percentile of the Student \(t\) distribution with \(n − 1\) degrees of freedom.

6.1 Example 3

Suppose the manufacturer claims that the mean lifetime of a light bulb is more than 10,000 hours. In a sample of 30 light bulbs, it was found that they only last 9,900 hours on average. Assume the sample standard deviation is 125 hours. At .05 significance level, can we reject the claim by the manufacturer?

6.1.1 T-test statistics

First, we calculate the t-test statistics according to the information that we have from the Example 1. In this case, we use t-statistics because we dont know the mean \(\mu\) and standard deviation \(\sigma\) of pupulation, also we know that the sample size \(\ge 30\).

## [1] -4.38178

6.1.2 Critical value

Then, we calculate the left critical value.

## [1] -1.699127

The test statistic -4.3818 is less than the critical value of -1.6991. Hence, at .05 significance level, we can reject the claim that mean lifetime of a light bulb is above 10,000 hours.

6.1.3 P-value

Alternative Solution: Instead of using the critical value, we apply the pt function to compute the lower tail p-value of the test statistic.

## [1] 7.035026e-05

As it turns out to be less than the .05 significance level, we reject the null hypothesis that \(\mu \ge 10000.\)

6.2 Exercise 3

Right tail: Garuda-food Indonesia claims that for each a cookie bag states of their product, there is at most 2 grams of saturated fat in a single cookie. In a sample of 40 cookies, it is found that the mean amount of saturated fat per cookie is 2.1 grams. Assume that the sample standard deviation is 0.3 gram. At .05 significance level, can we reject the claim?

6.2.1 T-test statistics

First, we calculate the t-test statistics according to the information that we have from the Exercises 3. In this case, we use t-statistics because we dont know the mean \(\mu\) and standard deviation \(\sigma\) of pupulation, also we know that the sample size \(\ge 30\).

## [1] 2.108185

6.2.2 Critical value

Then, we calculate the left critical value.

## [1] 1.684875

The test statistic 2.108185 is greater than the critical value of 1.684875. Hence, at .05 significance level, we can reject the claim that mean for each a cookie bag states of Garuda Food Indonesia’s product, there is at most 2 grams of saturated fat in a single cookie

6.2.3 P-value

Alternative Solution: Instead of using the critical value, we apply the pt function to compute the lower tail p-value of the test statistic.

## [1] 0.979254

As it turns out to be greater than the .05 significance level, we fail to reject the null hypothesis that \(\mu \leq 2.\)

7 Type I ~ Two Tail T-test

The null hypothesis of the two-tailed test of the population mean \(\mu\) and unknown \(\sigma\) can be expressed as follows:

\[\mu_0 = \mu\]

where \(\mu_0\) is a hypothesized value of the true population mean \(\mu\).

Let us define the test statistic \(t\) in terms of the sample mean, the sample size and the sample standard deviation \(s:\)

\[t={\bar{x}-\mu_0 \over s/\sqrt{n}}\]

Then the null hypothesis of the two-tailed test is to be rejected if \(t\le-t_{\alpha∕2}\) or \(t\ge t_{\alpha∕2}\) , where \(t_{\alpha∕2}\) is the \(100(1-\alpha)\) percentile of the Student \(t\) distribution with \(n−1\) degrees of freedom.

7.1 Example 4

Some journals concluded that the average weight of Hachiko Dogs around the world last ten years was 15.4 kg. Researchers want to make sure if there a change in the average weight of these varieties after ten years. Therefore, they pick up a random sample of 35 Dogs from the same varieties and at the same time this year, they found that the mean penguin weight is 14.6 kg. Assume the sample standard deviation is 2.5 kg. At .05 significance level, can we reject the null hypothesis that the mean Hachiko Dogs does not differ from last ten years?

7.1.1 T-test statistics

First, we calculate the t-test statistics according to the information that we have from the Example 2. In this case, we use t-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 35\).

## [1] -1.893146

7.1.2 Critical value

Then, we calculate the left critical value.

## [1] -2.032245  2.032245

The test statistic -1.8931 lies between the critical values -2.04523 and 2.04523. Hence, at .05 significance level, we do not reject the null hypothesis that the mean penguin weight does not differ from last year.

7.1.3 P-value

Alternative solution: Instead of using the critical value, we apply the 2*pt() function to compute the two tail p-value of the test statistic.

## [1] 0.06687552

Since it turns out to be greater than the .05 significance level, we do not reject the null hypothesis that \(μ \ge 15.4\).

7.2 Exercise 4

To test the hypothesis that the mean systolic blood pressure in a certain population equals 140 mmHg. The dataset at hands has measurements on 55 patients.

7.2.1 T-test statistics

First, we calculate the t-test statistics according to the information that we have from the Exercise 4. In this case, we use t-statistics because we know the mean \(\mu\) and standard deviation \(\sigma\), also we know that the sample size \(\ge 35\).

## [1] -3.869272

7.2.2 Critical value

Then, we calculate the left critical value.

## [1] -2.004879  2.004879

The test statistic -3.869272 lies between the critical values -2.004879 and 2.004879. Hence, at .05 significance level, we reject the null hypothesis that the mean systolic blood pressure in a certain population equals 140 mmHg

7.2.3 P-value

Alternative solution: Instead of using the critical value, we apply the 2*pt() function to compute the two tail p-value of the test statistic.

## [1] 0.0002961114

Since it turns out to be less than the .05 significance level, we reject the null hypothesis that \(μ = 140\).

8 Type II ~ One Tail

In a left/right tail test of the population mean, the null hypothesis claims that the true population mean \(\mu\) is greater than a given hypothetical value \(\mu_0\).

\[\text{Type II Error $H_a$} = \begin{cases} {\mu \ge \mu_0} & \text{Left Tail} \\ {\mu \le \mu_0} & \text{Right Tail} \end{cases}\]

A type II error occurs if the hypothesis test based on a random sample fails to reject the null hypothesis even when the true population mean \(\mu\) is in fact less than \(\mu_0\).

8.1 Example 5

Suppose the manufacturer claims that the mean speed of a motorcycle is more than 100 km/hours. In Assume the population standard deviation is 1.2 km/hours. At .05 significance level, what is the probability of having type II error for a sample size of 30 motorcycles?

8.1.1 Standard Error of the Mean

We begin with computing the standard deviation of the mean, sem.

## [1] 0.219089

Next compute the lower bound of sample means for which the null hypothesis \(\mu \ge 10000\) would not be rejected.

8.1.2 Sample Mean

## [1] 99.63963

Therefore, so long as the sample mean is greater than 99.64 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 99.50, we can compute the probability of the sample mean above 99.64, and thus found the probability of type II error.

8.1.3 Probability of Error

## [1] 0.261957

If the motorcycle sample size is 30, the actual mean motorcycles speed is 9,950 hours and the population standard deviation is 120 hours, then the probability of type II error for testing the null hypothesis \(\mu \ge 10000\) at .05 significance level is 26.2%, and the power of the hypothesis test is 73.8%.

8.2 Exercise 5

Right tail: Garuda-food Indonesia claims that for each a cookie bag states of their product, there is at most 2 grams of saturated fat in a single cookie. Assume the actual mean amount of saturated fat per cookie is 2.075 grams and the sample standard deviation is 0.25 grams. At .05 significance level, what is the probability of having a type II error for a sample size of 35 cookies?

8.2.1 Standard Error of the Mean

We begin with computing the standard deviation of the mean, sem.

## [1] 0.04225771

Next compute the lower bound of sample means for which the null hypothesis \(\mu \leq 2\) would not be rejected.

8.2.2 Sample Mean

## [1] 1.930492

Therefore, so long as the sample mean is greater than 1.93 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 2.075, we can compute the probability of the sample mean above 1.93, and thus found the probability of type II error.

8.2.3 Probability of Error

## [1] 0.9996865

If the Garuda Food Indonesia sample size is 35, the actual mean amount of saturated fat per cookie is 2.075 grams and the sample standard deviation is 0.25 grams, then the probability of type II error for testing the null hypothesis \(\mu \leq 2\) at .05 significance level is 99%, and the power of the hypothesis test is 1%.

9 Type II ~ Two Tail

In a two-tailed test of the population mean, the null hypothesis claims that the true population mean \(\mu\) is equal to a given hypothetical value \(\mu_0\).

\[\mu_0 = \mu\]

A type II error occurs if the hypothesis test based on a random sample fails to reject the null hypothesis even when the true population mean \(\mu\) is in fact different from \(\mu_0\).

Assume that the population has a known standard deviation \(\sigma\). By the Central Limit Theorem, the population of all possible means of samples of sufficiently large size n approximately follows the normal distribution. Hence we can compute the range of sample means for which the null hypothesis will not be rejected, and then obtain an estimate of the probability of type II error.

9.1 Example 6

Some journals concluded that the average weight of Hachiko Dogs around the world last ten years was 15.4 kg. Researchers want to make sure if there a change in the average weight of these varieties after ten years. Assume the actual mean population weight is 15.1 kg, and the population standard deviation is 2.5 kg. At .05 significance level, what is the probability of having type II error for a sample size of 35 Hachiko Dogs?

9.1.1 Standard Error of the Mean

We begin with computing the standard deviation of the mean, sem.

## [1] 0.4225771

Next compute the left and right bounds of sample means for which the null hypothesis \(\mu = 15.4\) would not be rejected.

9.1.2 Sample Mean

## [1] 14.57176 16.22824

Therefore, so long as the sample mean is between 14.572 and 16.228 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 15.1, we can compute the lower tail probabilities of both end points.

9.1.3 Probability of Error

## [1] 0.1056435 0.9962062
## [1] 0.8905627

Finally, the probability of type II error is the probability between the two end points. If the sample size of Hachiko Dogs is 35, the actual mean population weight is 15.1 kg and the population standard deviation is 2.5 kg, then the probability of type II error for testing the null hypothesis \(\mu = 15.4\) at .05 significance level is 89.1%, and the power of the hypothesis test is 10.9%.

9.2 Exercise 6

Under same assumptions as Example 6, if actual mean population weight is 14.9 kg, what is the probability of type II errors? What is the power of the hypothesis test?

9.2.1 Standard Error of the Mean

We begin with computing the standard deviation of the mean, sem.

## [1] 0.4225771

Next compute the left and right bounds of sample means for which the null hypothesis \(\mu = 15.4\) would not be rejected.

9.2.2 Sample Mean

## [1] 14.57176 16.22824

Therefore, so long as the sample mean is between 14.572 and 16.228 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 14.9, we can compute the lower tail probabilities of both end points.

9.2.3 Probability of Error

## [1] 0.2186537 0.9991644
## [1] 0.7805107

Finally, the probability of type II error is the probability between the two end points. If the sample size of Hachiko Dogs is 35, the actual mean population weight is 14.9 kg and the population standard deviation is 2.5 kg, then the probability of type II error for testing the null hypothesis \(\mu = 15.4\) at .05 significance level is 78%, and the power of the hypothesis test is 22%.

10 Mind Map (Hypothesis)

Mind Map Hypothesis Testing

Mind Map Hypothesis Testing

