2020-08-30

Hypothesis

The word “Hypothesis” is a conjunction of two words:
\[Hypo + Thesis.\]

Hypo means just below and Thesis means some kind of theory. Therefore we can say that “hypothesis” is just below the theory. But still not proven.It has to go through several tests using sample data and statistical tools and techniques.

The statistical hypothesis is a statement or presumption or claim about the unknown population parameters. (The characteristics of population like mean \(\mu\), standard deviation \(\sigma\), proportion \(P\) are known as parameters.)

In other words we can put hypothesis as a generalizations of the population parameters based on sample information.

  • The statistical hypotheses are mathematically precise.
  • They are not too messy like research hypothesis.
  • They do correspond to specific claims about the parameters of the population.
  • In short they are explicit statements about the population parameter.

Examples

  1. Suppose, the operations manager of a light bulb factory claims that the light bulb produced by the factory has an average life of 5000 hrs.

  2. A botanist want to claim that the average length of petal of a iris flower is 5 cm.

  3. The proportion of female students is higher than male students in undergraduate levels.

  4. The average income of auto rickshaw driver in Butwal has higher income than in Bhairahawa.

Single(or One) sample tests and Two samples tests

The hypothesis tests which involves a sample from a single population to test the claim about the parameter is called single sample test.

we can refer example 1 and 2 as single sample test.

Whereas, the tests which involves the samples from two populations is known as two samples test.

The example 3 and 4 are two samples tests.

Single (One) sample tests

Test of significance of mean

Suppose if we want to test whether the population true mean (\(\mu\)) is significantly equal to some specified (or hypothesized) value say (\(\mu_0\)), then we can perform the testing of hypothesis as below:

Steps in hypothesis testing

  1. Null hypothesis (H0): \(\mu = \mu_0\)
    Against
    Alternative hypothesis (H1): \(\mu\neq \mu_0\) or \(\mu < \mu_0\) or \(\mu > \mu_0\)

  2. Level of significance(\(\alpha\)) = 0.05 (unless stated)

  3. The test statistic1.

    \({\displaystyle z = \frac{\overline{x} - \mu_0}{\sigma/ \sqrt{n}}}\)

  4. Decision Rule: Accept the \(H_0\), if the calculated |z| \(\leq\) tabulated z at \(\alpha\) level of significance. Reject \(H_0\) , otherwise.

  5. Conclusion: Conclusion can be drawn on the basis of decision.

Selection of correct test statistic (z or t):

t-test is applied in the case of small sample (n \(\leq 30\)) and population standard deviation (\(\sigma\)) is unknown.

Otherwise we use Z test.

Figure showing the selection of test statistic on the basis of \(\sigma\) and sample size(n)

Numericals:

A medical researcher is willing to test whether the average systolic blood pressure of the people in the community is 120 mm of Hg. For this, he collected data on systolic blood pressure of 64 people in the community and found that the average and standard deviation of the pressure as 125 mm of Hg and 6.5 mm of Hg respectively. With help of these data can we conclude that the average systolic blood pressure of the people as a whole in the community is 120 mm of Hg? Use 5% level of significance.

Solution

  1. Null hypothesis (H0): \(\mu = 120\) mm of Hg
    Against,
    Alternative hypothesis(H1) : \(\mu\neq 120\) mm of Hg

  2. Level of significance (\(\alpha\)) = 0.05

  3. The test statistic: \({\displaystyle z = \frac{\overline{x} - \mu_0}{s/ \sqrt{n}}}\)
    Where:

  • \(\overline x\) = sample mean = 125 mm of Hg

  • \(\mu_0\) = hypothesized mean = 120 mm of Hg

  • \(s\) = sample standard deviation = 6.5 mm of Hg, and

  • \(n\) = sample size = 64 people

After substitution and computation we get,

s.mean = 125
s.std.dev = 6.5
sample.size = 64
hypo.mean = 120
z = (s.mean - hypo.mean)/s.std.dev*sqrt(sample.size)
z
[1] 6.153846

\(\therefore\) Cal |z| = 6.15
and the tabulated value of z at 5 % level of significance = 1.96

  1. Decision:

Since the cal|z| > tabulated z at 5 %,
we reject the Null hypothesis (H0).
\(\therefore\) we accept the Alternative hypothesis (H1).

  1. Conclusion

Hence, we conclude that the average systolic blood pressure of the people in the community is not significantly equal to 120 mm of Hg.

Reference for tabulated value of Z.

Figure showing the tabulated values of z at different level of significance.

Home work

Refer to example 1. To test the claim made by the manager, a researcher sampled 100 light bulbs from a day’s production and put them in an experiment. At the end of the experiment the researcher found that the average life of light bulbs to be 4900 hrs with the standard deviation of 120 hrs. On the basis of these data can it be concluded that the operations manager of the factory is legitimate? use 10 % level of significance.

Solution

In the usual notations we are given by;

  • Hypothesized mean life (Claim) of light bulbs (\(\mu_0\)) = 5000 hrs

  • Sample size (n) = 100 light bulbs

  • Sample Mean life of light bulbs (\(\overline x\)) = 4900 hrs

  • Sample standard deviation (\(s\)) = 120 hrs

  • Now, setting the Null and Alternative hypothesis.

  • H0: \(\mu = 5000\) hrs [The true mean life of the light bulbs is significantly equal to 5000 hrs]

  • H1 : \(\mu \neq 5000\) hrs [The true mean life of the light bulbs is not significantly equal to 5000 hrs]

  • Level of significance (\(\alpha\)) = 0.1

  • Test statistic

  • \(Z = \huge{\frac{\overline x - \mu_0}{s / \sqrt{n}}}\)

  • \(Z = 8.33\)

  • \(\therefore Cal |Z| = 8.33\)

  • Tabulated Z at 10 % level of significance =

  • 1.64

  • Decision: Since \(Cal |Z| > Tabulated\ Z\) at 10 %, we

  • Reject the H0. \(\therefore\) We accept H1.

  • Conclusion: Hence, we conclude that the claim made by the operations manager is not legitimate.

Numericals:

  1. A manufacturer of ball pen claims that the ball pens manufactured by the company has mean writing life of 400 pages with the standard deviation of 20 pages. A research scholar selects a sample of 45 ball pens and puts them for test. The mean writing life for the sample was found to be 395 pages. Should the scholar report that the average life of the ball pens have been decreasing as per manufacture’s claim at 1 % level of significance?

Solution

In the usual notations we are given by;

  • Hypothesized mean life of ball pen (\(\mu_0\)) = 400 pages

  • Population standard deviation (\(\sigma\)) = 20 pages

  • Sample size (n) = 45 ball pens

  • Sample Mean life of ball pen (\(\overline x\)) = 395 pages

Now, testing of hypothesis:

  • H0: \(\mu = 400\) pages [The true mean life of the ball pen is significantly equal to 400 pages]

  • H1 : \(\mu < 400\) pages [The true mean life of the ball pen is significantly less than 400 pages i,e. decreasing]

  • Level of significance (\(\alpha\)) = 0.01

  • Test statistic

  • \(Z = \Large \frac{\overline x - \mu_0}{\sigma / \sqrt{n}}\)

  • \(Z = -1.677\)

  • \(\therefore Cal |Z| = 1.677\)

  • Tabulated Z at 1 % level of significance =

  • 2.325 ( Why?)

  • Because the H1 is one tailed.

  • Decision: Since \(Cal |Z| < Tabulated\ Z\) at 1 %, we

  • Accept the H0.

  • Conclusion: Hence, we conclude that the research scholar should report that the mean writing life of the ball pen is significantly equal to 400 pages, the manufacturer’s claim is still valid.

  1. The mean income of random sample of 100 employees of an industrial concern was found to be Rs. 3000. If the standard deviation of the population was 250, find the standard error of the mean and also test whether the sample mean differs from the population mean of Rs. 2850.

Hint

The standard error of mean \((\sigma_{\overline x}) = \frac{\sigma}{\sqrt{n}}\)

Computation of sample mean (\(\overline x\)) and sample standard deviation (\(s\))

sample mean (\(\overline x\)) and sample standard deviation (\(s\)) are computed as below:

  1. Sample mean (\(\overline x\)) = \(\frac{\sum {x}}{n}\)

  2. Sample standard deviation (\(s\)) = \({\sqrt{\frac{1}{n-1}{\sum (x - \overline x)^2}}}\)

In short cut we can compute \(s\) as:

\(s\ =\ \sqrt{\frac{1}{n-1}[\sum x^2 - n\overline x^2]}\)

Now!!! you can try question number 6 on page 371.

Remember

Population standard deviation (\(\sigma\)) = \({\sqrt{\frac{1}{n}{\sum (x - \overline x)^2}}}\)

Or, short cut formula for \(\sigma\) is

\(\sigma = \sqrt{\frac{1}{n}\sum x^2 - (\overline {x})^2}\)