Hypothesis testing

Author

Sai Sravya

ABSTRACT

Hypothesis testing ascertains whether a particular assumption is true for the whole population. It is a statistical tool. It determines the validity of inference by evaluating sample data from the overall population.

The concept of hypothesis works on the probability of an event's occurrence. It confirms whether the primary hypothesis results are correct or not. It is widely applied in research—biology, criminal trials, marketing, and manufacturing.

INTRODUCTION

Hypothesis testing uses sample data to validate the research. Researchers speculate on relationships between various factors. They then collect data to test those relationships. Based on the data, researchers draw conclusions. In statistics, it is very important to eliminate randomness. The data should not have been caused by chance or a random factor. Hypothesis testing eliminates such uncertainties.

For every research experiment, there are mainly two explanations: the null hypothesis and the alternative hypothesis. It is often difficult to prove a theory; therefore, investigators test to reject the null hypothesis. So, the remaining alternate theory is believed to be true when the null hypothesis is rejected.

For example, if we believe that the returns from the NASDAQ stock index are not zero. Then the null hypothesis would state: 'the recovery from the NASDAQ is zero.' Tests are conducted for different levels of statistical significance.

Hypothesis tests are prone to two errors—type 1 and type 2. If the null hypothesis is rejected by the sample outcome despite being true—it is considered a type 1 error. Similarly, if the sample data fails to reject the null hypothesis, it is considered a type 2 error despite the null hypothesis being false.

HYPOTHESIS TESTING TYPES

Based on population distribution, hypothesis testing is further categorized into sub-types:

Simple: In a simple hypothesis, the population parameter is stated as a specific value, making the analysis easier.
Composite: In a composite hypothesis, the population parameter ranges between a lower and upper value.
One-tailed: When the majority of the population is concentrated on one side, it is called a one-tailed test. In a one-tailed test, the sample test is either higher or lower than the population parameter.
Two-tailed: The two-tailed hypothesis test works when the critical distribution of the population is two-sided. Here the test sample is either higher or lower than a number of given values.

TESTING OF HYPOTHESIS
1. State the Problem
2. Formulate the null and the appropriate alternative hypothesis
A statistical hypothesis is a guess or conjecture about the numerical value of some unknown population parameters. A null hypothesis is denoted by H0H0, such as
1. H0:μ=120H0:μ=120
2. H0:p=0.5H0:p=0.5
3. H0:μ≤50H0:μ≤50
When a hypothesis expresses a single value for the unknown parameter, like (a) and (b), the hypothesis is called a simple hypothesis. Otherwise, like in (c), it is called a composite hypothesis.
1. Specify the level of significance
This means choosing the probability of rejecting a true null hypothesis. It is denoted by αα.
1. Determine the appropriate test statistic
2. Compute the actual value of the test statistic from the sample
6.1. Determine the critical values for the sampling distribution and appropriate level of significance

The set of all possible values of the test statistic is divided into two regions:
- rejection/critical region
- nonrejection region
How the range of possible values is divided into the rejection or acceptance region will depend upon the alternative hypothesis.
- If the computed test statistic is in the interval defining the critical region, the null hypothesis is rejected.
- If the computed test statistic is in the interval defining the nonrejection region, the null hypothesis is not rejected.
There are two types of statistical tests:
1. One-sided Test is a test where the critical region is in one direction only.
If μ0μ0 is some specific constant and the null hypothesis is of the form H0:μ=μ0H0:μ=μ0, then the critical region for one-sided alternative hypothesis, Ha:μ>μ0Ha:μ>μ0.

Similarly, for an alternative hypothesis Ha:μ<μ0Ha:μ<μ0, the critical region is on the left tail of the distribution.
1. Two-sided Test has an alternative hypothesis of the form Ha:μ≠μ0Ha:μ≠μ0. Its critical region is on the two tails of the probability distribution.
A hypothesis can be also tested by constructing a (1−α)100%(1−α)100% confidence interval for the parameter of interest if the test is two-sided. If the hypothesized parameter value is contained in the confidence interval, then H0H0 is not rejected. Otherwise, if the hypothesized parameter value is not contained in the interval, H0H0 is rejected.

6.2. Determining the p-value of the test statistic

Alternatively, we can find the probability of the result or more extreme if H0H0 is true and use this so-called pp-value to choose between the two hypothesis. Decision making will be as follows:
- If the p-value is less than the significant level, p<αp<α, the null hypothesis, H0H0, is rejected.
- If the p-value is greater than the significant level, p>αp>α, the null hypothesis, H0H0, is not rejected.
1. Make a statistical decision
- The null hypothesis is rejected if the computed value of the test statistic is within the critical region, otherwise H0H0 is not rejected.
- The null hypothesis will be rejected if the pp-value obtained is less than the level of significance αα, p<αp<α.
If the null hypothesis is rejected, it does not follow that H0H0 is true. It may be true but the evidence compatible with the null hypothesis is never conclusive. An appropriate conclusion will be to state that "there is no substantial evidence to reject the null hypothesis" rather than concluding the null hypothesis is true.
1. State the appropriate conclusion
  
  HYPOTHESIS TESTING FORMULA
  
  Researchers opt for different statistical tests like t-tests or z-tests. The z-test formula is as follows:
  
  Z = ( x̅ – μ₀ ) / (σ /√n)
  - Here, x̅ is the sample mean,
  - μ₀ is the population mean,
  - σ is the standard deviation,
  - n is the sample size.
  Based on the Z-test result, the research derives the hypothesis conclusion. It can either be a null or it’s alternative. They are measured using the following formula:
  
  H₀: μ=μ₀
  
  H_a: μ≠μ₀
  
  Here,
  
  H₀ = null hypothesis
  
  H_a = alternate hypothesis
  
  If the mean value is equal to the population mean, then the null hypothesis is proven true. Otherwise, the alternate hypothesis is taken into consideration.
  
  REAL-LIFE APPLICATIONS OF HYPOTHESIS TESTING
  
  BIOLOGY
  
  Hypothesis tests are often used in biology to determine whether some new treatment, fertilizer, pesticide, chemical, etc. causes increased growth, stamina, immunity, etc. in plants or animals.
  
  For example, suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than they normally do, which is currently 20 inches. To test this, she applies the fertilizer to each of the plants in her laboratory for one month.
  
  She then performs a hypothesis test using the following hypotheses:
  - H₀: μ = 20 inches (the fertilizer will have no effect on the mean plant growth)
  - H_A: μ > 20 inches (the fertilizer will cause mean plant growth to increase)
  If the p-value of the test is less than some significance level (e.g. α = .05), then she can reject the null hypothesis and conclude that the fertilizer leads to increased plant growth.
  
  ADVERTISING SPEND
  
  Hypothesis tests are often used in business to determine whether or not some new advertising campaign, marketing technique, etc. causes increased sales.
  
  For example, suppose a company believes that spending more money on digital advertising leads to increased sales. To test this, the company may increase money spent on digital advertising during a two-month period and collect data to see if overall sales have increased.
  
  They may perform a hypothesis test using the following hypotheses:
  - H₀: μ_after = μ_before (the mean sales is the same before and after spending more on advertising)
  - H_A: μ_after > μ_before (the mean sales increased after spending more on advertising)
  If the p-value of the test is less than some significance level (e.g. α = .05), then the company can reject the null hypothesis and conclude that increased digital advertising leads to increased sales.
  
  PROBLEMS AND SOLUTIONS
  
  Problem
  
  Suppose the manufacturer claims that the mean lifetime of a light bulb is more than 10,000 hours. In a sample of 30 light bulbs, it was found that they only last 9,900 hours on average. Assume the population standard deviation is 120 hours. At .05 significance level, can we reject the claim by the manufacturer?
  
  Solution
  
  The null hypothesis is that μ≥10000μ≥10000.
  
  The alternative hypothesis is that μ<10000μ<10000.
  
  We begin with computing the test statistic.
```
xbar = 9900            # sample mean 
mu0 = 10000            # hypothesized value 
sigma = 120            # population standard deviation 
n = 30                 # sample size 
z =(xbar-mu0)/(sigma/sqrt(n)) 
z  
```
```
[1] -4.564355
```
  Then the null hypothesis of the lower tail test is to be rejected if z≤−zαz≤−zα.
  
  We then compute the critical value at the .05 significance level.
```
alpha = .05 
z.alpha = qnorm(1-alpha) 
-z.alpha 
```
```
[1] -1.644854
```
  The test statistic -4.5644 is less than the critical value of -1.6449. Hence, at the .05 significance level, we reject the claim that the mean lifetime of a light bulb is above 10,000 hours.
  
  REFERENCES
  
  https://www.geeksforgeeks.org/hypothesis-testing-formula/
  
  https://www.statology.org/hypothesis-testing-real-life-examples/
  
  https://rpubs.com/DonArres/HypothesisTestingInR
  
  https://www.wallstreetmojo.com/hypothesis-testing/

ABSTRACT

INTRODUCTION

HYPOTHESIS TESTING TYPES

TESTING OF HYPOTHESIS

HYPOTHESIS TESTING FORMULA

REAL-LIFE APPLICATIONS OF HYPOTHESIS TESTING

BIOLOGY

ADVERTISING SPEND

PROBLEMS AND SOLUTIONS

REFERENCES