The Null Hypothesis

We focus on the null hypothesis and decide whether or not we are able to reject it and accept an alternative hypothesis instead.

The concept of the null hypothesis is not very intuitive. A better term might be “The claim of nothingness.” That probably isn’t a legitimate word, but it may be clearer than “null hypothesis.”

What are some typical null hypotheses?

The value of this parameter is zero.
The mean value of this population is no different from what we have always assumed.
There is no difference between the mean values of these two populations.
There is no difference between the probability of success and of failure. In other words the probability of success is .5.
There is no difference between the effect of this drug and that of a placebo.

The Alternative Hypothesis

The alternative is what we will accept if we decide to reject the null hypothesis. There are two possibilities:

The parameter is simply not the value assumed in the null hypothesis, but we make no assumption about the direction of the difference. This is called a two-sided alternative.
The parameter differs from the value assumed in the null hypothesis in a specific direction. This is called a one-sided alternative.

The choice of the alternative must be made before data is analyzed. This is hard to do in the classroom environment where the results of the data analyis are presented in the problem statement.

The Process

This is taken from the CMU course.

State the null and alternative hypotheses.
Collect relevant data from a random sample and summarize them (using a test statistic).
Find the p-value, the probability of observing data like those observed assuming that \(H_{o}\) is true.
Based on the p-value, decide whether we have enough evidence to reject \(H_{o}\) (and accept \(H_{a}\)), and draw our conclusions in context.

Examples

For each of these state the null hypothesis and the alternative hypothesis. Note that to mimic the real-world process we must ignore the results of any data analysis at this stage.

Example 1

A professional gambler is concerned about the possibilty that a coin used in flipping rituals might not be fair. She really doesn’t have any particularly suspicion. She’s just uncomfortable that it’s never been formally examined. She asks that the coin be tested by flipping it many times to see if it really shows heads 50% of the time.

Solution

The null hypothesis is that the probability of a head is .5.

The alternative is two-sided. Note “She really doesn’t have any particularly suspicion.”

Example 2

Another professional gambler is concerned that a coin has been showing heads more frequently than it should. He asks to have it examined for fairness by flipping it many times to see if it is really fair.

Solution

The null hypothesis is that the probability of a head is .5.

The alternative is one-sided. Note “concerned that a coin has been showing heads more frequently than it should”

Example 3

New York is known as “the city that never sleeps”. The normal amount of sleep for the US population is eight hours per night. A statistician takes a survery of random residents of New York and asks them how many hours they slept las night.

Solution

The null hypothesis is that the mean of the replies is 8.

The alternative is one-sided since you expect the average to be less than 8.

Example 4

A university president is concerned that students are extending break periods by staying home extra days after holidays. On a normal monday, 10% of students will be absent from class. He asks that professors take role in class the monday after Thanksgiving and report the results to him.

The General Case With a Known Standard Deviation

In this situation we claim to know the standard deviation, \(\sigma\) of the population. We have a sample of size \(n\) and a sample mean, \(\bar{x}\). We want to test the hypothesis that the true mean is a hypothesized value \(\mu\) against the 2-sided alternative that the true mean is not \(\mu\). Under most conditions (Discussed later), we can compute a z-score as a test statistic, which has a standard normal distribution.

\[z=\frac{\bar{x}-\mu}{\sigma_{\bar{x}}}\]

We can then obtain the p-value as we did above.

The value of \(\sigma_{\bar{x}}\) is computed as \(\frac{\sigma}{\sqrt{n}}\).

The following code snippet does the work.

# Replace the example values as necessary

xbar <- 135    # Sample mean
mu <- 134      # Hypothesized value of the mean
sigma <- 15    # Known population standard deviation
n <- 100       # sample size
sided = 2      # Specification of the alternative type

# Now do the work
sd.xbar <- sigma/sqrt(n)
z <- (xbar - mu)/sd.xbar
p.value <- sided * pnorm(-abs(z))

# Display the p-value.

p.value

## [1] 0.5049851

Exercises

Use this code to solve the follwong problems.

Exercise 1

A sample of size 200 yields a mean of 11.2. This is taken from a population with a known standard deviation of .4. Test the null hypothesis that the true mean value is 11 against the alternative that it is not 11.

Solution

# Replace the example values as necessary

xbar <- 11.2    # Sample mean
mu <- 11        # Hypothesized value of the mean
sigma <- .4    # Known population standard deviation
n <- 200       # sample size
sided = 2      # Specification of the alternative type

# Now do the work
sd.xbar <- sigma/sqrt(n)
z <- (xbar - mu)/sd.xbar
p.value <- sided * pnorm(-abs(z))

# Display the p-value.

p.value

## [1] 1.53746e-12

Since the p value is far below .05, we reject the null hypothesis.

Exercise 2

A sample of size 1000 yileds a mean of 25.59. The known population standard deviation is 1.2. Test the null hypotheis that the true mean is 25.6 agianst the alternative that the true mean is less than 25.6.

Solution

# Replace the example values as necessary

xbar <- 25.59    # Sample mean
mu <- 25.6       # Hypothesized value of the mean
sigma <- 1.2    # Known population standard deviation
n <- 1000       # sample size
sided = 1      # Specification of the alternative type

# Now do the work
sd.xbar <- sigma/sqrt(n)
z <- (xbar - mu)/sd.xbar
p.value <- sided * pnorm(-abs(z))

# Display the p-value.

p.value

## [1] 0.3960737

Since the p value is greater than .05, we are unable to reject the null hypothesis.

The General Case with an Estimated Standard Deviation

In the case where we don’t know the population standard deviation, we will have to estimate it from the sample we have. The computation is almost identical to the earlier case, but instead of a z-score, we call what we get a t-statistic. Then instead of a standard normal distribution, we have soemthing with a t distribution. The t distribution is very similar to the standard normal, but it requires that we specify the “degrees of freedom.” In this case, we use the formula \(df = n - 1\). When the sample size, \(n\) is large the difference between the t distribution and the standard normal distribution disappears. Follow the convention from earlier, when we have an estimated standard deviation, we refer to it as \(S\) rather than \(\sigma\).

The following code incorporates these changes.

# Replace the example values as necessary

xbar <- 135    # Sample mean
mu <- 134      # Hypothesized value of the mean
s <- 15    # Known population standard deviation
n <- 100       # sample size
sided = 2      # Specification of the alternative type

# Now do the work
sd.xbar <- s/sqrt(n)
t <- (xbar - mu)/sd.xbar
p.value <- sided * pt(-abs(t),df=n-1)

Hypothesis Testing for a Single Proportion

Here is the scenario. We have a sample of size \(n\) from a large population and we have estimated the proportion of cases in the sample that meet some criterion. This estimated proportion is denote \(\hat{p}\). We wish to test the null hypothesis that the true population proportion is a specific value denoted \(p_{0}\). Under certain conditions, the quantity \(z\) has a standard normal distribution. \(z\) is computed as:

\[z=\frac{\hat{p}-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}\]

What are the “certain conditions” which allow us to assume that \(z\) will have a standard normal distribution. There are two conditions.

\[n* p_{0}\geq 10\] and \[n* (1-p_{0})\geq 10\]

The following code snippet constructs z and obtains the p-value.

# Here are the inputs which can be changed to reuse the snippet.
n <- 100       # Number of trials (sample size)
phat <- .6    # Poroportion of sample cases meeting the definition
p0 <-  .5       # The value of p under the null hypothesis
sided <- 2    # Specification of the alternative  
 
# Construct z  
z <- (phat - p0)/sqrt( (p0*(1-p0) )/n )

# Compute and display the p-value
pvalue <- sided * pnorm(-abs(z))
pvalue

## [1] 0.04550026

Module 7 Presentation

The Basic Framework of Hypothesis Testing

The Null Hypothesis

The Alternative Hypothesis

The Process

Examples

Example 1

Solution

Example 2

Solution

Example 3

Solution

Example 4

Mechanics.

The General Case With a Known Standard Deviation

Exercises

Exercise 1

Solution

Exercise 2

Solution

The General Case with an Estimated Standard Deviation

Hypothesis Testing for a Single Proportion