In hypothesis testing, a probability from a relevant sampling distribution is used as evidence against some claim about the true value. When a probability is used in this way, it is referred to as a p-value.
The hypotheses are formal statements about the parameters.
The null hypothesis denoted by \(H_0\) is interpreted as the baseline or no-change hypothesis and is the claim that is assumed to be true. This is usually stated as an equality (we use an \(=\) sign).
The alternative hypothesis denoted by \(H_A\) is the conjecture that you’re testing for, against the null hypothesis. This can be a less than, or greater than statement (we use \(\lt\) or \(\gt\) signs) and it’s called a one-sided hypothesis. Most of the time, however, we will work with two-sided hypothesis (we use the \(\ne\) sign).
The test statistic is the statistic that is compared to the appropriate standardized sampling distribution to yield the p-value.
A rejection region (RR) specifies the range of values the test statistic might assume that would lead to rejection of \(H_0\).
The p-value is the probability value that is used to quantify the amount of evidence, if any, against the null hypothesis.
Put simply, the more extreme the test statistic, the smaller the p-value. The smaller the p-value, the greater the amount of statistical evidence against the assumed truth of \(H_0\).
The significance level, denoted by \(\alpha\) is used to qualify the result of the test. The significance level defines a cutoff point, at which you decide whether there is sufficient evidence to view \(H_0\) as incorrect.
If the p-value is less than \(\alpha\), then the result of the test is statistically significant. This implies there is sufficient evidence against the null hypothesis, and therefore you reject \(H_0\).
People typically use small values for \(\alpha\) like 5% or 1%.
The level of significance is our willingness to make an error in our conclusion about an experiment due to sampling variability.
Remember that the level of significance \(\alpha\) corresponds to a \((1-\alpha)\)% confidence level.
That means that if a confidence interval does not include the hypothesized value, the results of a test are statistically significant.
When making a decision to either “Reject” or “Fail to Reject” a null hypothesis, there is always the possibility that we are making a mistake, since we are basing the decision on a sample and not the population.
In reality, we never know if we make a Type I or Type II error because we never know everything about the population that we are studying. All we can do is control the probability of making one of these errors.
The test statistic Z in a hypothesis test for a single mean and known \(\sigma\) with respect to a null value of \(\mu_0\) is given as:
\[Z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}\]
The test statistic T in a hypothesis test for a single mean and unknown standard deviation with respect to a null value of \(\mu_0\) is given as:
\[T = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}\]
When we want to directly compare the means of two distinct groups of measurements we want to test:
\[ H_0: \mu_1 = \mu_2\] Notice that this is equivalent to the following statement:
\[ H_0: \mu_1 - \mu_2 = 0\] so when we test for differences in means, we really are testing on whether the mean of two groups are equal to each other.
There are several versions for this test, but we will concentrate on two cases, independent samples, and paired samples.
The most general case is where the two sets of measurements are based on two independent, separate groups (also referred to as unpaired samples).
The test statistic is given by:
\[T = \frac{(\:\overline{x_1}-\overline{x_2}\:) -\mu_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\] where the degrees of freedom will be determined as the smaller of \(n_1-1\) or \(n_2-1\), OR \(n_1+n_2-2\) if the standard deviations and sample sizes are similar.
Paired data occur if the measurements forming the two sets of observations are recorded on the same individual or if they are related in some other important or obvious way. A classic example of this is “before” and “after” observations, such as two measurements made on each person before and after some kind of intervention treatment.
In this case we are concerned on the mean difference \(\bar{d}\) and the test statistic is given by:
\[T = \frac{\bar{d} -\mu_0}{\sqrt{\frac{s_d^2}{n} }}.\]
In testing for the true value of some proportion of success, \(p\), let \(\hat{p}\) be the sample proportion over n trials, and let the null value be denoted with \(p_0\). You find the test statistic with the following:
\[Z =\frac{\hat{p}-p_0}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}}\]
As with the difference between two means, you’re often testing whether the two proportions are the same and thus have a difference of zero.
The test statistic is given by:
\[Z =\frac{(\hat{p_1}-\hat{p_2})-p_0}{\sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2} }}.\] _____
Example 1: Families of size four spent a population mean of $170 on a Spurs game with a standard deviation of $40 last year. In an exit poll from the AT&T center at a recent game, 50 randomly chosen families of size four were interviewed. The average spent by these families is $182. Is this sufficient evidence at the 5% level of significance to indicate that the amount that families of size four spent this year is different than last? Calculate the p-value for this test.
Example 2: A brace on an AH-1J helicopter provides structural support to the airframe. The force exerted on the brace during action is 2000 psi. A test of six braces results in the following strengths. \[ 2134, 2409, 2213, 2154, 2387, 1942 \] Do these data provide sufficient evidence at the 1% level of significance to support the claim that the mean brace strength differs from the 2000-psi force?
Example 3: Sonic Boom has 40% of the CD market in a large city. SB conducted an extensive ad campaign to increase its share of the market. A post ad campaign of 100 randomly chosen persons revealed that 47 said they bought from SB. Is this sufficient evidence at the 10% level of significance to indicate SB’s share of the market has increased? Would this change if we want to be 95% confident of our conclusion?
Example 4: In a recent consumer confidence survey of 400 adults, 54 of 200 men and 36 of 200 women expressed agreement with the statement, “I would have trouble paying an unexpected bill of $1000 without borrowing from someone or selling something.” Do men and women differ on their answer to this question? Use a 0.05 level of significance. State your conclusion using the p-value?
Example 5: Two independent random samples are drawn from two populations. For the first sample: \(n_1=36, \bar{x_1}=26, \sigma_1 = 3.25\); for the second sample: \(n_2=31, \bar{x_2}=23, \sigma_2=2.75.\) Calculate a 99% confidence interval for the difference in means and use it to test the hypothesis of equal means.
Example 6: The data below show the effect of two soporific drugs in 10 patients. The response measured is the increase in hours of sleep.
Is there significant evidence that one of the drugs is more efficient than the other? Use a 5% level of significance.