Harold Nelson
2025-03-31
We can use the t test to test hypotheses about the mean value of a variable.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
In this situation we claim to know the standard deviation, \(\sigma\) of the population. We have a sample of size \(n\) and a sample mean, \(\bar{x}\). We want to test the hypothesis that the true mean is a hypothesized value \(\mu\) against the 2-sided alternative that the true mean is not \(\mu\). Under most conditions (Discussed later), we can compute a z-score as a test statistic, which has a standard normal distribution.
\[z=\frac{\bar{x}-\mu}{\sigma_{\bar{x}}}\]
We can then obtain the p-value.
The value of \(\sigma_{\bar{x}}\) is computed as \(\frac{\sigma}{\sqrt{n}}\).
The following code snippet does the work.
# Replace the example values as necessary
xbar <- 135 # Sample mean
mu <- 134 # Hypothesized value of the mean
sigma <- 15 # Known population standard deviation
n <- 100 # sample size
sided = 2 # Specification of the alternative type
# Now do the work
sd.xbar <- sigma/sqrt(n)
z <- (xbar - mu)/sd.xbar
p.value <- sided * pnorm(-abs(z))
# Display the p-value.
p.value
## [1] 0.5049851
Use this code to solve the following problems.
A sample of size 200 yields a mean of 11.2. This is taken from a population with a known standard deviation of .4. Test the null hypothesis that the true mean value is 11 against the alternative that it is not 11.
# Replace the example values as necessary
xbar <- 11.2 # Sample mean
mu <- 11 # Hypothesized value of the mean
sigma <- .4 # Known population standard deviation
n <- 200 # sample size
sided = 2 # Specification of the alternative type
# Now do the work
sd.xbar <- sigma/sqrt(n)
z <- (xbar - mu)/sd.xbar
p.value <- sided * pnorm(-abs(z))
# Display the p-value.
p.value
## [1] 1.53746e-12
In the case where we don’t know the population standard deviation, we will have to estimate it from the sample we have. The computation is almost identical to the earlier case, but instead of a z-score, we call what we get a t-statistic. Then instead of a standard normal distribution, we have a random variable with a t distribution. The t distribution is very similar to the standard normal, but it requires that we specify the “degrees of freedom.” In this case, we use the formula \(df = n - 1\). When the sample size, \(n\) is large the difference between the t distribution and the standard normal distribution disappears. Follow the convention from earlier, when we have an estimated standard deviation, we refer to it as \(S\) rather than \(\sigma\).
The following code incorporates these changes.
# Replace the example values as necessary
xbar <- 135 # Sample mean
mu <- 134 # Hypothesized value of the mean
s <- 15 # Estimated population standard deviation
n <- 100 # sample size
sided = 2 # Specification of the alternative type
# Now do the work
sd.xbar <- s/sqrt(n)
t <- (xbar - mu)/sd.xbar
p.value <- sided * pt(-abs(t),df=n-1)
# Compare the code above with
# z <- (xbar - mu)/sd.xbar
# p.value <- sided * pnorm(-abs(z))
Try a few different values of n in this code and see how much the p value differs from what we obtained with z (1.53746e-12).
For reasonable large values of the sample size, there is little difference between t and z.
If the sample is small, the difference can be significant.
The conclusion: Don’t stop to think about it. Use t, not z.
If you should have used t, you’ll be right.
If you didn’t have to use it, the result will be essentially the same p value.
Here is the scenario. We have a sample of size \(n\) from a large population and we have estimated the proportion of cases in the sample that meet some criterion. This estimated proportion is denote \(\hat{p}\). We wish to test the null hypothesis that the true population proportion is a specific value denoted \(p_{0}\). Under certain conditions, the quantity \(z\) has a standard normal distribution. \(z\) is computed as:
\[z=\frac{\hat{p}-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}\]
What are the “certain conditions” which allow us to assume that \(z\) will have a standard normal distribution. There are two conditions.
\[n* p_{0}\geq 10\] and \[n* (1-p_{0})\geq 10\] ## Code
The following code snippet constructs z and obtains the p-value.
# Here are the inputs which can be changed to reuse the snippet.
n <- 100 # Number of trials (sample size)
phat <- .6 # Poroportion of sample cases meeting the definition
p0 <- .5 # The value of p under the null hypothesis
sided <- 2 # Specification of the alternative
# Construct z
z <- (phat - p0)/sqrt( (p0*(1-p0) )/n )
# Compute and display the p-value
pvalue <- sided * pnorm(-abs(z))
pvalue
## [1] 0.04550026
Change the value of n and examine the impact it has on the p value.
With real data, we use the t.test() and prop.test() functions. The code in the snippets I provided allow you to solve toy “classroom” problems.
See the conversation with ChatGPT at https://chat.openai.com/share/a88fb015-b035-4c26-a1e1-97c12f94d3c3 for t-test().
Right click and select new tab or window. Left click doesn’t work.
See the conversation with ChatGPT at https://chat.openai.com/share/220cb321-1650-4c12-a522-c132951e30aa for prop.test().
Right click and select new tab or window. Left click doesn’t work.