Confidence Intervals & Hypothesis Testing for a Single Population

In R, the t.test() function can be used to perform both confidence interval estimation and hypothesis testing for a single population mean (average).

Here is the default function in R:

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

In R, use the prop.test() function to conduct hypothesis tests and estimate confidence intervals for population proportions.

Here is the default function in R:

prop.test(x, n, p = NULL,
          alternative = c("two.sided", "less", "greater"),
          conf.level = 0.95, correct = TRUE)

Note:

By default, the function prop.test() used the Yates continuity correction, which is really important if either the expected successes or failures is < 5. If you don’t want the correction, then set the argument correct = FALSE in prop.test() function. The default value is TRUE. This option must be set to FALSE to make the test equivalent to the uncorrected z-test of a proportion.

To obtain the confidence interval for a population proportion, first use the table() function to get \(n\), the number of trials and \(x\), the number of successes in \(n\) trials.

We are going to use the Mid-Atlantic Wage Data from the ISLR2 package for this tutorial. The data comprise of wage and other data for a group of 3000 male workers in the Mid-Atlantic region.

From the Mid-Atlantic Wage Data, we extract the age and health_ins variables. We apply the t.test() function on the age variable while we implement the prop.test() function on the health_in variable.

From the Mid-Atlantic Wage Data, age, a continuous variable represents the Age of worker and health_in, a categorical variable with levels 1. Yes and 2. No indicating whether worker has health insurance.

Run the following lines of code to load the Mid-Atlantic Wage Data and extract the age and health_in variables.

library(ISLR2)
data(Wage)
Age <- Wage$age
Health_Insurance <- Wage$health_ins

Confidence Intervals

Confidence Interval for a Population Mean (\(\sigma\) Unknown)

Example 1:

Construct 90%, 95%, and 99% confidence intervals for the true (population) mean age.

  • 90% Confidence Interval for the true population mean
CI_90 <- t.test(Age, conf.level = 0.90)
CI_90$conf.int
## [1] 42.06793 42.76140
## attr(,"conf.level")
## [1] 0.9
  • 95% Confidence Interval for the true population mean
CI_95 <- t.test(Age)
CI_95$conf.int
## [1] 42.00147 42.82787
## attr(,"conf.level")
## [1] 0.95
  • 99% Confidence Interval for the true population mean
CI_99 <- t.test(Age, conf.level = 0.99)
CI_99$conf.int
## [1] 41.87150 42.95783
## attr(,"conf.level")
## [1] 0.99

Confidence Interval for a Population Proportion

Example 2:

Construct 90%, 95%, and 99% confidence intervals for the true (population) proportion of those with no health insurance.

# Table
tab <- table(Health_Insurance)
# n = Number of trials
n <- sum(tab)
# x = Number of successes in n trials
x <- tab[2]
  • 90% Confidence Interval for the true population proportion
CI_90 <- prop.test(x, n, conf.level = 0.90)
CI_90$conf.int
## [1] 0.2918476 0.3198401
## attr(,"conf.level")
## [1] 0.9
  • 95% Confidence Interval for the true population proportion
CI_95 <- prop.test(x, n)
CI_95$conf.int
## [1] 0.2892747 0.3225607
## attr(,"conf.level")
## [1] 0.95
  • 99% Confidence Interval for the true population proportion
CI_99 <- prop.test(x, n, conf.level = 0.99)
CI_99$conf.int
## [1] 0.2842869 0.3279107
## attr(,"conf.level")
## [1] 0.99

Hypothesis Testing

\(t\) Test for a Mean (Assume \(\sigma\) is unknown)

Use the function t.test() to perform a hypothesis test for a single population mean.

Example 3:

Assume the population mean age of male workers in the Mid-Atlantic region is 40 years. Based on the wage and other data for a group of 3000 male workers in the Mid-Atlantic region (sample data), can it be concluded that the population mean age has increased? Take \(\alpha = 0.05\).

## For a two-tailed test, set alternative =  "two.sided"
## The two-tailed alternative is the default
## For right-tailed test, set alternative =  "greater"
## For left-tailed test, set alternative =  "less"
HT_0.05 <- t.test(Age, mu = 40,  alternative =  "greater")
HT_0.05
## 
##  One Sample t-test
## 
## data:  Age
## t = 11.458, df = 2999, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 40
## 95 percent confidence interval:
##  42.06793      Inf
## sample estimates:
## mean of x 
##  42.41467



Step 1: State null and alternative hypothesis:

\[H_0: \mu = 40\] \[H_1: \mu > 40 \text{ (Claim)}\]

Step 2: Compute the test value (statistic):

  • \(t=\) 11.458

Step 3: Find the \(p\)-value:

  • \(p-\)value \(< 2.2e-16 \approx 0\)

Step 4: Make a decision:

  • Reject \(H_0\) since \(p-\)value \(< \alpha = 0.05\)

Step 5: Summarize the results:

  • Since \(H_0\) is rejected, there is enough evidence in the sample data to conclude that the population mean age of male workers in the Mid-Atlantic region has increased.
# Changing from scientific notation to standard form
format(2.2e-16, scientific=FALSE)
## [1] "0.00000000000000022"

\(z\) Test for a Proportion

Use the function prop.test() to perform a hypothesis test for a single population proportion.

Example 4:

Assume the population proportion of male workers in the Mid-Atlantic region with no health insurance is 0.48. Based on the wage and other data for a group of 3000 male workers in the Mid-Atlantic region (sample data), can it be concluded that the true population proportion has decreased? Take \(\alpha = 0.01\).

## For a two-tailed test, set alternative =  "two.sided"
## The two-tailed alternative is the default
## For right-tailed test, set alternative =  "greater"
## For left-tailed test, set alternative =  "less"
## Refer to Example 2 for the values of x and n
HT_0.01 <- prop.test(x, n, p = 0.48, alternative = "less", correct = FALSE)
HT_0.01
## 
##  1-sample proportions test without continuity correction
## 
## data:  x out of n, null probability 0.48
## X-squared = 365.29, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is less than 0.48
## 95 percent confidence interval:
##  0.0000000 0.3196715
## sample estimates:
##         p 
## 0.3056667



Step 1: State null and alternative hypothesis:

\[H_0: p = 48\] \[H_1: p<0.48 \text{ (Claim)}\]

Step 2: Compute the test value (statistic):

  • \(z = - 19.1126\)

Step 3: Find the \(p\)-value:

  • \(p-\)value \(< 2.2e-16 \approx 0\)

Step 4: Make a decision:

  • Reject \(H_0\) since \(p-\)value \(< \alpha = 0.05\)

Step 5: Summarize the results:

  • Since \(H_0\) is rejected, there is enough evidence in the sample data to conclude that the true population proportion of male workers in the Mid-Atlantic region without health insurance has decreased.

  1. Southeast Missouri State University, ↩︎