In R, the t.test() function can be used to perform both
confidence interval estimation and
hypothesis testing for a single population mean
(average).
Here is the default function in R:
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
In R, use the prop.test() function to conduct
hypothesis tests and
estimate confidence intervals for population
proportions.
Here is the default function in R:
prop.test(x, n, p = NULL,
alternative = c("two.sided", "less", "greater"),
conf.level = 0.95, correct = TRUE)
Note:
By default, the function prop.test() used the
Yates continuity correction, which is really important
if either the expected successes or failures is < 5.
If you don’t want the correction, then set the argument
correct = FALSE in prop.test() function. The
default value is TRUE. This option must be set to
FALSE to make the test equivalent to the uncorrected z-test
of a proportion.
To obtain the confidence interval for a population proportion, first
use the table() function to get \(n\), the number of trials and \(x\), the number of successes in \(n\) trials.
We are going to use the Mid-Atlantic Wage Data from the
ISLR2 package for this tutorial. The data comprise of wage
and other data for a group of 3000 male workers in the Mid-Atlantic
region.
From the Mid-Atlantic Wage Data, we extract the
age and health_ins variables. We apply the
t.test() function on the age variable while we
implement the prop.test() function on the
health_in variable.
From the Mid-Atlantic Wage Data, age, a
continuous variable represents the Age of worker and
health_in, a categorical variable with levels 1.
Yes and 2. No indicating whether worker has health
insurance.
Run the following lines of code to load the
Mid-Atlantic Wage Data and extract the age and
health_in variables.
library(ISLR2)
data(Wage)
Age <- Wage$age
Health_Insurance <- Wage$health_ins
Construct 90%, 95%, and 99% confidence intervals for the true (population) mean age.
CI_90 <- t.test(Age, conf.level = 0.90)
CI_90$conf.int
## [1] 42.06793 42.76140
## attr(,"conf.level")
## [1] 0.9
CI_95 <- t.test(Age)
CI_95$conf.int
## [1] 42.00147 42.82787
## attr(,"conf.level")
## [1] 0.95
CI_99 <- t.test(Age, conf.level = 0.99)
CI_99$conf.int
## [1] 41.87150 42.95783
## attr(,"conf.level")
## [1] 0.99
Construct 90%, 95%, and 99% confidence intervals for the true (population) proportion of those with no health insurance.
# Table
tab <- table(Health_Insurance)
# n = Number of trials
n <- sum(tab)
# x = Number of successes in n trials
x <- tab[2]
CI_90 <- prop.test(x, n, conf.level = 0.90)
CI_90$conf.int
## [1] 0.2918476 0.3198401
## attr(,"conf.level")
## [1] 0.9
CI_95 <- prop.test(x, n)
CI_95$conf.int
## [1] 0.2892747 0.3225607
## attr(,"conf.level")
## [1] 0.95
CI_99 <- prop.test(x, n, conf.level = 0.99)
CI_99$conf.int
## [1] 0.2842869 0.3279107
## attr(,"conf.level")
## [1] 0.99
Use the function t.test() to perform a hypothesis test
for a single population mean.
Assume the population mean age of male workers in the Mid-Atlantic region is 40 years. Based on the wage and other data for a group of 3000 male workers in the Mid-Atlantic region (sample data), can it be concluded that the population mean age has increased? Take \(\alpha = 0.05\).
## For a two-tailed test, set alternative = "two.sided"
## The two-tailed alternative is the default
## For right-tailed test, set alternative = "greater"
## For left-tailed test, set alternative = "less"
HT_0.05 <- t.test(Age, mu = 40, alternative = "greater")
HT_0.05
##
## One Sample t-test
##
## data: Age
## t = 11.458, df = 2999, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 40
## 95 percent confidence interval:
## 42.06793 Inf
## sample estimates:
## mean of x
## 42.41467
Step 1: State null and alternative hypothesis:
\[H_0: \mu = 40\] \[H_1: \mu > 40 \text{ (Claim)}\]
Step 2: Compute the test value (statistic):
Step 3: Find the \(p\)-value:
Step 4: Make a decision:
Step 5: Summarize the results:
# Changing from scientific notation to standard form
format(2.2e-16, scientific=FALSE)
## [1] "0.00000000000000022"
Use the function prop.test() to perform a hypothesis
test for a single population proportion.
Assume the population proportion of male workers in the Mid-Atlantic region with no health insurance is 0.48. Based on the wage and other data for a group of 3000 male workers in the Mid-Atlantic region (sample data), can it be concluded that the true population proportion has decreased? Take \(\alpha = 0.01\).
## For a two-tailed test, set alternative = "two.sided"
## The two-tailed alternative is the default
## For right-tailed test, set alternative = "greater"
## For left-tailed test, set alternative = "less"
## Refer to Example 2 for the values of x and n
HT_0.01 <- prop.test(x, n, p = 0.48, alternative = "less", correct = FALSE)
HT_0.01
##
## 1-sample proportions test without continuity correction
##
## data: x out of n, null probability 0.48
## X-squared = 365.29, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is less than 0.48
## 95 percent confidence interval:
## 0.0000000 0.3196715
## sample estimates:
## p
## 0.3056667
Step 1: State null and alternative hypothesis:
\[H_0: p = 48\] \[H_1: p<0.48 \text{ (Claim)}\]
Step 2: Compute the test value (statistic):
Step 3: Find the \(p\)-value:
Step 4: Make a decision:
Step 5: Summarize the results:
Southeast Missouri State University, ethompson@semo.edu↩︎