This document demonstrates various parametric tests and confidence intervals using R for both one-sample and two-sample scenarios, including ANOVA.
The test statistic for the one-sample Z-test is calculated as:
\[ Z = \frac{{\bar{X} - \mu_0}}{{\sigma / \sqrt{n}}} \]
Where:
\(\bar{X}\) is the sample mean.
\(\mu_0\) is the hypothesized population mean.
\(\sigma\) is the population standard deviation.
\(n\) is the sample size.
The Z-test is used to test a population mean when the population standard deviation is known. The null and alternative hypotheses are:
\[ H_0: \mu = \mu_0 \quad \text{vs.} \quad H_1: \mu \neq \mu_0 \]
To perform a one-sample Z-test in R, you can use the following code:
# Load the library
library("BSDA")
# Sample data and population mean
data <- c(28.5, 29.8, 30.2, 31.0, 29.3)
mu_0 <- 30.0
# One-sample Z-test
z_test_result <- z.test(data, mu = mu_0, sigma.x = 1/sqrt(5))
z_test_result
##
## One-sample z-Test
##
## data: data
## z = -1.2, p-value = 0.2301
## alternative hypothesis: true mean is not equal to 30
## 95 percent confidence interval:
## 29.36801 30.15199
## sample estimates:
## mean of x
## 29.76
The confidence interval for the one-sample Z-test is calculated as:
\[ \left(\bar{X} - Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{X} + Z_{1-\alpha/2} \frac{\sigma}{\sqrt{n}}\right) \]
Where:
\(\bar{X}\) is the sample mean.
\(Z_{\alpha/2}\) is the critical value from the standard normal distribution corresponding to the desired confidence level \(\alpha\).
\(\sigma\) is the population standard deviation.
\(n\) is the sample size.
The confidence interval for the given dataset is:
# Confidence interval for one-sample Z-test
z_test_result$conf.int
## [1] 29.36801 30.15199
## attr(,"conf.level")
## [1] 0.95
To plot the critical region for the Z-test:
# Plot the critical region
x <- seq(26.5, 33.5, 0.1)
y <- dnorm(x, mean = mu_0, sd = sqrt(var(data)))
cr_left <- subset(data.frame(x, y), x < qnorm(0.025, mean = mu_0, sd = sd(data)))
cr_right <- subset(data.frame(x, y), x > qnorm(0.975, mean = mu_0, sd = sd(data)))
plot(x, y, type = "l", lwd = 2, col = "blue", xlab = "Sample Mean", ylab = "Density", main = "Critical Region for Z-Test")
abline(h=0)
polygon(c(cr_left$x[1], cr_left$x[1], cr_left$x, cr_left$x[17], cr_left$x[17], cr_left$x[1]), c(0, cr_left$y[1], cr_left$y, cr_left$y[17], 0, 0), col = "red")
polygon(c(cr_right$x[1], cr_right$x[1], cr_right$x, cr_right$x[17], cr_right$x[17], cr_right$x[1]), c(0, cr_right$y[1], cr_right$y, cr_right$y[17], 0, 0), col = "red")
The test statistic for the one-sample t-test is calculated as:
\[ t = \frac{{\bar{X} - \mu_0}}{{s / \sqrt{n}}} \]
Where:
\(\bar{X}\) is the sample mean.
\(\mu_0\) is the hypothesized population mean.
\(s\) is the sample standard deviation.
\(n\) is the sample size.
The t-test is used when the population standard deviation is unknown. The null and alternative hypotheses are the same as for the Z-test. To perform a one-sample t-test in R:
# Sample data
data <- c(28.5, 29.8, 30.2, 31.0, 29.3)
# One-sample t-test
t_test_result <- t.test(data, mu = mu_0)
t_test_result
##
## One Sample t-test
##
## data: data
## t = -0.5711, df = 4, p-value = 0.5985
## alternative hypothesis: true mean is not equal to 30
## 95 percent confidence interval:
## 28.59323 30.92677
## sample estimates:
## mean of x
## 29.76
The confidence interval for the one-sample t-test is calculated as:
\[ \left(\bar{X} - t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}, \bar{X} + t_{1-\alpha/2, n-1} \frac{s}{\sqrt{n}}\right) \]
Where:
\(\bar{X}\) is the sample mean.
\(t_{\alpha/2, n-1}\) is the critical value from the t-distribution with \(n-1\) degrees of freedom corresponding to the desired confidence level \(\alpha\).
\(s\) is the sample standard deviation.
\(n\) is the sample size.
The confidence interval for the given dataset is:
# Confidence interval for one-sample t-test
t_test_result$conf.int
## [1] 28.59323 30.92677
## attr(,"conf.level")
## [1] 0.95
To plot the critical region for the t-test:
# Plot the critical region
x <- seq(-6, 6, 0.1)
y <- dt(x, df = 4)
cr_left <- subset(data.frame(x, y), x < qt(0.025, df = 4))
cr_right <- subset(data.frame(x, y), x > qt(0.975, df = 4))
plot(x, y, type = "l", lwd = 2, col = "blue", xlab = "Sample Mean", ylab = "Density", main = "Critical Region for T-Test")
abline(h=0)
polygon(c(cr_left$x[1], cr_left$x[1], cr_left$x, cr_left$x[33], cr_left$x[33], -6), c(0, cr_left$y[1], cr_left$y, cr_left$y[33], 0, 0), col = "red")
polygon(c(cr_right$x[1], cr_right$x[1], cr_right$x, cr_right$x[33], cr_right$x[33], cr_right$x[1]), c(0, cr_right$y[1], cr_right$y, cr_right$y[33], 0, 0), col = "red")
The test statistic for the chi-square test is calculated as:
\[ \chi^2 = \sum \frac{{(O - E)^2}}{{E}} \]
Where:
\(O\) is the observed frequency.
\(E\) is the expected frequency.
The chi-square test is used to test the independence of two categorical variables. The null and alternative hypotheses are:
\[ H_0: \text{The variables are independent} \] \[ H_1: \text{The variables are not independent} \]
To perform a chi-square test in R:
# Contingency table
observed <- matrix(c(10, 20, 30, 40), nrow = 2)
# Chi-square test
chi_square_result <- chisq.test(observed)
chi_square_result
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: observed
## X-squared = 0.44643, df = 1, p-value = 0.504
To plot the critical region for the chi-square test:
# Define the chi-square critical value
alpha <- 0.05
cv <- qchisq(1 - alpha, df = 3)
# Plot the critical region
x <- seq(0, 12.5, 0.1)
y <- dchisq(x, df = 3)
cr <- subset(data.frame(x, y), x > cv)
plot(x, y, type = "l", lwd = 2, col = "blue", xlab = "Chi-Square Statistic", ylab = "Density", main = "Critical Region for Chi-Square Test")
abline(h=0)
polygon(c(cr$x[1], cr$x[1], cr$x, cr$x[47], cr$x[47], cr$x[1]), c(0, cr$y[1], cr$y, cr$y[47], 0, 0), col = "red")
The test statistic for the two-sample Z-test is calculated as:
\[ Z = \frac{{\bar{X}_1 - \bar{X}_2}}{{\sqrt{\frac{{\sigma_1^2}}{{n_1}} + \frac{{\sigma_2^2}}{{n_2}}}}} \]
Where:
\(\bar{X}_1\) and \(\bar{X}_2\) are the sample means of the two groups.
\(\sigma_1^2\) and \(\sigma_2^2\) are the population variances of the two groups.
\(n_1\) and \(n_2\) are the sample sizes of the two groups.
The two-sample Z-test is used to compare two population means when the population standard deviations are known. The null and alternative hypotheses are:
\[ H_0: \mu_1 = \mu_2 \quad \text{vs.} \quad H_1: \mu_1 \neq \mu_2 \]
To perform a two-sample Z-test in R:
# Sample data
group1 <- c(28.5, 29.8, 30.2, 31.0, 29.3)
group2 <- c(30.8, 29.9, 28.7, 31.5, 30.2)
# Two-sample Z-test
z_test_2sample_result <- z.test(group1, group2, sigma.x = 0.75, sigma.y = 0.75)
z_test_2sample_result
##
## Two-sample z-Test
##
## data: group1 and group2
## z = -0.96977, p-value = 0.3322
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.3896925 0.4696925
## sample estimates:
## mean of x mean of y
## 29.76 30.22
\[ \left(\bar{X}_1 - \bar{X}_2 - Z_{\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, \bar{X}_1 - \bar{X}_2 + Z_{1-\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\right) \]
Where:
\(\bar{X}_1\) and \(\bar{X}_2\) are the sample means of the two groups.
\(Z_{\alpha/2}\) is the critical value from the standard normal distribution corresponding to the desired confidence level \(\alpha\).
\(\sigma_1^2\) and \(\sigma_2^2\) are the population variances of the two groups.
\(n_1\) and \(n_2\) are the sample sizes of the two groups.
The confidence interval for the given dataset is:
# Confidence interval for two-sample Z-test
z_test_2sample_result$conf.int
## [1] -1.3896925 0.4696925
## attr(,"conf.level")
## [1] 0.95
To plot the critical region for the two-sample Z-test:
# Plot the critical region
x <- seq(26.5, 33.5, 0.1)
y1 <- dnorm(x, mean = mean(group1), sd = sd(group1))
y2 <- dnorm(x, mean = mean(group2), sd = sd(group2))
cr_left1 <- subset(data.frame(x, y1), x < qnorm(0.025, mean = mean(group1), sd = sd(group1)))
cr_right1 <- subset(data.frame(x, y1), x > qnorm(0.975, mean = mean(group1), sd = sd(group1)))
cr_left2 <- subset(data.frame(x, y2), x < qnorm(0.025, mean = mean(group2), sd = sd(group2)))
cr_right2 <- subset(data.frame(x, y2), x > qnorm(0.975, mean = mean(group2), sd = sd(group2)))
plot(x, y1, type = "l", lwd = 2, col = "steelblue", xlab = "Sample Mean", ylab = "Density", main = "Critical Region for Two-Sample Z-Test")
lines(x, y2, lwd = 2, col = "maroon")
abline(h=0)
polygon(c(cr_left1$x[1], cr_left1$x[1], cr_left1$x, cr_left1$x[15], cr_left1$x[15], cr_left1$x[1]), c(0, cr_left1$y[1], cr_left1$y, cr_left1$y[15], 0, 0), col = rgb(0, 0, 1, alpha = 0.4), border = "navy")
polygon(c(cr_right1$x[1], cr_right1$x[1], cr_right1$x, cr_right1$x[19], cr_right1$x[19], cr_right1$x[1]), c(0, cr_right1$y[1], cr_right1$y, cr_right1$y[19], 0, 0), col = rgb(0, 0, 1, alpha = 0.4), border = "navy")
polygon(c(cr_left2$x[1], cr_left2$x[1], cr_left2$x, cr_left2$x[17], cr_left2$x[17], cr_left2$x[1]), c(0, cr_left2$y[1], cr_left2$y, cr_left2$y[17], 0, 0), col = rgb(1, 0, 0, alpha = 0.4), border = "darkred")
polygon(c(cr_right2$x[1], cr_right2$x[1], cr_right2$x, cr_right2$x[13], cr_right2$x[13], cr_right2$x[1]), c(0, cr_right2$y[1], cr_right2$y, cr_right2$y[13], 0, 0), col = rgb(1, 0, 0, alpha = 0.4), border = "darkred")
legend("topright", legend = c("Group1", "Group2"), col = c("steelblue", "maroon"), lty = 1, lwd = 2)
The test statistic for the two-sample t-test is calculated as:
\[ t = \frac{{\bar{X}_1 - \bar{X}_2}}{{\sqrt{\frac{{s_1^2}}{{n_1}} + \frac{{s_2^2}}{{n_2}}}}} \]
Where:
\(\bar{X}_1\) and \(\bar{X}_2\) are the sample means of the two groups.
\(s_1\) and \(s_2\) are the sample standard deviations of the two groups.
\(n_1\) and \(n_2\) are the sample sizes of the two groups.
The two-sample t-test is used when the population standard deviations are unknown. The null and alternative hypotheses are the same as for the two-sample Z-test. To perform a two-sample t-test in R:
# Sample data
group1 <- c(28.5, 29.8, 30.2, 31.0, 29.3)
group2 <- c(30.8, 29.9, 28.7, 31.5, 30.2)
# Two-sample t-test
t_test_2sample_result <- t.test(group1, group2)
t_test_2sample_result
##
## Welch Two Sample t-test
##
## data: group1 and group2
## t = -0.73099, df = 7.9076, p-value = 0.4859
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.9140904 0.9940904
## sample estimates:
## mean of x mean of y
## 29.76 30.22
The confidence interval for the two-sample t-test is calculated as:
\[ \left(\bar{X}_1 - \bar{X}_2 - t_{\alpha/2, \text{df}} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}, \bar{X}_1 - \bar{X}_2 + t_{1-\alpha/2, \text{df}} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\right) \]
Where:
\(\bar{X}_1\) and \(\bar{X}_2\) are the sample means of the two groups.
\(t_{\alpha/2, \text{df}}\) is the critical value from the t-distribution with degrees of freedom (\(\text{df}\)) corresponding to the desired confidence level \(\alpha\).
\(s_1\) and \(s_2\) are the sample standard deviations of the two groups.
\(n_1\) and \(n_2\) are the sample sizes of the two groups.
The confidence interval for the given dataset is:
# Confidence interval for two-sample t-test
t_test_2sample_result$conf.int
## [1] -1.9140904 0.9940904
## attr(,"conf.level")
## [1] 0.95
To plot the critical region for the two-sample t-test:
# Plot the critical region
x <- seq(-6, 6, 0.1)
y1 <- dt(x, df = 4)
y2 <- dt(x, df = 4)
cr_left1 <- subset(data.frame(x, y1), x < qt(0.025, df = 4))
cr_right1 <- subset(data.frame(x, y1), x > qt(0.975, df = 4))
cr_left2 <- subset(data.frame(x, y2), x < qt(0.025, df = 4))
cr_right2 <- subset(data.frame(x, y2), x > qt(0.975, df = 4))
plot(x, y1, type = "l", lwd = 1.5, col = "steelblue", xlab = "Sample Mean", ylab = "Density", main = "Critical Region for Two-Sample T-Test")
lines(x, y2, lwd = 1.5, col = "maroon")
abline(h=0)
polygon(c(cr_left1$x[1], cr_left1$x[1], cr_left1$x, cr_left1$x[33], cr_left1$x[33], cr_left1$x[1]), c(0, cr_left1$y[1], cr_left1$y, cr_left1$y[33], 0, 0), col = rgb(0, 0, 1, alpha = 0.4), border = "navy")
polygon(c(cr_right1$x[1], cr_right1$x[1], cr_right1$x, cr_right1$x[33], cr_right1$x[33], cr_right1$x[1]), c(0, cr_right1$y[1], cr_right1$y, cr_right1$y[33], 0, 0), col = rgb(0, 0, 1, alpha = 0.4), border = "navy")
polygon(c(cr_left2$x[1], cr_left2$x[1], cr_left2$x, cr_left2$x[33], cr_left2$x[33], cr_left2$x[1]), c(0, cr_left2$y[1], cr_left2$y, cr_left2$y[33], 0, 0), col = rgb(1, 0, 0, alpha = 0.4), border = "darkred")
polygon(c(cr_right2$x[1], cr_right2$x[1], cr_right2$x, cr_right2$x[33], cr_right2$x[33], cr_right2$x[1]), c(0, cr_right2$y[1], cr_right2$y, cr_right2$y[33], 0, 0), col = rgb(1, 0, 0, alpha = 0.4), border = "darkred")
legend("topright", legend = c("Group1", "Group2"), col = c("steelblue", "maroon"), lty = 1, lwd = 2)
NOTE that the plot might apparently look like it’s plotted just for variable y2 but it also contains y1, the reason it’s not visible is because they both take identical values. Just to be sure that the code is working correctly, you can adjust parameter
lwd
to some different value.
The test statistic for the F-test comparing two variances is calculated as:
\[ F = \frac{{s_1^2}}{{s_2^2}} \]
Where:
\(s_1^2\) is the sample variance of the first group.
\(s_2^2\) is the sample variance of the second group.
The F-test is used to compare the variances of two populations. The null and alternative hypotheses are:
\[ H_0: \sigma_1^2 = \sigma_2^2 \quad \text{vs.} \quad H_1: \sigma_1^2 \neq \sigma_2^2 \]
To perform an F-test for two variances in R:
# Sample data
group1 <- c(28.5, 29.8, 30.2, 31.0, 29.3)
group2 <- c(30.8, 29.9, 28.7, 31.5, 30.2)
# F-test
f_test_result <- var.test(group1, group2)
f_test_result
##
## F test to compare two variances
##
## data: group1 and group2
## F = 0.80492, num df = 4, denom df = 4, p-value = 0.8385
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.08380655 7.73090236
## sample estimates:
## ratio of variances
## 0.8049225
To plot the critical region for the F-test:
# Define the F critical values
cv <- qf(0.95, df1 = 4, df2 = 4)
# Plot the critical region
x <- seq(0, 10, 0.05)
y <- df(x, df1 = 4, df2 = 4)
cr <- subset(data.frame(x, y), x > cv)
plot(x, y, type = "l", lwd = 2, col = "blue", xlab = "F Statistic", ylab = "Density", main = "Critical Region for F-Test (Two-Sample)")
abline(h=0)
polygon(c(cr$x[1], cr$x[1], cr$x, cr$x[73], cr$x[73], cr$x[1]), c(0, cr$y[1], cr$y, cr$y[73], 0, 0), col = "red")
abline(v = cv, lty = 2, col = "green")
legend("topright", legend = c("Critical Region", "F Critical Value"), col = c("red", "green"), lty = c(1, 2), lwd = c(2, 1))
Analysis of Variance (ANOVA) is used to test the equality of means of more than two groups. The null and alternative hypotheses are:
\[ H_0: \text{All group means are equal} \] \[ H_1: \text{At least one group mean is different} \]
To perform an ANOVA test in R:
# Sample data for ANOVA
g1 <- c(25, 30, 35, 40, 45)
g2 <- c(20, 22, 26, 28, 30)
g3 <- c(15, 18, 21, 24, 17)
# ANOVA test
anova_result <- aov(c(g1, g2, g3) ~ rep(c("g1", "g2", "g3"), each = 5))
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## rep(c("g1", "g2", "g3"), each = 5) 2 650.8 325.4 10.59 0.00224 **
## Residuals 12 368.8 30.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To plot the critical region for ANOVA:
# Define the F critical value
alpha <- 0.05
cv <- qf(1 - alpha, df1 = 2, df2 = 12)
# Plot the critical region
x <- seq(0, 5, length.out = 100)
y <- df(x, df1 = 2, df2 = 12)
cr<- subset(data.frame(x, y), x > cv)
plot(x, y, type = "l", lwd = 2, col = "blue", xlab = "F Statistic", ylab = "Density", main = "Critical Region for ANOVA")
abline(h=0)
polygon(c(cr$x[1], cr$x[1], cr$x, cr$x[23], cr$x[23], cr$x[1]), c(0, cr$y[1], cr$y, cr$y[23], 0, 0), col = "red")
abline(v = cv, lwd = 1, lty = 2, col = "green")
legend("topright", legend = c("Critical Region", "F-Critical Values"), col = c("red", "green"), lty = c(1, 2), lwd = c(2, 1))
This RMD file provides examples of hypothesis testing and confidence intervals for various parametric tests in R, including one-sample and two-sample scenarios, and ANOVA. It also includes plots of the critical regions for each test.