## [1] 49.52896
## [1] 9.810307
Hypothesis:
\(H_0: \mu = \mu_0\)
\(H_1: \mu \neq \mu_0\)
# mu_0 = 55
SE <- sd(sample_data)/sqrt(length(sample_data))
t_cal <- (mean(sample_data) - 55)/SE
print(t_cal)## [1] -3.054553
## [1] -2.04523
## [1] TRUE
## [1] 0.0047971
Calculating confidence interval:
## [1] 45.86573 53.19219
Generate sample data:
Testing normality using Shapiro-Wilk test:
##
## Shapiro-Wilk normality test
##
## data: sample_data
## W = 0.97894, p-value = 0.7966
\(H_0:\) Data follows normal
distribution.
\(H_1:\) Data does not follow normal
distribution.
Since the p-value is greater than the level of significance (\(\alpha\) = 0.05), we do not have enough statistical evidence to reject the null hypothesis.
One-sample t-test:
Hypothesis:
\(H_0: \mu = 50\)
\(H_1: \mu \neq 50\)
Using function, perform the two tailed t-test:
##
## One Sample t-test
##
## data: sample_data
## t = -1.9379, df = 29, p-value = 0.06242
## alternative hypothesis: true mean is not equal to 53
## 95 percent confidence interval:
## 45.86573 53.19219
## sample estimates:
## mean of x
## 49.52896
If p-value < 0.05, then reject null. Decision is not rejected.
##
## One Sample t-test
##
## data: sample_data
## t = -1.9379, df = 29, p-value = 0.06242
## alternative hypothesis: true mean is not equal to 53
## 99 percent confidence interval:
## 44.59198 54.46595
## sample estimates:
## mean of x
## 49.52896
If p-value < 0.01, then reject null. Decision is not rejected.
##
## One Sample t-test
##
## data: sample_data
## t = -1.9379, df = 29, p-value = 0.06242
## alternative hypothesis: true mean is not equal to 53
## 90 percent confidence interval:
## 46.48564 52.57228
## sample estimates:
## mean of x
## 49.52896
If p-value < 0.10, then reject null. Decision is rejected.
Hypothesis:
\(H_0: \mu <= 40\)
\(H_1: \mu > 40\)
##
## One Sample t-test
##
## data: sample_data
## t = 5.3201, df = 29, p-value = 5.21e-06
## alternative hypothesis: true mean is greater than 40
## 95 percent confidence interval:
## 46.48564 Inf
## sample estimates:
## mean of x
## 49.52896
Hypothesis:
\(H_0: \mu >= 58\)
\(H_1: \mu < 58\)
##
## One Sample t-test
##
## data: sample_data
## t = -4.7295, df = 29, p-value = 2.689e-05
## alternative hypothesis: true mean is less than 58
## 95 percent confidence interval:
## -Inf 52.57228
## sample estimates:
## mean of x
## 49.52896
ggplot(data.frame(Value = sample_data), aes(x = Value)) +
geom_histogram(aes(y = after_stat(density)), bins = 15, fill = "blue", alpha = 0.5) +
geom_density(color = "red", linewidth = 1) +
labs(title = "Sample Data Distribution", x = "Value", y = "Density") +
theme_minimal()Use built-in dataset: mtcars (comparing mpg for automatic vs manual cars):
Split into two groups based on transmission type”:
auto_mpg <- mtcars$mpg[mtcars$am == 0] # Automatic
manual_mpg <- mtcars$mpg[mtcars$am == 1] # Manual\(H_0\): Automatic cars and manual
cars have equal average mpg.
\(H_1\): Automatic cars and manual cars
have unequal average mpg.
## [1] 17.14737
## [1] 24.39231
## [1] 14.6993
## [1] 38.02577
Normality test for both groups:
##
## Shapiro-Wilk normality test
##
## data: auto_mpg
## W = 0.97677, p-value = 0.8987
##
## Shapiro-Wilk normality test
##
## data: manual_mpg
## W = 0.9458, p-value = 0.5363
Check variance homogeneity (Levene’s test):
## Levene's Test for Homogeneity of Variance (center = "mean")
## Df F value Pr(>F)
## group 1 5.921 0.02113 *
## 30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Perform two-sample t-test:
##
## Welch Two Sample t-test
##
## data: auto_mpg and manual_mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
# Visualize the data
ggplot(mtcars, aes(x = factor(am), y = mpg, fill = factor(am))) +
geom_boxplot(alpha = 0.6) +
geom_jitter(width = 0.2, alpha = 0.7) +
labs(title = "MPG Comparison: Automatic vs Manual",
x = "Transmission (0 = Auto, 1 = Manual)",
y = "Miles Per Gallon") +
scale_fill_manual(values = c("blue", "red"),
labels = c("Automatic", "Manual")) +
theme_minimal()Generate 30 observations:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 280.0 293.2 299.5 299.6 305.0 318.0
# after a course
after <- before + round(rnorm(30, mean = 5, sd = 5), 0) # Simulating a increase
summary(after)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 283.0 298.2 305.0 305.5 312.2 325.0
## [1] 5.933333
\(H_0:\) Before and after the course
the true GRE average score of the students stays the same.
\(H_1:\) Before and after the course
the true GRE average score of the students does not remain the same.
Perform Paired t-test
##
## Paired t-test
##
## data: after and before
## t = 7.7811, df = 29, p-value = 1.398e-08
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 4.373779 7.492888
## sample estimates:
## mean difference
## 5.933333
# Visualize the differences
df <- data.frame(
ID = 1:30,
Before = before,
After = after
)
df_long <- melt(df, id.vars = "ID")
ggplot(df_long, aes(x = variable, y = value, group = ID)) +
geom_point(aes(color = variable), size = 3) +
geom_line() +
labs(title = "Paired Samples (Before vs. After)",
y = "Values", x = "Condition") +
theme_minimal()ggpaired(df_long,
x = "variable",
y = "value",
color = "variable",
line.color = "gray",
line.size = 0.4,
palette = "jco") +
stat_compare_means(paired = TRUE, method = "t.test")