data(swiss)
str(swiss)
## 'data.frame': 47 obs. of 6 variables:
## $ Fertility : num 80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
## $ Agriculture : num 17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
## $ Examination : int 15 6 5 12 17 9 16 14 12 16 ...
## $ Education : int 12 9 5 7 15 7 7 8 7 13 ...
## $ Catholic : num 9.96 84.84 93.4 33.77 5.16 ...
## $ Infant.Mortality: num 22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 ...
head(swiss)
## Fertility Agriculture Examination Education Catholic
## Courtelary 80.2 17.0 15 12 9.96
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Neuveville 76.9 43.5 17 15 5.16
## Porrentruy 76.1 35.3 9 7 90.57
## Infant.Mortality
## Courtelary 22.2
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Neuveville 20.6
## Porrentruy 26.6
The unit of observation is the province level, and the sample size is 47 provinces.
#check for missing Variables
print(colSums(is.na(swiss)))
## Fertility Agriculture Examination Education
## 0 0 0 0
## Catholic Infant.Mortality
## 0 0
# Descriptive statistics
summary(swiss)
## Fertility Agriculture Examination Education
## Min. :35.00 Min. : 1.20 Min. : 3.00 Min. : 1.00
## 1st Qu.:64.70 1st Qu.:35.90 1st Qu.:12.00 1st Qu.: 6.00
## Median :70.40 Median :54.10 Median :16.00 Median : 8.00
## Mean :70.14 Mean :50.66 Mean :16.49 Mean :10.98
## 3rd Qu.:78.45 3rd Qu.:67.65 3rd Qu.:22.00 3rd Qu.:12.00
## Max. :92.50 Max. :89.70 Max. :37.00 Max. :53.00
## Catholic Infant.Mortality
## Min. : 2.150 Min. :10.80
## 1st Qu.: 5.195 1st Qu.:18.15
## Median : 15.140 Median :20.00
## Mean : 41.144 Mean :19.94
## 3rd Qu.: 93.125 3rd Qu.:21.70
## Max. :100.000 Max. :26.60
# Calculate additional statistics
library(psych)
describe(swiss)
## vars n mean sd median trimmed mad min max range
## Fertility 1 47 70.14 12.49 70.40 70.66 10.23 35.00 92.5 57.50
## Agriculture 2 47 50.66 22.71 54.10 51.16 23.87 1.20 89.7 88.50
## Examination 3 47 16.49 7.98 16.00 16.08 7.41 3.00 37.0 34.00
## Education 4 47 10.98 9.62 8.00 9.38 5.93 1.00 53.0 52.00
## Catholic 5 47 41.14 41.70 15.14 39.12 18.65 2.15 100.0 97.85
## Infant.Mortality 6 47 19.94 2.91 20.00 19.98 2.82 10.80 26.6 15.80
## skew kurtosis se
## Fertility -0.46 0.26 1.82
## Agriculture -0.32 -0.89 3.31
## Examination 0.45 -0.14 1.16
## Education 2.27 6.14 1.40
## Catholic 0.48 -1.67 6.08
## Infant.Mortality -0.33 0.78 0.42
# Visualization
par(mfrow=c(2,3), bg = "ivory1")
colour <- c("lightblue", "forestgreen", "yellow", "palevioletred", "plum1", "orange", "khaki")
for(i in 1:6) {
hist(swiss[,i],
main=names(swiss)[i],
xlab=names(swiss)[i],
col = colour)
}
No Missing Values for Any Variables.
Fertility:
Range: 35.00 to 92.50
Median: 70.40 - Mean: 70.14
Most values fall between 64.70 (1st quartile) and 78.45 (3rd quartile) (normally distributed)
Agriculture:
Range: 1.20% to 89.70%
Median: 54.10%
Mean: 50.66%
Most values fall between 35.90% and 67.65% (evenly distributed)
Examination:
Range: 3.00% to 37.00%
Median: 16.00%
Mean: 16.49%
Most values fall between 12.00% and 22.00%
Education:
Range: 1.00% to 53.00%
Median: 8.00%
Mean: 10.98%
Most values fall between 6.00% and 12.00%
Large maximum (53.00%) compared to the median (8.00%) suggests outliers
Catholic:
Range: 2.15% to 100.00%
Median: 15.14%
Mean: 41.14%
Infant.Mortality:
Range: 10.80 to 26.60
Median: 20.00
Mean: 19.94
Most values fall between 18.15 and 21.70
RQ: Is there a significant difference in fertility rates between provinces that are predominantly Catholic, versus those that are predominantly Protestant?
H0: There is no significant difference in fertility rates between provinces that are predominantly Catholic and those that are predominantly Protestant. µCatholic = μProtestant
HA: There is a significant difference in fertility rates between provinces that are predominantly Catholic and those that are predominantly Protestant. µCatholic ≠ μProtestant
# Create groups based on Catholic percentage
swiss$religious_group <- ifelse(swiss$Catholic > 50, "Predominantly Catholic", "Predominantly Protestant")
swiss$religious_group <- factor(swiss$religious_group)
# Explore the groups
table(swiss$religious_group)
##
## Predominantly Catholic Predominantly Protestant
## 18 29
by(swiss$Fertility, swiss$religious_group, summary)
## swiss$religious_group: Predominantly Catholic
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 42.80 71.75 79.35 76.46 83.62 92.50
## ------------------------------------------------------------
## swiss$religious_group: Predominantly Protestant
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 35.00 61.70 65.70 66.22 72.00 85.80
# Visualize
par(bg="ivory1")
boxplot(Fertility ~ religious_group, data = swiss,
main = "Fertility by Religious Majority",
ylab = "Fertility",
xlab = "Religious Group",
col = colour)
# Check assumptions for T-test
# 1. Normality within groups
shapiro.test(swiss$Fertility[swiss$religious_group == "Predominantly Catholic"])
##
## Shapiro-Wilk normality test
##
## data: swiss$Fertility[swiss$religious_group == "Predominantly Catholic"]
## W = 0.8576, p-value = 0.01118
shapiro.test(swiss$Fertility[swiss$religious_group == "Predominantly Protestant"])
##
## Shapiro-Wilk normality test
##
## data: swiss$Fertility[swiss$religious_group == "Predominantly Protestant"]
## W = 0.94021, p-value = 0.1015
# 2. Homogeneity of variances
var.test(Fertility ~ religious_group, data = swiss)
##
## F test to compare two variances
##
## data: Fertility by religious_group
## F = 2.18, num df = 17, denom df = 28, p-value = 0.06538
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.9509789 5.4908530
## sample estimates:
## ratio of variances
## 2.179993
# T-test
t_test_result <- t.test(Fertility ~ religious_group, data = swiss)
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: Fertility by religious_group
## t = 2.7004, df = 26.742, p-value = 0.01186
## alternative hypothesis: true difference in means between group Predominantly Catholic and group Predominantly Protestant is not equal to 0
## 95 percent confidence interval:
## 2.455904 18.024939
## sample estimates:
## mean in group Predominantly Catholic mean in group Predominantly Protestant
## 76.46111 66.22069
# Non-parametric test (Wilcoxon rank-sum test / Mann-Whitney U test)
wilcox_test_result <- wilcox.test(Fertility ~ religious_group, data = swiss)
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
print(wilcox_test_result)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Fertility by religious_group
## W = 409.5, p-value = 0.0012
## alternative hypothesis: true location shift is not equal to 0
Variance Homogeneity (F-test):
F = 2.18, p-value = 0.06538
The p-value is just above the conventional significance level of 0.05
The confidence interval (0.95-5.49) contains values substantially above 1, indicating potential heterogeneity
Normality:
Based on the boxplot, both groups have outliers, which could indicate non-normal distributions.
The predominantly Catholic group has one notable low outlier around 45
The predominantly Protestant group has one near 35.
Parametric Test Results
t = 2.7004, df = 26.742, p-value = 0.01186
The p-value is less than 0.05, indicating statistical significance
Mean fertility in predominantly Catholic provinces (76.46) is higher than in predominantly Protestant provinces (66.22)
Non-parametric Test (Wilcoxon rank sum test)
W = 409.5, p-value = 0.0012
The p-value is considerably smaller than the t-test and well below 0.05
This strongly supports the alternative hypothesis that there is a location shift between the two groups
Based on the statistical analysis, the research question: “Is there a significant difference in fertility rates between predominantly Catholic and predominantly Protestant provinces in Switzerland?” can be answered as:
Yes, there is a statistically significant difference in fertility rates between predominantly Catholic and predominantly Protestant provinces in Switzerland. Predominantly Catholic provinces show substantially higher fertility rates (approximately 10.24 units higher) compared to predominantly Protestant provinces.