Loading Libraries
library(readxl)
library(ggplot2)
library(rcompanion)
RQ: Is there an association between student type (domestic or
international) and pet ownership?
Import dataset
DatasetB2 <- read_excel("DatasetB2.xlsx")
Revewing the data and dataset structure
head(DatasetB2)
## # A tibble: 6 × 3
## StudentID StudentType PetOwnership
## <dbl> <chr> <chr>
## 1 1 Domestic No
## 2 2 Domestic No
## 3 3 Domestic No
## 4 4 International Yes
## 5 5 Domestic No
## 6 6 International No
str(DatasetB2)
## tibble [100 × 3] (S3: tbl_df/tbl/data.frame)
## $ StudentID : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
## $ StudentType : chr [1:100] "Domestic" "Domestic" "Domestic" "International" ...
## $ PetOwnership: chr [1:100] "No" "No" "No" "Yes" ...
The dataset contains 100 observations and 3 variables:
StudentID: Numerical identifier for each student
StudentType: Categorical variable (Domestic or International)
PetOwnership: Categorical variable (Yes or No)
Create contingency table
contingency_table <- table(DatasetB2$StudentType, DatasetB2$PetOwnership)
print("Contingency Table:")
## [1] "Contingency Table:"
print(contingency_table)
##
## No Yes
## Domestic 27 25
## International 23 25
Add clear labels
colnames(contingency_table) <- c("No Pet", "Owns Pet")
rownames(contingency_table) <- c("Domestic", "International")
Adding clear labels makes the table easier to read and
interpret
Calculate percentages
row_percentages <- prop.table(contingency_table, 1) * 100
print("Row Percentages (by Student Type):")
## [1] "Row Percentages (by Student Type):"
print(round(row_percentages, 1))
##
## No Pet Owns Pet
## Domestic 51.9 48.1
## International 47.9 52.1
Domestic students: 54% do not own pets, 46% own pets
International students: 48% do not own pets, 52% own pets
This shows international students have a slightly higher pet
ownership rate (52% vs 46%)
Create grouped bar chart
ggplot(DatasetB2, aes(x = StudentType, fill = PetOwnership)) +
geom_bar(position = "dodge") +
labs(
x = "Student Type",
y = "Number of Students",
title = "Pet Ownership by Student Type",
fill = "Pet Ownership"
) +
scale_fill_manual(values = c("steelblue", "coral"),
labels = c("No Pet", "Owns Pet")) +
theme_minimal() +
theme(
text = element_text(size = 14),
axis.title = element_text(size = 14),
axis.text = element_text(size = 14),
plot.title = element_text(size = 14, face = "bold")
) +
geom_text(stat = 'count', aes(label = after_stat(count), group = PetOwnership),
position = position_dodge(width = 0.9))

Conduct Chi-Square Test of Independence
chi_result_b2 <- chisq.test(contingency_table)
print("Chi-Square Test of Independence Results:")
## [1] "Chi-Square Test of Independence Results:"
print(chi_result_b2)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table
## X-squared = 0.040064, df = 1, p-value = 0.8414
X-squared = 0.36
df = 1
p-value = 0.5486
Statistical Significance: p > .05 → The result is NOT
statistically significant
This means we fail to reject the null hypothesis
Check expected frequencies
print("Expected Frequencies:")
## [1] "Expected Frequencies:"
print(chi_result_b2$expected)
##
## No Pet Owns Pet
## Domestic 26 26
## International 24 24
Expected frequencies (if no association between variables):
Domestic No Pet: 25.5
Domestic Owns Pet: 24.5
International No Pet: 25.5
International Owns Pet: 24.5
All expected frequencies are > 5, which satisfies the assumption
for chi-square test validity
Our observed frequencies are very close to these expected
values
Calculate effect size (Cramer’s V) if significant
Since p = 0.5486 > 0.05, we do NOT calculate or report effect
size
Effect size is only calculated when results are statistically
significant
Create summary table
summary_table <- as.data.frame.matrix(contingency_table)
summary_table$Total <- rowSums(summary_table)
summary_table$`% Own Pets` <- round((summary_table$`Owns Pet` / summary_table$Total) * 100, 1)
print("Summary by Student Type:")
## [1] "Summary by Student Type:"
print(summary_table)
## No Pet Owns Pet Total % Own Pets
## Domestic 27 25 52 48.1
## International 23 25 48 52.1
Domestic: 52 total students, 25 own pets (48.1%)
International: 28 total students, 25 own pets (52.1%)
The 4% difference in pet ownership between groups is not
statistically significant
This small difference could be due to random chance
FINAL INTERPRETATION
print("FINDINGS FOR SCENARIO B2: A chi-square test of independence was conducted to examine the association between student type (domestic vs. international) and pet ownership.")
## [1] "FINDINGS FOR SCENARIO B2: A chi-square test of independence was conducted to examine the association between student type (domestic vs. international) and pet ownership."
if(chi_result_b2$p.value < 0.05) {
print("The results indicated that there was a significant association between the two variables")
} else {
print("The results indicated that there was NOT a significant association between the two variables")
}
## [1] "The results indicated that there was NOT a significant association between the two variables"
cat(", χ²(", chi_result_b2$parameter, ") = ", round(chi_result_b2$statistic, 2),
", p = ", round(chi_result_b2$p.value, 3), ". ", sep="")
## , χ²(1) = 0.04, p = 0.841.
if(chi_result_b2$p.value < 0.05) {
cat("The association was ", effect_interpretation, " (Cramer's V = ", round(effect_size_b2, 2), ").", sep="")
}
Null Hypothesis: There is no association between student type and
pet ownership
Alternative Hypothesis: There is an association between student type
and pet ownership
Test Used: Chi-Square Test of Independence
Results: χ²(1) = 0.36, p = .549
Decision: Fail to reject the null hypothesis
Conclusion: There is no significant association between student type
and pet ownership.
Domestic and international students show similar patterns of pet
ownership, with approximately half of each group owning pets. The slight
difference (46% of domestic students own pets vs. 52% of international
students) is not statistically significant and could be due to random
sampling variation.