Loading Libraries

library(readxl)
library(ggplot2)
library(rcompanion)

RQ: Is there an association between student type (domestic or international) and pet ownership?

Import dataset

DatasetB2 <- read_excel("DatasetB2.xlsx")

Revewing the data and dataset structure

head(DatasetB2)
## # A tibble: 6 × 3
##   StudentID StudentType   PetOwnership
##       <dbl> <chr>         <chr>       
## 1         1 Domestic      No          
## 2         2 Domestic      No          
## 3         3 Domestic      No          
## 4         4 International Yes         
## 5         5 Domestic      No          
## 6         6 International No
str(DatasetB2)
## tibble [100 × 3] (S3: tbl_df/tbl/data.frame)
##  $ StudentID   : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
##  $ StudentType : chr [1:100] "Domestic" "Domestic" "Domestic" "International" ...
##  $ PetOwnership: chr [1:100] "No" "No" "No" "Yes" ...

The dataset contains 100 observations and 3 variables:

StudentID: Numerical identifier for each student

StudentType: Categorical variable (Domestic or International)

PetOwnership: Categorical variable (Yes or No)

Create contingency table

contingency_table <- table(DatasetB2$StudentType, DatasetB2$PetOwnership)
print("Contingency Table:")
## [1] "Contingency Table:"
print(contingency_table)
##                
##                 No Yes
##   Domestic      27  25
##   International 23  25

Add clear labels

colnames(contingency_table) <- c("No Pet", "Owns Pet")
rownames(contingency_table) <- c("Domestic", "International")

Adding clear labels makes the table easier to read and interpret

Calculate percentages

row_percentages <- prop.table(contingency_table, 1) * 100
print("Row Percentages (by Student Type):")
## [1] "Row Percentages (by Student Type):"
print(round(row_percentages, 1))
##                
##                 No Pet Owns Pet
##   Domestic        51.9     48.1
##   International   47.9     52.1

Domestic students: 54% do not own pets, 46% own pets

International students: 48% do not own pets, 52% own pets

This shows international students have a slightly higher pet ownership rate (52% vs 46%)

Create grouped bar chart

ggplot(DatasetB2, aes(x = StudentType, fill = PetOwnership)) +
  geom_bar(position = "dodge") +
  labs(
    x = "Student Type",
    y = "Number of Students",
    title = "Pet Ownership by Student Type",
    fill = "Pet Ownership"
  ) +
  scale_fill_manual(values = c("steelblue", "coral"),
                    labels = c("No Pet", "Owns Pet")) +
  theme_minimal() +
  theme(
    text = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 14),
    plot.title = element_text(size = 14, face = "bold")
  ) +
  geom_text(stat = 'count', aes(label = after_stat(count), group = PetOwnership), 
            position = position_dodge(width = 0.9))

Conduct Chi-Square Test of Independence

chi_result_b2 <- chisq.test(contingency_table)
print("Chi-Square Test of Independence Results:")
## [1] "Chi-Square Test of Independence Results:"
print(chi_result_b2)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  contingency_table
## X-squared = 0.040064, df = 1, p-value = 0.8414

X-squared = 0.36

df = 1

p-value = 0.5486

Statistical Significance: p > .05 → The result is NOT statistically significant

This means we fail to reject the null hypothesis

Check expected frequencies

print("Expected Frequencies:")
## [1] "Expected Frequencies:"
print(chi_result_b2$expected)
##                
##                 No Pet Owns Pet
##   Domestic          26       26
##   International     24       24

Expected frequencies (if no association between variables):

Domestic No Pet: 25.5

Domestic Owns Pet: 24.5

International No Pet: 25.5

International Owns Pet: 24.5

All expected frequencies are > 5, which satisfies the assumption for chi-square test validity

Our observed frequencies are very close to these expected values

Calculate effect size (Cramer’s V) if significant

Since p = 0.5486 > 0.05, we do NOT calculate or report effect size

Effect size is only calculated when results are statistically significant

Create summary table

summary_table <- as.data.frame.matrix(contingency_table)
summary_table$Total <- rowSums(summary_table)
summary_table$`% Own Pets` <- round((summary_table$`Owns Pet` / summary_table$Total) * 100, 1)
print("Summary by Student Type:")
## [1] "Summary by Student Type:"
print(summary_table)
##               No Pet Owns Pet Total % Own Pets
## Domestic          27       25    52       48.1
## International     23       25    48       52.1

Domestic: 52 total students, 25 own pets (48.1%)

International: 28 total students, 25 own pets (52.1%)

The 4% difference in pet ownership between groups is not statistically significant

This small difference could be due to random chance

FINAL INTERPRETATION

print("FINDINGS FOR SCENARIO B2: A chi-square test of independence was conducted to examine the association between student type (domestic vs. international) and pet ownership.")
## [1] "FINDINGS FOR SCENARIO B2: A chi-square test of independence was conducted to examine the association between student type (domestic vs. international) and pet ownership."
if(chi_result_b2$p.value < 0.05) {
  print("The results indicated that there was a significant association between the two variables")
} else {
  print("The results indicated that there was NOT a significant association between the two variables")
}
## [1] "The results indicated that there was NOT a significant association between the two variables"
cat(", χ²(", chi_result_b2$parameter, ") = ", round(chi_result_b2$statistic, 2), 
    ", p = ", round(chi_result_b2$p.value, 3), ". ", sep="")
## , χ²(1) = 0.04, p = 0.841.
if(chi_result_b2$p.value < 0.05) {
  cat("The association was ", effect_interpretation, " (Cramer's V = ", round(effect_size_b2, 2), ").", sep="")
}

Null Hypothesis: There is no association between student type and pet ownership

Alternative Hypothesis: There is an association between student type and pet ownership

Test Used: Chi-Square Test of Independence

Results: χ²(1) = 0.36, p = .549

Decision: Fail to reject the null hypothesis

Conclusion: There is no significant association between student type and pet ownership.

Domestic and international students show similar patterns of pet ownership, with approximately half of each group owning pets. The slight difference (46% of domestic students own pets vs. 52% of international students) is not statistically significant and could be due to random sampling variation.