Assignment 05

Loading Libraries

library(readxl)
library(ggplot2)
library(rcompanion)

RQ: Do students prefer tea, coffee, soda, and water equally?

Import dataset

DatasetA2 <- read_excel("DatasetA2.xlsx")

Revewing the data and dataset structure

head(DatasetA2)

## # A tibble: 6 × 2
##   StudentID FavoriteDrink
##       <dbl> <chr>        
## 1         1 Soda         
## 2         2 Soda         
## 3         3 Soda         
## 4         4 Coffee       
## 5         5 Soda         
## 6         6 Coffee

str(DatasetA2)

## tibble [100 × 2] (S3: tbl_df/tbl/data.frame)
##  $ StudentID    : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
##  $ FavoriteDrink: chr [1:100] "Soda" "Soda" "Soda" "Coffee" ...

The dataset contains 100 observations and 2 variables:

StudentID: Numerical identifier for each student

FavoriteDrink: Categorical variable with four options (Coffee, Soda, Tea, Water)

Creating a frequency table

beverage_table <- table(DatasetA2$FavoriteDrink)
print("Frequency Table for Beverage Preferences : ")

## [1] "Frequency Table for Beverage Preferences : "

print(beverage_table)

## 
## Coffee   Soda    Tea  Water 
##     26     29     28     17

The frequency table shows the count of students who prefer each beverage:

Coffee: 26 students, Soda: 29 students, Tea: 28 students, Water: 17 students

Total: 100 students

Calculate percentages for better understanding

beverage_percentages <- prop.table(beverage_table) * 100
print("Percentage Distribution:")

## [1] "Percentage Distribution:"

print(round(beverage_percentages, 1))

## 
## Coffee   Soda    Tea  Water 
##     26     29     28     17

Percentage breakdown: This shows Soda is the most preferred (29%) and Water is the least preferred (17%)

Creating Bar Graph

ggplot(DatasetA2, aes(x = FavoriteDrink, fill = FavoriteDrink)) +
  geom_bar() +
  labs(
    x = "Beverage Type",
    y = "Number of Students",
    title = "Distribution of Beverage Preferences Among Students"
  ) +
  theme_minimal() +
  theme(
    text = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 14),
    plot.title = element_text(size = 14, face = "bold"),
    legend.position = "none"
  ) +
  geom_text(stat = 'count', aes(label = after_stat(count)))

Conduct Chi-Square Goodness of Fit Test

# Observed frequencies from our data
observed <- as.vector(beverage_table)

# Expected proportions (equal preference = 25% each)
expected_proportions <- c(0.25, 0.25, 0.25, 0.25)

# Run the chi-square test
chi_result_a2 <- chisq.test(x = observed, p = expected_proportions)
print("Chi-Square Test Results:")

## [1] "Chi-Square Test Results:"

print(chi_result_a2)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 3.6, df = 3, p-value = 0.308

X-squared = 3.6

df = 3

p-value = 0.308

Statistical Significance: p > .05 → The result is NOT statistically significant

This means we fail to reject the null hypothesis

Calculate expected counts for reference

total_n <- sum(observed)
expected_counts <- total_n * expected_proportions
print("Expected counts (if preferences were equal):")

## [1] "Expected counts (if preferences were equal):"

print(expected_counts)

## [1] 25 25 25 25

If preferences were perfectly equal, we would expect 25 students for each beverage

Expected counts: Coffee=25, Soda=25, Tea=25, Water=25

Our observed counts differ slightly from these expected values

Create comparison table

comparison <- data.frame(
  Beverage = names(beverage_table),
  Observed = observed,
  Expected = expected_counts,
  Difference = observed - expected_counts
)
print("Observed vs Expected Comparison:")

## [1] "Observed vs Expected Comparison:"

print(comparison)

##   Beverage Observed Expected Difference
## 1   Coffee       26       25          1
## 2     Soda       29       25          4
## 3      Tea       28       25          3
## 4    Water       17       25         -8

Water shows the largest deviation from what was expected

Since p > 0.05, we do NOT calculate effect size

Effect size only calculated when p < 0.05

With p = 0.308, the differences between observed and expected frequencies are not statistically significant

Final interpretation and report

print("FINDINGS FOR SCENARIO A2: A chi-square goodness-of-fit test was conducted to determine whether students preferred tea, coffee, soda, and water equally.")

## [1] "FINDINGS FOR SCENARIO A2: A chi-square goodness-of-fit test was conducted to determine whether students preferred tea, coffee, soda, and water equally."

if(chi_result_a2$p.value < 0.05) {
  cat("The results indicated that the observed frequencies were significantly different from the expected frequencies")
} else {
  cat("The results indicated that the observed frequencies were NOT significantly different from the expected frequencies")
}

## The results indicated that the observed frequencies were NOT significantly different from the expected frequencies

cat(", χ²(", chi_result_a2$parameter, ") = ", round(chi_result_a2$statistic, 2), 
    ", p = ", round(chi_result_a2$p.value, 3), ". ", sep="")

## , χ²(3) = 3.6, p = 0.308.

if(chi_result_a2$p.value < 0.05) {
  cat("This suggests that students do not prefer all beverages equally.")
} else {
  cat("This suggests that students prefer all beverages equally, with no single drink being favored over others.")
}

## This suggests that students prefer all beverages equally, with no single drink being favored over others.

cat("\n\nBased on the observed frequencies: Coffee (", observed[1], "), Soda (", observed[2], 
    "), Tea (", observed[3], "), and Water (", observed[4], "), the slight variations from the expected 25 per beverage are likely due to random chance rather than true preference differences.")

## 
## 
## Based on the observed frequencies: Coffee ( 26 ), Soda ( 29 ), Tea ( 28 ), and Water ( 17 ), the slight variations from the expected 25 per beverage are likely due to random chance rather than true preference differences.

Null Hypothesis: There is no difference in observed vs expected frequencies

Alternative Hypothesis: There is a difference in observed vs expected frequencies

Test Used: Chi-Square Goodness of Fit

Results: χ²(3) = 3.6, p = .308

Decision: Fail to reject the null hypothesis

Conclusion: Students prefer all beverages equally. The slight variations in preferences (Soda slightly higher, Water slightly lower) are not statistically significant and could be due to random sampling variation.

Assignment 05 - RQ 1

Mian Afzaal Zahoor

2026-02-16

Loading Libraries

RQ: Do students prefer tea, coffee, soda, and water equally?

Import dataset

Revewing the data and dataset structure

The dataset contains 100 observations and 2 variables:

StudentID: Numerical identifier for each student

FavoriteDrink: Categorical variable with four options (Coffee, Soda, Tea, Water)

Creating a frequency table

The frequency table shows the count of students who prefer each beverage:

Coffee: 26 students, Soda: 29 students, Tea: 28 students, Water: 17 students

Total: 100 students

Calculate percentages for better understanding

Percentage breakdown: This shows Soda is the most preferred (29%) and Water is the least preferred (17%)

Creating Bar Graph

Conduct Chi-Square Goodness of Fit Test

X-squared = 3.6

df = 3

p-value = 0.308

Statistical Significance: p > .05 → The result is NOT statistically significant

This means we fail to reject the null hypothesis

Calculate expected counts for reference

If preferences were perfectly equal, we would expect 25 students for each beverage

Expected counts: Coffee=25, Soda=25, Tea=25, Water=25

Our observed counts differ slightly from these expected values

Create comparison table

Water shows the largest deviation from what was expected

Since p > 0.05, we do NOT calculate effect size

Effect size only calculated when p < 0.05

With p = 0.308, the differences between observed and expected frequencies are not statistically significant

Final interpretation and report

Null Hypothesis: There is no difference in observed vs expected frequencies

Alternative Hypothesis: There is a difference in observed vs expected frequencies

Test Used: Chi-Square Goodness of Fit

Results: χ²(3) = 3.6, p = .308

Decision: Fail to reject the null hypothesis

Conclusion: Students prefer all beverages equally. The slight variations in preferences (Soda slightly higher, Water slightly lower) are not statistically significant and could be due to random sampling variation.