In your markdown answer the following problems. Include the following:
# Observed data
observed <- c(244, 192)
# Null values
theoritical_prop <- rep(1/2, 2)
Hypothesis:
\(H_0\): \(p_1\) = \(p_2\) = \(1/2\)
\(H_a\): at least on \(pi\) \(\neq\) \(1/2\)
#Check the expected values to make sure that you can preform the Chi-square test
#Expected values
expected_values <- theoritical_prop*sum(observed)
expected_values
## [1] 218 218
P-value:
chisq.test(observed)
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 6.2018, df = 1, p-value = 0.01276
p-value = 0.01276
Conclusion:
With a p-value of 0.01276, we can reject the null, indicating that the observed R and X are Not equally likely, further explaining that the R and X alleles could have different probabilities of occurring to different athletes.
library(tidyverse)
setwd("C:/Users/Joanne G/OneDrive/Data101(Fall 2025)/Datasets")
nutrition_df <- read.csv("NutritionStudy.csv")
summary(nutrition_df)
## ID Age Smoke Quetelet
## Min. : 1.0 Min. :19.00 Length:315 Min. :16.33
## 1st Qu.: 79.5 1st Qu.:39.00 Class :character 1st Qu.:21.80
## Median :158.0 Median :48.00 Mode :character Median :24.74
## Mean :158.0 Mean :50.15 Mean :26.16
## 3rd Qu.:236.5 3rd Qu.:62.50 3rd Qu.:28.85
## Max. :315.0 Max. :83.00 Max. :50.40
## Vitamin Calories Fat Fiber
## Min. :1.000 Min. : 445.2 Min. : 14.40 Min. : 3.10
## 1st Qu.:1.000 1st Qu.:1338.0 1st Qu.: 53.95 1st Qu.: 9.15
## Median :2.000 Median :1666.8 Median : 72.90 Median :12.10
## Mean :1.965 Mean :1796.7 Mean : 77.03 Mean :12.79
## 3rd Qu.:3.000 3rd Qu.:2100.4 3rd Qu.: 95.25 3rd Qu.:15.60
## Max. :3.000 Max. :6662.2 Max. :235.90 Max. :36.80
## Alcohol Cholesterol BetaDiet RetinolDiet
## Min. : 0.000 Min. : 37.7 Min. : 214 Min. : 30.0
## 1st Qu.: 0.000 1st Qu.:155.0 1st Qu.:1116 1st Qu.: 480.0
## Median : 0.300 Median :206.3 Median :1802 Median : 707.0
## Mean : 3.279 Mean :242.5 Mean :2186 Mean : 832.7
## 3rd Qu.: 3.200 3rd Qu.:308.9 3rd Qu.:2836 3rd Qu.:1037.0
## Max. :203.000 Max. :900.7 Max. :9642 Max. :6901.0
## BetaPlasma RetinolPlasma Sex VitaminUse
## Min. : 0.0 Min. : 179.0 Length:315 Length:315
## 1st Qu.: 90.0 1st Qu.: 466.0 Class :character Class :character
## Median : 140.0 Median : 566.0 Mode :character Mode :character
## Mean : 189.9 Mean : 602.8
## 3rd Qu.: 230.0 3rd Qu.: 716.0
## Max. :1415.0 Max. :1727.0
## PriorSmoke
## Min. :1.000
## 1st Qu.:1.000
## Median :2.000
## Mean :1.638
## 3rd Qu.:2.000
## Max. :3.000
# data-set used to make a df table using variables "VitaminUse" and "Gender"
chi_sqr_nutrition_study <- table(nutrition_df$VitaminUse, nutrition_df$Sex)
chi_sqr_nutrition_study
##
## Female Male
## No 87 24
## Occasional 77 5
## Regular 109 13
Hypothesis:
\(H_0\): There is NO significant association between VitaminUse and Gender
\(H_a\):There is significant association between VitaminUse and Gender
P-value:
chisq.test(chi_sqr_nutrition_study)
##
## Pearson's Chi-squared test
##
## data: chi_sqr_nutrition_study
## X-squared = 11.071, df = 2, p-value = 0.003944
P-value = 0.003944
Conclusion:
Rounding to 0.0039 since the p-value is lower than the significance level(assuming it is 0.05), there is significant data to prove that there is association between VitaminUse and Gender… We can also see based on the table more females than males use vitamins more overall.
library(tidyverse)
setwd("C:/Users/Joanne G/OneDrive/Data101(Fall 2025)/Datasets")
fish_gill_df <- read.csv("FishGills3.csv")
summary(fish_gill_df)
## Calcium GillRate
## Length:90 Min. :33.00
## Class :character 1st Qu.:48.00
## Mode :character Median :62.50
## Mean :61.78
## 3rd Qu.:72.00
## Max. :98.00
Hypothesis:
\(H_0\):\(\mu_L\) = \(\mu_C\) = \(\mu_M\)
\(H_a\):not all \(\mu_i\) are equal
fish_gill_anova <- aov(GillRate ~ Calcium, data = fish_gill_df)
fish_gill_anova
## Call:
## aov(formula = GillRate ~ Calcium, data = fish_gill_df)
##
## Terms:
## Calcium Residuals
## Sum of Squares 2037.222 19064.333
## Deg. of Freedom 2 87
##
## Residual standard error: 14.80305
## Estimated effects may be unbalanced
P-value:
summary(fish_gill_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## Calcium 2 2037 1018.6 4.648 0.0121 *
## Residuals 87 19064 219.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P-value = 0.0121
TukeyHSD(fish_gill_anova)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = GillRate ~ Calcium, data = fish_gill_df)
##
## $Calcium
## diff lwr upr p adj
## Low-High 10.333333 1.219540 19.4471264 0.0222533
## Medium-High 0.500000 -8.613793 9.6137931 0.9906108
## Medium-Low -9.833333 -18.947126 -0.7195402 0.0313247
Conclusion:
Since we found that the p-value = 0.0121, we can conclude that there is enough evidence to reject the null hypothesis. This means that the mean of the gill rate does depend on the calcium levels of the water.