knitr::opts_chunk$set(echo = TRUE)

##Problem 1

\(H_0\): \(p_1 = p_2 = 0.5\)

\(H_a\): At least one \(p_i \neq 0.5\) Where,

\(p_1\)= proportion of people with the R allele

\(p_2\) = proportion of people with the X allele

# Observed counts
observed <- c(244, 192)

# Null values
theoretical_prop <- rep(0.5, 2)

expected_values <- theoretical_prop*sum(observed) 
expected_values
## [1] 218 218
chisq.test(observed, p= theoretical_prop)
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276

P-value = 0.01276

Conclusion:With a p-value of 0.01276, which is less than the typical significance level of 0.05, there is sufficient evidence to reject the null hypothesis.The allele proportions are not equally likely in the population.

Problem 2

\(H_0\): There is no association between VitaminUse and Gender.

\(H_a\): There is an association between VitaminUse and Gender.

df <- read.csv("C:/Users/dspd2/OneDrive/Data 101/week 8/NutritionStudy.csv")
summary(df)
##        ID             Age           Smoke              Quetelet    
##  Min.   :  1.0   Min.   :19.00   Length:315         Min.   :16.33  
##  1st Qu.: 79.5   1st Qu.:39.00   Class :character   1st Qu.:21.80  
##  Median :158.0   Median :48.00   Mode  :character   Median :24.74  
##  Mean   :158.0   Mean   :50.15                      Mean   :26.16  
##  3rd Qu.:236.5   3rd Qu.:62.50                      3rd Qu.:28.85  
##  Max.   :315.0   Max.   :83.00                      Max.   :50.40  
##     Vitamin         Calories           Fat             Fiber      
##  Min.   :1.000   Min.   : 445.2   Min.   : 14.40   Min.   : 3.10  
##  1st Qu.:1.000   1st Qu.:1338.0   1st Qu.: 53.95   1st Qu.: 9.15  
##  Median :2.000   Median :1666.8   Median : 72.90   Median :12.10  
##  Mean   :1.965   Mean   :1796.7   Mean   : 77.03   Mean   :12.79  
##  3rd Qu.:3.000   3rd Qu.:2100.4   3rd Qu.: 95.25   3rd Qu.:15.60  
##  Max.   :3.000   Max.   :6662.2   Max.   :235.90   Max.   :36.80  
##     Alcohol         Cholesterol       BetaDiet     RetinolDiet    
##  Min.   :  0.000   Min.   : 37.7   Min.   : 214   Min.   :  30.0  
##  1st Qu.:  0.000   1st Qu.:155.0   1st Qu.:1116   1st Qu.: 480.0  
##  Median :  0.300   Median :206.3   Median :1802   Median : 707.0  
##  Mean   :  3.279   Mean   :242.5   Mean   :2186   Mean   : 832.7  
##  3rd Qu.:  3.200   3rd Qu.:308.9   3rd Qu.:2836   3rd Qu.:1037.0  
##  Max.   :203.000   Max.   :900.7   Max.   :9642   Max.   :6901.0  
##    BetaPlasma     RetinolPlasma        Sex             VitaminUse       
##  Min.   :   0.0   Min.   : 179.0   Length:315         Length:315        
##  1st Qu.:  90.0   1st Qu.: 466.0   Class :character   Class :character  
##  Median : 140.0   Median : 566.0   Mode  :character   Mode  :character  
##  Mean   : 189.9   Mean   : 602.8                                        
##  3rd Qu.: 230.0   3rd Qu.: 716.0                                        
##  Max.   :1415.0   Max.   :1727.0                                        
##    PriorSmoke   
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :2.000  
##  Mean   :1.638  
##  3rd Qu.:2.000  
##  Max.   :3.000
observed_dataset<- table(df$VitaminUse, df$Sex)
observed_dataset
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13
chisq.test(observed_dataset)
## 
##  Pearson's Chi-squared test
## 
## data:  observed_dataset
## X-squared = 11.071, df = 2, p-value = 0.003944

Conclusion: With a p-value of 0.003944, which is less than the typical significance level of 0.05, there is sufficient evidence to reject the null hypothesis. Therefore, we conclude that there is a significant association between vitamin use and gender.

Problem 3

\(H_0\): \(\mu_A\) = \(\mu_B\) = \(\mu_C\)

\(H_a\): not all \(\mu_i\) are equal Where,

\(\mu_A\)= mean gill rate in low calcium

\(\mu_B\) = mean gill rate in medium calcium

\(\mu_C\) = mean gill rate in high calcium

Gr<-read.csv("C:/Users/dspd2/OneDrive/Data 101/week 8/FishGills3.csv")
summary(Gr)
##    Calcium             GillRate    
##  Length:90          Min.   :33.00  
##  Class :character   1st Qu.:48.00  
##  Mode  :character   Median :62.50  
##                     Mean   :61.78  
##                     3rd Qu.:72.00  
##                     Max.   :98.00
anova_result <- aov(GillRate ~ Calcium, data = Gr)

anova_result
## Call:
##    aov(formula = GillRate ~ Calcium, data = Gr)
## 
## Terms:
##                   Calcium Residuals
## Sum of Squares   2037.222 19064.333
## Deg. of Freedom         2        87
## 
## Residual standard error: 14.80305
## Estimated effects may be unbalanced
summary(anova_result)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: The p-value is very small (9.12e-07): indicating strong evidence against the null hypothesis. Overall, this test suggests that there are significant differences in the mean gill rate depending on the calcium level of the water.