# Reading CSV files

nutrition_data <- read.csv("NutritionStudy.csv")

fish_data <- read.csv("FishGills3.csv")

str(nutrition_data)
## 'data.frame':    315 obs. of  17 variables:
##  $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age          : int  64 76 38 40 72 40 65 58 35 55 ...
##  $ Smoke        : chr  "No" "No" "No" "No" ...
##  $ Quetelet     : num  21.5 23.9 20 25.1 21 ...
##  $ Vitamin      : int  1 1 2 3 1 3 2 1 3 3 ...
##  $ Calories     : num  1299 1032 2372 2450 1952 ...
##  $ Fat          : num  57 50.1 83.6 97.5 82.6 56 52 63.4 57.8 39.6 ...
##  $ Fiber        : num  6.3 15.8 19.1 26.5 16.2 9.6 28.7 10.9 20.3 15.5 ...
##  $ Alcohol      : num  0 0 14.1 0.5 0 1.3 0 0 0.6 0 ...
##  $ Cholesterol  : num  170.3 75.8 257.9 332.6 170.8 ...
##  $ BetaDiet     : int  1945 2653 6321 1061 2863 1729 5371 823 2895 3307 ...
##  $ RetinolDiet  : int  890 451 660 864 1209 1439 802 2571 944 493 ...
##  $ BetaPlasma   : int  200 124 328 153 92 148 258 64 218 81 ...
##  $ RetinolPlasma: int  915 727 721 615 799 654 834 825 517 562 ...
##  $ Sex          : chr  "Female" "Female" "Female" "Female" ...
##  $ VitaminUse   : chr  "Regular" "Regular" "Occasional" "No" ...
##  $ PriorSmoke   : int  2 1 2 2 1 2 1 1 1 2 ...
head(nutrition_data)
##   ID Age Smoke Quetelet Vitamin Calories  Fat Fiber Alcohol Cholesterol
## 1  1  64    No  21.4838       1   1298.8 57.0   6.3     0.0       170.3
## 2  2  76    No  23.8763       1   1032.5 50.1  15.8     0.0        75.8
## 3  3  38    No  20.0108       2   2372.3 83.6  19.1    14.1       257.9
## 4  4  40    No  25.1406       3   2449.5 97.5  26.5     0.5       332.6
## 5  5  72    No  20.9850       1   1952.1 82.6  16.2     0.0       170.8
## 6  6  40    No  27.5214       3   1366.9 56.0   9.6     1.3       154.6
##   BetaDiet RetinolDiet BetaPlasma RetinolPlasma    Sex VitaminUse PriorSmoke
## 1     1945         890        200           915 Female    Regular          2
## 2     2653         451        124           727 Female    Regular          1
## 3     6321         660        328           721 Female Occasional          2
## 4     1061         864        153           615 Female         No          2
## 5     2863        1209         92           799 Female    Regular          1
## 6     1729        1439        148           654 Female         No          2
tail(nutrition_data)
##      ID Age Smoke Quetelet Vitamin Calories   Fat Fiber Alcohol Cholesterol
## 310 310  48    No  24.6147       2   2021.1  72.2  16.6     9.0       299.1
## 311 311  46    No  25.8967       3   2263.6  98.2  19.4     2.6       306.5
## 312 312  45    No  23.8270       1   1841.1  84.2  14.1     2.2       257.7
## 313 313  49    No  24.2613       1   1125.6  44.8  11.9     4.0       150.5
## 314 314  31    No  23.4525       1   2729.6 144.4  13.2     2.2       381.8
## 315 315  45    No  26.5081       1   1627.0  77.4   9.9     0.2       195.6
##     BetaDiet RetinolDiet BetaPlasma RetinolPlasma    Sex VitaminUse PriorSmoke
## 310     1392        1027        144           752 Female Occasional          2
## 311     2572        1261        164           216 Female         No          2
## 312     1665         465         80           328 Female    Regular          1
## 313     6943         520        300           502 Female    Regular          1
## 314      741         644        121           684 Female    Regular          2
## 315     1242         554        233           826 Female    Regular          1
str(fish_data)
## 'data.frame':    90 obs. of  2 variables:
##  $ Calcium : chr  "Low" "Low" "Low" "Low" ...
##  $ GillRate: int  55 63 78 85 65 98 68 84 44 87 ...
head(fish_data)
##   Calcium GillRate
## 1     Low       55
## 2     Low       63
## 3     Low       78
## 4     Low       85
## 5     Low       65
## 6     Low       98
tail(fish_data)
##    Calcium GillRate
## 85    High       52
## 86    High       37
## 87    High       57
## 88    High       62
## 89    High       40
## 90    High       42

Hypothesis for Problem 1:

\(H_0\): \(p_R = p_X\) (the two alleles are equally frequent in population)

\(H_a\): \(p_R \neq p_X\) (the two alleles are not equally frequent in population)

observed <- c(244, 192)

p_null <- c(0.5, 0.5)

expected_values <- p_null * sum(observed)
expected_values
## [1] 218 218
chisq.test(observed, p = p_null)
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276

Conclusion for Problem 1:

With a p-value of 0.01276 and a significance level of 5%, we would reject the null hypothesis; there is sufficient evidence to conclude that the observed allele counts are not an equal ratio.

Hypothesis for Problem 2:

\(H_0\): Vitamin use and gender are not significantly associated

\(H_a\): Vitamin use and gender are significantly associated

vitamin_gender_table <- table(nutrition_data$VitaminUse, nutrition_data$Sex)

vitamin_gender_table
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13
chisq.test(vitamin_gender_table)
## 
##  Pearson's Chi-squared test
## 
## data:  vitamin_gender_table
## X-squared = 11.071, df = 2, p-value = 0.003944

Conclusion for Problem 2:

With a p-value of 0.003944 and a significance level of 5%, we would reject the null hypothesis; there is sufficient evidence to conclude that there is a strong association between gender and vitamin usage habits.

Hypothesis for Problem 3:

\(H_0\): \(\mu_L\) = \(\mu_M\) = \(\mu_H\) (Gill rates do not differ significantly depending on the calcium level of the water)

\(H_a\): \(\mu_L \neq \mu_M \neq \mu_H\) (Gill rates do differ significantly depending on the calcium level of the water)

fish_data$Calcium <- as.factor(fish_data$Calcium)

anova_test <- aov(GillRate ~ Calcium, data = fish_data)

anova_test
## Call:
##    aov(formula = GillRate ~ Calcium, data = fish_data)
## 
## Terms:
##                   Calcium Residuals
## Sum of Squares   2037.222 19064.333
## Deg. of Freedom         2        87
## 
## Residual standard error: 14.80305
## Estimated effects may be unbalanced
summary(anova_test)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion for Problem 3:

With a p-value of 0.0121 and a significance level of 5%, we would reject the null hypothesis; there is sufficient evidence to conclude that there is a significant difference in gill beat rates among differing calcium levels.