R Markdown

Research question: Is there a significant difference in average BMI between gender’s (male and females) and does the genotype providing more info to this relationship? This dataset is used from OpenIntro called “Famuss”, this dataset includes the study of the association of demographic, physiological and genetic characteristics with muscle strength. It includes many variables, quantative and categorical. For this project i will use two categorical(Gender, Genotype) and one quantatitve (BMI). The hypotheses tests will be on the difference between male and females BMI. Dataset from OpenIntro: https://www.openintro.org/data/index.php?data=famuss

In this data analysis I will start with cleaning the dataset then try to find the relationship between BMI and gender and genotype. I will also include the summary to describe the dataset. I will also try to filter and organize the dataset, then I will show a graph visually show the relationship between BMI, genders and the genotype.

Dataset

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/Documents/School/Data 101/Data project/Project 2")
famuss <- read.csv("famuss.csv")

Cleaning data and checking for missing values

##     ndrm.ch      drm.ch         sex         age        race      height 
##           0           0           0           0           0           0 
##      weight actn3.r577x         bmi 
##           0           0           0
##   ndrm_ch drm_ch    sex age      race height weight actn3_r577x    bmi
## 1      40     40 Female  27 Caucasian   65.0    199          CC 33.112
## 2      25      0   Male  36 Caucasian   71.7    189          CT 25.845
## 3      40      0 Female  24 Caucasian   65.0    134          CT 22.296
## 4     125      0 Female  40 Caucasian   68.0    171          CT 25.998
## 5      40     20 Female  32 Caucasian   61.0    118          CC 22.293
## 6      75      0 Female  24  Hispanic   62.2    120          CT 21.805
##     ndrm_ch           drm_ch           sex                 age      
##  Min.   :  0.00   Min.   :-33.30   Length:595         Min.   :17.0  
##  1st Qu.: 30.00   1st Qu.:  0.00   Class :character   1st Qu.:20.0  
##  Median : 45.50   Median :  8.30   Mode  :character   Median :22.0  
##  Mean   : 53.29   Mean   : 10.35                      Mean   :24.4  
##  3rd Qu.: 66.70   3rd Qu.: 20.00                      3rd Qu.:27.0  
##  Max.   :250.00   Max.   :100.00                      Max.   :40.0  
##      race               height          weight      actn3_r577x       
##  Length:595         Min.   :57.00   Min.   : 82.0   Length:595        
##  Class :character   1st Qu.:64.25   1st Qu.:132.0   Class :character  
##  Mode  :character   Median :67.00   Median :150.0   Mode  :character  
##                     Mean   :66.83   Mean   :155.6                     
##                     3rd Qu.:69.00   3rd Qu.:174.0                     
##                     Max.   :77.00   Max.   :317.0                     
##       bmi       
##  Min.   :15.50  
##  1st Qu.:21.30  
##  Median :23.35  
##  Mean   :24.40  
##  3rd Qu.:26.62  
##  Max.   :43.76

Summary of bmi

summary_stats <- famuss |>
  summarise(
    mean_bmi = mean(bmi, na.rm=TRUE),
    median_bmi = median(bmi, na.rm = TRUE),
    sd_bmi = sd(bmi, na.rm = TRUE),
    min_bmi = min(bmi, na.rm = TRUE),
    max_bmi = max(age, na.rm = TRUE))
summary_stats
##   mean_bmi median_bmi  sd_bmi min_bmi max_bmi
## 1 24.40108      23.35 4.57662  15.504      40

The mean, median, standard deviation, min, and max for bmi.

Histrogram of dataset and boxplot

library(ggplot2)

ggplot(famuss, aes(x = bmi)) +
  geom_histogram(binwidth = 5, fill = "#FF7256", color = "yellow") +
  labs(title = "Historgram of BMI", x = "BMI", y = "Amount") +
  theme_minimal()

ggplot(famuss, aes(x = sex, y = bmi)) +
         geom_boxplot(aes(fill = sex)) +  
         scale_fill_manual(values = c("#FFAEB9", "#9BCD9B")) +
         labs(title = "Boxplot of BMI by Gender", x = "Gender", y = "BMI") +
  theme_minimal()

The histogram shows where males and females bmi is and if the overall data looks normal. The boxplot shows you the difference in bmi between male and female.

m = males, f = females Hypotheses \(H_0\): \(\mu_m\) = \(\mu_f\) \(H_a\): \(\mu_m\) \(\neq\) \(\mu_f\)

male (m) represents the mean of male and female (f) represent the mean of females to test the degree of freedom, p-value, confidence interval for the mean difference, estimated mean(s), or mean difference.

T-test

t.test(bmi ~ sex, data = famuss)
## 
##  Welch Two Sample t-test
## 
## data:  bmi by sex
## t = -2.7518, df = 495.32, p-value = 0.006144
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -1.8134086 -0.3025932
## sample estimates:
## mean in group Female   mean in group Male 
##             23.97077             25.02877

after boxplot and histogram

Paragraph for Statistical Analysis

From the results, it shows the p-value is 0.006144. Meaning that the p-value is significant, displaying that there is a difference in bmi between males (25.02877) and females (23.97077).

Conclusion and Future Directions:

Overall the results show the significant difference in the median of bmi between males and females, the p-value was 0.006144, we would reject the hypotheses and confirm that males have a higher bmi in average compared to females. This implies that gender is related to differences in bmi. For future analysis, they could better explore how other contributing variables/traits like height could impact the bmi.

Refrences:

Cleaning dataset and missing values -> Past assignments, “Loading datasets.Rmd” and “Dataset_Csoto.Rmd”.

Summary’s of bmi -> Past assignment “Descriptive Statistics.Rmd”

Histogram’s -> Past assignment “Descriptive Statstics.Rmd”

Hypotheses & T.test format -> past assignment “HT and CI in R-2.Rmd” and youtube “Doing a t-test using R programming (in 4 minutes)” https://youtu.be/x1RFWHV2VUU?si=HapDhTrKWnH7FTRR at 1:35 (Line 14) and more info from the “Hypothsis Testing and CI.pptx” from week 7.

Paragraph’s for dataset -> helped gather more information from the datasets website https://www.openintro.org/data/index.php?data=famuss from description.