Research question: Is there a significant difference in average BMI between gender’s (male and females) and does the genotype providing more info to this relationship? This dataset is used from OpenIntro called “Famuss”, this dataset includes the study of the association of demographic, physiological and genetic characteristics with muscle strength. It includes many variables, quantative and categorical. For this project i will use two categorical(Gender, Genotype) and one quantatitve (BMI). The hypotheses tests will be on the difference between male and females BMI. Dataset from OpenIntro: https://www.openintro.org/data/index.php?data=famuss
In this data analysis I will start with cleaning the dataset then try to find the relationship between BMI and gender and genotype. I will also include the summary to describe the dataset. I will also try to filter and organize the dataset, then I will show a graph visually show the relationship between BMI, genders and the genotype.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/Documents/School/Data 101/Data project/Project 2")
famuss <- read.csv("famuss.csv")
## ndrm.ch drm.ch sex age race height
## 0 0 0 0 0 0
## weight actn3.r577x bmi
## 0 0 0
## ndrm_ch drm_ch sex age race height weight actn3_r577x bmi
## 1 40 40 Female 27 Caucasian 65.0 199 CC 33.112
## 2 25 0 Male 36 Caucasian 71.7 189 CT 25.845
## 3 40 0 Female 24 Caucasian 65.0 134 CT 22.296
## 4 125 0 Female 40 Caucasian 68.0 171 CT 25.998
## 5 40 20 Female 32 Caucasian 61.0 118 CC 22.293
## 6 75 0 Female 24 Hispanic 62.2 120 CT 21.805
## ndrm_ch drm_ch sex age
## Min. : 0.00 Min. :-33.30 Length:595 Min. :17.0
## 1st Qu.: 30.00 1st Qu.: 0.00 Class :character 1st Qu.:20.0
## Median : 45.50 Median : 8.30 Mode :character Median :22.0
## Mean : 53.29 Mean : 10.35 Mean :24.4
## 3rd Qu.: 66.70 3rd Qu.: 20.00 3rd Qu.:27.0
## Max. :250.00 Max. :100.00 Max. :40.0
## race height weight actn3_r577x
## Length:595 Min. :57.00 Min. : 82.0 Length:595
## Class :character 1st Qu.:64.25 1st Qu.:132.0 Class :character
## Mode :character Median :67.00 Median :150.0 Mode :character
## Mean :66.83 Mean :155.6
## 3rd Qu.:69.00 3rd Qu.:174.0
## Max. :77.00 Max. :317.0
## bmi
## Min. :15.50
## 1st Qu.:21.30
## Median :23.35
## Mean :24.40
## 3rd Qu.:26.62
## Max. :43.76
summary_stats <- famuss |>
summarise(
mean_bmi = mean(bmi, na.rm=TRUE),
median_bmi = median(bmi, na.rm = TRUE),
sd_bmi = sd(bmi, na.rm = TRUE),
min_bmi = min(bmi, na.rm = TRUE),
max_bmi = max(age, na.rm = TRUE))
summary_stats
## mean_bmi median_bmi sd_bmi min_bmi max_bmi
## 1 24.40108 23.35 4.57662 15.504 40
The mean, median, standard deviation, min, and max for bmi.
library(ggplot2)
ggplot(famuss, aes(x = bmi)) +
geom_histogram(binwidth = 5, fill = "#FF7256", color = "yellow") +
labs(title = "Historgram of BMI", x = "BMI", y = "Amount") +
theme_minimal()
ggplot(famuss, aes(x = sex, y = bmi)) +
geom_boxplot(aes(fill = sex)) +
scale_fill_manual(values = c("#FFAEB9", "#9BCD9B")) +
labs(title = "Boxplot of BMI by Gender", x = "Gender", y = "BMI") +
theme_minimal()
The histogram shows where males and females bmi is and if the overall data looks normal. The boxplot shows you the difference in bmi between male and female.
m = males, f = females Hypotheses \(H_0\): \(\mu_m\) = \(\mu_f\) \(H_a\): \(\mu_m\) \(\neq\) \(\mu_f\)
male (m) represents the mean of male and female (f) represent the mean of females to test the degree of freedom, p-value, confidence interval for the mean difference, estimated mean(s), or mean difference.
t.test(bmi ~ sex, data = famuss)
##
## Welch Two Sample t-test
##
## data: bmi by sex
## t = -2.7518, df = 495.32, p-value = 0.006144
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
## -1.8134086 -0.3025932
## sample estimates:
## mean in group Female mean in group Male
## 23.97077 25.02877
after boxplot and histogram
From the results, it shows the p-value is 0.006144. Meaning that the p-value is significant, displaying that there is a difference in bmi between males (25.02877) and females (23.97077).
Overall the results show the significant difference in the median of bmi between males and females, the p-value was 0.006144, we would reject the hypotheses and confirm that males have a higher bmi in average compared to females. This implies that gender is related to differences in bmi. For future analysis, they could better explore how other contributing variables/traits like height could impact the bmi.
Cleaning dataset and missing values -> Past assignments, “Loading datasets.Rmd” and “Dataset_Csoto.Rmd”.
Summary’s of bmi -> Past assignment “Descriptive Statistics.Rmd”
Histogram’s -> Past assignment “Descriptive Statstics.Rmd”
Hypotheses & T.test format -> past assignment “HT and CI in R-2.Rmd” and youtube “Doing a t-test using R programming (in 4 minutes)” https://youtu.be/x1RFWHV2VUU?si=HapDhTrKWnH7FTRR at 1:35 (Line 14) and more info from the “Hypothsis Testing and CI.pptx” from week 7.
Paragraph’s for dataset -> helped gather more information from the datasets website https://www.openintro.org/data/index.php?data=famuss from description.