This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: ## Load Dataset
hbody <- read.csv("~/Desktop/Desktop - Morgan’s MacBook Pro (2)/R data sets/hbody.csv")
colnames(hbody)[2] <- 'GENDER'
hbody$GENDER <- factor(hbody$GENDER, levels = c(1, 0), labels = c('M', 'F'))
levels(hbody$GENDER)
## [1] "M" "F"
BMI_CAT <- cut(hbody$BMI, breaks = c(15, 30, 45, 60), labels = c('low', 'med', 'high'))
str(hbody)
## 'data.frame': 300 obs. of 15 variables:
## $ AGE : int 43 57 38 80 34 77 29 69 44 35 ...
## $ GENDER : Factor w/ 2 levels "M","F": 2 1 2 1 1 1 1 2 2 1 ...
## $ PULSE : int 80 84 94 74 50 60 52 58 66 62 ...
## $ SYSTOLIC : int 100 112 134 126 114 134 118 138 114 124 ...
## $ DIASTOLIC: int 70 70 94 64 68 60 64 80 66 70 ...
## $ HDL : int 73 35 36 37 50 55 53 40 45 62 ...
## $ LDL : int 68 116 223 83 104 75 128 140 136 110 ...
## $ WHITE : num 8.7 4.9 6.9 7.5 6.1 5.7 4.1 8.1 8 5.6 ...
## $ RED : num 4.8 4.73 4.47 4.32 4.95 3.95 4.68 4.6 4.09 5.47 ...
## $ PLATE : int 319 187 297 170 140 192 191 286 263 193 ...
## $ WEIGHT : num 98.6 96.9 108.2 73.1 83.1 ...
## $ HEIGHT : num 172 186 154 160 179 ...
## $ WAIST : num 120.4 107.8 120.3 97.2 95.1 ...
## $ ARM.CIRC : num 40.7 37 44.3 30.3 34 31.4 27.4 34.2 32.5 40 ...
## $ BMI : num 33.3 28 45.4 28.4 25.9 31.1 20.1 32.7 25.8 36.5 ...
summary(hbody)
## AGE GENDER PULSE SYSTOLIC DIASTOLIC
## Min. :18.00 M:153 Min. : 36.00 Min. : 88 Min. : 40.00
## 1st Qu.:31.00 F:147 1st Qu.: 64.00 1st Qu.:112 1st Qu.: 64.00
## Median :46.00 Median : 72.00 Median :121 Median : 70.00
## Mean :47.04 Mean : 71.77 Mean :123 Mean : 70.75
## 3rd Qu.:62.00 3rd Qu.: 80.00 3rd Qu.:132 3rd Qu.: 78.00
## Max. :80.00 Max. :104.00 Max. :186 Max. :102.00
## HDL LDL WHITE RED
## Min. : 26.00 Min. : 39.0 Min. : 2.700 Min. :3.390
## 1st Qu.: 43.00 1st Qu.: 85.0 1st Qu.: 5.200 1st Qu.:4.197
## Median : 52.00 Median :113.0 Median : 6.200 Median :4.490
## Mean : 53.66 Mean :113.7 Mean : 6.542 Mean :4.538
## 3rd Qu.: 62.00 3rd Qu.:137.2 3rd Qu.: 7.825 3rd Qu.:4.883
## Max. :138.00 Max. :251.0 Max. :14.300 Max. :6.340
## PLATE WEIGHT HEIGHT WAIST
## Min. : 75.0 Min. : 39.00 Min. :134.5 Min. : 64.40
## 1st Qu.:198.0 1st Qu.: 67.08 1st Qu.:161.6 1st Qu.: 87.88
## Median :232.0 Median : 80.50 Median :168.3 Median : 96.95
## Mean :239.4 Mean : 81.66 Mean :168.0 Mean : 99.18
## 3rd Qu.:263.5 3rd Qu.: 92.80 3rd Qu.:174.6 3rd Qu.:109.10
## Max. :646.0 Max. :150.40 Max. :193.3 Max. :170.50
## ARM.CIRC BMI
## Min. :20.50 Min. :15.90
## 1st Qu.:29.48 1st Qu.:24.50
## Median :33.05 Median :28.00
## Mean :33.08 Mean :28.91
## 3rd Qu.:36.33 3rd Qu.:31.98
## Max. :46.60 Max. :59.00
gf_histogram(~ PLATE, data = hbody, bins = 30, color = "black", fill = "lightsteelblue3", alpha = 0.7) %>%
gf_density(~PLATE, data = hbody, color = "red", size = 1) %>%
gf_labs(
title = "Distribution of Platelet Counts",
x = "Platelet Counts",
y = "Density"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplot(hbody, aes(x = PLATE)) +
geom_histogram(aes(y = ..density..), bins = 30, color = "black", fill = "lightsteelblue3") +
geom_density(color = "red", size = 1) +
ggtitle("Distribution of Platelet Counts") +
theme_minimal()
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
gf_qq(~hbody$PLATE, title = "Human Body Platelet Data", color = "black", size = 2) %>%
gf_qqline(~hbody$PLATE, color = "red", size = 1, linetype = "solid") %>%
gf_labs(
title = "Human Body Platelet Data",
x = "Theoretical Quantiles",
y = "Sample Quantiles of Platelets"
)
shapiro_test <- shapiro.test(hbody$PLATE)
shapiro_test
##
## Shapiro-Wilk normality test
##
## data: hbody$PLATE
## W = 0.89367, p-value = 1.2e-13
platelet_summary <- hbody %>%
group_by(GENDER) %>%
summarise(mean_platelets = mean(PLATE, na.rm = TRUE),
sd_platelets = sd(PLATE, na.rm = TRUE))
platelet_summary
## # A tibble: 2 × 3
## GENDER mean_platelets sd_platelets
## <fct> <dbl> <dbl>
## 1 M 224. 59.5
## 2 F 255. 65.4
ggplot(hbody, aes(x = GENDER, y = PLATE, fill = GENDER)) +
geom_boxplot() +
labs(title = "Boxplot of Platelet Counts by Gender",
x = "Gender",
y = "Platelet Count") +
theme_minimal() +
scale_fill_manual(values = c("lightskyblue", "hotpink"))
# Separate male and female platelets
male_platelets <- hbody %>% filter(GENDER == "M") %>% pull(PLATE)
female_platelets <- hbody %>% filter(GENDER == "F") %>% pull(PLATE)
# Conduct one-tailed t-test
t_test_result <- t.test(male_platelets, female_platelets, alternative = "less", var.equal = TRUE)
t_test_result
##
## Two Sample t-test
##
## data: male_platelets and female_platelets
## t = -4.2722, df = 298, p-value = 1.303e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -18.91313
## sample estimates:
## mean of x mean of y
## 224.2745 255.0884
# Calculate Cohen's d
cohens_d_result <- cohens_d(male_platelets, female_platelets)
cohens_d_result
## Cohen's d | 95% CI
## --------------------------
## -0.49 | [-0.72, -0.26]
##
## - Estimated using pooled SD.
ggplot(hbody, aes(x = RED, y = PLATE, color = BMI_CAT)) +
geom_point(alpha = 0.7, size = 3) +
scale_color_manual(values = c("grey60", "firebrick1", "dodgerblue2")) +
labs(title = "Scatter Plot of Platelets vs. Red Blood Cells by BMI Category",
x = "Red Blood Cells",
y = "Platelet Count") +
theme_minimal()
# Run ANOVA
anova_result <- aov(PLATE ~ BMI_CAT, data = hbody)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## BMI_CAT 2 27105 13553 3.337 0.0369 *
## Residuals 297 1206321 4062
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check for homogeneity of variances using Levene's test
levene_test <- leveneTest(PLATE ~ BMI_CAT, data = hbody)
levene_test
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.7021 0.4964
## 297
# Post-hoc analysis with Tukey HSD if ANOVA is significant
tukey_result <- TukeyHSD(anova_result)
tukey_result
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = PLATE ~ BMI_CAT, data = hbody)
##
## $BMI_CAT
## diff lwr upr p adj
## med-low 2.394995 -16.025465 20.81545 0.9496203
## high-low 53.429947 4.704796 102.15510 0.0276543
## high-med 51.034951 1.311474 100.75843 0.0427441
The platelet count data does not appear to be normally distributed. It seems to follow a positively skewed (right-skewed) distribution. The histograms and QQ plots support this observation, and the Shapiro-Wilk test results further confirm it. Since the p-value from the Shapiro-Wilk test is less than 0.05, we reject the null hypothesis that the data is normally distributed. This suggests that the platelet counts do not follow a normal distribution.
When analyzing platelet counts by gender, separating the data for mean, standard deviation, and box plot analysis reveals that the median platelet count for men is lower than that of women. This supports the claim that men tend to have lower platelet counts than women.
A two-sample t-test was conducted, and the results further reinforce this claim, showing that males generally have lower platelet counts than females.
The ANOVA results show a p-value of 0.0369, so we reject the null hypothesis that the mean platelet count is the same across all BMI categories. This indicates that BMI does have a significant effect on platelet counts. Levene’s test confirms that the assumption of equal variances is met, validating the ANOVA results. The Tukey HSD analysis reveals significant differences in platelet counts between the high BMI category and both the low and medium BMI categories. However, there is no significant difference between the low and medium BMI categories. This suggests that individuals with higher BMI tend to have higher platelet counts compared to those with lower and medium BMIs.