Hot_flashes project

Author

Lydia Baick

Statistics Course Project

Intro

text for your intro with your backgound research

load the libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
✔ broom        1.0.6     ✔ rsample      1.2.1
✔ dials        1.3.0     ✔ tune         1.2.1
✔ infer        1.0.7     ✔ workflows    1.1.4
✔ modeldata    1.4.0     ✔ workflowsets 1.1.0
✔ parsnip      1.2.1     ✔ yardstick    1.3.1
✔ recipes      1.1.0     
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use tidymodels_prefer() to resolve common conflicts.

Load the data

getwd()
[1] "C:/Users/lydia/Downloads/Stats 217/stats final project"
setwd("C:/Users/lydia/Downloads/Stats 217/stats final project")
hot_flash <- read_csv("hflash.csv")
Rows: 375 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (14): pt, ageg, aagrp, edu, d1, f1a, pcs12, hotflash, bmi30, estra, fsh,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(hot_flash)
# A tibble: 6 × 14
     pt  ageg aagrp   edu    d1   f1a pcs12 hotflash bmi30 estra   fsh    lh
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>
1     3     2     0     1     0     0  56.8        0     0 107.   3.00  2.98
2     6     3     0     1     0     0  59.2        0     0  31.2 11.2   5.76
3     7     1     0     1     0     1  57.7        0     0  13.4 14.5   5.60
4     8     1     0     1     1     0  55.8        1     0  10.6  5.53  2.26
5     9     2     0     1     0     1  55.9        0     0  24.1  9.78  2.6 
6    10     3     0     1     0     0  NA          1     0  37.3 10.3   3.40
# ℹ 2 more variables: testo <dbl>, dheas <dbl>
hot_flash1 <- hot_flash |>
  mutate(race = ifelse(aagrp == 0, "caucasian", "african american"))
hot_flash3 <- hot_flash1 |>
  count(race)|>
  mutate(prop=n/sum(n))
head(hot_flash3)
# A tibble: 2 × 3
  race                 n  prop
  <chr>            <int> <dbl>
1 african american   182 0.485
2 caucasian          193 0.515
ggplot(hot_flash3, aes(x = race, y = prop, fill = race)) +
  geom_col()

This bar graph compares the proportion of African Americans and Caucasians who reported experiencing hot flashes. It shows that African Americans experience hot flashes slightly more than Caucasian people do.

hot_flash2 <- hot_flash |>
  mutate(age_type = case_when(ageg == 1 ~ "35-39 years",
                              ageg == 2 ~ "40-44 years",
                               ageg == 3 ~ "45-48 years"))
ggplot(hot_flash2, aes(age_type, estra, fill = age_type)) +
         geom_boxplot()

These box plots illustrate the distribution of Baseline Estradiol (pg/ml) across three age groups: 35-39 years, 40-44 years, and 45-48 years. Estradiol is a type of estrogen hormone that plays a vital role in the female reproductive system. I am exploring estradiol levels and age to find out at what ages do levels increase and at what ages do they decrease. This graph is showing the median estradiol level for the 35-39 age group is the lowest among the three groups while the median level is highest in the 45-48 age group. The plots are suggesting a trend of increasing estradiol levels with age.

ggplot(hot_flash1, aes(x=estra, y=lh))+
  geom_point() +
  theme_bw()+
  labs(x="Baseline Estradiol (pg/ml)", 
       y="Baseline Lutenizing hormone (mIU/ml)",
       title = "Scatterplot of Baseline Estradiol to Baseline Lutenizing hormone",
       caption = "Source: Bigelow, July 11, 2023. Teaching of Statistics in the Health Science")

This is a scatter plot showing the relationship between baseline estradiol levels, which is a type of estrogen hormone that plays a vital role in the female reproductive system by developing the reproductive organs, and baseline luteinizing hormone levels, which is a hormone that also plays a key role in the reproductive system by stimulating the ovaries and testes to produce hormones and cells. I am exploring the relationship between the two because I want to find out if there is some sort of connection between them since they seem to work in similar areas. The plot shows that there is no clear linear relationship between the two variables.