With the babies data set in openintro, use a summary table to
investigate whether first pregnancy status correlates with gestation
length or not. Use pipe operator for your code. Submit yoru code and the
result.
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
gestation_summary <- babies %>%
filter(!is.na(parity) & !is.na(gestation)) %>%
mutate(
pregnancy_status = if_else(parity == 0, "First Pregnancy", "Subsequent Pregnancy")
) %>%
group_by(pregnancy_status) %>%
summarize(
total_cases = n(),
avg_gestation_days = mean(gestation),
median_gestation_days = median(gestation),
gestation_spread_sd = sd(gestation)
)
gestation_summary
## # A tibble: 2 × 5
## pregnancy_status total_cases avg_gestation_days median_gestation_days
## <chr> <int> <dbl> <dbl>
## 1 First Pregnancy 910 279. 279
## 2 Subsequent Pregnancy 313 281. 282
## # ℹ 1 more variable: gestation_spread_sd <dbl>
Based on the summary table, there does not appear to be a strong or
meaningful correlation between first pregnancy status and gestation
length.