Is there a correlation between age at first childbirth and self-reported mental and physical wellbeing in adult women? This analysis aims to explore the potential relationship between how early or late parenthood may influence overall health outcomes and quality of life, focusing specifically on female adults. To determine if such life-course timing has a measurable relationship with wellbeing indicators, I will be using a subset of data from “A Sample of 500 Adults from NHANES” (nhanes.samp.adult.500), provided by the OpenIntro Data Repository. This sample of 500 adults between ages 21-80 is derived from the National Health and Nutrition Examination Survey (NHANES), a nationally representative study conducted by the Centers for Disease Control and Prevention (CDC) that collects demographic, health, and lifestyle information from adults and children in the United States.
For this project, the subset of data taken will contain only adult female respondents with children. Of that subset, the following columns are most relevant to answering the research question at hand:
These columns allow for a focused exploration of the correlation between age at first childbirth and self-reported health outcomes in adult females, providing insight into how family timing might relate to overall wellbeing among adult women in the U.S.
Data cleaning and analysis were conducted using the tidyverse and dplyr packages to prepare the dataset for analysis and ensure data accuracy. The nhanes.samp.adult.500 dataset was imported, and the filter() function was applied to include only female respondents aged 21–64, minimizing bias related to age-associated health changes. Next, the select() function retained only the variables relevant to the research question, and the rename() function improved clarity and readability by updating variable names: Age1stBaby → fch_age (age at first childbirth), DaysPhysHlthBad → days_bph (days of poor physical health), DaysMentHlthBad → days_bmh (days of poor mental health), and Age → age_at_survey. Two new variables were then created using the mutate() function to simplify comparisons between groups and improve visualization interpretation. The first, bad_days, represents the sum of days_bph and days_bmh, serving as a composite indicator of overall quality of life (ranging from 0 = best to 60 = worst). The second, fch_age_group, categorizes fch_age into three groups—early (<25 years), mid (25–34 years), and late (35+ years)—to enable comparisons across life stages. Subsequent analysis will include summary statistics and a boxplot visualization to examine the distribution of bad_days across fch_age_group, assessing whether age at first childbirth is associated with differences in overall health-related quality of life.
# Perform exploratory data analysis
head(nhanes_df)
## # A tibble: 6 × 76
## ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education
## <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <chr>
## 1 63106 2011_12 male 50 50-59 NA White White 9 - 11th Grade
## 2 67820 2011_12 female 47 40-49 NA Black Black College Grad
## 3 57178 2009_10 male 46 40-49 561 White <NA> Some College
## 4 68693 2011_12 male 28 20-29 NA White White Some College
## 5 69465 2011_12 female 50 50-59 NA White White College Grad
## 6 61505 2009_10 male 39 30-39 471 Black <NA> Some College
## # ℹ 67 more variables: MaritalStatus <chr>, HHIncome <chr>, HHIncomeMid <dbl>,
## # Poverty <dbl>, HomeRooms <dbl>, HomeOwn <chr>, Work <chr>, Weight <dbl>,
## # Length <lgl>, HeadCirc <lgl>, Height <dbl>, BMI <dbl>,
## # BMICatUnder20yrs <lgl>, BMI_WHO <chr>, Pulse <dbl>, BPSysAve <dbl>,
## # BPDiaAve <dbl>, BPSys1 <dbl>, BPDia1 <dbl>, BPSys2 <dbl>, BPDia2 <dbl>,
## # BPSys3 <dbl>, BPDia3 <dbl>, Testosterone <dbl>, DirectChol <dbl>,
## # TotChol <dbl>, UrineVol1 <dbl>, UrineFlow1 <dbl>, UrineVol2 <dbl>, …
summary(nhanes_df)
## ID SurveyYr Gender Age
## Min. :51780 Length:500 Length:500 Min. :21.00
## 1st Qu.:57239 Class :character Class :character 1st Qu.:37.00
## Median :62137 Mode :character Mode :character Median :50.00
## Mean :61885 Mean :49.92
## 3rd Qu.:66968 3rd Qu.:63.00
## Max. :71868 Max. :80.00
##
## AgeDecade AgeMonths Race1 Race3
## Length:500 Min. :252.0 Length:500 Length:500
## Class :character 1st Qu.:449.0 Class :character Class :character
## Mode :character Median :585.0 Mode :character Mode :character
## Mean :594.9
## 3rd Qu.:756.0
## Max. :959.0
## NA's :263
## Education MaritalStatus HHIncome HHIncomeMid
## Length:500 Length:500 Length:500 Min. : 2500
## Class :character Class :character Class :character 1st Qu.: 30000
## Mode :character Mode :character Mode :character Median : 50000
## Mean : 57849
## 3rd Qu.: 87500
## Max. :100000
## NA's :42
## Poverty HomeRooms HomeOwn Work
## Min. :0.000 Min. : 1.000 Length:500 Length:500
## 1st Qu.:1.520 1st Qu.: 5.000 Class :character Class :character
## Median :3.070 Median : 6.000 Mode :character Mode :character
## Mean :3.031 Mean : 6.083
## 3rd Qu.:4.987 3rd Qu.: 7.000
## Max. :5.000 Max. :13.000
## NA's :38 NA's :7
## Weight Length HeadCirc Height BMI
## Min. : 40.80 Mode:logical Mode:logical Min. :143.9 Min. :17.80
## 1st Qu.: 68.50 NA's:500 NA's:500 1st Qu.:161.1 1st Qu.:24.67
## Median : 82.20 Median :168.9 Median :28.20
## Mean : 83.83 Mean :168.8 Mean :29.39
## 3rd Qu.: 97.00 3rd Qu.:175.7 3rd Qu.:32.93
## Max. :223.00 Max. :195.8 Max. :68.63
## NA's :3 NA's :2 NA's :3
## BMICatUnder20yrs BMI_WHO Pulse BPSysAve
## Mode:logical Length:500 Min. : 42.00 Min. : 85.0
## NA's:500 Class :character 1st Qu.: 64.00 1st Qu.:111.0
## Mode :character Median : 72.00 Median :120.0
## Mean : 71.74 Mean :121.8
## 3rd Qu.: 78.00 3rd Qu.:130.5
## Max. :112.00 Max. :183.0
## NA's :19 NA's :21
## BPDiaAve BPSys1 BPDia1 BPSys2
## Min. : 24.00 Min. : 86.0 Min. : 32.00 Min. : 86.0
## 1st Qu.: 65.00 1st Qu.:112.0 1st Qu.: 64.00 1st Qu.:110.0
## Median : 71.00 Median :122.0 Median : 72.00 Median :120.0
## Mean : 70.43 Mean :123.1 Mean : 71.11 Mean :122.2
## 3rd Qu.: 78.00 3rd Qu.:132.0 3rd Qu.: 78.00 3rd Qu.:132.0
## Max. :110.00 Max. :184.0 Max. :110.00 Max. :184.0
## NA's :21 NA's :42 NA's :42 NA's :32
## BPDia2 BPSys3 BPDia3 Testosterone
## Min. : 0.00 Min. : 84.0 Min. : 0.0 Min. : 3.22
## 1st Qu.: 64.00 1st Qu.:110.0 1st Qu.: 64.0 1st Qu.: 19.50
## Median : 70.00 Median :120.0 Median : 70.0 Median : 73.44
## Mean : 70.47 Mean :121.4 Mean : 70.3 Mean : 219.08
## 3rd Qu.: 78.00 3rd Qu.:130.0 3rd Qu.: 78.0 3rd Qu.: 384.63
## Max. :108.00 Max. :182.0 Max. :110.0 Max. :1113.58
## NA's :32 NA's :29 NA's :29 NA's :270
## DirectChol TotChol UrineVol1 UrineFlow1
## Min. :0.520 Min. :2.40 Min. : 0.0 Min. :0.0550
## 1st Qu.:1.090 1st Qu.:4.37 1st Qu.: 51.0 1st Qu.:0.4130
## Median :1.320 Median :5.02 Median : 96.0 Median :0.7095
## Mean :1.386 Mean :5.14 Mean :122.5 Mean :1.0104
## 3rd Qu.:1.600 3rd Qu.:5.77 3rd Qu.:166.0 3rd Qu.:1.2270
## Max. :2.690 Max. :9.31 Max. :446.0 Max. :7.2110
## NA's :27 NA's :27 NA's :11 NA's :40
## UrineVol2 UrineFlow2 Diabetes DiabetesAge
## Min. : 11.0 Min. :0.0700 Length:500 Min. :24.00
## 1st Qu.: 52.0 1st Qu.:0.5018 Class :character 1st Qu.:42.00
## Median :102.0 Median :0.8005 Mode :character Median :55.00
## Mean :129.3 Mean :1.1340 Mean :52.88
## 3rd Qu.:183.0 3rd Qu.:1.2185 3rd Qu.:60.00
## Max. :400.0 Max. :5.4740 Max. :80.00
## NA's :425 NA's :426 NA's :451
## HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest
## Length:500 Min. : 0.00 Min. : 0.000 Length:500
## Class :character 1st Qu.: 0.00 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.00 Median : 0.000 Mode :character
## Mean : 3.18 Mean : 4.229
## 3rd Qu.: 2.00 3rd Qu.: 4.000
## Max. :30.00 Max. :30.000
## NA's :50 NA's :50
## Depressed nPregnancies nBabies Age1stBaby
## Length:500 Min. :1.000 Min. :0.000 Min. :14.00
## Class :character 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:19.00
## Mode :character Median :3.000 Median :2.000 Median :21.00
## Mean :2.911 Mean :2.425 Mean :22.21
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:25.50
## Max. :9.000 Max. :7.000 Max. :36.00
## NA's :309 NA's :321 NA's :365
## SleepHrsNight SleepTrouble PhysActive PhysActiveDays
## Min. : 3.000 Length:500 Length:500 Min. :1.000
## 1st Qu.: 6.000 Class :character Class :character 1st Qu.:3.000
## Median : 7.000 Mode :character Mode :character Median :4.000
## Mean : 6.819 Mean :4.004
## 3rd Qu.: 8.000 3rd Qu.:5.000
## Max. :12.000 Max. :7.000
## NA's :2 NA's :256
## TVHrsDay CompHrsDay TVHrsDayChild CompHrsDayChild
## Length:500 Length:500 Mode:logical Mode:logical
## Class :character Class :character NA's:500 NA's:500
## Mode :character Mode :character
##
##
##
##
## Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow
## Length:500 Min. : 1.000 Min. : 0.00 Length:500
## Class :character 1st Qu.: 1.000 1st Qu.: 3.00 Class :character
## Mode :character Median : 2.000 Median : 24.00 Mode :character
## Mean : 2.821 Mean : 77.36
## 3rd Qu.: 3.000 3rd Qu.:104.00
## Max. :36.000 Max. :364.00
## NA's :159 NA's :90
## Smoke100 Smoke100n SmokeAge Marijuana
## Length:500 Length:500 Min. : 8.00 Length:500
## Class :character Class :character 1st Qu.:15.00 Class :character
## Mode :character Mode :character Median :17.00 Mode :character
## Mean :18.01
## 3rd Qu.:19.00
## Max. :52.00
## NA's :279
## AgeFirstMarij RegularMarij AgeRegMarij HardDrugs
## Min. : 6.00 Length:500 Min. :10.00 Length:500
## 1st Qu.:15.00 Class :character 1st Qu.:16.00 Class :character
## Median :17.00 Mode :character Median :18.00 Mode :character
## Mean :17.01 Mean :17.95
## 3rd Qu.:19.00 3rd Qu.:20.00
## Max. :43.00 Max. :35.00
## NA's :319 NA's :420
## SexEver SexAge SexNumPartnLife SexNumPartYear
## Length:500 Min. : 9.00 Min. : 0.00 Min. : 0.000
## Class :character 1st Qu.:15.25 1st Qu.: 3.00 1st Qu.: 1.000
## Mode :character Median :17.00 Median : 6.00 Median : 1.000
## Mean :17.80 Mean : 21.87 Mean : 1.158
## 3rd Qu.:19.00 3rd Qu.: 14.00 3rd Qu.: 1.000
## Max. :50.00 Max. :999.00 Max. :17.000
## NA's :142 NA's :137 NA's :196
## SameSex SexOrientation PregnantNow
## Length:500 Length:500 Length:500
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
str(nhanes_df)
## spc_tbl_ [500 × 76] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ID : num [1:500] 63106 67820 57178 68693 69465 ...
## $ SurveyYr : chr [1:500] "2011_12" "2011_12" "2009_10" "2011_12" ...
## $ Gender : chr [1:500] "male" "female" "male" "male" ...
## $ Age : num [1:500] 50 47 46 28 50 39 74 31 21 80 ...
## $ AgeDecade : chr [1:500] "50-59" "40-49" "40-49" "20-29" ...
## $ AgeMonths : num [1:500] NA NA 561 NA NA 471 NA NA 260 NA ...
## $ Race1 : chr [1:500] "White" "Black" "White" "White" ...
## $ Race3 : chr [1:500] "White" "Black" NA "White" ...
## $ Education : chr [1:500] "9 - 11th Grade" "College Grad" "Some College" "Some College" ...
## $ MaritalStatus : chr [1:500] "Divorced" "Separated" "Married" "NeverMarried" ...
## $ HHIncome : chr [1:500] "10000-14999" "35000-44999" "more 99999" "more 99999" ...
## $ HHIncomeMid : num [1:500] 12500 40000 100000 100000 40000 NA 50000 100000 50000 40000 ...
## $ Poverty : num [1:500] 0.95 1.74 4.99 4.14 2.16 NA 3.3 5 0.92 3.51 ...
## $ HomeRooms : num [1:500] 7 6 6 8 10 4 4 4 4 6 ...
## $ HomeOwn : chr [1:500] "Own" "Rent" "Own" "Own" ...
## $ Work : chr [1:500] "NotWorking" "Working" "Working" "Working" ...
## $ Weight : num [1:500] 82.8 79.9 73.7 80.9 70.5 ...
## $ Length : logi [1:500] NA NA NA NA NA NA ...
## $ HeadCirc : logi [1:500] NA NA NA NA NA NA ...
## $ Height : num [1:500] 172 165 170 177 162 ...
## $ BMI : num [1:500] 27.9 29.4 25.4 25.8 26.9 ...
## $ BMICatUnder20yrs: logi [1:500] NA NA NA NA NA NA ...
## $ BMI_WHO : chr [1:500] "25.0_to_29.9" "25.0_to_29.9" "25.0_to_29.9" "25.0_to_29.9" ...
## $ Pulse : num [1:500] 58 70 74 58 76 58 68 60 84 82 ...
## $ BPSysAve : num [1:500] 125 121 120 132 152 148 122 114 108 168 ...
## $ BPDiaAve : num [1:500] 86 68 74 74 103 88 65 76 46 52 ...
## $ BPSys1 : num [1:500] 122 124 120 134 144 150 120 114 108 170 ...
## $ BPDia1 : num [1:500] 88 66 70 72 104 94 68 82 44 76 ...
## $ BPSys2 : num [1:500] 124 120 118 130 150 148 118 114 108 168 ...
## $ BPDia2 : num [1:500] 86 66 74 72 106 92 62 76 46 52 ...
## $ BPSys3 : num [1:500] 126 122 122 134 154 148 126 114 108 168 ...
## $ BPDia3 : num [1:500] 86 70 74 76 100 84 68 76 46 52 ...
## $ Testosterone : num [1:500] 525.37 5.98 NA 653.19 8.17 ...
## $ DirectChol : num [1:500] 1.29 1.22 1.4 1.84 2.43 NA 1.14 1.29 1.34 1.6 ...
## $ TotChol : num [1:500] 5.07 3.7 6.03 4.55 5.92 NA 3.21 5.02 6.15 5.02 ...
## $ UrineVol1 : num [1:500] 244 65 105 51 30 NA 39 120 81 47 ...
## $ UrineFlow1 : num [1:500] 1.683 0.442 0.682 0.464 1.304 ...
## $ UrineVol2 : num [1:500] NA NA NA NA 114 NA NA NA NA NA ...
## $ UrineFlow2 : num [1:500] NA NA NA NA 1.12 ...
## $ Diabetes : chr [1:500] "No" "No" "No" "No" ...
## $ DiabetesAge : num [1:500] NA NA NA NA NA NA NA NA NA NA ...
## $ HealthGen : chr [1:500] "Fair" "Vgood" "Vgood" "Excellent" ...
## $ DaysPhysHlthBad : num [1:500] 5 2 0 0 0 NA 30 0 0 0 ...
## $ DaysMentHlthBad : num [1:500] 30 5 0 0 0 NA 0 10 30 0 ...
## $ LittleInterest : chr [1:500] "Most" "None" "None" "None" ...
## $ Depressed : chr [1:500] "Several" "Several" "None" "None" ...
## $ nPregnancies : num [1:500] NA 2 NA NA 3 NA NA NA NA 1 ...
## $ nBabies : num [1:500] NA 2 NA NA 3 NA NA NA NA 1 ...
## $ Age1stBaby : num [1:500] NA 21 NA NA 27 NA NA NA NA NA ...
## $ SleepHrsNight : num [1:500] 4 6 5 7 6 6 8 7 5 7 ...
## $ SleepTrouble : chr [1:500] "Yes" "No" "No" "No" ...
## $ PhysActive : chr [1:500] "No" "Yes" "Yes" "Yes" ...
## $ PhysActiveDays : num [1:500] 3 NA 3 NA NA 4 3 NA NA NA ...
## $ TVHrsDay : chr [1:500] "2_hr" "3_hr" NA "0_to_1_hr" ...
## $ CompHrsDay : chr [1:500] "0_hrs" "2_hr" NA "4_hr" ...
## $ TVHrsDayChild : logi [1:500] NA NA NA NA NA NA ...
## $ CompHrsDayChild : logi [1:500] NA NA NA NA NA NA ...
## $ Alcohol12PlusYr : chr [1:500] "Yes" "Yes" "Yes" "No" ...
## $ AlcoholDay : num [1:500] 1 2 NA NA 1 NA 1 2 2 1 ...
## $ AlcoholYear : num [1:500] 24 3 0 0 364 NA 12 104 2 1 ...
## $ SmokeNow : chr [1:500] "No" NA NA NA ...
## $ Smoke100 : chr [1:500] "Yes" "No" "No" "No" ...
## $ Smoke100n : chr [1:500] "Smoker" "Non-Smoker" "Non-Smoker" "Non-Smoker" ...
## $ SmokeAge : num [1:500] 18 NA NA NA NA NA NA NA NA 21 ...
## $ Marijuana : chr [1:500] "No" "Yes" "Yes" "No" ...
## $ AgeFirstMarij : num [1:500] NA 19 14 NA NA NA NA 20 19 NA ...
## $ RegularMarij : chr [1:500] "No" "Yes" "Yes" "No" ...
## $ AgeRegMarij : num [1:500] NA 20 16 NA NA NA NA NA NA NA ...
## $ HardDrugs : chr [1:500] "No" "No" "Yes" "No" ...
## $ SexEver : chr [1:500] "Yes" "Yes" "Yes" "No" ...
## $ SexAge : num [1:500] 16 17 14 NA 17 NA NA 19 13 NA ...
## $ SexNumPartnLife : num [1:500] 26 10 50 0 4 NA NA 3 15 NA ...
## $ SexNumPartYear : num [1:500] 2 2 1 0 1 NA NA 1 2 NA ...
## $ SameSex : chr [1:500] "No" "No" "No" "No" ...
## $ SexOrientation : chr [1:500] "Heterosexual" "Heterosexual" "Heterosexual" "Heterosexual" ...
## $ PregnantNow : chr [1:500] NA NA NA NA ...
## - attr(*, "spec")=
## .. cols(
## .. ID = col_double(),
## .. SurveyYr = col_character(),
## .. Gender = col_character(),
## .. Age = col_double(),
## .. AgeDecade = col_character(),
## .. AgeMonths = col_double(),
## .. Race1 = col_character(),
## .. Race3 = col_character(),
## .. Education = col_character(),
## .. MaritalStatus = col_character(),
## .. HHIncome = col_character(),
## .. HHIncomeMid = col_double(),
## .. Poverty = col_double(),
## .. HomeRooms = col_double(),
## .. HomeOwn = col_character(),
## .. Work = col_character(),
## .. Weight = col_double(),
## .. Length = col_logical(),
## .. HeadCirc = col_logical(),
## .. Height = col_double(),
## .. BMI = col_double(),
## .. BMICatUnder20yrs = col_logical(),
## .. BMI_WHO = col_character(),
## .. Pulse = col_double(),
## .. BPSysAve = col_double(),
## .. BPDiaAve = col_double(),
## .. BPSys1 = col_double(),
## .. BPDia1 = col_double(),
## .. BPSys2 = col_double(),
## .. BPDia2 = col_double(),
## .. BPSys3 = col_double(),
## .. BPDia3 = col_double(),
## .. Testosterone = col_double(),
## .. DirectChol = col_double(),
## .. TotChol = col_double(),
## .. UrineVol1 = col_double(),
## .. UrineFlow1 = col_double(),
## .. UrineVol2 = col_double(),
## .. UrineFlow2 = col_double(),
## .. Diabetes = col_character(),
## .. DiabetesAge = col_double(),
## .. HealthGen = col_character(),
## .. DaysPhysHlthBad = col_double(),
## .. DaysMentHlthBad = col_double(),
## .. LittleInterest = col_character(),
## .. Depressed = col_character(),
## .. nPregnancies = col_double(),
## .. nBabies = col_double(),
## .. Age1stBaby = col_double(),
## .. SleepHrsNight = col_double(),
## .. SleepTrouble = col_character(),
## .. PhysActive = col_character(),
## .. PhysActiveDays = col_double(),
## .. TVHrsDay = col_character(),
## .. CompHrsDay = col_character(),
## .. TVHrsDayChild = col_logical(),
## .. CompHrsDayChild = col_logical(),
## .. Alcohol12PlusYr = col_character(),
## .. AlcoholDay = col_double(),
## .. AlcoholYear = col_double(),
## .. SmokeNow = col_character(),
## .. Smoke100 = col_character(),
## .. Smoke100n = col_character(),
## .. SmokeAge = col_double(),
## .. Marijuana = col_character(),
## .. AgeFirstMarij = col_double(),
## .. RegularMarij = col_character(),
## .. AgeRegMarij = col_double(),
## .. HardDrugs = col_character(),
## .. SexEver = col_character(),
## .. SexAge = col_double(),
## .. SexNumPartnLife = col_double(),
## .. SexNumPartYear = col_double(),
## .. SameSex = col_character(),
## .. SexOrientation = col_character(),
## .. PregnantNow = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
colSums(is.na(nhanes_df))
## ID SurveyYr Gender Age
## 0 0 0 0
## AgeDecade AgeMonths Race1 Race3
## 25 263 0 251
## Education MaritalStatus HHIncome HHIncomeMid
## 1 0 42 42
## Poverty HomeRooms HomeOwn Work
## 38 7 7 0
## Weight Length HeadCirc Height
## 3 500 500 2
## BMI BMICatUnder20yrs BMI_WHO Pulse
## 3 500 3 19
## BPSysAve BPDiaAve BPSys1 BPDia1
## 21 21 42 42
## BPSys2 BPDia2 BPSys3 BPDia3
## 32 32 29 29
## Testosterone DirectChol TotChol UrineVol1
## 270 27 27 11
## UrineFlow1 UrineVol2 UrineFlow2 Diabetes
## 40 425 426 0
## DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad
## 451 50 50 50
## LittleInterest Depressed nPregnancies nBabies
## 53 52 309 321
## Age1stBaby SleepHrsNight SleepTrouble PhysActive
## 365 2 0 0
## PhysActiveDays TVHrsDay CompHrsDay TVHrsDayChild
## 256 251 251 500
## CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear
## 500 50 159 90
## SmokeNow Smoke100 Smoke100n SmokeAge
## 270 0 0 279
## Marijuana AgeFirstMarij RegularMarij AgeRegMarij
## 191 319 191 420
## HardDrugs SexEver SexAge SexNumPartnLife
## 135 133 142 137
## SexNumPartYear SameSex SexOrientation PregnantNow
## 196 132 201 388
# Clean and create dataset
cleaned_nhanes_df <- nhanes_df %>%
filter(Gender == "female", between(Age, 21, 64)) %>%
select(Age1stBaby, DaysPhysHlthBad, DaysMentHlthBad, Age) %>%
filter(if_all(everything(), ~ !is.na(.))) %>%
rename(
fch_age = Age1stBaby,
days_bph = DaysPhysHlthBad,
days_bmh = DaysMentHlthBad,
age_at_survey = Age
) %>%
mutate(
bad_days = rowSums(across(c(days_bph, days_bmh)), na.rm = TRUE),
fch_age_group = case_when(
fch_age < 25 ~ "early (<25)",
fch_age >= 25 & fch_age <= 34 ~ "mid (25-34)",
fch_age >= 35 ~ "late (35+)",
TRUE ~ NA_character_
),
fch_age_group = factor(fch_age_group,
levels = c("early (<25)", "mid (25-34)", "late (35+)"))
)
nrow(cleaned_nhanes_df)
## [1] 92
cleaned_nhanes_df %>% count(fch_age_group)
## # A tibble: 3 × 2
## fch_age_group n
## <fct> <int>
## 1 early (<25) 60
## 2 mid (25-34) 31
## 3 late (35+) 1
# Summary Table
summary_table <- cleaned_nhanes_df %>%
group_by(fch_age_group) %>%
summarise(
n = n(),
mean_fch_age = round(mean(fch_age, na.rm = TRUE), 1),
mean_bad_days = round(mean(bad_days, na.rm = TRUE), 1),
.groups = "drop"
)
kable(summary_table, caption = "Average Age at First Child and Total Bad Days by Group (Female Respondents)")
| fch_age_group | n | mean_fch_age | mean_bad_days |
|---|---|---|---|
| early (<25) | 60 | 19.6 | 10.3 |
| mid (25-34) | 31 | 28.1 | 3.2 |
| late (35+) | 1 | 36.0 | 0.0 |
# Boxplot visualization
ggplot(cleaned_nhanes_df, aes(x = fch_age_group, y = bad_days)) +
geom_boxplot(fill = "lightgray") +
stat_summary(fun = mean, geom = "point", shape = 20, size = 3, color = "blue") +
labs(
title = "Distribution of Total Bad Days by Age at First Childbirth",
x = "Age at First Childbirth Group",
y = "Total Bad Days (Physical + Mental)"
) +
theme_minimal()
This analysis examined the relationship between age at first childbirth and self-reported health among female respondents aged 21–64 in the nhanes.samp.adult.500 dataset. After cleaning and filtering, the analytic sample included 92 women: 60 in the early (<25) group, 31 in the mid (25–34) group, and 1 in the late (35+) group. The early group reported a higher mean number of bad days (10.3) compared to the mid group (3.2), suggesting that earlier childbirth may be associated with poorer overall physical and mental health. The boxplot visualization reinforced this trend, showing greater variability and higher counts of bad days among early mothers. Although the early group’s sample size was about twice that of the mid group, this does not statistically skew the averages as each mean was calculated independently. Instead, the larger sample size provides a more stable estimate for early mothers, while the smaller mid group’s average is less precise. The late group’s single respondent is not statistically meaningful, as the extremely limited sample size prevents any reliable generalization.
Future research should expand the sample size, particularly for women who had their first child at older ages, to ensure balanced group comparisons. Additionally, using longitudinal data to apply fixed-effects models that compare individuals to themselves over time would account for the influence of external variables such as demographic and socioeconomic factors on well-being. Combining this with a cross-sectional approach, as used here, would allow for broader generalization and help clarify whether the relationship between age at first childbirth and well-being persists across a larger and more diverse sample of women. Together, these methods would help isolate the causal impact of childbirth timing on well-being at the population level within the United States. Overall, the results suggest a potential link between earlier age at first childbirth and lower self-reported well-being, but further study using larger and more representative samples is needed to confirm and refine these findings.
Roger D. Peng, Exploratory Data Analysis with R (Leanpub, 2020)↩︎
Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto, and David Lusseau, An Introduction to R (April 9 2024), https://intro2r.com/.↩︎
Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (O’Reilly Media, 2017), https://r4ds.had.co.nz/.↩︎
OpenIntro Stat, nhanes.samp.adult.500: A random sample of 500 participants age 21 or older from the full NHANES dataset, https://www.openintro.org/data/index.php?data=nhanes.samp.adult.500.↩︎
Thomas E. Love, 431 Course Notes: NHANES Data, https://thomaselove.github.io/431-notes/04-nhanes.html.↩︎
OpenIntroStat, openintro R package: nhanes.samp.adult.500 documentation, https://rdrr.io/github/OpenIntroStat/openintro/man/nhanes.samp.adult.500.html.↩︎
I used Google AI to identify and develop understanding of research techniques for future directions, such as fixed-effect models. I used ChatGPT 5 to review and revise the written portions of my report for grammatical and syntax flow.↩︎
I used Google AI to appropriately format reference citations in R studio.↩︎