Introduction

Is there a correlation between age at first childbirth and self-reported mental and physical wellbeing in adult women? This analysis aims to explore the potential relationship between how early or late parenthood may influence overall health outcomes and quality of life, focusing specifically on female adults. To determine if such life-course timing has a measurable relationship with wellbeing indicators, I will be using a subset of data from “A Sample of 500 Adults from NHANES” (nhanes.samp.adult.500), provided by the OpenIntro Data Repository. This sample of 500 adults between ages 21-80 is derived from the National Health and Nutrition Examination Survey (NHANES), a nationally representative study conducted by the Centers for Disease Control and Prevention (CDC) that collects demographic, health, and lifestyle information from adults and children in the United States.

For this project, the subset of data taken will contain only adult female respondents with children. Of that subset, the following columns are most relevant to answering the research question at hand:

These columns allow for a focused exploration of the correlation between age at first childbirth and self-reported health outcomes in adult females, providing insight into how family timing might relate to overall wellbeing among adult women in the U.S.

Data Analysis

Data cleaning and analysis were conducted using the tidyverse and dplyr packages to prepare the dataset for analysis and ensure data accuracy. The nhanes.samp.adult.500 dataset was imported, and the filter() function was applied to include only female respondents aged 21–64, minimizing bias related to age-associated health changes. Next, the select() function retained only the variables relevant to the research question, and the rename() function improved clarity and readability by updating variable names: Age1stBabyfch_age (age at first childbirth), DaysPhysHlthBaddays_bph (days of poor physical health), DaysMentHlthBaddays_bmh (days of poor mental health), and Ageage_at_survey. Two new variables were then created using the mutate() function to simplify comparisons between groups and improve visualization interpretation. The first, bad_days, represents the sum of days_bph and days_bmh, serving as a composite indicator of overall quality of life (ranging from 0 = best to 60 = worst). The second, fch_age_group, categorizes fch_age into three groups—early (<25 years), mid (25–34 years), and late (35+ years)—to enable comparisons across life stages. Subsequent analysis will include summary statistics and a boxplot visualization to examine the distribution of bad_days across fch_age_group, assessing whether age at first childbirth is associated with differences in overall health-related quality of life.

# Perform exploratory data analysis

head(nhanes_df)
## # A tibble: 6 × 76
##      ID SurveyYr Gender   Age AgeDecade AgeMonths Race1 Race3 Education     
##   <dbl> <chr>    <chr>  <dbl> <chr>         <dbl> <chr> <chr> <chr>         
## 1 63106 2011_12  male      50 50-59            NA White White 9 - 11th Grade
## 2 67820 2011_12  female    47 40-49            NA Black Black College Grad  
## 3 57178 2009_10  male      46 40-49           561 White <NA>  Some College  
## 4 68693 2011_12  male      28 20-29            NA White White Some College  
## 5 69465 2011_12  female    50 50-59            NA White White College Grad  
## 6 61505 2009_10  male      39 30-39           471 Black <NA>  Some College  
## # ℹ 67 more variables: MaritalStatus <chr>, HHIncome <chr>, HHIncomeMid <dbl>,
## #   Poverty <dbl>, HomeRooms <dbl>, HomeOwn <chr>, Work <chr>, Weight <dbl>,
## #   Length <lgl>, HeadCirc <lgl>, Height <dbl>, BMI <dbl>,
## #   BMICatUnder20yrs <lgl>, BMI_WHO <chr>, Pulse <dbl>, BPSysAve <dbl>,
## #   BPDiaAve <dbl>, BPSys1 <dbl>, BPDia1 <dbl>, BPSys2 <dbl>, BPDia2 <dbl>,
## #   BPSys3 <dbl>, BPDia3 <dbl>, Testosterone <dbl>, DirectChol <dbl>,
## #   TotChol <dbl>, UrineVol1 <dbl>, UrineFlow1 <dbl>, UrineVol2 <dbl>, …
summary(nhanes_df)
##        ID          SurveyYr            Gender               Age       
##  Min.   :51780   Length:500         Length:500         Min.   :21.00  
##  1st Qu.:57239   Class :character   Class :character   1st Qu.:37.00  
##  Median :62137   Mode  :character   Mode  :character   Median :50.00  
##  Mean   :61885                                         Mean   :49.92  
##  3rd Qu.:66968                                         3rd Qu.:63.00  
##  Max.   :71868                                         Max.   :80.00  
##                                                                       
##   AgeDecade           AgeMonths        Race1              Race3          
##  Length:500         Min.   :252.0   Length:500         Length:500        
##  Class :character   1st Qu.:449.0   Class :character   Class :character  
##  Mode  :character   Median :585.0   Mode  :character   Mode  :character  
##                     Mean   :594.9                                        
##                     3rd Qu.:756.0                                        
##                     Max.   :959.0                                        
##                     NA's   :263                                          
##   Education         MaritalStatus        HHIncome          HHIncomeMid    
##  Length:500         Length:500         Length:500         Min.   :  2500  
##  Class :character   Class :character   Class :character   1st Qu.: 30000  
##  Mode  :character   Mode  :character   Mode  :character   Median : 50000  
##                                                           Mean   : 57849  
##                                                           3rd Qu.: 87500  
##                                                           Max.   :100000  
##                                                           NA's   :42      
##     Poverty        HomeRooms        HomeOwn              Work          
##  Min.   :0.000   Min.   : 1.000   Length:500         Length:500        
##  1st Qu.:1.520   1st Qu.: 5.000   Class :character   Class :character  
##  Median :3.070   Median : 6.000   Mode  :character   Mode  :character  
##  Mean   :3.031   Mean   : 6.083                                        
##  3rd Qu.:4.987   3rd Qu.: 7.000                                        
##  Max.   :5.000   Max.   :13.000                                        
##  NA's   :38      NA's   :7                                             
##      Weight        Length        HeadCirc           Height           BMI       
##  Min.   : 40.80   Mode:logical   Mode:logical   Min.   :143.9   Min.   :17.80  
##  1st Qu.: 68.50   NA's:500       NA's:500       1st Qu.:161.1   1st Qu.:24.67  
##  Median : 82.20                                 Median :168.9   Median :28.20  
##  Mean   : 83.83                                 Mean   :168.8   Mean   :29.39  
##  3rd Qu.: 97.00                                 3rd Qu.:175.7   3rd Qu.:32.93  
##  Max.   :223.00                                 Max.   :195.8   Max.   :68.63  
##  NA's   :3                                      NA's   :2       NA's   :3      
##  BMICatUnder20yrs   BMI_WHO              Pulse           BPSysAve    
##  Mode:logical     Length:500         Min.   : 42.00   Min.   : 85.0  
##  NA's:500         Class :character   1st Qu.: 64.00   1st Qu.:111.0  
##                   Mode  :character   Median : 72.00   Median :120.0  
##                                      Mean   : 71.74   Mean   :121.8  
##                                      3rd Qu.: 78.00   3rd Qu.:130.5  
##                                      Max.   :112.00   Max.   :183.0  
##                                      NA's   :19       NA's   :21     
##     BPDiaAve          BPSys1          BPDia1           BPSys2     
##  Min.   : 24.00   Min.   : 86.0   Min.   : 32.00   Min.   : 86.0  
##  1st Qu.: 65.00   1st Qu.:112.0   1st Qu.: 64.00   1st Qu.:110.0  
##  Median : 71.00   Median :122.0   Median : 72.00   Median :120.0  
##  Mean   : 70.43   Mean   :123.1   Mean   : 71.11   Mean   :122.2  
##  3rd Qu.: 78.00   3rd Qu.:132.0   3rd Qu.: 78.00   3rd Qu.:132.0  
##  Max.   :110.00   Max.   :184.0   Max.   :110.00   Max.   :184.0  
##  NA's   :21       NA's   :42      NA's   :42       NA's   :32     
##      BPDia2           BPSys3          BPDia3       Testosterone    
##  Min.   :  0.00   Min.   : 84.0   Min.   :  0.0   Min.   :   3.22  
##  1st Qu.: 64.00   1st Qu.:110.0   1st Qu.: 64.0   1st Qu.:  19.50  
##  Median : 70.00   Median :120.0   Median : 70.0   Median :  73.44  
##  Mean   : 70.47   Mean   :121.4   Mean   : 70.3   Mean   : 219.08  
##  3rd Qu.: 78.00   3rd Qu.:130.0   3rd Qu.: 78.0   3rd Qu.: 384.63  
##  Max.   :108.00   Max.   :182.0   Max.   :110.0   Max.   :1113.58  
##  NA's   :32       NA's   :29      NA's   :29      NA's   :270      
##    DirectChol       TotChol       UrineVol1       UrineFlow1    
##  Min.   :0.520   Min.   :2.40   Min.   :  0.0   Min.   :0.0550  
##  1st Qu.:1.090   1st Qu.:4.37   1st Qu.: 51.0   1st Qu.:0.4130  
##  Median :1.320   Median :5.02   Median : 96.0   Median :0.7095  
##  Mean   :1.386   Mean   :5.14   Mean   :122.5   Mean   :1.0104  
##  3rd Qu.:1.600   3rd Qu.:5.77   3rd Qu.:166.0   3rd Qu.:1.2270  
##  Max.   :2.690   Max.   :9.31   Max.   :446.0   Max.   :7.2110  
##  NA's   :27      NA's   :27     NA's   :11      NA's   :40      
##    UrineVol2       UrineFlow2       Diabetes          DiabetesAge   
##  Min.   : 11.0   Min.   :0.0700   Length:500         Min.   :24.00  
##  1st Qu.: 52.0   1st Qu.:0.5018   Class :character   1st Qu.:42.00  
##  Median :102.0   Median :0.8005   Mode  :character   Median :55.00  
##  Mean   :129.3   Mean   :1.1340                      Mean   :52.88  
##  3rd Qu.:183.0   3rd Qu.:1.2185                      3rd Qu.:60.00  
##  Max.   :400.0   Max.   :5.4740                      Max.   :80.00  
##  NA's   :425     NA's   :426                         NA's   :451    
##   HealthGen         DaysPhysHlthBad DaysMentHlthBad  LittleInterest    
##  Length:500         Min.   : 0.00   Min.   : 0.000   Length:500        
##  Class :character   1st Qu.: 0.00   1st Qu.: 0.000   Class :character  
##  Mode  :character   Median : 0.00   Median : 0.000   Mode  :character  
##                     Mean   : 3.18   Mean   : 4.229                     
##                     3rd Qu.: 2.00   3rd Qu.: 4.000                     
##                     Max.   :30.00   Max.   :30.000                     
##                     NA's   :50      NA's   :50                         
##   Depressed          nPregnancies      nBabies        Age1stBaby   
##  Length:500         Min.   :1.000   Min.   :0.000   Min.   :14.00  
##  Class :character   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:19.00  
##  Mode  :character   Median :3.000   Median :2.000   Median :21.00  
##                     Mean   :2.911   Mean   :2.425   Mean   :22.21  
##                     3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:25.50  
##                     Max.   :9.000   Max.   :7.000   Max.   :36.00  
##                     NA's   :309     NA's   :321     NA's   :365    
##  SleepHrsNight    SleepTrouble        PhysActive        PhysActiveDays 
##  Min.   : 3.000   Length:500         Length:500         Min.   :1.000  
##  1st Qu.: 6.000   Class :character   Class :character   1st Qu.:3.000  
##  Median : 7.000   Mode  :character   Mode  :character   Median :4.000  
##  Mean   : 6.819                                         Mean   :4.004  
##  3rd Qu.: 8.000                                         3rd Qu.:5.000  
##  Max.   :12.000                                         Max.   :7.000  
##  NA's   :2                                              NA's   :256    
##    TVHrsDay          CompHrsDay        TVHrsDayChild  CompHrsDayChild
##  Length:500         Length:500         Mode:logical   Mode:logical   
##  Class :character   Class :character   NA's:500       NA's:500       
##  Mode  :character   Mode  :character                                 
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##  Alcohol12PlusYr      AlcoholDay      AlcoholYear       SmokeNow        
##  Length:500         Min.   : 1.000   Min.   :  0.00   Length:500        
##  Class :character   1st Qu.: 1.000   1st Qu.:  3.00   Class :character  
##  Mode  :character   Median : 2.000   Median : 24.00   Mode  :character  
##                     Mean   : 2.821   Mean   : 77.36                     
##                     3rd Qu.: 3.000   3rd Qu.:104.00                     
##                     Max.   :36.000   Max.   :364.00                     
##                     NA's   :159      NA's   :90                         
##    Smoke100          Smoke100n            SmokeAge      Marijuana        
##  Length:500         Length:500         Min.   : 8.00   Length:500        
##  Class :character   Class :character   1st Qu.:15.00   Class :character  
##  Mode  :character   Mode  :character   Median :17.00   Mode  :character  
##                                        Mean   :18.01                     
##                                        3rd Qu.:19.00                     
##                                        Max.   :52.00                     
##                                        NA's   :279                       
##  AgeFirstMarij   RegularMarij        AgeRegMarij     HardDrugs        
##  Min.   : 6.00   Length:500         Min.   :10.00   Length:500        
##  1st Qu.:15.00   Class :character   1st Qu.:16.00   Class :character  
##  Median :17.00   Mode  :character   Median :18.00   Mode  :character  
##  Mean   :17.01                      Mean   :17.95                     
##  3rd Qu.:19.00                      3rd Qu.:20.00                     
##  Max.   :43.00                      Max.   :35.00                     
##  NA's   :319                        NA's   :420                       
##    SexEver              SexAge      SexNumPartnLife  SexNumPartYear  
##  Length:500         Min.   : 9.00   Min.   :  0.00   Min.   : 0.000  
##  Class :character   1st Qu.:15.25   1st Qu.:  3.00   1st Qu.: 1.000  
##  Mode  :character   Median :17.00   Median :  6.00   Median : 1.000  
##                     Mean   :17.80   Mean   : 21.87   Mean   : 1.158  
##                     3rd Qu.:19.00   3rd Qu.: 14.00   3rd Qu.: 1.000  
##                     Max.   :50.00   Max.   :999.00   Max.   :17.000  
##                     NA's   :142     NA's   :137      NA's   :196     
##    SameSex          SexOrientation     PregnantNow       
##  Length:500         Length:500         Length:500        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
## 
str(nhanes_df)
## spc_tbl_ [500 × 76] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ID              : num [1:500] 63106 67820 57178 68693 69465 ...
##  $ SurveyYr        : chr [1:500] "2011_12" "2011_12" "2009_10" "2011_12" ...
##  $ Gender          : chr [1:500] "male" "female" "male" "male" ...
##  $ Age             : num [1:500] 50 47 46 28 50 39 74 31 21 80 ...
##  $ AgeDecade       : chr [1:500] "50-59" "40-49" "40-49" "20-29" ...
##  $ AgeMonths       : num [1:500] NA NA 561 NA NA 471 NA NA 260 NA ...
##  $ Race1           : chr [1:500] "White" "Black" "White" "White" ...
##  $ Race3           : chr [1:500] "White" "Black" NA "White" ...
##  $ Education       : chr [1:500] "9 - 11th Grade" "College Grad" "Some College" "Some College" ...
##  $ MaritalStatus   : chr [1:500] "Divorced" "Separated" "Married" "NeverMarried" ...
##  $ HHIncome        : chr [1:500] "10000-14999" "35000-44999" "more 99999" "more 99999" ...
##  $ HHIncomeMid     : num [1:500] 12500 40000 100000 100000 40000 NA 50000 100000 50000 40000 ...
##  $ Poverty         : num [1:500] 0.95 1.74 4.99 4.14 2.16 NA 3.3 5 0.92 3.51 ...
##  $ HomeRooms       : num [1:500] 7 6 6 8 10 4 4 4 4 6 ...
##  $ HomeOwn         : chr [1:500] "Own" "Rent" "Own" "Own" ...
##  $ Work            : chr [1:500] "NotWorking" "Working" "Working" "Working" ...
##  $ Weight          : num [1:500] 82.8 79.9 73.7 80.9 70.5 ...
##  $ Length          : logi [1:500] NA NA NA NA NA NA ...
##  $ HeadCirc        : logi [1:500] NA NA NA NA NA NA ...
##  $ Height          : num [1:500] 172 165 170 177 162 ...
##  $ BMI             : num [1:500] 27.9 29.4 25.4 25.8 26.9 ...
##  $ BMICatUnder20yrs: logi [1:500] NA NA NA NA NA NA ...
##  $ BMI_WHO         : chr [1:500] "25.0_to_29.9" "25.0_to_29.9" "25.0_to_29.9" "25.0_to_29.9" ...
##  $ Pulse           : num [1:500] 58 70 74 58 76 58 68 60 84 82 ...
##  $ BPSysAve        : num [1:500] 125 121 120 132 152 148 122 114 108 168 ...
##  $ BPDiaAve        : num [1:500] 86 68 74 74 103 88 65 76 46 52 ...
##  $ BPSys1          : num [1:500] 122 124 120 134 144 150 120 114 108 170 ...
##  $ BPDia1          : num [1:500] 88 66 70 72 104 94 68 82 44 76 ...
##  $ BPSys2          : num [1:500] 124 120 118 130 150 148 118 114 108 168 ...
##  $ BPDia2          : num [1:500] 86 66 74 72 106 92 62 76 46 52 ...
##  $ BPSys3          : num [1:500] 126 122 122 134 154 148 126 114 108 168 ...
##  $ BPDia3          : num [1:500] 86 70 74 76 100 84 68 76 46 52 ...
##  $ Testosterone    : num [1:500] 525.37 5.98 NA 653.19 8.17 ...
##  $ DirectChol      : num [1:500] 1.29 1.22 1.4 1.84 2.43 NA 1.14 1.29 1.34 1.6 ...
##  $ TotChol         : num [1:500] 5.07 3.7 6.03 4.55 5.92 NA 3.21 5.02 6.15 5.02 ...
##  $ UrineVol1       : num [1:500] 244 65 105 51 30 NA 39 120 81 47 ...
##  $ UrineFlow1      : num [1:500] 1.683 0.442 0.682 0.464 1.304 ...
##  $ UrineVol2       : num [1:500] NA NA NA NA 114 NA NA NA NA NA ...
##  $ UrineFlow2      : num [1:500] NA NA NA NA 1.12 ...
##  $ Diabetes        : chr [1:500] "No" "No" "No" "No" ...
##  $ DiabetesAge     : num [1:500] NA NA NA NA NA NA NA NA NA NA ...
##  $ HealthGen       : chr [1:500] "Fair" "Vgood" "Vgood" "Excellent" ...
##  $ DaysPhysHlthBad : num [1:500] 5 2 0 0 0 NA 30 0 0 0 ...
##  $ DaysMentHlthBad : num [1:500] 30 5 0 0 0 NA 0 10 30 0 ...
##  $ LittleInterest  : chr [1:500] "Most" "None" "None" "None" ...
##  $ Depressed       : chr [1:500] "Several" "Several" "None" "None" ...
##  $ nPregnancies    : num [1:500] NA 2 NA NA 3 NA NA NA NA 1 ...
##  $ nBabies         : num [1:500] NA 2 NA NA 3 NA NA NA NA 1 ...
##  $ Age1stBaby      : num [1:500] NA 21 NA NA 27 NA NA NA NA NA ...
##  $ SleepHrsNight   : num [1:500] 4 6 5 7 6 6 8 7 5 7 ...
##  $ SleepTrouble    : chr [1:500] "Yes" "No" "No" "No" ...
##  $ PhysActive      : chr [1:500] "No" "Yes" "Yes" "Yes" ...
##  $ PhysActiveDays  : num [1:500] 3 NA 3 NA NA 4 3 NA NA NA ...
##  $ TVHrsDay        : chr [1:500] "2_hr" "3_hr" NA "0_to_1_hr" ...
##  $ CompHrsDay      : chr [1:500] "0_hrs" "2_hr" NA "4_hr" ...
##  $ TVHrsDayChild   : logi [1:500] NA NA NA NA NA NA ...
##  $ CompHrsDayChild : logi [1:500] NA NA NA NA NA NA ...
##  $ Alcohol12PlusYr : chr [1:500] "Yes" "Yes" "Yes" "No" ...
##  $ AlcoholDay      : num [1:500] 1 2 NA NA 1 NA 1 2 2 1 ...
##  $ AlcoholYear     : num [1:500] 24 3 0 0 364 NA 12 104 2 1 ...
##  $ SmokeNow        : chr [1:500] "No" NA NA NA ...
##  $ Smoke100        : chr [1:500] "Yes" "No" "No" "No" ...
##  $ Smoke100n       : chr [1:500] "Smoker" "Non-Smoker" "Non-Smoker" "Non-Smoker" ...
##  $ SmokeAge        : num [1:500] 18 NA NA NA NA NA NA NA NA 21 ...
##  $ Marijuana       : chr [1:500] "No" "Yes" "Yes" "No" ...
##  $ AgeFirstMarij   : num [1:500] NA 19 14 NA NA NA NA 20 19 NA ...
##  $ RegularMarij    : chr [1:500] "No" "Yes" "Yes" "No" ...
##  $ AgeRegMarij     : num [1:500] NA 20 16 NA NA NA NA NA NA NA ...
##  $ HardDrugs       : chr [1:500] "No" "No" "Yes" "No" ...
##  $ SexEver         : chr [1:500] "Yes" "Yes" "Yes" "No" ...
##  $ SexAge          : num [1:500] 16 17 14 NA 17 NA NA 19 13 NA ...
##  $ SexNumPartnLife : num [1:500] 26 10 50 0 4 NA NA 3 15 NA ...
##  $ SexNumPartYear  : num [1:500] 2 2 1 0 1 NA NA 1 2 NA ...
##  $ SameSex         : chr [1:500] "No" "No" "No" "No" ...
##  $ SexOrientation  : chr [1:500] "Heterosexual" "Heterosexual" "Heterosexual" "Heterosexual" ...
##  $ PregnantNow     : chr [1:500] NA NA NA NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ID = col_double(),
##   ..   SurveyYr = col_character(),
##   ..   Gender = col_character(),
##   ..   Age = col_double(),
##   ..   AgeDecade = col_character(),
##   ..   AgeMonths = col_double(),
##   ..   Race1 = col_character(),
##   ..   Race3 = col_character(),
##   ..   Education = col_character(),
##   ..   MaritalStatus = col_character(),
##   ..   HHIncome = col_character(),
##   ..   HHIncomeMid = col_double(),
##   ..   Poverty = col_double(),
##   ..   HomeRooms = col_double(),
##   ..   HomeOwn = col_character(),
##   ..   Work = col_character(),
##   ..   Weight = col_double(),
##   ..   Length = col_logical(),
##   ..   HeadCirc = col_logical(),
##   ..   Height = col_double(),
##   ..   BMI = col_double(),
##   ..   BMICatUnder20yrs = col_logical(),
##   ..   BMI_WHO = col_character(),
##   ..   Pulse = col_double(),
##   ..   BPSysAve = col_double(),
##   ..   BPDiaAve = col_double(),
##   ..   BPSys1 = col_double(),
##   ..   BPDia1 = col_double(),
##   ..   BPSys2 = col_double(),
##   ..   BPDia2 = col_double(),
##   ..   BPSys3 = col_double(),
##   ..   BPDia3 = col_double(),
##   ..   Testosterone = col_double(),
##   ..   DirectChol = col_double(),
##   ..   TotChol = col_double(),
##   ..   UrineVol1 = col_double(),
##   ..   UrineFlow1 = col_double(),
##   ..   UrineVol2 = col_double(),
##   ..   UrineFlow2 = col_double(),
##   ..   Diabetes = col_character(),
##   ..   DiabetesAge = col_double(),
##   ..   HealthGen = col_character(),
##   ..   DaysPhysHlthBad = col_double(),
##   ..   DaysMentHlthBad = col_double(),
##   ..   LittleInterest = col_character(),
##   ..   Depressed = col_character(),
##   ..   nPregnancies = col_double(),
##   ..   nBabies = col_double(),
##   ..   Age1stBaby = col_double(),
##   ..   SleepHrsNight = col_double(),
##   ..   SleepTrouble = col_character(),
##   ..   PhysActive = col_character(),
##   ..   PhysActiveDays = col_double(),
##   ..   TVHrsDay = col_character(),
##   ..   CompHrsDay = col_character(),
##   ..   TVHrsDayChild = col_logical(),
##   ..   CompHrsDayChild = col_logical(),
##   ..   Alcohol12PlusYr = col_character(),
##   ..   AlcoholDay = col_double(),
##   ..   AlcoholYear = col_double(),
##   ..   SmokeNow = col_character(),
##   ..   Smoke100 = col_character(),
##   ..   Smoke100n = col_character(),
##   ..   SmokeAge = col_double(),
##   ..   Marijuana = col_character(),
##   ..   AgeFirstMarij = col_double(),
##   ..   RegularMarij = col_character(),
##   ..   AgeRegMarij = col_double(),
##   ..   HardDrugs = col_character(),
##   ..   SexEver = col_character(),
##   ..   SexAge = col_double(),
##   ..   SexNumPartnLife = col_double(),
##   ..   SexNumPartYear = col_double(),
##   ..   SameSex = col_character(),
##   ..   SexOrientation = col_character(),
##   ..   PregnantNow = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
colSums(is.na(nhanes_df))
##               ID         SurveyYr           Gender              Age 
##                0                0                0                0 
##        AgeDecade        AgeMonths            Race1            Race3 
##               25              263                0              251 
##        Education    MaritalStatus         HHIncome      HHIncomeMid 
##                1                0               42               42 
##          Poverty        HomeRooms          HomeOwn             Work 
##               38                7                7                0 
##           Weight           Length         HeadCirc           Height 
##                3              500              500                2 
##              BMI BMICatUnder20yrs          BMI_WHO            Pulse 
##                3              500                3               19 
##         BPSysAve         BPDiaAve           BPSys1           BPDia1 
##               21               21               42               42 
##           BPSys2           BPDia2           BPSys3           BPDia3 
##               32               32               29               29 
##     Testosterone       DirectChol          TotChol        UrineVol1 
##              270               27               27               11 
##       UrineFlow1        UrineVol2       UrineFlow2         Diabetes 
##               40              425              426                0 
##      DiabetesAge        HealthGen  DaysPhysHlthBad  DaysMentHlthBad 
##              451               50               50               50 
##   LittleInterest        Depressed     nPregnancies          nBabies 
##               53               52              309              321 
##       Age1stBaby    SleepHrsNight     SleepTrouble       PhysActive 
##              365                2                0                0 
##   PhysActiveDays         TVHrsDay       CompHrsDay    TVHrsDayChild 
##              256              251              251              500 
##  CompHrsDayChild  Alcohol12PlusYr       AlcoholDay      AlcoholYear 
##              500               50              159               90 
##         SmokeNow         Smoke100        Smoke100n         SmokeAge 
##              270                0                0              279 
##        Marijuana    AgeFirstMarij     RegularMarij      AgeRegMarij 
##              191              319              191              420 
##        HardDrugs          SexEver           SexAge  SexNumPartnLife 
##              135              133              142              137 
##   SexNumPartYear          SameSex   SexOrientation      PregnantNow 
##              196              132              201              388
# Clean and create dataset

cleaned_nhanes_df <- nhanes_df %>%
  filter(Gender == "female", between(Age, 21, 64)) %>%
  select(Age1stBaby, DaysPhysHlthBad, DaysMentHlthBad, Age) %>%
  filter(if_all(everything(), ~ !is.na(.))) %>%
  rename(
    fch_age = Age1stBaby,
    days_bph = DaysPhysHlthBad,
    days_bmh = DaysMentHlthBad,
    age_at_survey = Age
  ) %>%
  mutate(
    bad_days = rowSums(across(c(days_bph, days_bmh)), na.rm = TRUE),
    fch_age_group = case_when(
      fch_age < 25 ~ "early (<25)",
      fch_age >= 25 & fch_age <= 34 ~ "mid (25-34)",
      fch_age >= 35 ~ "late (35+)",
      TRUE ~ NA_character_
    ),
    fch_age_group = factor(fch_age_group,
                           levels = c("early (<25)", "mid (25-34)", "late (35+)"))
  )



nrow(cleaned_nhanes_df)
## [1] 92
cleaned_nhanes_df %>% count(fch_age_group)
## # A tibble: 3 × 2
##   fch_age_group     n
##   <fct>         <int>
## 1 early (<25)      60
## 2 mid (25-34)      31
## 3 late (35+)        1
# Summary Table

summary_table <- cleaned_nhanes_df %>%
  group_by(fch_age_group) %>%
  summarise(
    n = n(),
    mean_fch_age = round(mean(fch_age, na.rm = TRUE), 1),
    mean_bad_days = round(mean(bad_days, na.rm = TRUE), 1),
    .groups = "drop"
  )

kable(summary_table, caption = "Average Age at First Child and Total Bad Days by Group (Female Respondents)")
Average Age at First Child and Total Bad Days by Group (Female Respondents)
fch_age_group n mean_fch_age mean_bad_days
early (<25) 60 19.6 10.3
mid (25-34) 31 28.1 3.2
late (35+) 1 36.0 0.0
# Boxplot visualization

ggplot(cleaned_nhanes_df, aes(x = fch_age_group, y = bad_days)) +
  geom_boxplot(fill = "lightgray") +
  stat_summary(fun = mean, geom = "point", shape = 20, size = 3, color = "blue") +
  labs(
    title = "Distribution of Total Bad Days by Age at First Childbirth",
    x = "Age at First Childbirth Group",
    y = "Total Bad Days (Physical + Mental)"
  ) +
  theme_minimal()

Conclusion and Future Directions

This analysis examined the relationship between age at first childbirth and self-reported health among female respondents aged 21–64 in the nhanes.samp.adult.500 dataset. After cleaning and filtering, the analytic sample included 92 women: 60 in the early (<25) group, 31 in the mid (25–34) group, and 1 in the late (35+) group. The early group reported a higher mean number of bad days (10.3) compared to the mid group (3.2), suggesting that earlier childbirth may be associated with poorer overall physical and mental health. The boxplot visualization reinforced this trend, showing greater variability and higher counts of bad days among early mothers. Although the early group’s sample size was about twice that of the mid group, this does not statistically skew the averages as each mean was calculated independently. Instead, the larger sample size provides a more stable estimate for early mothers, while the smaller mid group’s average is less precise. The late group’s single respondent is not statistically meaningful, as the extremely limited sample size prevents any reliable generalization.

Future research should expand the sample size, particularly for women who had their first child at older ages, to ensure balanced group comparisons. Additionally, using longitudinal data to apply fixed-effects models that compare individuals to themselves over time would account for the influence of external variables such as demographic and socioeconomic factors on well-being. Combining this with a cross-sectional approach, as used here, would allow for broader generalization and help clarify whether the relationship between age at first childbirth and well-being persists across a larger and more diverse sample of women. Together, these methods would help isolate the causal impact of childbirth timing on well-being at the population level within the United States. Overall, the results suggest a potential link between earlier age at first childbirth and lower self-reported well-being, but further study using larger and more representative samples is needed to confirm and refine these findings.

References

Textbooks 1 2 3

NHANES Data Resources 4 5 6

AI

I used AI in the following ways: generate ideas or structure suggestions, for assistance with understanding core concepts, or other substantial foundational and preparatory activity for the assessment; 7 generate some other aspect of the submitted assessment 8


  1. Roger D. Peng, Exploratory Data Analysis with R (Leanpub, 2020)↩︎

  2. Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto, and David Lusseau, An Introduction to R (April 9 2024), https://intro2r.com/.↩︎

  3. Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (O’Reilly Media, 2017), https://r4ds.had.co.nz/.↩︎

  4. OpenIntro Stat, nhanes.samp.adult.500: A random sample of 500 participants age 21 or older from the full NHANES dataset, https://www.openintro.org/data/index.php?data=nhanes.samp.adult.500.↩︎

  5. Thomas E. Love, 431 Course Notes: NHANES Data, https://thomaselove.github.io/431-notes/04-nhanes.html.↩︎

  6. OpenIntroStat, openintro R package: nhanes.samp.adult.500 documentation, https://rdrr.io/github/OpenIntroStat/openintro/man/nhanes.samp.adult.500.html.↩︎

  7. I used Google AI to identify and develop understanding of research techniques for future directions, such as fixed-effect models. I used ChatGPT 5 to review and revise the written portions of my report for grammatical and syntax flow.↩︎

  8. I used Google AI to appropriately format reference citations in R studio.↩︎