1 - Data

This data set was collected by myself and my husband over the first year of my twin daughters’ lives. The data collection was primarily a survival need: we needed to communicate which baby had last eaten/pooped/etc when we were too sleep deprived to form complete sentences. The hospital sent us home with a paper tracking log, but that was too last millennium for us; we found an app called Baby Tracker that made data entry easy and synced seamlessly between multiple devices. We found the tracking so immediately useful and actionable that we kept doing it beyond the hectic newborn days. We decided to stop tracking on their first birthday, which coincided with the transition from formula to cow’s milk. We actually still use the app to track their weight gain and medications like Tylenol to avoid double-dosing.

Baby Tracker screenshot

Baby Tracker screenshot

I am using the data as exported from the app, except with the kids’ names replaced with their prenatal designations “A” and “B”.

Note that the twins are fraternal; they share as much genetic material as any normal pair of siblings.

1.1 - Load Libraries

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'stringr' was built under R version 3.6.3
library(lubridate) #managing lots of dates
library(unglue) #Regex without learning regex
## Warning: package 'unglue' was built under R version 3.6.3
library(highcharter) #for interactive charts
## Warning: package 'highcharter' was built under R version 3.6.3
library(ggthemes) #themes - I will be using the Economist theme for finished plots
## Warning: package 'ggthemes' was built under R version 3.6.3
library(childsds) #growth charts
## Warning: package 'childsds' was built under R version 3.6.3

1.2 - Load Data

A_formula <- read_csv("A_formula.csv") 
## Parsed with column specification:
## cols(
##   Baby = col_character(),
##   Time = col_character(),
##   Amount = col_character(),
##   Note = col_character()
## )
A_pumped <- read_csv("A_pumped.csv")
## Parsed with column specification:
## cols(
##   Baby = col_character(),
##   Time = col_character(),
##   Amount = col_character(),
##   Note = col_logical()
## )
B_formula <- read_csv("B_formula.csv")
## Parsed with column specification:
## cols(
##   Baby = col_character(),
##   Time = col_character(),
##   Amount = col_character(),
##   Note = col_character()
## )
B_pumped <- read_csv("B_pumped.csv")
## Parsed with column specification:
## cols(
##   Baby = col_character(),
##   Time = col_character(),
##   Amount = col_character(),
##   Note = col_logical()
## )
A_growth <- read_csv("A_growth.csv")
## Parsed with column specification:
## cols(
##   Baby = col_character(),
##   Time = col_character(),
##   Weight = col_character()
## )
B_growth <- read_csv("B_growth.csv")
## Parsed with column specification:
## cols(
##   Baby = col_character(),
##   Time = col_character(),
##   Weight = col_character()
## )
A_diaper <- read_csv("A_diaper.csv")
## Parsed with column specification:
## cols(
##   Baby = col_character(),
##   Time = col_character(),
##   Status = col_character(),
##   Note = col_character()
## )
B_diaper <- read_csv("B_diaper.csv")
## Parsed with column specification:
## cols(
##   Baby = col_character(),
##   Time = col_character(),
##   Status = col_character(),
##   Note = col_character()
## )

2 - Cleaning and Tidying Data

The source data comes from csv files exported by Baby Tracker. This analysis consists of eight tables, four per child. Diapers (including Baby, Time-stamp, Wet/Dirty/Mixed/Dry, and Notes), Formula (Baby, Time-stamp, Amount, Notes), Pumped Milk (Baby, Time-stamp, Amount, Notes), and Weight (Baby, Time-stamp, Weight). I merged the two tables for each baby on each topic into one using rbind() and included pumped milk and formula together. The first three days of the babies’ lives are excluded from the feedings table for lack of data; our breastfeeding experience could be charitably described as a dumpster fire.

feedings <- rbind(A_formula,A_pumped,B_formula,B_pumped)

weights <- rbind(A_growth,B_growth)

diapers <- rbind(A_diaper,B_diaper)

As the tables are all formatted the same way, this step was pretty easy.

2.1 - Convert the datetime strings to actual date data in R

The data includes time-stamps in the format of “9/11/17, 5:36 PM”. These strings need to be converted to dates and times that R understands, for which I used lubridate.

feedings$Timestamp <- as.POSIXct(strptime(feedings$Time,format="%m/%d/%y, %I:%M %p"))
diapers$Timestamp <- as.POSIXct(strptime(diapers$Time,format="%m/%d/%y, %I:%M %p"))
weights$Timestamp <- as.POSIXct(strptime(weights$Time,format="%m/%d/%y, %I:%M %p"))

#Split dates and times into seperate fields to see if it would help with the Month encoding in Chunk 6. Spoiler alert: It didn't. Also dplyr is really unhappy unless dates are in POSIXct so that wrapper was needed.

feedings$Date <- as.Date(feedings$Timestamp) 
feedings$Time <- format(as.POSIXct(feedings$Timestamp) ,format = "%H:%M:%S") 

diapers$Date <- as.Date(diapers$Timestamp) 
diapers$Time <- format(as.POSIXct(diapers$Timestamp) ,format = "%H:%M:%S") 

weights$Date <- as.Date(weights$Timestamp) 
weights$Time <- format(as.POSIXct(weights$Timestamp) ,format = "%H:%M:%S") 

2.2 - Convert weight strings to numbers, and lbs to kg

Weight is noted in the format of “X lbs. Y oz.” If oz are 0, that part is omitted and format is “X lbs.”

I needed to strip out the values from the string, convert them to a decimal notation (lbs + (oz/16)), and then convert that value to kilograms (lbs / 2.2)

I found the unglue package, because I was not making any progress using grep() and regular expressions. I know what regular expressions do and what they are good for, but I could not work out how to actually use them.

Finally, in order to match the growth charts, I will need their age in years with a decimal, e.g. 1.5 years old. Using lubridate::year() would produce whole numbers, so another solution is needed.

pattern <- "{lbs} lbs. {oz} oz."

weights <- unglue_unnest(weights,Weight,pattern,remove=F,na = NA_character_)

#couldn't manage to make it handle the cases where there were no ounces, so I worked around it. Knowing Regex would have helped but that's a lot more than I'm able to handle in the time alloted. 

pattern2 <- "{lbs} lbs."
weights <- unglue_unnest(weights,Weight,pattern2,remove=F)

#now we have two columns, lbs and lbs.1, where lbs.1 has all the values where there are no oz. So, we have two columns with complementary distributions of NAs. Enter the coalesce() function, which is made of magic and rainbows 

weights$lbs <- coalesce(weights$lbs,weights$lbs.1) 

weights$lbs <- as.numeric(weights$lbs)
weights$oz <- weights$oz %>% 
  replace_na(0) %>%
  as.numeric()

#add lbs and oz together. 16 ounces in a pound. And then convert to kilograms
weights$lbs_decimal <- (weights$lbs + (weights$oz/16)) 
weights$kg <- weights$lbs_decimal / 2.2

#drop lbs.1 column as unneeded
weights <- weights %>%
  select(-lbs.1)

#age in years w decimal (for the growth charts)
dob <- as.Date("2017-08-12")

weights$Age <- as.numeric(weights$Date - dob) / 365.24 #days between measurement and date of birth/divided by number of days in a year

Now that I have spent an awful long time making this chunk work, I realize could have gone back to Baby Tracker, changed the units to kilograms, and re-exported the data in Sanity Units. Then I would just need to drop the " kg.“, convert to numeric, and it would have been done.

2.3 - Convert fluid ounce string to numbers

Feedings are noted in fluid ounces, in the format of “X.XX oz.”

As a note, these were collected in mL for the first two months, when every drop counted, and then we switched to fluid ounces once they were eating larger amounts and “4 ounces” is easier to read off the side of a bottle than “120 mL.” Baby Tracker did the unit conversions when I switched the units setting, which is why the values for the first two month are more precise than the rest of the period.

pattern_oz <- "{oz} oz."

feedings <- unglue_unnest(feedings,Amount,pattern_oz,remove=F)

feedings$oz <- as.numeric(feedings$oz)

2.4 - Create a relative “month” category

To make this data easily to reference for people whose children weren’t born on the same day, I will use relative months instead of actual months. For example, August 12-September 11 are “Month 1.”

#use lubridate to create 1-month intervals
intervals <- as.data.frame(
  interval(
    start=(seq.Date(from=as.Date("2017-08-12"),
                    to=as.Date("2018-07-12"),
                    by="month")),
    end=(seq.Date(from=as.Date("2017-09-11"),
                  to=as.Date("2018-08-11"),
                  by="month"))))

intervals$MonthIndex <- c(1:12) #i ended up not using this

colnames(intervals) <- c("Intervals","MonthIndex")

#now to bring the index in

feedings <- feedings %>%
  mutate(Month = case_when(
    Date %within% intervals$Intervals[1] ~ "1",
    Date %within% intervals$Intervals[2] ~ "2",
    Date %within% intervals$Intervals[3] ~ "3",
    Date %within% intervals$Intervals[4] ~ "4",
    Date %within% intervals$Intervals[5] ~ "5",
    Date %within% intervals$Intervals[6] ~ "6",
    Date %within% intervals$Intervals[7] ~ "7",
    Date %within% intervals$Intervals[8] ~ "8",
    Date %within% intervals$Intervals[9] ~ "9",
    Date %within% intervals$Intervals[10] ~ "10",
    Date %within% intervals$Intervals[11] ~ "11",
    Date %within% intervals$Intervals[12] ~ "12",
    T ~ "NA"
  ))

diapers <- diapers %>%
  mutate(Month = case_when(
    Date %within% intervals$Intervals[1] ~ "1",
    Date %within% intervals$Intervals[2] ~ "2",
    Date %within% intervals$Intervals[3] ~ "3",
    Date %within% intervals$Intervals[4] ~ "4",
    Date %within% intervals$Intervals[5] ~ "5",
    Date %within% intervals$Intervals[6] ~ "6",
    Date %within% intervals$Intervals[7] ~ "7",
    Date %within% intervals$Intervals[8] ~ "8",
    Date %within% intervals$Intervals[9] ~ "9",
    Date %within% intervals$Intervals[10] ~ "10",
    Date %within% intervals$Intervals[11] ~ "11",
    Date %within% intervals$Intervals[12] ~ "12",
    T ~ "NA"
  ))


#there are a handful NA's in diapers from dates after Year 1, so let's get rid of those before they become a problem

diapers <- diapers %>%
  filter(Month != "NA")

This chunk also took a long, long time. I struggled to get code that worked with a straight date to work with a column of dates, so I kind of brute forced it. Not as elegant as I would like but it got the job done. Also T ~ NA didn’t work so I made it use a character string T ~ "NA"; I removed them anyways so it doesn’t really matter.

which(as.Date("2018-05-06") %within% intervals$Intervals)

This WORKED on its own; it returned the index of the matching interval, but I couldn’t get it to work with weights$Date.

3 - Statistical Analysis

feedings_daily_totals <- feedings %>%
  group_by(Baby,Month,Date) %>%
  filter(!Date=="2017-08-15") %>% #exclude Aug 15 - this day only has partial data
  summarize(sum(oz),mean(oz))

summary(feedings_daily_totals)
##      Baby              Month                Date           
##  Length:720         Length:720         Min.   :2017-08-16  
##  Class :character   Class :character   1st Qu.:2017-11-13  
##  Mode  :character   Mode  :character   Median :2018-02-11  
##                                        Mean   :2018-02-11  
##                                        3rd Qu.:2018-05-12  
##                                        Max.   :2018-08-11  
##     sum(oz)         mean(oz)     
##  Min.   : 3.00   Min.   :0.4288  
##  1st Qu.:21.99   1st Qu.:2.3333  
##  Median :26.00   Median :3.2929  
##  Mean   :24.95   Mean   :3.3443  
##  3rd Qu.:29.39   3rd Qu.:4.3333  
##  Max.   :43.50   Max.   :7.5000

5 Number Summaries for the Total Daily Intake (sum(oz)) and the Average Feeding Amount (mean(oz)). The Average Feeding Amount isn’t a surprise; the largest bottle we used was 8 ounces, and feeding any more than that in one sitting is a terrible, terrible idea. Through much of their first year, they were offered 4 ounces at a time, and we would get a second 4 ounce bottle if they finished the first.

4 - Plots

4.1 - Parent Sleep

I used the number of diaper changes between 10:00pm and 5:00am, as a proxy measure for parent sleep. I chose diapers for this instead of feedings or diapers+feedings because most wake ups involved a diaper change as an attempt to calm the baby, but not all wake ups involved a feeding.

Additionally, I wanted to use the diapers data set, but I didn’t want to talk at length about dirty diapers.

nighttime_diapers <- diapers %>%
  filter(Time > "22:00:00" | Time < "05:00:00") %>%
  group_by(Date,Baby) %>%
  summarize(count=(n()))

nighttime_diapers_chart <- ggplot(nighttime_diapers, aes(x=Date,y=count,group=1,fill=Baby)) +
  geom_bar(stat="identity") +
  labs(x="Date",y="Number of Diaper Changes",title="Total Nighttime (10pm-5am) Diaper Changes,\n as a Proxy Measure for Parent Sleep")  + theme_economist() +
  scale_fill_economist()
  
nighttime_diapers_chart

As shown here, the girls really didn’t sleep through the night - and neither did we - until they were about 6 months old. From there, there were sporadic wake ups, followed by two notable spikes in summer 2018. Those spikes were associated with erupting molars.

There is a “common wisdom” that babies undergo “sleep regressions”, a period of relatively poorer sleep lasting two to four weeks, at certain points in their first few years. While I heard about sleep regressions extensively in conversations with my peers, I do not recall experiencing them myself. I also do not see any spikes that fit the bill in my data, nor could I find discussion of the phenomenon in the academic literature, and there is reason to doubt that they exist. My sample size is admittedly so small (n=2) that it represents anecdotes, not data, but it certainly does not support the existence of sleep regressions.

If I had time, I would dig deeper into this and try to find patterns in this data compared to my own logs of the time. I could scrape my own post history to my online mothers’ group to see if I was complaining about teething, a sick baby, or unexplained fussiness keeping me up the night before.

4.2 - Daily Formula Intake by Month

level_order <- c("1","2","3","4","5","6","7","8","9","10","11","12")
#forcing the order because it wants to make them alphabetical, e.g. 1,10,11...

feedings_chart <- ggplot(feedings_daily_totals,
          aes(x=factor(Month, levels = level_order),y=`sum(oz)`,fill=Baby)) +
  geom_bar(stat="identity") +
  labs(x="Month",y="Formula Intake in Fluid Ounces",title="Formula intake by month") +
  theme_economist() +
  scale_fill_economist()

feedings_chart

Interesting, but the totals per month aren’t terribly informative, except to cause me to reflect back on exactly how expensive formula was.

Total daily intake is more important than each bottle. Baby A was - and still is - a big drinker, and would drink an entire bottle without stopping to breathe. Baby B preferred an ounce here, and ounce there, and so had more frequent feedings of smaller amounts. Both feeding patterns are normal; it is much more helpful to focus on Daily Total Intake. I’ll make it a box plot to see the range of daily totals for each baby.

feedings_chart_box <- ggplot(feedings_daily_totals,
          aes(x=factor(Month, levels = level_order),y=`sum(oz)`,color=Baby)) + #color instead of fill is more readable
  geom_boxplot() +
  labs(x="Month",y="Daily Intake in Fluid Ounces",title="Daily intake by month") +
  theme_economist() +
  scale_color_economist()

feedings_chart_box

Now to add some interactivity. Recreate the same chart in highcharter with data points visible on mouse over.

hcboxplot(x = feedings_daily_totals$`sum(oz)`, var = factor(feedings_daily_totals$Month, levels = level_order), var2 = feedings_daily_totals$Baby,outliers=T) %>%
  hc_chart(type = "column") %>%
  hc_add_theme(hc_theme_economist()) %>%
  hc_xAxis(title=list(text="Month")) %>%
  hc_yAxis(title=list(text="Daily Intake in Fluid Ounces"),floor=0) %>%
  hc_title(text="Daily intake by Month",align="center")

Highcharter and GGplot disagree on the outliers, but I don’t have the brainpower to dig into that now.

We always knew Baby A ate more, and that is reflected in the plot. There is a clear tapering off from 10-12. At 9 months we started serving them solid foods (not purees) three times a day, and at 10 months we offered bottles after meals, not before. Over the next few months they weaned themselves off of formula, deriving more and more of their nutrition from food.

There’s a clear dip in how much Baby B was eating in months 3-5, but as seen on the growth charts below, there was no associated slowing of weight gain. As long as she was gaining weight at an appropriate rate, and taking in an average of 24 ounces a day (one day might be less, another more), the pediatrician was unconcerned.

4.3 - Growth compared to CDC Growth Charts

I am going to chart the twins’ weights against the CDC growth charts, found in the childsds package.

I am extracting the CDC Growth Charts Infant Charts for Weight from this package. This data is packaged as an S4 object, which I honestly have no idea how to handle. I had to copy sample code from someone else extracting data from this package using make_percentile_tab() and modify the variables I needed. From there I had a data frame I could work with.

#read in the ref data. 
growth_chart <- make_percentile_tab(ref = cdc.ref,
          item = "weight",
                  perc = c(5,25,50,75,95),
                  age = seq(0,3,by=0.1),
                  stack = T)
## Loading required namespace: reshape2
#only interested in the female chart
growth_chart <- growth_chart %>%
  filter(sex=="female")

#create the baseline chart
growth_plot <- ggplot(growth_chart, aes(x = age, y = value,group=paste(variable))) +
    geom_line() +
  labs(x="Age (in years)",y="Weight (in kg)",title="CDC Growth Chart, Female, 0-3 years, Weight for Age")

growth_plot

#growth_plot +
#    geom_line(data=weights,aes(x=Age, y=kg,color=Baby))
#this plot works on its own but not layered on top of the growth charts; I'll try it in highcharter instead of spending time trying to fix the ggplot

#add my data
growth_plot_hc <- hchart(growth_chart,"line",hcaes(x=age,y=value,group=variable)) %>%
  hc_add_series(weights,"line",hcaes(x=Age,y=kg,group=Baby)) 
## Warning: `parse_quosure()` is deprecated as of rlang 0.2.0.
## Please use `parse_quo()` instead.
## This warning is displayed once per session.
growth_plot_hc

It worked! Now just need to pretty it up. Add labels, remove the messy marker icons, and fade the percentile benchmark lines.

growth_plot_hc <- hchart(growth_chart,"line",hcaes(x=age,y=value,group=variable),color="#a9a9a9",enableMouseTracking = F) %>%
  hc_plotOptions(line = list(marker = list(enabled = F))) %>%
  hc_legend(enabled = T) %>%
  hc_add_series(weights,"line",hcaes(x=Age,y=kg,group=Baby)) %>%
  hc_add_theme(hc_theme_economist()) %>%
  hc_xAxis(title=list(text="Age")) %>%
  hc_yAxis(title=list(text="Weight (in kg)"),floor=0) %>%
  hc_title(text="Growth of Baby A & B compared to CDC Growth Charts",align="center")

growth_plot_hc

I originally used marker = list(enabled = T)) to show the markers for the two series showing Baby A and B, but they obscured the lines itself, so I removed them. I also attempted to change the benchmark lines to dashed lines, but I could not figure it out. All of the documentation for Highcharter is for javascript, not for R, so the language translation step presents an extra challenge.

Regarding the data, the dip in weight at the beginning is normal; almost all children lose 7-10% of their birthweight in the first week of life. Once we established a feeding routine they quickly bounced up on the growth charts. However, it looks like they took until a bit after their first birthday to really find their places on the growth charts, with A settling in around the 75 percentile, and Baby B around the 25th. Baby A has been taller and heavier than Baby B since birth, and is now a clear two inches taller; we expect this difference in size to be lifelong.

The growth lines are smoother after 1 year. This likely has more to do with less frequent measurements, rather than an actual smoothing in their grwoth trend.

Their pediatrician said that in her experience, Baby A (firstborn) of a twin set tends to be larger than Baby B. My data supports this, and I was able to find additional support for that assertation in the academic literature.

4.4 - Formula Intake Compared to Growth

For this I am going to try and merge the Daily Intake Chart and the Growth Charts. I’ll need to trim the Growth data to just the first year and adjust the Relative Months column to be Age - matching the X axis on the Growth Chart.

For readability’s sake, I will use just the Mean Daily Formula Intake and not the box plot.

feedings_monthly_average <- feedings_daily_totals %>%
  group_by(Month,Baby) %>%
  summarise(Intake=mean(`sum(oz)`)) %>%
  mutate(Age=((as.numeric(Month)/12))) %>%
  arrange(Age) %>% #arranging by age ended up being critical for highcharter
  mutate(Age=round(Age, digits = 2)) %>%
  mutate(Intake=round(Intake, digits = 2))
  
weights_1st_year <- weights %>%
  filter(Age <= 1) %>%
  mutate(Age=round(Age, digits = 3)) %>% #rounding to 2 digits merged some early weights; need more precision
  mutate(kg=round(kg, digits = 2))

growth_chart_1st_year <- growth_chart %>%
  filter(age <= 1)

growth_plot_hc <- highchart()  %>%
  hc_yAxis_multiples(
    list(
      title = list(text = "Weight in kg"),
      opposite = T
      ),
    list(
      title = list(text = "Average Daily Intake in fl oz"),
      opposite = F
      )
  ) %>%
  hc_plotOptions(line = list(marker = list(enabled = F))) %>%
  hc_legend(enabled = T) %>%
  hc_add_series(growth_chart_1st_year,"line",hcaes(x=age,y=value,group=variable),color="#a9a9a9",enableMouseTracking = F,yAxis = 0) %>%
  hc_add_series(
    weights_1st_year,"line",
    hcaes(x=Age,y=kg,group=Baby),
    yAxis = 0,
    tooltip = list(pointFormat = "Weight: Baby {point.Baby} - {point.kg} kg")
    ) %>%
  hc_add_series(
    feedings_monthly_average,"line",
    hcaes(x=Age,y=Intake,group=Baby),
    yAxis = 1,
    tooltip = list(pointFormat = "Intake: Baby {point.Baby} - {point.Intake} fl oz")
    ) %>%
  hc_add_theme(hc_theme_economist()) %>%
  hc_xAxis(title=list(text="Age")) %>%
  hc_title(text="Growth of Baby A & B compared to Formula Intake",align="center")

growth_plot_hc

This is where I ran out of time. The two charts are not quite aligned correctly on the x axis, as a result of how I transformed the ages, so I’d like to fix that. I’d like to figure out how to change the line styles to dotted, and perhaps change the y axis colors to clearly differentiate which line belongs to which y axis. I’d like to also adjust this chart to use months instead of fractional years, for ease of reference.

I’m also not a fan of having fluid ounces on one side and kilograms on the other, so if I were to revisit this I’d pick metric or imperial units and stick to it the whole way through. Pounds and ounces are what the consumers of this data typically use; they have built an intuitive understanding of those measures and would be more able to readily make use of the data in the charts. Metric would be a lot easier, though.

As for the data presented in the chart, we see a dip in formula intake for both babies between ages 0.2 and 0.4 years, but can see that it’s not associated with a slowdown in weight gain for either child.