Markdown Document for the purposes of analysing BaYaka child data from GGIR, with additional comparisons with MCS and NHANES.

This document was last edited on 2023-11-17 using R version 4.3.1

1 Workspace Prep

Here we are reading in data from data cleaning, reducing the data frame size to core variables, introducing packages, and specify styling prompts for ggplot and gt

Packages used in this markdown are:

c("quantreg","gamlss","tidyr",
       "gtsummary","broom","broom.mixed",
       "ggplot2","dplyr","forcats",
       "foreign","tableone","kableExtra","psych",
       "trackdown","sjPlot","broom.mixed",
       "geomtextpath","ggmap","ggplot2","tmap",
       "leaflet","ggalt","grid","ggpp","gridExtra",
       "cowplot","haven","data.table","stringr",
       "readxl","xlsx","gtools","naniar","sjmisc",
       "sjstats","sjlabelled","knitr","gt","gtExtras",
       "lme4","emojifont","huxtable","jtools","multcomp"
       )

Styling features for the later figures

default_colour_pal <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#009E73", "#CC79A7", "#CC79A7")

alternative_colour_pal <- c("#009E73","#CC79A7")

ggtheme <- theme_minimal()+

1.1 Fieldsite

The BaYaka’s physical activity data was collected during fieldwork in 2018 and 2022 in the Likouala region of Congo’s Ndoki swamp forest. The 2018 fieldwork took place in July and August, when the main subsistence activities involved caterpillar and honey collecting, hunting, collecting wild yams and plants and occasional fishing. The 2022 fieldwork took place during fishing season in February when BaYaka move to deeper parts of the forest (leaving their camps by mud roads) and establish camps close to river streams to engage in fishing.

1.2 Data Cleaning

1.2.1 BaYaka

Accelerometer recordings were processed using the package ‘GGIR’. The maximum recording length was set to 7 days from the start of recording. While no restriction on excluding the first or last day of study was stipulated, this was included de facto by excluding all incomplete days. Data was calculated in 24-hour windows from midnight to midnight, based on West Africa time (UTC +1 hour).

1.2.1.1 GGIR

GGIR is run twice, once for day level summaries, a second for hour level summaries.

The GGIR command for day level output is as follows:

datadir= "G:.."
outputdir= "G:..."
mode= 1:5
studyname="BaYaka Child PA"
f0 = 1 
f1 = X #How files in the folder specified above
metadatadir= "G:..."

g.shell.GGIR(#-------------------------------
             # General parameters
             #-------------------------------
             mode=mode,
             datadir=datadir,
             outputdir=outputdir,
             studyname=studyname,
             metadatadir = metadatadir,
             f0=f0,
             f1=f1,
             overwrite = TRUE,
               do.imp=TRUE,
             idloc=2,#Is the ID stored in the ID columns? 1 if true,2 if is filename
             print.filename=FALSE,
             storefolderstructure = FALSE,
             #-------------------------------
             # Part 1 parameters:
             #-------------------------------
             windowsizes = c(5,900,3600),
             do.cal=TRUE, # Calibrate?
             do.enmo = TRUE, #Use ENMO as metric?
             do.anglez=TRUE,
             chunksize=1,
             printsummary=TRUE,
             do.parallel=TRUE,
             daylimit=FALSE,
             #-------------------------------
             # Part 2 parameters:
             #-------------------------------
             strategy = 1, #use the values in hrs.del (1 is default)
             ndayswindow=7, #Maximum study length
             hrs.del.start = 0, #how many hours to ignore at the start
             hrs.del.end = 0, #how many hours to ignore at the end
             maxdur = 7, # How many days after start did the study definitely end
             includedaycrit = 16, #Only include days that have atleast 16hrs of recording (16 is default)
             #includnenightcrit = 1,
             L5M5window = c(0,24), #Period of time to calculate when the least active hours are (as default)
             M5L5res = c(1,5), #Same but most active hours (as default)
             winhr = 5, #Size of window to be used in L5m5 analyses (5 is default)
             qlevels = c(c(1380/1440),c(1410/1440)),
             qwindow=c(0,24), #Window over which all values are calculated 0,24 means over a full day
             ilevels = c(seq(0,400,by=50),8000), # Thresholds at which acceleration is calculated
             mvpathreshold =c(100), #Cutoffs for MVPA calculation
             #-------------------------------
             # Part 3 parameters:
             #-------------------------------
             timethreshold= c(10), #Time Thresholds at which 'sustained inactivity' are calculated
             anglethreshold=5, #Amount of movement in degrees allowed to still be considered inactive
             ignorenonwear = FALSE, #Ignore periods of non-wear?
             ##-------------------------------
             ## Part 4 parameters:
             ##-------------------------------
             loglocation= c(), #If you have a sleep log include the location - if not leave blank like this
             excludefirstlast = FALSE, #Exclude first and last night?
             includenightcrit = 16, #Maximum number of hours that can be considered sleep in one bout
             def.noc.sleep = c(), # time window in which sustained sleep will be considered. If blank GGIR will use the least active hours to centre
             relyonsleeplog = FALSE, #Use a sleep log to define start of sleep? False means GGIR will calculate instead
             ##-------------------------------
             ## Part 5 parameters:
             ##-------------------------------
             ## Key functions: Merging physical activity with sleep analyses
             threshold.lig = c(50), #Threshold for Light activity
             threshold.mod = c(100), #Threshold for moderate activity
             threshold.vig = c(400), #Threshold for vigorous activity
             boutcriter = 0.8, #what percentage of MVPA bout needs to be above threshold. 0.8 is standard
             boutcriter.in = 0.9, #Same but for inactive
             boutcriter.lig = 0.8, #same for light
             boutcriter.mvpa = 0.8, #for MVPA
             boutdur.in = c(10,30), #Length of bouts of inactivity
             boutdur.lig = c(1,10), #lenth of bouts for light
             boutdur.mvpa = c(1,5,10), #lenth of bouts for MVPA
             timewindow = "MM", #Time window for averaging? WW = wake to wake. MM = midnight to midnight (or both)
             excludefirstlast.part5 = FALSE,
             do.sibreport = TRUE,
             #dayborder = 10,
             #-----------------------------------
             # Report generation
             #-------------------------------
             do.report=c(2,4,5)) #What sections to build reports for? (1 & 3 can't report)
## End

To obtain hour level summaries, the qwindow command is edited to the following:

qwindow=seq(0,24,1)

1.2.1.2 Merge Files

This is namely an issue for the reports generated in sections 2 and 4 of GGIR

file2$filename <- paste(file2$filename,".RData",sep = "")
file2day$filename <- paste(file2day$filename,".RData",sep = "")
file2hour$filename <- paste(file2hour$filename,".RData",sep = "")
file2hourday$filename <- paste(file2hourday$filename,".RData",sep = "")

file4$filename <- paste(file4$filename,".RData",sep = "")

1.2.1.3 Merge files together by filename - Adds new variables horizontally

mergedfile <- merge(file2,file4,by= "filename",all = TRUE)
mergedfile <- merge(mergedfile,file5,by= "filename",all = TRUE)

mergedfileday <- merge(file2day,file4day,by=c("filename","weekday"),all = TRUE)
mergedfileday <- merge(mergedfileday,file5day,by=c("filename","weekday"),all = TRUE)

1.2.1.4 Bind these files together, so each row is 1 individual

Don’t need to do this as there is only 1 file location

accelerometery <- mergedfile
accelerometery_by_day <- mergedfileday
accelerometery_by_hour <- file2hour
accelerometery_by_hour_by_day <- file2hourday

Data frame is subset to reduce working memory

bayakachild  <-  subset(bayakachild, select = c("filename",
                                               "device_sn",
                                               "name.x",
                                               "gender",
                                               "age",
                                               ... )

1.2.1.5 Subset Demographic File to necessary variables

names(demographic) <- tolower(names(demographic))
demographic$gender <- ifelse(demographic$sex=="m","boy",
                             ifelse(demographic$sex=="f","girl",NA))

demographic <- subset(demographic,!is.na(demographic$`watch id`))
demographic$device_sn <- demographic$`watch id`
demographic <- subset(demographic,select=c("name",
 "age",
 "gender",
 "device_sn"))

demographic$age <- as.numeric(demographic$age)

1.2.1.6 Extract device ID from filename

accelerometery_by_day <- separate(accelerometery_by_day,filename,c("1","2","device_sn","4"),sep = "_")
accelerometery_by_hour$device_sn <- paste("0",accelerometery_by_hour$device_sn,sep = "")
accelerometery_by_hour_by_day <- separate(accelerometery_by_hour_by_day,filename,c("1","2","device_sn","4"),sep = "_")
accelerometery$device_sn <- paste("0",accelerometery$device_sn,sep = "")

1.2.1.7 merge demogrpahic data with accelerometery data

bayakachild <- merge(demographic,accelerometery,by= "device_sn",all = TRUE)
bayakachild_day <- merge(demographic,accelerometery_by_day,by= "device_sn",all = TRUE)
bayakachild_hour <- merge(demographic,accelerometery_by_hour,by= "device_sn",all = TRUE)
bayakachild_hour_day <- merge(demographic,accelerometery_by_hour_by_day,by= "device_sn",all = TRUE)

1.2.1.8 Set all variables names to lowercase

names(bayakachild) <- tolower(names(bayakachild))
names(bayakachild_day) <- tolower(names(bayakachild_day))
names(bayakachild_hour) <- tolower(names(bayakachild_hour))
names(bayakachild_hour_day) <- tolower(names(bayakachild_hour_day))

1.2.1.9 Set any NAs to `NA`

bayakachild %>%
replace_with_na_all(condition = ~.x %in% c("NA"))

bayakachild_day %>%
replace_with_na_all(condition = ~.x %in% c("NA"))

bayakachild_hour %>%
replace_with_na_all(condition = ~.x %in% c("NA"))

bayakachild_hour_day %>%
replace_with_na_all(condition = ~.x %in% c("NA"))

1.2.1.10 Replace Character Variables with factor

i <- sapply(bayakachild, is.character)
bayakachild[i] <- lapply(bayakachild[i], as.factor)
remove(i)

i <- sapply(bayakachild_day, is.character)
bayakachild_day[i] <- lapply(bayakachild_day[i], as.factor)
remove(i)

i <- sapply(bayakachild_hour, is.character)
bayakachild_hour[i] <- lapply(bayakachild_hour[i], as.factor)
remove(i)

i <- sapply(bayakachild_hour_day, is.character)
bayakachild_hour_day[i] <- lapply(bayakachild_hour_day[i], as.factor)
remove(i)

bayakachild_hour_day$sedentary <- bayakachild_hour_day$x.0.50._enmo_mg
bayakachild_hour_day$light <- bayakachild_hour_day$x.50.100._enmo_mg
bayakachild_hour_day$mvpa <- bayakachild_hour_day$x.100.150._enmo_mg +
  bayakachild_hour_day$x.150.200._enmo_mg +
  bayakachild_hour_day$x.200.250._enmo_mg+
  bayakachild_hour_day$x.250.300._enmo_mg+
  bayakachild_hour_day$x.300.350._enmo_mg+
  bayakachild_hour_day$x.350.400._enmo_mg+
  bayakachild_hour_day$x.400.8e.03._enmo_mg

bayakachild_hour_day <- subset(bayakachild_hour_day,select = c("device_sn","name","age","gender","id","n_valid_hours","n_hours","n_valid_hours_in_window", "n_hours_in_window","weekday","measurementday","qwindow_timestamps","qwindow_name",                 "mean_enmo_mg", "sedentary","light","mvpa"))

1.2.1.11 Sedentary

bayakachild$sedentaryp2 <- bayakachild$ad_.0.50._enmo_mg_0.24hr
bayakachild$sedentaryp5 <- bayakachild$dur_spt_wake_in_min_pla+bayakachild$dur_day_total_in_min_pla+bayakachild$dur_spt_sleep_min_pla
bayakachild$sedentary <- bayakachild$dur_spt_wake_in_min_pla+bayakachild$dur_day_total_in_min_pla+bayakachild$dur_spt_sleep_min_pla
bayakachild$sedentary_wakeful <- bayakachild$dur_spt_wake_in_min_pla+bayakachild$dur_day_total_in_min_pla

1.2.1.12 Of which, how much is sleep

bayakachild$sleep <- bayakachild$sleepdurationinspt_ad_t10a5_mn+bayakachild$sleepdurationinspt_ad_t10a5_mn
bayakachild$sleep_night <- bayakachild$sleepdurationinspt_ad_t10a5_mn
bayakachild$sleep_day <- bayakachild$sleepdurationinspt_ad_t10a5_mn
bayakachild$sedentary_wakeful <- bayakachild$sedentary - bayakachild$sleep

1.2.1.13 light

bayakachild$lightp2 <- bayakachild$ad_.50.100._enmo_mg_0.24hr
bayakachild$lightp5 <- bayakachild$dur_spt_wake_lig_min_pla+bayakachild$dur_day_total_lig_min_pla
plot(bayakachild$lightp2~bayakachild$lightp5)
bayakachild$light <- bayakachild$dur_spt_wake_lig_min_pla+bayakachild$dur_day_total_lig_min_pla

1.2.1.14 moderate

bayakachild$moderatep2 <- bayakachild$ad_.100.150._enmo_mg_0.24hr+
bayakachild$ad_.150.200._enmo_mg_0.24hr+
bayakachild$ad_.200.250._enmo_mg_0.24hr+
bayakachild$ad_.250.300._enmo_mg_0.24hr+
bayakachild$ad_.300.350._enmo_mg_0.24hr+
bayakachild$ad_.350.400._enmo_mg_0.24hr
bayakachild$moderatep5 <- bayakachild$dur_spt_wake_mod_min_pla+bayakachild$dur_day_total_mod_min_pla
plot(bayakachild$moderatep2~bayakachild$moderatep5)

bayakachild$moderate <- bayakachild$dur_spt_wake_mod_min_pla+bayakachild$dur_day_total_mod_min_pla

1.2.1.15 vigorous

bayakachild$vigorousp2 <- bayakachild$ad_.400.8e.03._enmo_mg_0.24hr
bayakachild$vigorousp5 <- bayakachild$dur_spt_wake_vig_min_pla+bayakachild$dur_day_total_vig_min_pla
plot(bayakachild$vigorousp2~bayakachild$vigorousp5)

bayakachild$vigorous <- bayakachild$dur_spt_wake_vig_min_pla+bayakachild$dur_day_total_vig_min_pla

1.2.1.16 mvpa

bayakachild$mvpa <- bayakachild$moderate+bayakachild$vigorous

1.2.1.17 mean acceleration

bayakachild$mean_acc <- bayakachild$ad_mean_enmo_mg_0.24hr

1.2.1.18 check total day length from the p2 output and the p5 output

bayakachild$p2total <- bayakachild$vigorousp2+bayakachild$moderatep2+bayakachild$lightp2+bayakachild$sedentaryp2
bayakachild$p5total <- bayakachild$vigorousp5+bayakachild$moderatep5+bayakachild$lightp5+bayakachild$sedentaryp5
plot(bayakachild$p2total ~ bayakachild$p5total)

1.2.1.19 remove incomplete cases for physical activity

bayakachild <- bayakachild[complete.cases(bayakachild[,c("sleep","sedentary","light","moderate","vigorous")]),]

1.2.1.20 create long form

columns_to_gather <- c("sedentary_wakeful","light","moderate","vigorous")
bayakachild_long <- gather(bayakachild,intensity,time,columns_to_gather)
remove(columns_to_gather)

1.2.1.21 Create long form of hourly

Creating a time variable mainly for use in plotting. Renaming the time window from the two adjoining hours, to the half hour mark between the two of them

columns_to_gather<-c("ad_mean_enmo_mg_0.1hr",
                     "ad_mean_enmo_mg_1.2hr",
                     ...)
bayakachild_hour_long <- gather(bayakachild_hour,time,mean_enmo,columns_to_gather)
remove(columns_to_gather)

bayakachild_hour_long$time2 <- ifelse(
bayakachild_hour_long$time=="ad_mean_enmo_mg_0.1hr","0:30",
ifelse(...))

bayakachild_hour_long$time2 <- as.POSIXct(bayakachild_hour_long$time2,format="%H:%M")

1.2.1.22 And for MVPA

columns_to_gather<-c("ad_mvpa_e5s_t100_enmo_0.1hr",
                     "ad_mvpa_e5s_t100_enmo_1.2hr",
                     ...)
bayakachild_hour_long_mvpa <- gather(bayakachild_hour,time,mvpa,columns_to_gather)
remove(columns_to_gather)

bayakachild_hour_long_mvpa$time2 <- ifelse(
bayakachild_hour_long_mvpa$time=="ad_mvpa_e5s_t100_enmo_0.1hr","0:30",
ifelse(...))

bayakachild_hour_long_mvpa$time2 <- as.POSIXct(bayakachild_hour_long_mvpa$time2,format="%R")

bayakachild_hour_day$time <- ifelse(
  bayakachild_hour_day$qwindow_timestamps=="00:00-01:00","00:30:00",ifelse(
    ...,NA
  ))


bayakachild_hour_day$time <- as.POSIXct(bayakachild_hour_day$time,format="%R")

1.2.1.23 Script to turn by day file into 1 row per person

While analysis is mixed effect (1 row per person per day) for plots we need 1 row per person. The below script was used to calculate mean for each child for each variable of interest.

bayakachild_day_collapsed <- subset(bayakachild_day,bayakachild_day$n.valid.hours == 24)

bayakachild_day_collapsed<-
  bayakachild_day_collapsed%>%
  group_by(device_sn)%>%
  summarise( name=first(name),
             sleeponset.x=mean(sleeponset.x,na.rm=T),
             wakeup.x=mean(wakeup.x,na.rm=T),
             sptduration=mean(sptduration,na.rm=T),
             fraction_night_invalid=mean(fraction_night_invalid,na.rm=T),
             sleepdurationinspt=mean(sleepdurationinspt,na.rm=T),
             duration_sib_wakinghours=mean(duration_sib_wakinghours,na.rm=T),
             number_sib_wakinghours=mean(number_sib_wakinghours,na.rm=T),
             duration_sib_wakinghours_atleast15min=mean(duration_sib_wakinghours_atleast15min,na.rm=T),
             dur_spt_sleep_min=mean(dur_spt_sleep_min,na.rm=T),
             dur_day_spt_min=mean(dur_day_spt_min,na.rm=T),
             sleep_efficiency=mean(sleep_efficiency,na.rm=T))

bayakachild <- merge(bayakachild,bayakachild_day_collapsed,by= "device_sn",all = TRUE)

Introduce Cleaned MCS dataset

mcs <- read.csv("mcs_cleaned.csv")

Collapse MCS to 1 row per person

mcs_merge = mcs %>% 
  group_by(id) %>%
  summarise(id = first(id)...)

1.2.2 NHANES

1.2.2.1 Cut down variables to make more managable

names(pa_day) <- tolower(names(pa_day))
names(pa_hour) <- tolower(names(pa_hour))
names(demographics) <- tolower(names(demographics))

pa_day <- subset(pa_day,select = c(seqn, paxdaywd, paxvmd, paxmtsd))
pa_hour <- subset(pa_hour, select = c(seqn, paxtmh, paxmtsh, paxdaywh,paxdayh,paxssnhp))
demographics <- subset(demographics, select = c(seqn, riagendr,ridageyr,ridreth1,ridexagm))

1.2.2.2 Subset PA to complete entries

pa_day <- subset(pa_day,pa_day$paxvmd == 1440)
pa_day <- subset(pa_day,select = -c(paxvmd))

pa_by_day <- pa_day
pa_by_day$day <- ifelse(pa_by_day$paxdaywd==1,"sunday",
                        ifelse(pa_by_day$paxdaywd==2,"monday",
                               ifelse(pa_by_day$paxdaywd==3,"tuesday",
                                      ifelse(pa_by_day$paxdaywd==4,"wednesday",
                                             ifelse(pa_by_day$paxdaywd==5,"thursday",
                                                    ifelse(pa_by_day$paxdaywd==6,"friday",
                                                           ifelse(pa_by_day$paxdaywd==7,"saturday",NA
                                                                  )))))))

pa_hour_2 <- pa_hour 

pa_hour_2$day <- ifelse(pa_hour_2$paxdaywh==1,"sunday",
                        ifelse(pa_hour_2$paxdaywh==2,"monday",
                               ifelse(pa_hour_2$paxdaywh==3,"tuesday",
                                      ifelse(pa_hour_2$paxdaywh==4,"wednesday",
                                             ifelse(pa_hour_2$paxdaywh==5,"thursday",
                                                    ifelse(pa_hour_2$paxdaywh==6,"friday",
                                                           ifelse(pa_hour_2$paxdaywh==7,"saturday",NA
                                                                  )))))))

pa_by_day$mims_min <- pa_by_day$paxmtsd/1440

pa_day <- spread(pa_day, key = "paxdaywd", value = "paxmtsd")
pa_day$mean_mims <- rowMeans(pa_day[ , c(2:8)], na.rm=TRUE)
pa_day$mims_min <- pa_day$mean_mims/1440
pa_day <- subset(pa_day, select=c(seqn,mean_mims,mims_min))

pa_hour <- subset(pa_hour,pa_hour$paxtmh==60)
pa_hour <- subset(pa_hour,select = -c(paxtmh))
pa_hour <- subset(pa_hour, paxdayh!=1 )
pa_hour <- subset(pa_hour, paxdayh!=9 )
pa_hour_wd <- subset(pa_hour,paxdaywh !=1 & paxdaywh !=7)


pa_hour <- pa_hour %>% 
  group_by(seqn) %>% 
  mutate(rank = order(order(paxssnhp,paxdayh, decreasing=FALSE))) %>% 
  mutate(hour = rep(seq_len(24), each = 1, length.out = n()))

pa_hour <- pa_hour %>% 
  group_by(seqn,hour) %>%
  summarize(mean_mims = mean(paxmtsh))

pa_hour_wd <- pa_hour_wd %>% 
  group_by(seqn) %>% 
  mutate(rank = order(order(paxssnhp,paxdayh, decreasing=FALSE))) %>% 
  mutate(hour = rep(seq_len(24), each = 1, length.out = n()))

pa_hour_wd <- pa_hour_wd %>% 
  group_by(seqn,hour) %>%
  summarize(mean_mims = mean(paxmtsh))

pa_hour_2 <- subset(pa_hour_2, paxdayh!=1 )
pa_hour_2 <- subset(pa_hour_2, paxdayh!=9 )

pa_hour_2 <- pa_hour_2 %>% 
  group_by(seqn) %>% 
  mutate(rank = order(order(paxssnhp,paxdayh, decreasing=FALSE))) %>% 
  mutate(hour = rep(seq_len(24), each = 1, length.out = n()))

pa_hour_2 <- subset(pa_hour_2,pa_hour_2$paxtmh==60)
pa_hour_2 <- subset(pa_hour_2,!is.na(pa_hour_2$paxmtsh))

1.2.2.3 Merge files with demogrpahics

nhanes <- merge(demographics,pa_day,by= "seqn",all = TRUE)

nhanes_day <- merge(demographics,pa_by_day,by= "seqn",all = TRUE)

nhanes_hour <- merge(demographics,pa_hour,by= "seqn",all = TRUE)

nhanes_hour_wd <- merge(demographics,pa_hour_wd,by= "seqn",all = TRUE)

nhanes_hour_2 <- merge(demographics,pa_hour_2,by= "seqn",all = TRUE)

nhanes_day <- nhanes_day %>% 
  group_by(seqn) %>%
  fill(seqn, .direction = "updown") %>% 
  fill(riagendr, .direction = "updown") %>% 
  fill(ridageyr, .direction = "updown") %>% 
  fill(ridreth1, .direction = "updown") %>% 
  fill(ridexagm, .direction = "updown")

1.2.2.4 Drop Adults and those with incomplete PA measures

nhanes <- subset(nhanes, ridageyr <=18 & ridageyr >=3)
nhanes_day <- subset(nhanes_day, ridageyr <=18 & ridageyr >=3)
nhanes_hour <- subset(nhanes_hour, ridageyr <=18 & ridageyr >=3)
nhanes_hour_wd <- subset(nhanes_hour_wd, ridageyr <=18 & ridageyr >=3)
nhanes_hour_2 <- subset(nhanes_hour_2, ridageyr <=18 & ridageyr >=5)

1.2.2.5 Make hour time format for `nhanes_hour`

nhanes_hour2 <- nhanes_hour
nhanes_hour$hour <- as.character(nhanes_hour$hour)
nhanes_hour$hour <- paste(nhanes_hour$hour,":00",sep="")
nhanes_hour$hour <- as.POSIXct(nhanes_hour$hour,format="%H:%M",origin = "2013-01-01")

nhanes_hour_wd$hour <- as.character(nhanes_hour_wd$hour)
nhanes_hour_wd$hour <- paste(nhanes_hour_wd$hour,":00",sep="")
nhanes_hour_wd$hour <- as.POSIXct(nhanes_hour_wd$hour,format="%H:%M",origin = "2013-01-01")

nhanes_hour_2$hour <- as.character(nhanes_hour_2$hour)
nhanes_hour_2$hour <- paste(nhanes_hour_2$hour,":00",sep="")
nhanes_hour_2$timestamp <- case_when(nhanes_hour_2$day == "tuesday" ~ as.POSIXct(nhanes_hour_2$hour,format="%H:%M",origin = "2013-01-01"),
                             nhanes_hour_2$day == "wednesday" ~ as.POSIXct(nhanes_hour_2$hour,format="%H:%M",origin = "2013-01-02"),
                                   ...))

nhanes_hour_2_grouped <-  nhanes_hour_2 %>% 
  group_by(seqn,day) %>% 
  summarise(
    age = ridexagm/12,
    gender = ifelse(riagendr==1,"boy",
                    ifelse(riagendr==2,"girl",NA)),
    ethnicity = ridreth1,
    timestamp = timestamp,
    paxmtsh = paxmtsh,
    `0-3am` = mean(paxmtsh[hour=="1:00"|hour=="2:00"|hour=="3:00"],na.rm = T),
     `4-6am` = mean(paxmtsh[hour=="4:00"|hour=="5:00"|hour=="6:00"],na.rm = T),
    ...
  )

1.2.2.6 Convert hour counts to minute counts

nhanes_hour$mims_min <- nhanes_hour$mean_mims/60
nhanes_hour_2$mims_min <- nhanes_hour_2$paxmtsh/60
nhanes_hour_wd$mims_min <- nhanes_hour_wd$mean_mims/60

1.2.3 MCS

1.2.3.1 Read in files

interview <- read_dta("mcs6_cm_interview.dta")
accelerometer <- read_dta("mcs6_cm_accelerometer_derived.dta")
derived <- read_dta("mcs6_cm_derived.dta")

Set names to lower case

names(interview) <- tolower(names(interview))
names(derived) <- tolower(names(derived))
names(accelerometer) <- tolower(names(accelerometer))

1.2.3.2 Subset to core variables to reduce processing time

interview <- subset(interview,select = c("mcsid",
                                         "fcnum00",
                                         "fchtcm00",
                                         "fcwtcm00",
                                         "fcbfpc00",
                                         "fcintrob"))

derived<-subset(derived,select=c("mcsid",
                                 "fcnum00",
                                 "fccsex00",
                                 "fccdbm00",
                                 "fccdby00",
                                 "fccage00",
                                 "fdce0600",
                                 "fcmcs6ag",
                                 "fcbmin6"))

accelerometer <- subset(accelerometer,select = c( "mcsid",
                                                  "fcnum00",
                                                  "fcaccmonth",
                                                  "fcaccweekday",
                                                  "fcacc_n_valid_hrs",
                                                  "fcacc_mean_acc_24h",
                                                  "fcacc_mean_acc_5am_9am",
                                                  "fcacc_m5_hour_start",
                                                  "fcacc_m5_mean_acc_mg_24h",
                                                  "fcacc_l5_hour_start",
                                                  "fcacc_l5_mean_acc_mg_24h",
                                                  "fcacc_mvpa_mean_acc_e5sec_100mg",
                                                  "fcacc_mvpa_mean_acc_e1min_100mg",
                                                  "fcacc_mvpa_e5s_b10m80_t100_enmo"))

1.2.3.3 Create ID from Household and Person ID

MCS uses two identifiers, $MCSID identifies the household, with $FCNUM00 identifying the individual within that household. Where recruitment included twins, there are 2 individuals for a particular household. In this sample there are twins. As such, household number and individual ID are merged to make a unique identifier for all subsequent analysis

interview$id <- paste(interview$mcsid,interview$fcnum00,sep="")
derived$id <- paste(derived$mcsid,derived$fcnum00,sep="")
accelerometer$id <- paste(accelerometer$mcsid,accelerometer$fcnum00,sep="")

1.2.3.4 Drop extra `$mcsid`’s and `$fcnum00`’s so that they don’t become `.x` , `.y` …

derived<-subset(derived,select=c("id",
                                 "fccsex00",
                                 "fccdbm00",
                                 "fccdby00",
                                 "fccage00",
                                 "fdce0600",
                                 "fcmcs6ag",
                                 "fcbmin6"))

accelerometer <- subset(accelerometer,select = c( "id",
                                                  "fcaccmonth",
                                                  "fcaccweekday",
                                                  "fcacc_n_valid_hrs",
                                                  "fcacc_mean_acc_24h",
                                                  "fcacc_mean_acc_5am_9am",
                                                  "fcacc_m5_hour_start",
                                                  "fcacc_m5_mean_acc_mg_24h",
                                                  "fcacc_l5_hour_start",
                                                  "fcacc_l5_mean_acc_mg_24h",
                                                  "fcacc_mvpa_mean_acc_e5sec_100mg",
                                                  "fcacc_mvpa_mean_acc_e1min_100mg",
                                                  "fcacc_mvpa_e5s_b10m80_t100_enmo"))

1.2.3.5 Merge Files into one

Accelerometer contains one row per day per person, so these values are averaged to give a mean value per person over all contributing days

mcs <- merge(interview,derived,by= "id",all = TRUE)

accelerometer <- subset(accelerometer,fcacc_n_valid_hrs > 16)

accel <- accelerometer %>% 
  group_by(id) %>% 
  summarise(fcaccmonth = mean(fcaccmonth,na.rm=T),
            fcaccweekday = mean(fcaccweekday,na.rm=T),
            fcacc_n_valid_hrs = mean(fcacc_n_valid_hrs,na.rm=T),
            fcacc_mean_acc_24h = mean(fcacc_mean_acc_24h,na.rm=T),
            fcacc_mean_acc_5am_9am = mean(fcacc_mean_acc_5am_9am,na.rm=T),
            fcacc_m5_hour_start = mean(fcacc_m5_hour_start,na.rm=T),
            fcacc_m5_mean_acc_mg_24h = mean(fcacc_m5_mean_acc_mg_24h,na.rm=T),
            fcacc_l5_hour_start = mean(fcacc_l5_hour_start,na.rm=T),
            fcacc_l5_mean_acc_mg_24h = mean(fcacc_l5_mean_acc_mg_24h,na.rm=T),
            fcacc_mvpa_mean_acc_e5sec_100 = mean(fcacc_mvpa_mean_acc_e5sec_100mg,na.rm=T),
            fcacc_mvpa_mean_acc_e1min_100 = mean(fcacc_mvpa_mean_acc_e1min_100mg,na.rm=T)
            )

mcs <- merge(mcs,accelerometer,by= "id",all = TRUE)

mcs <- mcs %>%
  group_by(id) %>%             
  fill(mcsid     , .direction = "updown") %>%                      
  fill(...)

mcs <- subset(mcs,!is.na(fcacc_n_valid_hrs))
rm(accel,accelerometer,derived,interview)

1.2.3.6 Recode variables

1.2.3.6.1 Recode Gender to Boy/Girl

mcs$gender <- ifelse(mcs$fccsex00==1,"boy",
                       ifelse(mcs$fccsex00==2,"girl",
                              ifelse(mcs$fccsex00==-1,NA,NA)))
mcs <- subset(mcs,select = -c(fccsex00))

1.2.3.6.2 Recoding height to remove `NAs`

NA is -1

1.2.3.6.3 Recoding Weight

Collapse to one row per person

mcs_merge = mcs %>% 
  group_by(id) %>%
  summarise(id = first(id)...)

2 Analysis

2.0.0.1 Summary Tables

Summary statistics for

2.0.0.1.1 BaYaka children

Table One summary of complete dataframe by demographic data, stratified by gender
	Overall	boy	girl
n	51	29	22
age (mean (SD))	10.90 (4.35)	10.20 (4.60)	11.81 (3.93)
year = 2022 (%)	23 (45.1)	12 (41.4)	11 (50.0)
mean_acc (mean (SD))	46.60 (13.23)	48.12 (15.45)	44.61 (9.55)
mvpa (mean (SD))	199.95 (70.94)	206.24 (75.14)	191.66 (65.79)
sedentary (mean (SD))	1066.63 (89.69)	1071.05 (99.89)	1060.79 (76.06)
wakeful_sed (mean (SD))	585.87 (129.17)	583.07 (133.14)	589.55 (126.74)
light (mean (SD))	182.74 (38.38)	174.24 (38.69)	193.94 (35.78)
moderate (mean (SD))	179.90 (60.45)	183.45 (64.96)	175.21 (55.06)
vigorous (mean (SD))	19.36 (14.82)	21.64 (17.10)	16.36 (10.79)

2.0.0.1.2 NHANES

Table One summary of complete dataframe by demographic data, stratified by gender
	Overall	boy	girl
n	3308	1699	1609
age (mean (SD))	10.65 (4.55)	10.55 (4.55)	10.76 (4.56)
ridreth1 (%)
1	729 (22.0)	350 (20.6)	379 (23.6)
2	333 (10.1)	176 (10.4)	157 ( 9.8)
3	854 (25.8)	462 (27.2)	392 (24.4)
4	844 (25.5)	441 (26.0)	403 (25.0)
5	548 (16.6)	270 (15.9)	278 (17.3)
mean_mims (mean (SD))	14711.23 (6051.93)	14563.69 (6220.36)	14860.63 (5874.98)

*NHANES data and documentation is available here

2.0.0.1.3 MCS

Table One summary of complete dataframe by demographic data, stratified by gender
	Overall	boy	girl
n	4509	2169	2340
age (mean (SD))	14.25 (0.34)	14.26 (0.34)	14.24 (0.33)
height (mean (SD))	163.86 (8.07)	166.68 (8.70)	161.23 (6.41)
weight (mean (SD))	57.55 (12.80)	57.90 (13.40)	57.22 (12.18)
bmi (mean (SD))	21.34 (4.07)	20.70 (3.84)	21.95 (4.19)
bodyfat (mean (SD))	21.91 (9.13)	16.46 (7.89)	27.09 (6.97)
season (%)
autumn	961 (22.0)	472 (22.6)	489 (21.5)
spring	1410 (32.3)	671 (32.1)	739 (32.5)
summer	1385 (31.7)	651 (31.1)	734 (32.3)
winter	608 (13.9)	297 (14.2)	311 (13.7)
ethnicity (%)
white	3714 (83.1)	1785 (83.0)	1929 (83.1)
black or black british	116 ( 2.6)	53 ( 2.5)	63 ( 2.7)
indian	112 ( 2.5)	60 ( 2.8)	52 ( 2.2)
pakistani or bangladeshi	251 ( 5.6)	109 ( 5.1)	142 ( 6.1)
mixed	181 ( 4.0)	97 ( 4.5)	84 ( 3.6)
other	97 ( 2.2)	46 ( 2.1)	51 ( 2.2)
ad_mean_enmo_mg_0.24hr (mean (SD))	33.95 (13.25)	36.75 (15.32)	31.36 (10.33)
mvpa (mean (SD))	128.23 (54.62)	133.70 (59.58)	123.16 (49.06)

*MCS data available via UK data service

2.1 Variation Between Individuals

2.1.1 By Each Measure

2.1.1.1 Mean Enmo

Mean ENMO appears to increase by approximately 1mg ENMO for each additional year of age. Sample size is small, but implications are an increase with age. s put forward in Rowlands et al., 2020 , 1mg ENMO should be the minimum clinically important difference between inactive individuals, with each 1mg equivalent to an additional 500 steps per day (in adults), or a 5 minute brisk walk.

	mean_acc	mean_acc
Predictors	Estimates	Estimates
(Intercept)	37.30 ^*** (27.41 – 47.19)	37.69 ^*** (25.91 – 49.48)
age	0.87 ^* (0.10 – 1.64)	0.83 (-0.14 – 1.80)
gender [girl]	-4.51 (-11.14 – 2.12)	-5.65 (-25.42 – 14.12)
year [2018]	4.58 (-1.97 – 11.13)	4.55 (-2.10 – 11.20)
age × gender [girl]		0.10 (-1.53 – 1.73)
Random Effects
σ²	82.13	82.03
τ₀₀	95.31 _{device_sn}	98.64 _{device_sn}
ICC	0.54	0.55
N	51 _{device_sn}	51 _{device_sn}
Observations	150	150
Marginal R² / Conditional R²	0.099 / 0.583	0.097 / 0.590
p<0.05 p<0.01 * p<0.001

2.1.1.2 MVPA

MVPA appears to increase by a volume of 11 minutes a day per additional year of age.

	mvpa	mvpa
Predictors	Estimates	Estimates
(Intercept)	119.54 ^*** (72.18 – 166.90)	125.31 ^*** (69.11 – 181.50)
age	7.94 ^*** (4.22 – 11.65)	7.41 ^** (2.78 – 12.04)
gender [girl]	-24.06 (-56.07 – 7.96)	-41.25 (-136.70 – 54.20)
year [2018]	9.53 (-22.96 – 42.02)	8.94 (-23.83 – 41.70)
age × gender [girl]		1.50 (-6.34 – 9.34)
Random Effects
σ²	3231.56	3247.39
τ₀₀	1685.27 _{device_sn}	1704.61 _{device_sn}
ICC	0.34	0.34
N	51 _{device_sn}	51 _{device_sn}
Observations	147	147
Marginal R² / Conditional R²	0.193 / 0.469	0.192 / 0.470
p<0.05 p<0.01 * p<0.001

2.1.1.3 Sedentary

The volume of sedentary activity appears to fall with increasing age

	sedentary	sedentary
Predictors	Estimates	Estimates
(Intercept)	1167.92 ^*** (1104.93 – 1230.91)	1163.49 ^*** (1088.61 – 1238.37)
age	-9.94 ^*** (-14.85 – -5.03)	-9.53 ^** (-15.67 – -3.38)
gender [girl]	7.59 (-34.76 – 49.93)	20.45 (-105.94 – 146.84)
year [2018]	-6.66 (-48.68 – 35.36)	-6.27 (-48.84 – 36.30)
age × gender [girl]		-1.14 (-11.56 – 9.28)
Random Effects
σ²	4090.25	4094.81
τ₀₀	3577.49 _{device_sn}	3677.25 _{device_sn}
ICC	0.47	0.47
N	51 _{device_sn}	51 _{device_sn}
Observations	148	148
Marginal R² / Conditional R²	0.186 / 0.566	0.183 / 0.570
p<0.05 p<0.01 * p<0.001

2.1.1.3.1 Of which: wakeful

	wakeful_sed	wakeful_sed
Predictors	Estimates	Estimates
(Intercept)	647.75 ^*** (546.18 – 749.31)	601.90 ^*** (486.38 – 717.42)
age	-5.37 (-13.39 – 2.65)	-1.23 (-10.77 – 8.32)
gender [girl]	30.00 (-39.00 – 98.99)	176.04 (-23.32 – 375.41)
year [2018]	-27.00 (-99.55 – 45.54)	-21.78 (-92.85 – 49.30)
age × gender [girl]		-12.63 (-28.98 – 3.72)
Random Effects
σ²	21763.38	22115.31
τ₀₀	5575.15 _{device_sn}	4579.81 _{device_sn}
ICC	0.20	0.17
N	51 _{device_sn}	51 _{device_sn}
Observations	145	145
Marginal R² / Conditional R²	0.027 / 0.225	0.050 / 0.213
p<0.05 p<0.01 * p<0.001

2.1.1.4 Light Activity

Marginal changes in light activity. As such, transition in mean ENMO with age seems to be a result of increased time in MVPA exchanged with time spent sedentary.

	light	light
Predictors	Estimates	Estimates
(Intercept)	146.20 ^*** (116.78 – 175.62)	148.29 ^*** (113.26 – 183.32)
age	2.52 ^* (0.23 – 4.81)	2.32 (-0.55 – 5.20)
gender [girl]	14.28 (-5.49 – 34.05)	8.07 (-50.56 – 66.70)
year [2018]	5.47 (-14.10 – 25.03)	5.29 (-14.55 – 25.14)
age × gender [girl]		0.55 (-4.29 – 5.39)
Random Effects
σ²	699.95	699.82
τ₀₀	847.31 _{device_sn}	872.91 _{device_sn}
ICC	0.55	0.56
N	51 _{device_sn}	51 _{device_sn}
Observations	145	145
Marginal R² / Conditional R²	0.108 / 0.597	0.107 / 0.603
p<0.05 p<0.01 * p<0.001

2.1.2 Overall

2.1.2.1 Table

	mean_acc	mean_acc	mvpa	mvpa	light	light	sedentary	sedentary	wakeful_sed	wakeful_sed
Predictors	Estimates	Estimates	Estimates	Estimates	Estimates	Estimates	Estimates	Estimates	Estimates	Estimates
(Intercept)	37.3 ^*** (27.4 – 47.2)	37.7 ^*** (25.9 – 49.5)	119.5 ^*** (72.2 – 166.9)	125.3 ^*** (69.1 – 181.5)	146.2 ^*** (116.8 – 175.6)	148.3 ^*** (113.3 – 183.3)	1167.9 ^*** (1104.9 – 1230.9)	1163.5 ^*** (1088.6 – 1238.4)	647.7 ^*** (546.2 – 749.3)	601.9 ^*** (486.4 – 717.4)
age	0.9 ^* (0.1 – 1.6)	0.8 (-0.1 – 1.8)	7.9 ^*** (4.2 – 11.7)	7.4 ^** (2.8 – 12.0)	2.5 ^* (0.2 – 4.8)	2.3 (-0.6 – 5.2)	-9.9 ^*** (-14.8 – -5.0)	-9.5 ^** (-15.7 – -3.4)	-5.4 (-13.4 – 2.7)	-1.2 (-10.8 – 8.3)
gender [girl]	-4.5 (-11.1 – 2.1)	-5.6 (-25.4 – 14.1)	-24.1 (-56.1 – 8.0)	-41.2 (-136.7 – 54.2)	14.3 (-5.5 – 34.1)	8.1 (-50.6 – 66.7)	7.6 (-34.8 – 49.9)	20.5 (-105.9 – 146.8)	30.0 (-39.0 – 99.0)	176.0 (-23.3 – 375.4)
year [2018]	4.6 (-2.0 – 11.1)	4.5 (-2.1 – 11.2)	9.5 (-23.0 – 42.0)	8.9 (-23.8 – 41.7)	5.5 (-14.1 – 25.0)	5.3 (-14.6 – 25.1)	-6.7 (-48.7 – 35.4)	-6.3 (-48.8 – 36.3)	-27.0 (-99.5 – 45.5)	-21.8 (-92.9 – 49.3)
age × gender [girl]		0.1 (-1.5 – 1.7)		1.5 (-6.3 – 9.3)		0.6 (-4.3 – 5.4)		-1.1 (-11.6 – 9.3)		-12.6 (-29.0 – 3.7)
Random Effects
σ²	82.13	82.03	3231.56	3247.39	699.95	699.82	4090.25	4094.81	21763.38	22115.31
τ₀₀	95.31 _{device_sn}	98.64 _{device_sn}	1685.27 _{device_sn}	1704.61 _{device_sn}	847.31 _{device_sn}	872.91 _{device_sn}	3577.49 _{device_sn}	3677.25 _{device_sn}	5575.15 _{device_sn}	4579.81 _{device_sn}
ICC	0.54	0.55	0.34	0.34	0.55	0.56	0.47	0.47	0.20	0.17
N	51 _{device_sn}	51 _{device_sn}	51 _{device_sn}	51 _{device_sn}	51 _{device_sn}	51 _{device_sn}	51 _{device_sn}	51 _{device_sn}	51 _{device_sn}	51 _{device_sn}
Observations	150	150	147	147	145	145	148	148	145	145
Marginal R² / Conditional R²	0.099 / 0.583	0.097 / 0.590	0.193 / 0.469	0.192 / 0.470	0.108 / 0.597	0.107 / 0.603	0.186 / 0.566	0.183 / 0.570	0.027 / 0.225	0.050 / 0.213
p<0.05 p<0.01 * p<0.001

2.1.2.2 Figure

2.1.3 Mean Acceleration in NHANES

Trend in ages appear to be diverging, with BaYaka level/increasing with age compared to a decrease in NHANES. Scales are how differing, so going to z-score and re-plot to examine differences between the populations.

Using Z-Scores, For the BaYaka Pa is lowest through late childhood (ages 5-11) and increases into adulthood. Conversely, in the US/NHANES sample, PA is highest during early childhood , peaking at ages 6-7, before decreasing through to adulthood, at which point it starts to plateau.

Age is associated with an increasing PA score but only when the sample is from the BaYaka, with a low intercept that has a positive function with age. NHANES starts higher at younger (marked by a positive estimate for NHANES), but decreases from there across childhood, marked by a negative estimate for age:populationNHANES

	pa_z
Predictors	Estimates
(Intercept)	0.07 (-0.17 – 0.31)
gender [girl]	0.05 (-0.01 – 0.11)
age z	0.22 (-0.01 – 0.46)
population NHANES	-0.06 (-0.30 – 0.18)
age z × population NHANES	-0.51 ^*** (-0.75 – -0.28)
Random Effects
σ²	0.35
τ₀₀ _id	0.57
ICC	0.62
N _id	2771
Observations	14129
Marginal R² / Conditional R²	0.086 / 0.649
p<0.05 p<0.01 * p<0.001

2.2 Variation within days

2.2.1 Overall

To analyse variation within days, only thoes with from the 2022 field visit who had multiple days of recording were included.

For each individual in the 2022 data, their mean acceleration at each hour averaged across the 5 days of recording is presented as a single point, with colour corresponding to gender. The mean acceleration was calculated for each hour interval and presented at the half hour (i.e., the mean acceleration between 09:00 and 10:00 is presented at 09:30). Presented light phases were taken from the nearest town on the first day of recording. The dashed horizontal lines represent the mean time of wake and sleep onset across the sample.

Sunrise in Pokola on 13/2/22 was at 06:06, with morning twilight beginning at 05:45. Mean wake-up time in this sample was 05:49 with the average onset of sleep at 21:46 as detected by the device

2.2.2 Analysing Using mixed effects

Mixed effect models were also used to analyse differences in activity across the day. Mean accelerations were averaged over 3-hour windows, for each day of recording for each child. The effect of time of day (in 3-hour windows) was modelled against the mean acceleration in that window, with the fixed covariates of age and gender included. To account for the nested clustering of results within days and within individuals, random effects of day and individual ID were included to reflect this nesting of groups.

Plot here is similar to previous, but here each 3hour block is presented separately to display differences between groups of times. As before, mean ENMO rises through the morning, with the highest values observed before 12, then remains high through the afternoon before decreasing into the afternoon

2.2.3 Variation in sleep timing

Device estimated (a) wake up and (b) sleep onset times across individuals. Dashed red line represents the estimate from linear model. The algorithm defines the longest sustained inactive bout within each day as sleep, allowing for the sleep bout to be disrupted for up to one hour before. Sleep is classified by GGIR as any block of 10 minutes, in which the average movement of the device is less than 5o in any axis over each 5 second epoch. From this the estimated onset and end of sleep windows can be calculated as the final and first timepoint in each day in which an individual was not in a device defined bout of sleep Y-axes are matched in length to 9 hours each to aid interpretation.

The standard deviation in wakeup time was NA compared to NA

2.2.4 Variation between days

To assess the variation between days in the BaYaka and NHANES sample, intercept only mixed effect models are employed.

Violin plot of variation within Individuals

2.2.4.1 Between Populations

In comparing ICCs between studies, values can vary between 0 and 1, with lower values indicating greater variation within an individual. a score of 1 would indicate that all observations for an individual were equal to their means.

Using intercept only mixed effect models on standardised mean accelerations amongst BaYaka children with five days of recording (N=23) to match the length of recording in NHANES, a lower intra-class correlation (ICC) (0.51) (Left Hand Column) was observed for BaYaka children than US children (0.65).

	pa_z	pa_z
Predictors	Estimates	Estimates
(Intercept)	-0.00 (-0.32 – 0.32)	0.04 ^* (0.01 – 0.07)
Random Effects
σ²	0.51	0.35
τ₀₀	0.51 _{device_sn}	0.65 _id
ICC	0.50	0.65
N	23 _{device_sn}	2720 _id
Observations	115	13979
Marginal R² / Conditional R²	0.000 / 0.497	0.000 / 0.648
p<0.05 p<0.01 * p<0.001

BaYaka Child Data Analysis

Luke Kretschmer

2023-11-17