Introduction

Indoor air pollution, in the form of particulate matter (PM2.5) is found in hazardous levels in 86% of Nairobi slum households (Muindi et al., 2017). As the result of poverty and poor availability of public goods, such as natural gas and electricity, cheap but polluting biomass fuels are used for cooking, lighting and heating (Muindi et al., 2017; Nlom & Karimov, 2015). Studies have shown an association between indoor air pollution and respiratory symptoms and illnesses among children aged under five years in the slums of Nairobi (Egondi et al., 2018). It is thought that the burden of exposure to indoor air pollution, and resulting health consequences, falls disproportionately on women and girls who use the cooking stove (Ezzati & Kammen, 2002).

This study looks at symptoms or conditions potentially aggravated by air pollution and the relationship to gender among school age children in the Kibera informal settlement (‘slum’).

Method

School-age children (primary to secondary level) were seen in a medical screening clinic in Nairobi, Kenya, from 3rd January 2020 to 10th January 2020. The clinic was conducted by the Spur Afrika child development program. Children were drawn from those supported by the Spur Afrika program and also entire year-levels of primary and secondary schools in the Kibera area. Kibera is an area of informal dwellings (slums). Kibera has a population of approximately 900,000 in an area of 2.5 square kilometeres, five kilometres from the Nairobi city centre. Kibera has had low levels of urban planning, low levels of public goods and high levels of poverty (Mutisya & Yarime, 2011).

Children seen from the Kibera schools were not screened or selected prior to being seen by the physicians/doctors. Five physicians saw children from the schools and Spur Afrika program. Physicians recorded their open-ended consultations with the children, including symptoms, diagnoses and management plans. As the consultations were open-ended and the background of the physicians varied widely, it is expected that there will be variation between the recording of different physicians. At the time of writing (13th May 2020), data entry for one physician (Dr. David Fong) is complete. Only data derived from the recordings by Dr. David Fong is used in this interim analysis.

The study looks at symptoms or conditions potentially attributable to air pollution (respiratory, respiratory tract and eye symptoms and conditions) and the relationship with gender. Other potential effects are also examined in the supplemental analysis. The potential ‘random effects’ are age and school. The other potential ‘fixed effect’ is being supported by the Spur Afrika child development program.

Data analysis

# required libraries

library(airtabler) # database access
library(dplyr)     # data manipulation
library(tidyr)     # 'tidy' data
library(magrittr)  # piping
library(lubridate) # time functions
library(finalfit)  # summary tables
library(lme4)      # regressions
library(jtools)    # pretty regression tables
library(summarytools)# pretty regression tables
library(kableExtra)  # pretty tables
library(highcharter) # charts
library(viridisLite) # colour palettes
library(sjPlot)      # regression table display, requires recent version
# read database

airtable <- airtabler::airtable(base_key, "Children")
rawdata <- airtable$Children$select_all()
  • Gender Male, Female and NA. ‘NA’ is ‘not available’
  • School School attended (if recorded)
  • Clinician The clinician/‘surveyor’ who saw the child.
  • Sponsored children are supported by the Spur Afrika child development program.
# choose the required data

data <- rawdata %>%
  select(c("id", "Gender", "Date of birth", "Date seen",
    "School", "Diagnosis", "Clinician", "Sponsored")) %>%
  replace_na(list(Clinician = "Other")) %>%
  mutate(
    Gender = factor(Gender, c("Male", "Female")),
    School = as.factor(School),
    Clinician = as.factor(Clinician),
    Sponsored = replace_na(Sponsored, FALSE)
  )

# Some data is 'Factorized' : **Gender**, **School**, **Sponsored** and **Clinician**
  • AgeDays Age in days
  • AgeYears Age in years
  • AgeGroup Grouped into ‘two years’ groups
# calculate ages

data <- data %>%
  mutate(
    `Date seen` = as.Date(`Date seen`),
    `Date of birth` = as.Date(`Date of birth`)
  ) %>%
  mutate(AgeDays = `Date seen` - `Date of birth`) %>%
  mutate(AgeYears = floor(time_length(AgeDays, "years"))) %>%
  mutate(AgeGroup = as.factor(
    paste0(
      as.character(floor((AgeYears+1)/2)*2-1),
      "-",
      as.character(floor((AgeYears+1)/2)*2)
    ) # two year age groups
  ))

Diagnoses/conditions e.g. asthma, bronchitis.

Findings e.g. cough, itchy eyes.

In the database, both diagnoses and findings information are found in the Diagnosis column.

Created summary columns are PollutionDiagnosis, PollutionFinding and PollutionDiagnosisOrFinding. These variables are set to TRUE if the child has a condition or finding which could be aggravated or caused by air pollution.

# make the data 'wide', a column for each finding, diagnosis/condition

wide_data <- data %>%
  unnest(Diagnosis) %>% # each finding/condition/diagnosis has its own row
  mutate(yesno = TRUE) %>% # 'dummy' column
  distinct() %>%           # get rid of duplicates (there shouldn't be any)
  spread(Diagnosis, yesno, fill = FALSE) %>% # go 'wide'
  # each finding/diagnosis will now have its own column
  # if the patient had the 'Diagnosis' in their list, then 
  #  the column entry will have 'yesno' = TRUE
  #  otherwise will equal the 'fill' = FALSE
  mutate(
    PollutionDiagnosis = Adenitis | `Allergic Bronchitis` |
      `Allergic conjunctivitis` | `Allergic rhinitis` | Asthma | Bronchitis |
      `Nasal polyps` | `Otitis media` |
      `Respiratory tract infection` | Rhinorrhoea | Tonsilitis |
      `Viral pharyngitis` | `viral illness` | `Viral upper respiratory tract infection`,
    PollutionFinding = Cough | `Cough - cold weather` | `Cough - exertional` |
      `Cough - nocturnal` | `Dry eyes` | Dyspnoea | `Dyspnoea - exercise` |
      `Dyspnoea - nocturnal` |
      `Environmental irritant` | `Eye inflammation` | `Eye irritation` |
      `Eye pain` | `Itchy eyes` | `Lacrimation` | `Nasal congestion` |
      `Smoke irritation` | `Sore throat` |
      `Watery eyes` | Wheezing
  ) %>%
  mutate(PollutionDiagnosisOrFinding = PollutionDiagnosis | PollutionFinding)

Data filtering

Only children up to the age of 16 years is included in the analysis.

Only children seen by the clinician ‘David Fong’ are included in the analysis.

# filter by age and clinician

filtered_wide_data <- wide_data %>%
  filter(Clinician == "David Fong") %>% # just the children seen by this clinician
  filter(AgeYears <= 16) # children only
# there are some respondents who are in university etc. and 20+ years old

Summary statistics

x <- filtered_wide_data %>%
    select(AgeYears, Gender, PollutionDiagnosisOrFinding, Sponsored)
dfSummary(
  x,
  plain.ascii = FALSE,
  headings = FALSE,
  graph.col = FALSE,
  graph.magnif = 0.75,
  style = "grid",
  na.col = FALSE,
  tmp.img.dir = "img"
  )
No Variable Stats / Values Freqs (% of Valid) Valid
1 AgeYears
[numeric]
Mean (sd) : 12.3 (1.8)
min < med < max:
7 < 12 < 16
IQR (CV) : 2 (0.1)
7 : 1 ( 0.6%)
8 : 1 ( 0.6%)
9 : 9 ( 5.8%)
10 : 14 ( 9.0%)
11 : 25 (16.0%)
12 : 32 (20.5%)
13 : 36 (23.1%)
14 : 21 (13.5%)
15 : 11 ( 7.0%)
16 : 6 ( 3.9%)
156
(100%)
2 Gender
[factor]
1. Male
2. Female
85 (54.8%)
70 (45.2%)
155
(99.36%)
3 PollutionDiagnosisOrFinding
[logical]
1. FALSE
2. TRUE
96 (61.5%)
60 (38.5%)
156
(100%)
4 Sponsored
[logical]
1. FALSE
2. TRUE
125 (80.1%)
31 (19.9%)
156
(100%)

A total of 156 children up from the age of seven (7) to sixteen (16) years inclusive are included in the analysis.

Eighty-five (85) are recorded as being Male, seventy (70) are recorded as being Female.

Sixty (60) reported, or were found to have, a symptom or diagnosis potentially aggravated by air pollution, a prevalence rate of 38%.

Age distribution and Gender

filtered_wide_data %>%
  mutate(Gender = as.character(Gender)) %>%
  replace_na(list(Gender = "Unknown")) %>%
  count(AgeYears, Gender) %>%
  hchart(
    'areaspline',
    hcaes(x = "AgeYears", y = "n", group = "Gender")
  ) %>%
  hc_colors(viridis(3, alpha = 0.5))

A simple logistic model

A simple model, predicting the presence of potential air pollution related conditions (e.g. asthma) or symptoms (e.g. cough, eye irritation) based on gender.

The results of this model is very close to more complicated models which include factors Age, School and Sponsorship (Supplement A).

model1 <- glm(
  PollutionDiagnosisOrFinding ~ Gender,
  data = filtered_wide_data,
  family = binomial(link = "logit")
)
summ(
  model1,
  confint = TRUE, # include confidence interval
  exp = TRUE,     # logistical model, need to 'exponential' to get odds-ratio
  digits = 4
)
Observations 155 (1 missing obs. deleted)
Dependent variable PollutionDiagnosisOrFinding
Type Generalized linear model
Family binomial
Link logit
𝛘²(1) 5.9885
Pseudo-R² (Cragg-Uhler) 0.0515
Pseudo-R² (McFadden) 0.0291
AIC 203.9690
BIC 210.0559
exp(Est.) 2.5% 97.5% z val. p
(Intercept) 0.4167 0.2613 0.6644 -3.6777 0.0002
GenderFemale 2.2667 1.1700 4.3914 2.4252 0.0153
Standard errors: MLE


Findings

Female children had an estimated odd-ratio of 2.23 (95% confidence interval 1.17 to 4.39) of having potentially air-pollution related conditions or symptoms (such as cough, asthma or eye irritation) compared to male children.

Limitations and Further study

Although this study is consistent with the suspected relationship between air pollution symptoms/conditions and female gender, cause and effect is not established.

When asked what their favourite activity was, some girls responded ‘cooking’ (and no boys said the same), but this study did not systematically tally either the favourite activity of children, or record the time spent in the vicinity of a cooking stove or the nature of cooking fuels in the dwellings.

Supplement A - comparison to more complex models

Adding Sponsored status to Gender as a predictor does not substantially improve predicting power (approximately the same Akaike Information Criterion ‘AIC’) and does not substantially change the estimated co-efficient for ‘GenderFemale’, or the p-value for the ‘GenderFemale’ estimate.

The p-value for the effect of being sponsored is greater than \(0.10\). It would be interesting to see if the ‘negative’ predictive value is sustained when more data is available.

model2 <- glm(
  PollutionDiagnosisOrFinding ~ Gender + Sponsored,
  data = filtered_wide_data,
  family = binomial(link = "logit")
)

Conditions or symptoms aggravted by air pollution could change with age e.g. involvement with cooking could change with age. Respiratory symptoms and conditions, even in the absence of cooking pollution exposure, also change during childhood.

Exposure to air pollution, and other factors which might influence respiratory or eye conditions and symptoms, might be different in different geographical locations. School location might be a proxy for geographical location. However, all the surveyed children are based in, and most of the schools they attend, are within the Kibera area, which is only 2.5 square kilometres in area.

Adding School and AgeGroup as random effects to Gender does not substantially change the estimated co-efficient for Gender, or the p-value for the estimate.

The reported variance of the random effects School and AgeGroup is low (close to zero), suggesting that both School and AgeGroup has little predictive power in this analysis.

model3 <- glmer(
  PollutionDiagnosisOrFinding ~ Gender + (1 | School) + (1 | AgeGroup),
  data = filtered_wide_data,
  family = binomial(link = "logit")
)

Three models are compared below, only children seen by Dr. David Fong (from left to right):

  1. The ‘simple’ model, Gender as sole predictor
  2. Gender and Sponsored as predictors
  3. Gender as predictor, School and AgeGroup as random effects
tab_model(model1, model2, model3, show.aic = TRUE)
  PollutionDiagnosisOrFinding PollutionDiagnosisOrFinding PollutionDiagnosisOrFinding
Predictors Odds Ratios CI p Odds Ratios CI p Odds Ratios CI p
(Intercept) 0.42 0.26 – 0.66 <0.001 0.47 0.29 – 0.76 0.003 0.43 0.27 – 0.69 0.001
Gender [Female] 2.27 1.18 – 4.43 0.015 2.30 1.19 – 4.53 0.015 2.26 1.15 – 4.46 0.018
SponsoredTRUE 0.48 0.19 – 1.15 0.113
Random Effects
σ2     3.29
τ00     0.00 School
    0.00 AgeGroup
N     22 School
    5 AgeGroup
Observations 155 155 147
R2 Tjur 0.039 0.055 0.048 / NA
AIC 203.969 203.291 198.605


As of 13th May 2020, survey data from surveyors/clinicians other than Dr. David Fong is incomplete. Survey data was restricted to Dr. David Fong in the preliminary analysis both for ease of analysis and also to have more consistent prevalence figures.

The finding of a significant difference between girls and boys of potentially pollution-aggravated symptoms/finding continues to hold true at the \(p<0.05\) level if findings from all clinicians are included, but only if ‘surveyor’ effects are included as a fixed effect.

‘Surveyor’ effect is considerable, there is a large variance in the Clinician random effect.

filtered_age_wide_data <- wide_data %>%
  filter(AgeYears <= 16)

model4 <- glm(
  PollutionDiagnosisOrFinding ~ Gender,
  data = filtered_age_wide_data,
  family = binomial(link = "logit")
)

model5 <- glmer(
  PollutionDiagnosisOrFinding ~ Gender + (1 | Clinician),
  # 'Clinician' is a random effect
  data = filtered_age_wide_data,
  family = binomial(link = "logit")
)

model6 <- glmer(
  PollutionDiagnosisOrFinding ~ Gender + Sponsored +
    (1 | Clinician) + (1 | School) + (1 | AgeGroup),
  # 'Clinician' is a random effect
  data = filtered_age_wide_data,
  family = binomial(link = "logit")
)

Three models are compared below (from left to right)

  1. The ‘simple’ model, Gender as sole predictor, only children seen by David Fong
  2. All children seen, not just those seen by David Fong. (larger number of observations than the first model)
  3. All children seen, with Clinician as random effect.
  4. All children seen. Gender and Sponsored as predictors. Clinicians, School and AgeGroup as random effects.
tab_model(model1, model4, model5, model6, show.aic = TRUE)
  PollutionDiagnosisOrFinding PollutionDiagnosisOrFinding PollutionDiagnosisOrFinding PollutionDiagnosisOrFinding
Predictors Odds Ratios CI p Odds Ratios CI p Odds Ratios CI p Odds Ratios CI p
(Intercept) 0.42 0.26 – 0.66 <0.001 0.31 0.23 – 0.41 <0.001 0.27 0.13 – 0.56 <0.001 0.29 0.14 – 0.63 0.001
Gender [Female] 2.27 1.18 – 4.43 0.015 1.18 0.80 – 1.77 0.406 1.32 0.87 – 2.00 0.196 1.30 0.85 – 2.00 0.220
SponsoredTRUE 0.41 0.18 – 0.94 0.034
Random Effects
σ2     3.29 3.29
τ00     0.46 Clinician 0.01 School
      0.02 AgeGroup
      0.45 Clinician
ICC     0.12 0.13
N     4 Clinician 4 Clinician
      37 School
      7 AgeGroup
Observations 155 515 515 506
R2 Tjur 0.039 0.001 0.005 / 0.127 0.027 / 0.152
AIC 203.969 585.242 556.290 546.994

References

Egondi, T., Ettarh, R., Kyobutungi, C., Ng, N., & Rocklöv, J. (2018). Exposure to Outdoor Particles (PM2.5) and Associated Child Morbidity and Mortality in Socially Deprived Neighborhoods of Nairobi, Kenya. Atmosphere, 9(9), 351. https://doi.org/10.3390/atmos9090351

Ezzati, M., & Kammen, D. M. (2002). The health impacts of exposure to indoor air pollution from solid fuels in developing countries: Knowledge, gaps, and data needs. Environmental Health Perspectives, 110(11), 1057–1068.

Muindi, K., Ng, N., Rocklöv, J., Kimani-Murage, E., Thynell, M., Umeå universitet, & Institutionen för folkhälsa och klinisk medicin. (2017). Air pollution in Nairobi slums sources, levels and lay perceptions. Umeå University.

Mutisya, E., & Yarime, M. (2011). Understanding the Grassroots Dynamics of Slums in Nairobi: The Dilemma of Kibera Informal Settlements. International Transaction Journal of Engineering, Management, & Applied Sciences & Technologies, 2, 197–213.

Nlom, J. H., & Karimov, A. A. (2015). Modeling Fuel Choice among Households in Northern Cameroon. Sustainability, 2015(7), 9989–9999. https://doi.org/10.3390/su7089989