# Processing the raw data
df_clean <- df %>%
clean_names() %>%
mutate(
# 1. Convert Excel Serial Dates and fix typos
date_onset = case_when(
str_detect(date_of_onset_of_fever, "^[0-9]{5}$") ~
as.Date(as.numeric(date_of_onset_of_fever), origin = "1899-12-30"),
str_detect(date_of_onset_of_fever, "-1012$") ~
dmy(str_replace(date_of_onset_of_fever, "-1012$", "-2012")),
TRUE ~ ymd(str_remove(date_of_interview, " UTC"))
),
# 2. Standardize Age
age_years = as.numeric(age),
age_group = cut(age_years,
breaks = c(0, 5, 18, 50, 65, Inf),
labels = c("<5", "5-17", "18-49", "50-64", "65+")),
# 3. Outcome Variable
severity = factor(type_of_case, levels = c("ILI", "SARI")),
# 4. Comorbidity Logic (Expanded to include Pregnancy, Asthma, Cancer)
# We create numeric versions for the sum, then clean the factors
across(c(diabetes, hiv_aids, heart_disease, asthma, cancer, pregnancy),
~case_when(. == "Yes" ~ 1, TRUE ~ 0), .names = "{.col}_num"),
# 5. Create Comorbidity Count (Summing the numeric versions)
comorbid_count = diabetes_num + hiv_aids_num + heart_disease_num +
asthma_num + cancer_num + pregnancy_num,
comorbid_factor = factor(case_when(
comorbid_count == 0 ~ "0",
comorbid_count == 1 ~ "1",
comorbid_count >= 2 ~ "2+",
TRUE ~ "0"
), levels = c("0", "1", "2+"))
) %>%
# Filter for Influenza Positives
filter(influenza_type %in% c("Flu A", "Flu B", "Flu A&B"))
# 6. Standardizing Subtypes
df_clean <- df_clean %>%
mutate(
influenza_sub_type = str_trim(str_to_upper(influenza_sub_type)),
influenza_sub_type = case_match(
influenza_sub_type,
c("B VICTORIA", "B/VICTORIA") ~ "B/Victoria",
c("NOT SUBTYPED", "NOT_SUBTYPED") ~ "Not Subtyped",
c("A/H1", "2009 A/H1N1", "A/H1N1") ~ "A(H1N1)pdm09",
"A/H3" ~ "A/H3N2",
"ALL NEGATIVE" ~ "Negative",
.default = influenza_sub_type
),
influenza_sub_type = factor(influenza_sub_type)
)
| Patient Characteristic | Overall N = 4321 |
ILI N = 3621 |
SARI N = 701 |
p-value2 |
|---|---|---|---|---|
| Age Group (Years) | 0.7 | |||
| <5 | 254 (59%) | 211 (59%) | 43 (61%) | |
| 5-17 | 72 (17%) | 59 (16%) | 13 (19%) | |
| 18-49 | 79 (18%) | 70 (19%) | 9 (13%) | |
| 50-64 | 16 (3.7%) | 13 (3.6%) | 3 (4.3%) | |
| 65+ | 9 (2.1%) | 7 (1.9%) | 2 (2.9%) | |
| sex | >0.9 | |||
| Female | 223 (52%) | 187 (52%) | 36 (51%) | |
| Male | 208 (48%) | 174 (48%) | 34 (49%) | |
| Missing | 1 (0.2%) | 1 (0.3%) | 0 (0%) | |
| Diabetes Mellitus | 0.4 | |||
| Missing | 13 (3.0%) | 10 (2.8%) | 3 (4.3%) | |
| No | 415 (96%) | 349 (96%) | 66 (94%) | |
| Yes | 4 (0.9%) | 3 (0.8%) | 1 (1.4%) | |
| HIV/AIDS Positive | >0.9 | |||
| Missing | 3 (0.7%) | 3 (0.8%) | 0 (0%) | |
| No | 419 (97%) | 350 (97%) | 69 (99%) | |
| Yes | 10 (2.3%) | 9 (2.5%) | 1 (1.4%) | |
| Chronic Heart Disease | 2 (0.5%) | 2 (0.6%) | 0 (0%) | >0.9 |
| Asthma | 5 (1.2%) | 3 (0.8%) | 2 (2.9%) | 0.2 |
| Cancer/Malignancy | 1 (0.2%) | 1 (0.3%) | 0 (0%) | >0.9 |
| Pregnancy Status | 0.7 | |||
| Missing | 12 (2.8%) | 11 (3.0%) | 1 (1.4%) | |
| No | 420 (97%) | 351 (97%) | 69 (99%) | |
| Number of Comorbidities | 0.8 | |||
| 0 | 410 (95%) | 344 (95%) | 66 (94%) | |
| 1 | 22 (5.1%) | 18 (5.0%) | 4 (5.7%) | |
| 2+ | 0 (0%) | 0 (0%) | 0 (0%) | |
| 1 n (%) | ||||
| 2 Fisher’s exact test | ||||
A total of 432 laboratory-confirmed influenza cases were included in the analysis, comprising 362 (83.8%) patients with Influenza-Like Illness (ILI) and 70 (16.2%) with Severe Acute Respiratory Infection (SARI). The demographic and clinical profiles of these patients are summarized in Table 1.
The study population was predominantly pediatric, with children under five years of age accounting for 58.8% (n=254) of the total cohort. The proportion of children under five was slightly higher among SARI cases (61.4%) compared to ILI cases (58.3%), though this difference was not statistically significant (p=0.7). Sex distribution was nearly equal, with females representing 51.6% (n=223) of all cases. No significant association was found between sex and disease severity (p>0.9).
The overall prevalence of recorded chronic comorbidities was low in this cohort, with 94.9% (n=410) of patients having no documented underlying conditions. Among the specific conditions investigated:
HIV/AIDS was the most frequent comorbidity, present in 2.3% (n=10) of the population, followed by Asthma (1.2%, n=5) and Diabetes Mellitus (0.9%, n=4).
Asthma showed the most notable trend toward severity, being present in 2.9% of SARI cases compared to 0.8% of ILI cases (p=0.2).
Diabetes Mellitus was recorded in 1.4% of SARI cases and 0.8% of ILI cases (p=0.4).
Pregnancy, Chronic Heart Disease, and Cancer/Malignancy were rare, each occurring in <=0.5% of the total cohort.
Statistical analysis using Fisher’s Exact Test revealed that no single comorbidity or the cumulative number of comorbidities (p=0.8) was significantly associated with an increased risk of SARI in this specific sample.
| Symptom Reported | ILI N = 3621 |
SARI N = 701 |
p-value2 |
|---|---|---|---|
| Fever | 0.029 | ||
| Missing | 46 (13%) | 15 (21%) | |
| No | 141 (39%) | 17 (24%) | |
| Yes | 175 (48%) | 38 (54%) | |
| Cough | 0.2 | ||
| Missing | 0 (0%) | 1 (1.4%) | |
| No | 10 (2.8%) | 1 (1.4%) | |
| Yes | 352 (97%) | 68 (97%) | |
| Sore Throat | 0.2 | ||
| Missing | 3 (0.8%) | 1 (1.4%) | |
| No | 288 (80%) | 49 (70%) | |
| Yes | 71 (20%) | 20 (29%) | |
| Shortness of Breath | <0.001 | ||
| Missing | 3 (0.8%) | 0 (0%) | |
| No | 326 (90%) | 37 (53%) | |
| Yes | 33 (9.1%) | 33 (47%) | |
| 1 n (%) | |||
| 2 Fisher’s exact test | |||
The clinical presentation of influenza-positive patients varied significantly by case definition (Table 2). While certain core symptoms were near-universal across both groups, others served as strong indicators of severe disease.
Respiratory Distress: The most significant clinical differentiator was Shortness of Breath. While only 9.1% (n=33) of ILI patients reported this symptom, it was present in 47% (n=33) of SARI cases (p < 0.001). This confirms that respiratory distress is the primary hallmark distinguishing severe cases (SARI) from mild presentations (ILI) in this cohort.
Fever: There was a statistically significant difference in the reporting of Fever between the two groups (p = 0.029). SARI patients had a higher rate of confirmed fever (54%) compared to ILI patients (48%). Notably, the SARI group also had a higher proportion of missing fever data (21% vs 13%), which may reflect limitations in temperature monitoring during emergency admissions.
Cough: This was the most common symptom overall, present in 97% of both ILI and SARI cases. Due to its near-universal presence in laboratory-confirmed influenza, it did not serve as a predictor of severity (p = 0.2).
Sore Throat: Although a higher proportion of SARI cases reported a sore throat (29% vs 20% in ILI), this difference did not reach statistical significance (p = 0.2).
| Subtype | N = 4321 |
|---|---|
| Influenza Subtype | |
| A(H1N1)pdm09 | 70 (16%) |
| A/H3N2 | 13 (3.0%) |
| B/Victoria | 33 (7.6%) |
| Negative | 1 (0.2%) |
| Not Subtyped | 315 (73%) |
| 1 n (%) | |
The distribution of standardized influenza subtypes among the study population (N=432) is presented in Table 3.
Key Findings
Dominant Strains: Among the samples that were successfully subtyped, Influenza A(H1N1)pdm09 was the most prevalent strain, accounting for 16% (n=70) of the total positive cases. This was followed by Influenza B/Victoria at 7.6% (n=33) and Influenza A/H3N2 at 3.0% (n=13).
Subtyping Gap: A significant proportion of the laboratory-confirmed influenza cases, 73% (n=315), were recorded as “Not Subtyped.” This indicates that while these samples were confirmed positive for Influenza A or B via initial screening, specific lineage or subtype identification was not completed or available in the surveillance record.
Negative Controls: One sample (0.2%) was re-classified as negative during the standardization process, likely representing a sample that initially screened weak-positive but failed to meet subtyping thresholds.