# Processing the raw data
df_clean <- df %>%
  clean_names() %>%
  mutate(
    # 1. Convert Excel Serial Dates and fix typos
    date_onset = case_when(
      str_detect(date_of_onset_of_fever, "^[0-9]{5}$") ~ 
        as.Date(as.numeric(date_of_onset_of_fever), origin = "1899-12-30"),
      str_detect(date_of_onset_of_fever, "-1012$") ~ 
        dmy(str_replace(date_of_onset_of_fever, "-1012$", "-2012")),
      TRUE ~ ymd(str_remove(date_of_interview, " UTC")) 
    ),
    
    # 2. Standardize Age
    age_years = as.numeric(age),
    age_group = cut(age_years, 
                    breaks = c(0, 5, 18, 50, 65, Inf),
                    labels = c("<5", "5-17", "18-49", "50-64", "65+")),
    
    # 3. Outcome Variable
    severity = factor(type_of_case, levels = c("ILI", "SARI")),
    
    # 4. Comorbidity Logic (Expanded to include Pregnancy, Asthma, Cancer)
    # We create numeric versions for the sum, then clean the factors
    across(c(diabetes, hiv_aids, heart_disease, asthma, cancer, pregnancy), 
           ~case_when(. == "Yes" ~ 1, TRUE ~ 0), .names = "{.col}_num"),
    
    # 5. Create Comorbidity Count (Summing the numeric versions)
    comorbid_count = diabetes_num + hiv_aids_num + heart_disease_num + 
                     asthma_num + cancer_num + pregnancy_num,
    
    comorbid_factor = factor(case_when(
      comorbid_count == 0 ~ "0",
      comorbid_count == 1 ~ "1",
      comorbid_count >= 2 ~ "2+",
      TRUE ~ "0"
    ), levels = c("0", "1", "2+"))
  ) %>%
  # Filter for Influenza Positives
  filter(influenza_type %in% c("Flu A", "Flu B", "Flu A&B"))

# 6. Standardizing Subtypes
df_clean <- df_clean %>%
  mutate(
    influenza_sub_type = str_trim(str_to_upper(influenza_sub_type)),
    influenza_sub_type = case_match(
      influenza_sub_type,
      c("B VICTORIA", "B/VICTORIA")         ~ "B/Victoria",
      c("NOT SUBTYPED", "NOT_SUBTYPED")     ~ "Not Subtyped",
      c("A/H1", "2009 A/H1N1", "A/H1N1")    ~ "A(H1N1)pdm09",
      "A/H3"                                ~ "A/H3N2",
      "ALL NEGATIVE"                        ~ "Negative",
      .default = influenza_sub_type
    ),
    influenza_sub_type = factor(influenza_sub_type)
  )

Patient Characteristics and Clinical Comorbidities

Table 1: Demographic and Clinical Characteristics
Patient Characteristic Overall
N = 432
1
ILI
N = 362
1
SARI
N = 70
1
p-value2
Age Group (Years)


0.7
    <5 254 (59%) 211 (59%) 43 (61%)
    5-17 72 (17%) 59 (16%) 13 (19%)
    18-49 79 (18%) 70 (19%) 9 (13%)
    50-64 16 (3.7%) 13 (3.6%) 3 (4.3%)
    65+ 9 (2.1%) 7 (1.9%) 2 (2.9%)
sex


>0.9
    Female 223 (52%) 187 (52%) 36 (51%)
    Male 208 (48%) 174 (48%) 34 (49%)
    Missing 1 (0.2%) 1 (0.3%) 0 (0%)
Diabetes Mellitus


0.4
    Missing 13 (3.0%) 10 (2.8%) 3 (4.3%)
    No 415 (96%) 349 (96%) 66 (94%)
    Yes 4 (0.9%) 3 (0.8%) 1 (1.4%)
HIV/AIDS Positive


>0.9
    Missing 3 (0.7%) 3 (0.8%) 0 (0%)
    No 419 (97%) 350 (97%) 69 (99%)
    Yes 10 (2.3%) 9 (2.5%) 1 (1.4%)
Chronic Heart Disease 2 (0.5%) 2 (0.6%) 0 (0%) >0.9
Asthma 5 (1.2%) 3 (0.8%) 2 (2.9%) 0.2
Cancer/Malignancy 1 (0.2%) 1 (0.3%) 0 (0%) >0.9
Pregnancy Status


0.7
    Missing 12 (2.8%) 11 (3.0%) 1 (1.4%)
    No 420 (97%) 351 (97%) 69 (99%)
Number of Comorbidities


0.8
    0 410 (95%) 344 (95%) 66 (94%)
    1 22 (5.1%) 18 (5.0%) 4 (5.7%)
    2+ 0 (0%) 0 (0%) 0 (0%)
1 n (%)
2 Fisher’s exact test

A total of 432 laboratory-confirmed influenza cases were included in the analysis, comprising 362 (83.8%) patients with Influenza-Like Illness (ILI) and 70 (16.2%) with Severe Acute Respiratory Infection (SARI). The demographic and clinical profiles of these patients are summarized in Table 1.

Demographic Profile

The study population was predominantly pediatric, with children under five years of age accounting for 58.8% (n=254) of the total cohort. The proportion of children under five was slightly higher among SARI cases (61.4%) compared to ILI cases (58.3%), though this difference was not statistically significant (p=0.7). Sex distribution was nearly equal, with females representing 51.6% (n=223) of all cases. No significant association was found between sex and disease severity (p>0.9).

Prevalence of Comorbidities

The overall prevalence of recorded chronic comorbidities was low in this cohort, with 94.9% (n=410) of patients having no documented underlying conditions. Among the specific conditions investigated:

  • HIV/AIDS was the most frequent comorbidity, present in 2.3% (n=10) of the population, followed by Asthma (1.2%, n=5) and Diabetes Mellitus (0.9%, n=4).

  • Asthma showed the most notable trend toward severity, being present in 2.9% of SARI cases compared to 0.8% of ILI cases (p=0.2).

  • Diabetes Mellitus was recorded in 1.4% of SARI cases and 0.8% of ILI cases (p=0.4).

  • Pregnancy, Chronic Heart Disease, and Cancer/Malignancy were rare, each occurring in <=0.5% of the total cohort.

Statistical analysis using Fisher’s Exact Test revealed that no single comorbidity or the cumulative number of comorbidities (p=0.8) was significantly associated with an increased risk of SARI in this specific sample.

Clinical Presentation of ILI vs. SARI Cases

Table 2: Frequency of Clinical Symptoms by Case Definition
Symptom Reported ILI
N = 362
1
SARI
N = 70
1
p-value2
Fever

0.029
    Missing 46 (13%) 15 (21%)
    No 141 (39%) 17 (24%)
    Yes 175 (48%) 38 (54%)
Cough

0.2
    Missing 0 (0%) 1 (1.4%)
    No 10 (2.8%) 1 (1.4%)
    Yes 352 (97%) 68 (97%)
Sore Throat

0.2
    Missing 3 (0.8%) 1 (1.4%)
    No 288 (80%) 49 (70%)
    Yes 71 (20%) 20 (29%)
Shortness of Breath

<0.001
    Missing 3 (0.8%) 0 (0%)
    No 326 (90%) 37 (53%)
    Yes 33 (9.1%) 33 (47%)
1 n (%)
2 Fisher’s exact test

The clinical presentation of influenza-positive patients varied significantly by case definition (Table 2). While certain core symptoms were near-universal across both groups, others served as strong indicators of severe disease.

  • Respiratory Distress: The most significant clinical differentiator was Shortness of Breath. While only 9.1% (n=33) of ILI patients reported this symptom, it was present in 47% (n=33) of SARI cases (p < 0.001). This confirms that respiratory distress is the primary hallmark distinguishing severe cases (SARI) from mild presentations (ILI) in this cohort.

  • Fever: There was a statistically significant difference in the reporting of Fever between the two groups (p = 0.029). SARI patients had a higher rate of confirmed fever (54%) compared to ILI patients (48%). Notably, the SARI group also had a higher proportion of missing fever data (21% vs 13%), which may reflect limitations in temperature monitoring during emergency admissions.

  • Cough: This was the most common symptom overall, present in 97% of both ILI and SARI cases. Due to its near-universal presence in laboratory-confirmed influenza, it did not serve as a predictor of severity (p = 0.2).

  • Sore Throat: Although a higher proportion of SARI cases reported a sore throat (29% vs 20% in ILI), this difference did not reach statistical significance (p = 0.2).



Virological Profile and Subtype Distribution

Table 3: Distribution of Standardized Influenza Subtypes
Subtype N = 4321
Influenza Subtype
    A(H1N1)pdm09 70 (16%)
    A/H3N2 13 (3.0%)
    B/Victoria 33 (7.6%)
    Negative 1 (0.2%)
    Not Subtyped 315 (73%)
1 n (%)

The distribution of standardized influenza subtypes among the study population (N=432) is presented in Table 3.

Key Findings

  • Dominant Strains: Among the samples that were successfully subtyped, Influenza A(H1N1)pdm09 was the most prevalent strain, accounting for 16% (n=70) of the total positive cases. This was followed by Influenza B/Victoria at 7.6% (n=33) and Influenza A/H3N2 at 3.0% (n=13).

  • Subtyping Gap: A significant proportion of the laboratory-confirmed influenza cases, 73% (n=315), were recorded as “Not Subtyped.” This indicates that while these samples were confirmed positive for Influenza A or B via initial screening, specific lineage or subtype identification was not completed or available in the surveillance record.

  • Negative Controls: One sample (0.2%) was re-classified as negative during the standardization process, likely representing a sample that initially screened weak-positive but failed to meet subtyping thresholds.



Influenza Severity by Viral Subtype

The relationship between influenza subtypes and clinical severity (ILI vs. SARI) is illustrated in Figure 3. This proportional analysis allows for a comparison of the “virulence” or severity risk associated with each specific strain identified in the cohort.

Key Findings

  • Consistency in Severity Risk: Among the subtyped samples, A(H1N1)pdm09 demonstrated the highest proportion of severe cases, with 17.1% (n=12) of cases meeting the SARI criteria. This was closely mirrored by the “Not Subtyped” group, which also showed a SARI proportion of 17.1% (n=54).

  • Lineage Differences: Influenza B/Victoria appeared to have a lower clinical severity profile in this cohort, with only 12.1% (n=4) of cases resulting in SARI, compared to 87.9% (n=29) presenting as ILI.

  • Mild Presentations: All identified cases of A/H3N2 (n=13) and the single standardized Negative case presented exclusively as ILI (100%). However, the small sample size for A/H3N2 limits the ability to conclude that this strain is less severe in the broader population.

  • Volume vs. Risk: While the “Not Subtyped” group accounts for the largest absolute volume of SARI cases (n=54), the percentage of severity remains aligned with the dominant A(H1N1)pdm09 strain, suggesting no significant selection bias in which samples undergo subtyping based on clinical severity.



This chart quantifies how the accumulation of health conditions correlates with SARI.