# Processing the raw data
df_clean <- df %>%
  clean_names() %>%
  mutate(
    # 1. Convert Excel Serial Dates and fix typos
    date_onset = case_when(
      str_detect(date_of_onset_of_fever, "^[0-9]{5}$") ~ 
        as.Date(as.numeric(date_of_onset_of_fever), origin = "1899-12-30"),
      str_detect(date_of_onset_of_fever, "-1012$") ~ 
        dmy(str_replace(date_of_onset_of_fever, "-1012$", "-2012")),
      TRUE ~ ymd(str_remove(date_of_interview, " UTC")) 
    ),
    
    # 2. Standardize Age
    age_years = as.numeric(age),
    age_group = cut(age_years, 
                    breaks = c(0, 5, 18, 50, 65, Inf),
                    labels = c("<5", "5-17", "18-49", "50-64", "65+")),
    
    # 3. Outcome Variable
    severity = factor(type_of_case, levels = c("ILI", "SARI")),
    
    # 4. Comorbidity Logic (Expanded to include Pregnancy, Asthma, Cancer)
    # We create numeric versions for the sum, then clean the factors
    across(c(diabetes, hiv_aids, heart_disease, asthma, cancer, pregnancy), 
           ~case_when(. == "Yes" ~ 1, TRUE ~ 0), .names = "{.col}_num"),
    
    # 5. Create Comorbidity Count (Summing the numeric versions)
    comorbid_count = diabetes_num + hiv_aids_num + heart_disease_num + 
                     asthma_num + cancer_num + pregnancy_num,
    
    comorbid_factor = factor(case_when(
      comorbid_count == 0 ~ "0",
      comorbid_count == 1 ~ "1",
      comorbid_count >= 2 ~ "2+",
      TRUE ~ "0"
    ), levels = c("0", "1", "2+"))
  ) %>%
  # Filter for Influenza Positives
  filter(influenza_type %in% c("Flu A", "Flu B", "Flu A&B"))

# 6. Standardizing Subtypes
df_clean <- df_clean %>%
  mutate(
    influenza_sub_type = str_trim(str_to_upper(influenza_sub_type)),
    influenza_sub_type = case_match(
      influenza_sub_type,
      c("B VICTORIA", "B/VICTORIA")         ~ "B/Victoria",
      c("NOT SUBTYPED", "NOT_SUBTYPED")     ~ "Not Subtyped",
      c("A/H1", "2009 A/H1N1", "A/H1N1")    ~ "A(H1N1)pdm09",
      "A/H3"                                ~ "A/H3N2",
      "ALL NEGATIVE"                        ~ "Negative",
      .default = influenza_sub_type
    ),
    influenza_sub_type = factor(influenza_sub_type)
  )
Table 1: Demographic and Clinical Characteristics
Patient Characteristic Overall
N = 432
1
ILI
N = 362
1
SARI
N = 70
1
p-value2
Age Group (Years)


0.7
    <5 254 (59%) 211 (59%) 43 (61%)
    5-17 72 (17%) 59 (16%) 13 (19%)
    18-49 79 (18%) 70 (19%) 9 (13%)
    50-64 16 (3.7%) 13 (3.6%) 3 (4.3%)
    65+ 9 (2.1%) 7 (1.9%) 2 (2.9%)
sex


>0.9
    Female 223 (52%) 187 (52%) 36 (51%)
    Male 208 (48%) 174 (48%) 34 (49%)
    Missing 1 (0.2%) 1 (0.3%) 0 (0%)
Diabetes Mellitus


0.4
    Missing 13 (3.0%) 10 (2.8%) 3 (4.3%)
    No 415 (96%) 349 (96%) 66 (94%)
    Yes 4 (0.9%) 3 (0.8%) 1 (1.4%)
HIV/AIDS Positive


>0.9
    Missing 3 (0.7%) 3 (0.8%) 0 (0%)
    No 419 (97%) 350 (97%) 69 (99%)
    Yes 10 (2.3%) 9 (2.5%) 1 (1.4%)
Chronic Heart Disease 2 (0.5%) 2 (0.6%) 0 (0%) >0.9
Asthma 5 (1.2%) 3 (0.8%) 2 (2.9%) 0.2
Cancer/Malignancy 1 (0.2%) 1 (0.3%) 0 (0%) >0.9
Pregnancy Status


0.7
    Missing 12 (2.8%) 11 (3.0%) 1 (1.4%)
    No 420 (97%) 351 (97%) 69 (99%)
Number of Comorbidities


0.8
    0 410 (95%) 344 (95%) 66 (94%)
    1 22 (5.1%) 18 (5.0%) 4 (5.7%)
    2+ 0 (0%) 0 (0%) 0 (0%)
1 n (%)
2 Fisher’s exact test

Table 1: Patient Characteristics and Clinical Comorbidities

A total of 432 laboratory-confirmed influenza cases were included in the analysis, comprising 362 (83.8%) patients with Influenza-Like Illness (ILI) and 70 (16.2%) with Severe Acute Respiratory Infection (SARI). The demographic and clinical profiles of these patients are summarized in Table 1.

Demographic Profile

The study population was predominantly pediatric, with children under five years of age accounting for 58.8% (n=254) of the total cohort. The proportion of children under five was slightly higher among SARI cases (61.4%) compared to ILI cases (58.3%), though this difference was not statistically significant (p=0.7). Sex distribution was nearly equal, with females representing 51.6% (n=223) of all cases. No significant association was found between sex and disease severity (p>0.9).

Prevalence of Comorbidities

The overall prevalence of recorded chronic comorbidities was low in this cohort, with 94.9% (n=410) of patients having no documented underlying conditions. Among the specific conditions investigated: - HIV/AIDS was the most frequent comorbidity, present in 2.3% (n=10) of the population, followed by Asthma (1.2%, n=5) and Diabetes Mellitus (0.9%, n=4). - Asthma showed the most notable trend toward severity, being present in 2.9% of SARI cases compared to 0.8% of ILI cases (p=0.2). - Diabetes Mellitus was recorded in 1.4% of SARI cases and 0.8% of ILI cases (p=0.4). - Pregnancy, Chronic Heart Disease, and Cancer/Malignancy were rare, each occurring in <=0.5% of the total cohort.

Statistical analysis using Fisher’s Exact Test revealed that no single comorbidity or the cumulative number of comorbidities (p=0.8) was significantly associated with an increased risk of SARI in this specific sample.

Table 2: Frequency of Clinical Symptoms by Case Definition
Symptom Reported ILI
N = 362
1
SARI
N = 70
1
p-value2
Fever

0.029
    Missing 46 (13%) 15 (21%)
    No 141 (39%) 17 (24%)
    Yes 175 (48%) 38 (54%)
Cough

0.2
    Missing 0 (0%) 1 (1.4%)
    No 10 (2.8%) 1 (1.4%)
    Yes 352 (97%) 68 (97%)
Sore Throat

0.2
    Missing 3 (0.8%) 1 (1.4%)
    No 288 (80%) 49 (70%)
    Yes 71 (20%) 20 (29%)
Shortness of Breath

<0.001
    Missing 3 (0.8%) 0 (0%)
    No 326 (90%) 37 (53%)
    Yes 33 (9.1%) 33 (47%)
1 n (%)
2 Fisher’s exact test



Table 3: Distribution of Standardized Influenza Subtypes
Subtype N = 4321
Influenza Subtype
    A(H1N1)pdm09 70 (16%)
    A/H3N2 13 (3.0%)
    B/Victoria 33 (7.6%)
    Negative 1 (0.2%)
    Not Subtyped 315 (73%)
1 n (%)



This chart quantifies how the accumulation of health conditions correlates with SARI.