HW #9 Epi

Part 1 - Measures of Disease Frequency

Influenza data were collected in a suburban city among adult population aged 18–70. Participants were monitored from October 1, 2019 – March 1, 2020 (covering the typical U.S. flu season; ~120 days). At enrollment, participants were tested using PCR to determine: whether they had current influenza infection → disease. Throughout the flu season, participants reported symptoms weekly through an online portal, received testing if symptomatic. They could become new cases if the tested positive. They continued contributing to the study until they developed the flu, dropped out, or the flu season ended

influenza_data <- read.csv("~/Downloads/epi_influenza_dataset_days_discrete.csv")
gi_data <- read.csv("~/Downloads/epi_gi_water_dataset.csv")
library (dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
  1. Calculate prevalence of the “disease” variable

    This means that 2.4% of people had influenza at enrollment.

  2. Calculate cumulative incidence of new cases

::: {.cell}

```{.r .cell-code}
influenza_data |> summarize(
  ci_risk= sum(influenza_data$new_case==1)/nrow(influenza_data)
)
```

::: {.cell-output .cell-output-stdout}

```
  ci_risk
1    0.11
```


:::
:::



The risk of someone getting the flu over the study period is 11%.
  1. Calculate the incidence rate using person-time

    influenza_data |> summarize(
      incidence_rate = sum(influenza_data$new_case)/ sum(influenza_data$pt_days)
    )
      incidence_rate
    1    0.001216168
    0.001216168*365
    [1] 0.4439013
    0.4439013*10
    [1] 4.439013

There is an incidence rate of 4.439013 per 10 person-years for influenza.

Part 2 — Measures of Association

Data were collected to determine the relationship of GI disease in the community. After multiple reports of acute gastroenteritis in the community, the local health department launched an investigation to see whether drinking untreated water increased the risk of GI illness.
Design: Short-term prospective cohort
Sample size: n=600 residents
Follow-up: ~3–5 weeks during a suspected contamination episode

Residents were enrolled and completed a baseline survey:

Source of most drinking water in the last month:
  - Untreated (private well, spring, surface water, or unfiltered tap) → exposed_untreated = 1
  - Treated (municipal treated water, filtered, or bottled) → exposed_untreated = 0

Participants were then followed for 20–40 days, during which they reported:
  Onset of acute GI symptoms
  Duration of follow-up (censoring if they moved, were hospitalized, or stopped responding)

Any participant meeting the GI case definition during follow-up was coded as:
  outcome_gi = 1 (incident GI illness)

  outcome_gi = 0 (no GI symptoms during their follow-up time)

Create a 2×2 table of exposure by outcome.

library(epitools)

gi_table <- gi_data |> select(exposed_untreated, outcome_gi) |> table()

gi_table
                 outcome_gi
exposed_untreated   0   1
                0 376  29
                1 154  41

Risk Ratio

riskratio(gi_table)
$data
                 outcome_gi
exposed_untreated   0  1 Total
            0     376 29   405
            1     154 41   195
            Total 530 70   600

$measure
                 risk ratio with 95% C.I.
exposed_untreated estimate    lower    upper
                0  1.00000       NA       NA
                1  2.93634 1.883905 4.576711

$p.value
                 two-sided
exposed_untreated   midp.exact fisher.exact   chi.square
                0           NA           NA           NA
                1 1.977105e-06 2.244548e-06 7.226956e-07

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

Those who are exposed to untreated water are 2.94 times more likely to experience GI illness compared to those who are not exposed to untreated water.

Risk Difference

risk_table <- gi_data |>
group_by(exposed_untreated) |>
summarise(
risk = mean(outcome_gi),
n = n(),
.groups = "drop"
) |>
arrange(exposed_untreated) # 0 = unexposed, 1 = exposed
risk_table
# A tibble: 2 × 3
  exposed_untreated   risk     n
              <int>  <dbl> <int>
1                 0 0.0716   405
2                 1 0.210    195
risk_table$risk[2]-risk_table$risk[1]
[1] 0.1386515

With a risk difference of 0.1386515, those exposed to untreated water have a 13.9% greater chance of developing GI illness.