1 Introduction.
2 Load and clean COVID-19 data.
- 2.1 Extracting daily incidence of lab confirmed cases
- 2.2 Creating a list of states
3 Filter state of choice for further processing.
4 Epidemic curve using ggplot2
- 4.1 Epidemic curve for total duration.
- 4.2 Epidemic curve for a selected timeline.
5 Estimating parameters using incidence package.

1 Introduction.

The present document aims to provide understanding on use of IDM concepts to undertake the situational assessment of an epidemic/ outbreak/ pandemic by plotting epidemic curves and calculating effective reproduction numbers \(R_{eff}\), growth rates and doubling time. We shall be using incidence and tidyverse packages for the same. Additional packages recommended for learning are provided at Free and open training resources to respond to outbreaks, health emergencies and humanitarian crises website https://www.reconlearn.org/ . We shall not be discussing other packages but interested students are encouraged to go through the website and discuss on moodle platform/ individual sessions over the next week.

In an outbreak situation, in addition to on ground prevention and control (as discussed in outbreak investigation part of this module), a public health specialist is expected to:-

Generate and interpret epidemic curves.
Calculate Effective reproduction numbers.
Calculate growth rates.
Calculate doubling/ halving time,.

2 Load and clean COVID-19 data.

We will fetch COVID-19 data as available from covid19india.org website for India.

# Fetch and load data
dat <- data.table::fread("https://api.covid19india.org/csv/latest/state_wise_daily.csv")
# Convert to tidy format
dat <- dat %>% pivot_longer(
  cols = 4:42,
  names_to = "States",
  values_to = "Cases"
)
# EDA
glimpse(dat)

## Rows: 29,367
## Columns: 5
## $ Date     <chr> "14-Mar-20", "14-Mar-20", "14-Mar-20", "14-Mar-20", "14-Ma...
## $ Date_YMD <date> 2020-03-14, 2020-03-14, 2020-03-14, 2020-03-14, 2020-03-1...
## $ Status   <chr> "Confirmed", "Confirmed", "Confirmed", "Confirmed", "Confi...
## $ States   <chr> "TT", "AN", "AP", "AR", "AS", "BR", "CH", "CT", "DN", "DD"...
## $ Cases    <int> 81, 0, 1, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 14, 0, 2, 0, 6, 19...

unique(dat$Status)

## [1] "Confirmed" "Recovered" "Deceased"

2.1 Extracting daily incidence of lab confirmed cases

# filter only confirmed cases
dat <- dat %>% filter(
  dat$Status == "Confirmed"
)
# Data Cleaning
dat$Cases <- if_else(dat$Cases>0, 
                         dat$Cases,
                         abs(dat$Cases))
dat$Date_YMD <- lubridate::ymd(dat$Date_YMD)

2.2 Creating a list of states

listofstates <- unique(dat$States)
listofstates

##  [1] "TT" "AN" "AP" "AR" "AS" "BR" "CH" "CT" "DN" "DD" "DL" "GA" "GJ" "HR" "HP"
## [16] "JK" "JH" "KA" "KL" "LA" "LD" "MP" "MH" "MN" "ML" "MZ" "NL" "OR" "PY" "PB"
## [31] "RJ" "SK" "TN" "TG" "TR" "UP" "UT" "WB" "UN"

3 Filter state of choice for further processing.

We can develop iterations for all the states, but avoiding same to avoid complexity. Hence, we have to manually choose the state of choice for demo purposes.

n= 19 # Change to state of your choice
states = listofstates[[n]]
states

## [1] "KL"

df <- dat %>% filter(dat$States == states)
tail(df)

4 Epidemic curve using ggplot2

4.1 Epidemic curve for total duration.

ggplot(data = df, aes(x = Date_YMD, y = Cases)) +
  geom_line(color = "red")+
  scale_color_discrete()+
  labs(y = "Daily COVID-19 cases",
       x = "Timeline")+
  theme_classic() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b")+
  theme(legend.position="bottom")

4.2 Epidemic curve for a selected timeline.

df %>% filter(df$Date_YMD > dmy("01092020")) %>% ggplot(aes(x = Date_YMD, y = Cases)) + 
  geom_line(color = "red")+
  scale_color_discrete()+
  labs(y = "Daily COVID-19 cases",
       x = "Timeline")+
  theme_classic() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b")+
  theme(legend.position="bottom")

5 Estimating parameters using incidence package.

To work with ìncidence`package, an incidence object is required to be created. The same takes atleast two arguments viz. the timeline and the number of cases on each day. There are alternate methods also available which can create incidence objects from a line list also.

5.1 Creating incidence object

i <- as.incidence(df$Cases, df$Date_YMD) 
i

## <incidence object>
## [545642 cases from days 2020-03-14 to 2020-11-19]
## 
## $counts: matrix with 251 rows and 1 columns
## $n: 545642 cases in total
## $dates: 251 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 251 days
## $cumulative: FALSE

5.2 Epidemic curve for the entire duration using incidence package

plot(i)

5.3 Epidemic curve for last 30 days using incidence package

plot(i[-30:-1])

5.4 Estimating parameters

f <- fit(i[200:250])
f

## <incidence_fit object>
## 
## $model: regression of log-incidence over time
## 
## $info: list containing the following items:
##   $r (daily growth rate):
## [1] -0.009662796
## 
##   $r.conf (confidence interval):
##            2.5 %       97.5 %
## [1,] -0.01420663 -0.005118962
## 
##   $halving (halving time in days):
## [1] 71.73361
## 
##   $halving.conf (confidence interval):
##        2.5 %   97.5 %
## [1,] 48.7904 135.4078
## 
##   $pred: data.frame of incidence predictions (51 rows, 5 columns)

5.5 Visualisation of predicted and observed incidence during a given time period.

plot(i[200:250], fit = f)

5.6 Splitting the datatset on basis of peak of epidemic identified.

best.fit <- fit_optim_split(i)
best.fit

## $df
##         dates   mean.R2
## 1  2020-09-09 0.4806508
## 2  2020-09-10 0.4732215
## 3  2020-09-11 0.4655507
## 4  2020-09-12 0.4565719
## 5  2020-09-13 0.4478886
## 6  2020-09-14 0.4409851
## 7  2020-09-15 0.4343875
## 8  2020-09-16 0.4323145
## 9  2020-09-17 0.4331681
## 10 2020-09-18 0.4355704
## 11 2020-09-19 0.4403081
## 12 2020-09-20 0.4458940
## 13 2020-09-21 0.4528206
## 14 2020-09-22 0.4803731
## 15 2020-09-23 0.5052529
## 16 2020-09-24 0.5209989
## 17 2020-09-25 0.5296152
## 18 2020-09-26 0.5376711
## 19 2020-09-27 0.5420031
## 20 2020-09-28 0.5430704
## 21 2020-09-29 0.5797107
## 22 2020-09-30 0.5835311
## 23 2020-10-01 0.5771983
## 24 2020-10-02 0.5746098
## 25 2020-10-03 0.5651661
## 26 2020-10-04 0.5635809
## 27 2020-10-05 0.5566840
## 28 2020-10-06 0.5880725
## 29 2020-10-07 0.5866284
## 30 2020-10-08 0.5691645
## 31 2020-10-09 0.5961152
## 32 2020-10-10 0.5839718
## 33 2020-10-11 0.5605408
## 34 2020-10-12 0.5443080
## 35 2020-10-13 0.5594027
## 36 2020-10-14 0.5453853
## 37 2020-10-15 0.5556621
## 38 2020-10-16 0.5483301
## 39 2020-10-17 0.5454343
## 40 2020-10-18 0.5273345
## 41 2020-10-19 0.5184269
## 42 2020-10-20 0.5477682
## 43 2020-10-21 0.5524239
## 44 2020-10-22 0.5361109
## 45 2020-10-23 0.5272639
## 46 2020-10-24 0.5075949
## 47 2020-10-25 0.4889441
## 48 2020-10-26 0.4822917
## 49 2020-10-27 0.5261437
## 50 2020-10-28 0.5516430
## 51 2020-10-29 0.5219627
## 52 2020-10-30 0.5117550
## 53 2020-10-31 0.5066843
## 54 2020-11-01 0.4806629
## 55 2020-11-02 0.4647556
## 56 2020-11-03 0.5242427
## 57 2020-11-04 0.5109508
## 58 2020-11-05 0.4661591
## 59 2020-11-06 0.4450809
## 60 2020-11-07 0.4219229
## 61 2020-11-08 0.4069926
## 62 2020-11-09 0.4012606
## 63 2020-11-10 0.4242461
## 
## $split
## [1] "2020-10-09"
## 
## $fit
## <list of incidence_fit objects>
## 
## attr(x, 'locations'): list of vectors with the locations of each incidence_fit object
## 
## 'before'
## 'after'
## 
## $model: regression of log-incidence over time
## 
## $info: list containing the following items:
##   $r (daily growth rate):
##      before       after 
##  0.03987393 -0.01240162 
## 
##   $r.conf (confidence interval):
##              2.5 %       97.5 %
## before  0.03812660  0.041621255
## after  -0.01844847 -0.006354774
## 
##   $doubling (doubling time in days):
##   before 
## 17.38347 
## 
##   $doubling.conf (confidence interval):
##           2.5 %   97.5 %
## before 16.65368 18.18014
## 
##   $halving (halving time in days):
##    after 
## 55.89165 
## 
##   $halving.conf (confidence interval):
##          2.5 %  97.5 %
## after 37.57207 109.075
## 
##   $pred: data.frame of incidence predictions (245 rows, 6 columns)
## 
## $plot

plot(i, fit = best.fit$fit)

Information from models to understand outbreak situation: An illustrative example using COVID-19 data

Dr Gurpreet Singh, Dr Biju Soman

2020-11-20