We want to know the contents of the summary sheet presented at each patient encounter.
There are a few challenges. First, facilities do not track if summary sheets are present at a specific encounter for a specific patient. Our team of DAs are conducting spot checks generate facility-level numbers, but we are also interested in knowing (i) if a summary sheet was present at a specific encounter and, if yes, (ii) knowing the content of the summary sheet (e.g., which reminders, how many reminders).
Since it is not possible to collect summary sheets after encounters for retrospective evaluation, (i) is unknowable for every patient. The question at hand is whether we can come up with an approximation for (ii).
To do this, we asked Win to create a script that logs when summary sheets are generated or viewed (note: neither are equivalent to “printed”). Win wrote the script, but we are left with another challenge: (iii) summary sheets are generated every time there is a change in the AMRS, so there is not a 1-to-1 match with patient encounters.
So how can we make an educated guess about the content of the summary sheet for a specific patient that, if actually printed/delivered/viewed, would have been available to a clinician during a specific encounter.
Use the patient ID and date of encounter to match to the patient's most recent summary sheet. For instance, if patient A has an encounter on Feb 1, and if we know summary sheets were generated/viewed on Jan 5, Jan 20, and Feb 5, then we match the Feb 1 encounter to the most recent log entry on or before Feb 1: Jan 20.
This is with no assumptions. We could add some additional specifications to say that (a) if the site gets summary sheets delivered and if (b) the time between the patient encounter and the most recent summary sheet generation is less than X days, then it would not be possible for this summary sheet to have been present at the encounter. Instead, match the encounter to the previous summary sheet.
Keny queried all return encounters from April 11, 2014 (study launch) to June 16, 2014. He also grabbed all log data, generated and viewed, since the study launched. So in the first step we import all datasets and merge the different log files. File structures vary, so we have to do a bit of munging.
# import logs
logs1v <- read.csv("logs/log1v.csv", stringsAsFactors=FALSE, header=F)
logs1g <- read.csv("logs/log1g.csv", stringsAsFactors=FALSE, header=F)
logs2v <- read.csv("logs/log2v.csv", stringsAsFactors=FALSE, header=F)
logs2g <- read.csv("logs/log2g.csv", stringsAsFactors=FALSE, header=F)
logs3v <- read.csv("logs/log3v.csv", stringsAsFactors=FALSE, header=F)
logs3g <- read.csv("logs/log3g.csv", stringsAsFactors=FALSE, header=F)
# rename and format date
names(logs1v) <- c("drop", "patientId", "logDate", "drop2", "requestedby",
"nonTB", "ieLoc", "ieDate", "TBreminder1", "TBreminder2")
names(logs1g) <- c("drop", "patientId", "logDate", "drop2", "requestedby",
"nonTB", "ieLoc", "ieID", "TBreminder1", "TBreminder2",
"TBreminder3")
logs1v$logDate <- dmy(substr(logs1v$logDate, 1, 12))
logs1g$logDate <- dmy(substr(logs1g$logDate, 1, 12))
logs1v <- logs1v[,c(-1, -4)]
logs1g <- logs1g[,c(-1, -4)]
names(logs2v) <- c("drop", "patientId", "logDate", "drop2", "requestedby",
"nonTB", "TBreminder1", "TBreminder2",
"TBreminder3")
names(logs2g) <- c("drop", "patientId", "logDate", "drop2", "requestedby",
"nonTB", "TBreminder1", "TBreminder2",
"TBreminder3")
logs2v$temp1 <- substr(logs2v$logDate, 1, 12)
foo <- data.frame(do.call('rbind',
strsplit(as.character(logs2v$temp1),
'-',
fixed=TRUE)))
foo$X4 <- ifelse(nchar(as.character(foo$X3))==2,
paste0("20", as.character(foo$X3)), as.character(foo$X3))
logs2v$logDate <- paste(foo$X1, foo$X2, foo$X4, sep="-")
logs2v$logDate <- dmy(logs2v$logDate)
## Warning: 1 failed to parse.
remove(foo)
logs2v$temp1 <- NULL
logs2g$temp1 <- substr(logs2g$logDate, 1, 12)
foo <- data.frame(do.call('rbind',
strsplit(as.character(logs2g$temp1),
'-',
fixed=TRUE)))
## Warning: number of columns of result is not a multiple of vector length
## (arg 2427)
foo$X4 <- ifelse(nchar(as.character(foo$X3))==2,
paste0("20", as.character(foo$X3)), as.character(foo$X3))
logs2g$logDate <- paste(foo$X1, foo$X2, foo$X4, sep="-")
logs2g$logDate <- dmy(logs2g$logDate)
## Warning: 1 failed to parse.
remove(foo)
logs2g$temp1 <- NULL
logs2v <- logs2v[,c(-1, -4)]
logs2g <- logs2g[,c(-1, -4)]
names(logs3v) <- c("drop", "patientId", "logDate", "drop2", "requestedby",
"nonTB", "ieLoc", "ieID", "ieDate", "TBreminder1",
"TBreminder2", "TBreminder3")
names(logs3g) <- c("drop", "patientId", "logDate", "drop2", "requestedby",
"nonTB", "ieLoc", "ieID", "ieDate", "TBreminder1",
"TBreminder2", "TBreminder3")
logs3v$logDate <- dmy(substr(logs3v$logDate, 1, 12))
logs3g$logDate <- dmy(substr(logs3g$logDate, 1, 12))
logs3v <- logs3v[,c(-1, -4)]
logs3g <- logs3g[,c(-1, -4)]
# row bind
library(plyr)
##
## Attaching package: 'plyr'
##
## The following object is masked from 'package:lubridate':
##
## here
logs1 <- rbind.fill(logs1g, logs2g, logs3g)
logs2 <- rbind.fill(logs1v, logs2v, logs3v)
# add a column to each file to indicate generated or viewed
logs1$type <- "g"
logs2$type <- "v"
# combine log files
logs <- rbind.fill(logs1, logs2)
# encounters
encs <- read.csv("logs/enc.csv", stringsAsFactors=FALSE)
names(encs) <- c("patientId", "encLocID", "encDate", "encLocname")
encs$encDate <- mdy(encs$encDate)
#encs <- subset(encs, encs$encDate >= mdy("01-01-2014"))
We can sort the data ascending order by patient and date. Keny says that it is possible to have duplicate encoutners (patient + date), so we check and remove.
# remove duplicates
encs <- encs[ !duplicated(encs$patientId, encs$encDate), ]
# sort
logs <- logs[order(logs$patientId, logs$logDate),]
encs <- encs[order(encs$patientId, encs$encDate),]
This leaves us with 32710 return encounters and 980451 log entries (going back to 2013). Now we can merge the two datasets by patientId and keep all combinations.
dat <- merge(encs, logs, by="patientId", all=T)
dat <- subset(dat, !is.na(dat$patientId))
The dataframe is now arranged in such a way that patients can have multiple summary sheet dates for each encounter date. Let's create a new column that calculates the difference in days.
dat$diff <- difftime(dat$encDate, dat$logDate, units="days")
Under the simple assumptions, we want to drop any duplicates according to logDate and patientID. When we use the !duplicated() function below, R is going to keep only the first instance of the duplicate (patientId + encDate). So we can sort the dataframe so that the first instance of each duplicate will be the combination with the smallest, positive value for diff. We only want positive differences because logs have to be generated on or before the encounter date to have any chance of the summary being viewed at the encounter.
# remove negative values
dat <- dat[dat$diff>=0,]
# if we wanted, we could say if encLoc=="a" | encLoc=="b", then >=7
# sort ascending again, this time with diff
dat <- dat[order(dat$patientId, dat$encDate, dat$diff),]
# remove duplicates leaving first match (smallest diff value)
dat2 <- dat[ !duplicated(dat$patientId, dat$encDate), ]
Now every return encounter is matched to the first log entry (generated or viewed) on or before the encounter. Let's tally the number of return encounters by location and see if all of these encounters have data in the logs. We're not being picky here in this demonstration. Any log entry on or before the encounter date counts. We'll limit ourselves to study mother sites.
tbl <- data.frame(table(dat2$encLocname))
names(tbl) <- c("encLocname", "matched")
encnum <- aggregate(patientId ~ encLocname, data = encs, FUN = length)
names(encnum) <- c("encLocname", "total.enc")
dat3 <- merge(encnum, tbl, by = "encLocname", all.x = TRUE)
dat3$matched[is.na(dat3$matched)] <- 0
dat3$per <- dat3$matched/dat3$total.enc
# limit to study mother sites
sites <- c("Khuyangu", "Mukhobola", "Turbo", "Ziwa", "Kitale", "Uasin Gishu District Hospital",
"Webuye", "Port Victoria", "Bumala A", "Mt. Elgon", "Iten", "Mois Bridge",
"Teso", "Mosoriot", "Busia", "ANGURAI", "Huruma SDH", "Chulaimbo", "Bumala B",
"Burnt Forest")
dat3 <- dat3[dat3$encLocname %in% sites, ]
dat3
## encLocname total.enc matched per
## 5 ANGURAI 335 334 0.9970
## 7 Bumala A 861 860 0.9988
## 8 Bumala B 349 347 0.9943
## 9 Burnt Forest 825 814 0.9867
## 10 Busia 123 123 1.0000
## 17 Chulaimbo 26 26 1.0000
## 21 Huruma SDH 167 164 0.9820
## 22 Iten 450 444 0.9867
## 28 Khuyangu 1310 1298 0.9908
## 29 Kitale 2991 2963 0.9906
## 39 Mois Bridge 414 411 0.9928
## 40 Mosoriot 1424 1413 0.9923
## 41 Mt. Elgon 143 142 0.9930
## 46 Mukhobola 697 685 0.9828
## 51 Port Victoria 1569 1556 0.9917
## 60 Teso 673 670 0.9955
## 62 Turbo 1398 1380 0.9871
## 63 Uasin Gishu District Hospital 681 674 0.9897
## 64 Webuye 2144 2140 0.9981
## 65 Ziwa 281 275 0.9786
So we see that we were able to match plausible log entries for most patients across all facilities. Success!
Here's an example to show how the matching works. Take a look at patient 656. She had an encounter on April 28, 2014, and we matched this encounter to every log entry we found for her that occured on or before this date.
subset(dat, dat$patientId == 656)
## patientId encLocID encDate encLocname
## 602328 656 70 2014-04-28 Pioneer Sub-District Hospital
## 602329 656 70 2014-04-28 Pioneer Sub-District Hospital
## 602321 656 70 2014-04-28 Pioneer Sub-District Hospital
## 602323 656 70 2014-04-28 Pioneer Sub-District Hospital
## 602326 656 70 2014-04-28 Pioneer Sub-District Hospital
## 602324 656 70 2014-04-28 Pioneer Sub-District Hospital
## 602325 656 70 2014-04-28 Pioneer Sub-District Hospital
## 602322 656 70 2014-04-28 Pioneer Sub-District Hospital
## logDate requestedby nonTB ieLoc ieID TBreminder1 TBreminder2
## 602328 2014-03-02 rkkisia 1 NA NA
## 602329 2014-03-02 daemon 1 NA NA
## 602321 2013-12-21 daemon 1 NA NA
## 602323 2013-12-21 rjkeitany 1 NA NA
## 602326 2013-12-21 rjkeitany 1 NA NA
## 602324 2013-10-14 rkkisia 0 NA NA
## 602325 2013-10-14 daemon 0 NA NA
## 602322 2013-10-10 daemon 0 NA NA
## TBreminder3 ieDate type diff
## 602328 <NA> v 57 days
## 602329 <NA> g 57 days
## 602321 <NA> g 128 days
## 602323 <NA> v 128 days
## 602326 <NA> v 128 days
## 602324 <NA> v 196 days
## 602325 <NA> g 196 days
## 602322 <NA> g 200 days
In the end, we matched her encounter to the most recent log entry on March 2. [Sadly, in this example, the most recent summary sheet possible was 57 days old. But this is not our concern here.]
subset(dat2, dat2$patientId == 656)
## patientId encLocID encDate encLocname
## 602328 656 70 2014-04-28 Pioneer Sub-District Hospital
## logDate requestedby nonTB ieLoc ieID TBreminder1 TBreminder2
## 602328 2014-03-02 rkkisia 1 NA NA
## TBreminder3 ieDate type diff
## 602328 <NA> v 57 days
Does this mean that when patient 656 arrived for her appointment on April 28 that her clinician viewed her summary sheet? No. We have no way to know this.
Does it mean that a summary sheet was printed and in her file when she met with the clinician? Again, no.
Does it mean that, if a summary sheet was printed, we have a reasonable guess about the content? Here, I think the answer is yes. It's probable that, if printed and placed in her file, patient 656's provider might have been exposed to 1 non-TB reminder.
Before and after we turned “on” the new TB reminders for facilities randomized to treatment, Keny verified that only treatment sites were getting reminders. Let's check to make sure that is still the case.
We count the number of patients who have received TB reminders since the study launched. Then we aggregate by site and merge with the site's treatment assignement.
The (potentially very) bad news is that it looks like patients with encounters at CONTROL sites are getting TB reminders. I've coded “contamination==1” if the site is control and if any patients are receiving TB reminders.
dat6 <- merge(cttb, rand, by.y = "site.id", by.x = "encLocID")
dat6$contamination <- ifelse(dat6$trt == 0 & dat6$ctTBreminders > 0, 1, 0)
dat6 <- dat6[, -c(1, 3)]
dat6 <- dat6[order(-dat6$contamination), ]
dat6
## ctTBreminders trt contamination
## 4 12 0 1
## 6 5 0 1
## 7 119 0 1
## 9 13 0 1
## 10 24 0 1
## 12 199 0 1
## 16 108 0 1
## 22 18 0 1
## 27 72 0 1
## 29 1 0 1
## 30 55 0 1
## 32 3 0 1
## 35 7 0 1
## 37 63 0 1
## 38 1 0 1
## 40 1 0 1
## 1 89 1 0
## 2 69 1 0
## 3 43 1 0
## 5 50 1 0
## 8 59 1 0
## 11 129 1 0
## 13 7 1 0
## 14 50 1 0
## 15 10 1 0
## 17 0 0 0
## 18 0 1 0
## 19 0 1 0
## 20 23 1 0
## 21 20 1 0
## 23 7 1 0
## 24 0 1 0
## 25 1 1 0
## 26 8 1 0
## 28 0 0 0
## 31 0 0 0
## 33 10 1 0
## 34 10 1 0
## 36 1 1 0
## 39 0 0 0
## 41 0 1 0
## 42 0 0 0
## 43 0 1 0