Regular cervical cancer screening reduces the risk of cancer incidence by approximately a factor of 3 to 4 compared to no screening (for women aged 35 to 79 years), and reduces mortality by a factor of 4 to 10. Regular cervical cancer screening reduces the risk of cervical cancer incidence and death by up to a factor of two compared to irregular cervical cancer screening (Landy et al., 2016).
According to Practice Incentive program data (May 2018), 61.14% of eligible patients at coHealth Kensington were screened appropriately for cervical cancer.
Using SQL searches in the Best Practice database, as of 1st March 2019 there were 857 eligible patients who were active in seeing the clinic. Of those, 290 patients (34%) had either no recording of a cervical screening, or no cervical screening recording for the previous forty-five months. At the time of writing, cervical screening done forty-five (45) or more months previously would be the ‘Pap’ smear, which needs to be repeated in twenty-four (24) months.
Note : For the purpose of this document, ‘eligible’ patients are females aged between 25 and 75 years, who are not otherwise marked as not requiring future cervical screening. ‘Active’ patients refers to patients who have had three or more contacts with the clinic in the previous two years, including at least once in the previous six months. If the information is available (as it was in March 2019, but not in preceding years), the ‘previous six months’ contact is a ‘billed visit’.
The number of active and eligible patients who either had no history of screening or were very overdue (more than 45 months old screen, when screening is due every 24 months) as of 1st March 2018 was 270 patients.
Over the next three months, fourteen of those patients (5.2%) had a recorded cervical screening.
In the next three months after that, an additional ten patients (3.7%) had a recorded cervical screening.
SQL search code used to count historical screening found at the end of this document. Note that no billing information is available prior to June 2018, so the definition of an ‘active’ patient is slightly different.
During visits for other purposes in a primary care clinic, patients are, where possible, recommended to have cervical screening. This can be aided by automated ‘real-time’ tools such as Doctor’s Control Panel, and the availability of a womens’ health nurse who is available to do cervical screening.
In addition, the clinic runs a reminder system which invites patients to repeat the cervical screening when the next screening is due. This is done with a letter, in English. The government also uses letters to remind people when the next cervical screening is due. Until very recenty, the ‘reminder’ interval has been two years, as cervical screening was done with the ‘Pap’ test until 2018.
Not available from latter half of 2019.
Current eligible practice population : 754 (according to May 2018 PIP data)
Annual outcome payment : $3 per eligible patient = $2262. Never yet received
Potential additional Cervical SIP revenue, if target Cervical SIP claim rate achieved, assuming that half of additionally screened women can have a cervical SIP item claimed = (70-61.14)/100 * 754 * $35 * 0.5 = $1169.1
Additional revenue if Cervical Outcome Payment achieved = $2262 (annual outcome payment) + $1169.1 (additional cervical SIP revenue) = $3431.1
Theory of Change
| Objectives Hierarchy | Indicator | Verification | Assumptions/Threats |
|---|---|---|---|
| (Input) Invitation to cervical screening by telephone | Telephone invitation | Call log | Telephone answered |
| Common language used | |||
| Social desirability bias? e.g. type of practitioner ringing, or previous contact with practitioner who rings | |||
| Diqualifying criteria for cervical screening e.g. hysterectomy | |||
| (Input) Invitation to cervical screening by mail | Letter invitations | Previous letter in file | Letter read |
| English literacy | |||
| Letter persuasive | |||
| (Output) Appointment for cervical screening | Appointments made | Appointment log | Appropriate cervical screening already done elsewhere |
| Disqualifying criteria for cervical screening e.g. hysterectomy | |||
| Appointments attended | Appointment book | Appointment is remembered | |
| Cervical screening (CST) at clinic | Cervical screening results (can be searched using clinical database). | Some cervical screening will be the result of the telephone invitation, others might screen for other reasons. | |
| Cervical screening (CST) might be done elsewhere | |||
| (Outcome) Improved cervical screening coverage | Improved cervical screening coverage | Government Practice Incentive Program reports | |
| Increase detection of HPV or cervical changes | Patient record audit of HPV/ pLSIL/LSIL detected (Low-grade squamous intraepithelial lesion, possible or confirmed) | ||
| HSIL detected (High-grade squamous intraepithelial lesion) | |||
| (Impact) Reduce cervical cancer | Reduced cervical cancer rates | Difficult! | Vulnerable population screened |
Summary
## DOB Age SeenBuddini RefugeeOrAsylum
## Min. :1944-04-04 Min. :25.00 Mode :logical Mode :logical
## 1st Qu.:1954-05-04 1st Qu.:36.25 FALSE:200 FALSE:197
## Median :1969-06-20 Median :49.00 TRUE :90 TRUE :93
## Mean :1968-11-20 Mean :49.87
## 3rd Qu.:1982-03-08 3rd Qu.:64.00
## Max. :1994-02-27 Max. :74.00
##
## Subgroup AgeGroup5
## Buddini : 65 25 :40
## Refugee : 68 70 :38
## Refugee+Buddini: 25 45 :36
## Standard :132 65 :34
## 30 :26
## 60 :26
## (Other):90
Number of patients at each age group, and whether they have a recorded refugee or asylum seeker background or seen by Dr Buddini in the past two years.
Population overview
Total population : 290
Previous proportion of under-screened patients who were screened in the three months after March 2018 was 5.2%. With a populaton of 290 patients, split between two groups, power of 0.8 and significance level of 0.05, the
##
## Two-sample comparison of proportions power calculation
##
## n = 145
## p1 = 0.05185
## p2 = 0.1507374
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
## [1] "Minimum detectable effect : 0.0989"
For further power calculation workings, including minimum detectable effect size at 6 months, see the Appendix - Power Calculation Details.
Complete randomization within sub-groups defined by age (5 years groups), of known refugee or asylum seeker background, and whether seen by Dr Buddini in the past two years.
Two surveyors were initially planned. One of whom was Dr Buddini. Each of the phase 1 and phase 2 were into two sub-groups for allocation to the surveyors.
Telephone-based recall for cervical screening commenced 1st April 2019. Table @ref(tab:results-table) shows interim results as of 14th November 2019. ‘n’ represents the number of patients in each sub-group. ‘CST done’ indicates that, as of 14th November 2019, a cervical screening test had been recorded within the previous two years. ‘CST overdue’ indicates that no cervical screening had been recorded during the previous two years.
‘Refugee’ is the subset of patients who have a recorded entry of ‘refugee’ or ‘asylum-seeker’ in the patient file. ‘Other’ is the remainder, who have no recorded entry of ‘refugee’ or ‘asylum-seeker’ in the patient file.
| n | % | n | % | |
|---|---|---|---|---|
| All | ||||
| n | 69 | 83.1% | 14 | 16.9% |
| Phase1 | 26 | 72.2% | 10 | 27.8% |
| Phase2 | 43 | 91.5% | 4 | 8.5% |
| Refugee | ||||
| n | 34 | 85.0% | 6 | 15.0% |
| Phase1 | 12 | 70.6% | 5 | 29.4% |
| Phase2 | 22 | 95.7% | 1 | 4.3% |
| Other | ||||
| n | 35 | 81.4% | 8 | 18.6% |
| Phase1 | 14 | 73.7% | 5 | 26.3% |
| Phase2 | 21 | 87.5% | 3 | 12.5% |
As of 14th November 19, patients aged 25 to 39 years had been contacted by telephone. A total of thirty-six (36) patients are in the sub-groups (in ‘Phase 1’) for which telephone contact had been attempted. Of these ‘Phase 1’ patients, ten (27.8%) had a recorded cervical screening test within the previous two years as of 14th November 2019.
These ‘Phase 1’ groups are compared with patients in the same age range (25 to 39 years) who are in the ‘Phase 2’ group which have not yet been contacted by telephone for cervical screening. A total of forty-seven (47) patients are in this group. Of these ‘Phase 2’ patients, four (8.5%) had a recorded screening test within the previous two years as of 14th November 2019.
Telephone-based recall for cervical screening has a positive effect on cervical screening rates for patients who are overdue for cervical screening.
The table above suggests that, although patients with a background of refugee/asylumseeker status have a very low rate of cervical screening compared to non-refugee patients (‘Refugee Phase 2’ vs. ‘Other’ Phase 2), the rate of cervical screening is similar if telephone-based recall for cervical screening is done (‘Refugee Phase 1’ vs. ‘Other’ Phase 1).
This is shown visually in the interaction plot Figure @ref(fig:results-interaction).
Recall + Refugee/Asylum-seeker interaction
Table @ref(tab:aic) compares a logistic regression model which includes a background of refugee/asylum-seeker status and interaction effects with telephone-based recall (‘PhaseTRUE’ in the regression table below) with a simpler logistic regression model (‘Simple model’) which just includes telephone-based recall as the predictor.
| Refugee model | Simple model | |
| (Intercept) | -1.946 ** | -2.375 *** |
| (0.617) | (0.523) | |
| refugeeTRUE | -1.145 | |
| (1.194) | ||
| PhaseTRUE | 0.916 | 1.419 * |
| (0.808) | (0.642) | |
| refugeeTRUE:PhaseTRUE | 1.299 | |
| (1.407) | ||
| N | 83 | 83 |
| logLik | -34.405 | -34.950 |
| AIC | 76.810 | 73.901 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | ||
The ‘refugee model’ is not superior to the ‘simple model’ according to step-wise model selection by Akaike Information Criterion (AIC). Step-wise model selection by AIC on the ‘refugee model’ yields the ‘simple model’. The ‘simple model’, with just the telephone-based recall intervention (‘Phase’, indicating ‘Phase 1’), shows that there is a significant increase in cervical screening (\(p = 0.027\)) when telephone-based recall was used.
Landy, R., Pesola, F., Castañón, A., & Sasieni, P. (2016). Impact of cervical screening on cervical cancer mortality: Estimation using stage-specific results from a nested caseControl study. British Journal of Cancer, 115(9), 1140–1146. https://doi.org/10.1038/bjc.2016.290
Find eligible patients (female, age criteria), who are active patients (defined by three or more contacts in the past two years, including one billed visit in the past six months), who have not had cervical screening detected found in the PapSmear table or the Investigations table.
Note that there is another table ‘ObGyn’ which should contain the most recent cervical screening result, but unfortunately, ‘over-chose’ 5 patients (found 295 patients, instead of ‘290’ as found using this search.)
SELECT *
FROM BPS_Patients
WHERE StatusText = 'Active'
AND Sex = 'Female'
AND DOB BETWEEN DateAdd(Year,-75,'20190301') AND DateAdd(Year,-25,'20190301')
AND InternalID IN (SELECT InternalID
FROM Visits v
INNER JOIN (VALUES('%bhagwat%'),('%fong%'),('%ekanayake%'),('%shoesmith%'),
('%plastow%'),('%samarawickrama%'),('%obeyesekere%'),('%chaves%'),
('%ryan%'),('%mikhail%'),('%haynes%'),('%buckwell%'),('%maxwell%'),
('%grace ho%'))
AS ProviderName(Name)
ON v.DrName LIKE ProviderName.Name
WHERE VisitDate BETWEEN DateAdd(Year,-2,'20190301') AND '20190301'
AND RecordStatus = 1
GROUP BY internalid
HAVING count(internalid) >= 3)
AND InternalID IN (SELECT InternalID
FROM Invoices WHERE InvoiceID IN (SELECT InvoiceID
FROM Services
WHERE Recordstatus = 1
AND Servicedate > DateAdd(Month, -6, '20190301')))
AND InternalID NOT IN (SELECT InternalID
FROM PapSmears
WHERE PapDate > DATEADD(Month, -45, '20190301')
-- CST not extremely overdue (>45 months old)
)
AND InternalID NOT IN (SELECT InternalID
FROM Investigations
WHERE (
TestName LIKE '%CERVICAL SCREENING%'
OR TestName LIKE '%PAP SMEAR%')
AND ReportDate > DATEADD(Month, -45, '20190301')
-- alternative search for Cervical Screening in Investigations (does not always appear in PAP table)
)
AND InternalID NOT IN (SELECT InternalID
FROM ObsGynDetail
WHERE NoPap=1
-- for some reason, Pap/CST marked as no longer required
)
ORDER BY surname, firstname
AND (
InternalID IN
(SELECT InternalID FROM PastHistory WHERE ItemCode = 13155 AND RecordStatus = 1)
OR
InternalID IN
(SELECT InternalID FROM PastHistory WHERE ItemCode = 13154 AND RecordStatus = 1)
)
AND InternalID IN (SELECT InternalID
FROM Visits v
INNER JOIN (VALUES('%ekanayake%'))
AS ProviderName(Name)
ON v.DrName LIKE ProviderName.Name
WHERE VisitDate BETWEEN DateAdd(Year,-2,'20190301') AND '20190301'
AND RecordStatus = 1
GROUP BY internalid
HAVING count(internalid) >= 1)
Code finds ‘active’ patients (5 contacts in two years, include one contact within the previous six months) preceding 1st March 2018. Excludes patients who are marked for ‘no cervical screening’.
The ‘recent’ visit definition is different to seeking patients eligible for the study, because no billing data is available prior to June 2018.
Further restriction to patients who had a cervical screening test in the next three months.
SELECT *
FROM BPS_Patients
WHERE StatusText = 'Active'
AND Sex = 'Female'
AND DOB BETWEEN DateAdd(Year,-75,'20180301') AND DateAdd(Year,-25,'20180301')
AND InternalID IN (SELECT InternalID
FROM Visits v
INNER JOIN (VALUES('%bhagwat%'),('%fong%'),('%ekanayake%'),('%shoesmith%'),
('%plastow%'),('%samarawickrama%'),('%obeyesekere%'),('%chaves%'),
('%ryan%'),('%mikhail%'),('%haynes%'),('%buckwell%'),('%maxwell%'),
('%grace ho%'))
AS ProviderName(Name)
ON v.DrName LIKE ProviderName.Name
WHERE VisitDate BETWEEN DateAdd(Year,-2,'20180301') AND '20180301'
AND RecordStatus = 1
GROUP BY internalid
HAVING count(internalid) >= 3)
AND InternalID IN (SELECT InternalID
FROM Visits v
INNER JOIN (VALUES('%bhagwat%'),('%fong%'),('%ekanayake%'),('%shoesmith%'),
('%plastow%'),('%samarawickrama%'),('%obeyesekere%'),('%chaves%'),
('%ryan%'),('%mikhail%'),('%haynes%'),('%buckwell%'),('%maxwell%'),
('%grace ho%'))
AS ProviderName(Name)
ON v.DrName LIKE ProviderName.Name
WHERE VisitDate BETWEEN DateAdd(Month,-6,'20180301') AND '20180301'
AND RecordStatus = 1
GROUP BY internalid
HAVING count(internalid) >= 1)
AND InternalID NOT IN (SELECT InternalID
FROM PapSmears
WHERE PapDate > DATEADD(Month, -45, '20180301')
AND PapDate < '20180301'
-- CST not extremely overdue (>45 months old)
)
AND InternalID NOT IN (SELECT InternalID
FROM Investigations
WHERE (
TestName LIKE '%CERVICAL SCREENING%'
OR TestName LIKE '%PAP SMEAR%')
AND ReportDate > DATEADD(Month, -45, '20180301')
AND ReportDate < '20180301'
-- alternative search for Cervical Screening in Investigations (does not always appear in PAP table)
)
AND (InternalID IN (SELECT InternalID
FROM PapSmears
WHERE PapDate >= '20180301'
AND PapDate < DATEADD(Month, 3, '20180301')
-- CST done in next three months
)
OR InternalID IN (SELECT InternalID
FROM Investigations
WHERE (
TestName LIKE '%CERVICAL SCREENING%'
OR TestName LIKE '%PAP SMEAR%')
AND ReportDate > '20180301'
AND ReportDate < DATEADD(Month, 3, '20180301')
-- alternative search for Cervical Screening in Investigations (does not always appear in PAP table)
))
AND InternalID NOT IN (SELECT InternalID
FROM ObsGynDetail
WHERE NoPap=1
-- for some reason, Pap/CST marked as no longer required
)
ORDER BY surname, firstname
library(MASS) # has its own 'select' function!
library(tidyverse)
library(dagitty)
library(lubridate)
library(openxlsx)
library(formattable)
library(kableExtra)
library(huxtable)
dag <- downloadGraph("dagitty.net/mannE14")
# data for directed acyclic graph model of intervention
# graph create with dagitty at www.dagitty.net
population <- as_tibble(read.csv("20190301CSTpopulation.csv"))
seen_Buddini <- as_tibble(read.csv("20190301CSTBuddini.csv"))
refugeeasylum <- as_tibble(read.csv("20190301CSTRefugeeAsylum.csv"))
# Calculate age
#
# By default, calculates the typical "age in years", with a
# \code{floor} applied so that you are, e.g., 5 years old from
# 5th birthday through the day before your 6th birthday. Set
# \code{floor = FALSE} to return decimal ages, and change \code{units}
# for units other than years.
# @param dob date-of-birth, the day to start calculating age.
# @param age.day the date on which age is to be calculated.
# @param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
# @param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
# @return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
# @examples
# my.dob <- as.Date('1983-10-20')
# age(my.dob)
# age(my.dob, units = "minutes")
# age(my.dob, floor = FALSE)
# code by 'Gregor'
# https://stackoverflow.com/questions/27096485/change-a-column-from-birth-date-to-age-in-r
# requires library 'lubridate'
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
calc.age = interval(dob, age.day) / duration(num = 1, units = units)
if (floor) return(as.integer(floor(calc.age)))
return(calc.age)
}
population <- population %>%
# add columns for whether they have seen the doctor who is making the telephone calls
# or are of known refugee or asylum seeker background
# (?proxy for low English language literacy)
mutate(SeenBuddini = INTERNALID %in% seen_Buddini$INTERNALID) %>%
mutate(RefugeeOrAsylum = INTERNALID %in% refugeeasylum$INTERNALID) %>%
mutate(Subgroup = case_when(
RefugeeOrAsylum & SeenBuddini ~ "Refugee+Buddini",
SeenBuddini ~ "Buddini",
RefugeeOrAsylum ~ "Refugee",
TRUE ~ "Standard"
)) %>%
mutate(Subgroup = as.factor(Subgroup)) %>%
mutate(DOB = dmy(DOB)) %>% # change into standard R date
mutate(Age = age(DOB, age.day = as.Date("2019/03/01"))) %>%
# note that age is on 1st March 2019
mutate(AgeGroup5 = as.factor((Age %/% 5)*5)) # 5-year age groups, labelled with minimum age
population %>%
dplyr::select("DOB", "Age", "SeenBuddini", "RefugeeOrAsylum",
"Subgroup", "AgeGroup5") %>%
summary()
ggplot(population, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 25)
Previous proportion of under-screened patients who were screened in the three months after March 2018 was 5.2%. With a populaton of 290 patients, split between two groups, power of 0.8 and significance level of 0.05, the
power_3month <- power.prop.test(n = nrow(population)/2,
p1 = 0.05185,
# the measured proportion with recorded cervical screen
# under 'control' conditions
power=0.8, sig.level=0.05,
alternative = "two.sided")
power_3month
paste("Minimum detectable effect : ", round(power_3month$p2 - power_3month$p1, 4))
This is similar to estimated minimum detectable effect size using the equation:
\(EffectSize = (t_{1-\kappa}+t_\alpha)*\sqrt{\frac{1}{P(1-P)}}*\sqrt{\frac{\sigma^2}{N}}\)
where \(t_{1-\kappa}\) is the power, \(t_\alpha\) is the significance level, \(P\) is the proportion in treatment, \(\sigma^2\) is the variance and \(N\) is the sample size.
p1 <- 0.05185 # our previously observed proportion of women
# who had cervical screening (CST) over three months
p2 <- 0.1507374 # our 'guess' of what the 'treatment' group who had CST over
# three months. this guess was actually informed by
# above power calculation, however!
p_avg <- (p1+p2)/2 # the average proportion of women who might
# have cervical screening at the end of the study
# variance of binomial is p(1-p)
abs(qnorm(0.8)+qnorm(0.975))*sqrt(1/((0.5)*(1-0.5)))*sqrt((p_avg)*(1-p_avg)/290)
## [1] 0.09927387
Previous proportion of under-screened patients who were screened in the six months after March 2018 was 8.9%. With a populaton of 290 patients, split between two groups, power of 0.8 and significance level of 0.05:
power_6month <- power.prop.test(n = nrow(population)/2,
p1 = 0.08889,
# the measured proportion with recorded cervical screen
# under 'control' conditions
power=0.8, sig.level=0.05,
alternative = "two.sided")
power_6month
##
## Two-sample comparison of proportions power calculation
##
## n = 145
## p1 = 0.08889
## p2 = 0.2049093
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
paste("Minimum detectable effect : ", power_6month$p2 - power_6month$p1)
## [1] "Minimum detectable effect : 0.116019322103659"
ggplot(treatment, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 24.95)
ggplot(control, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 24.95)
ggplot(treatment, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 24.95)
ggplot(control, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 24.95)
set.seed(1603104944)
treatment1 <- NULL
treatment2 <- NULL
for (i in levels(treatment$AgeGroup5)) {
coinflip <- runif(1)>.5
for (j in levels(treatment$Subgroup)) {
subsection <- treatment[treatment$AgeGroup5 == i & treatment$Subgroup == j,]
subsection$rank <- runif(nrow(subsection))
if ((nrow(subsection) %% 2) == 1)
{coinflip <- 1 - coinflip} # toggle from favouring one group or another
new_treatment1 <- top_n(subsection, as.integer(nrow(subsection)/2 + coinflip*.5), rank)
new_treatment2 <- anti_join(subsection, new_treatment1, by = "INTERNALID")
treatment1 <- rbind(treatment1, new_treatment1)
treatment2 <- rbind(treatment2, new_treatment2)
}
}
control1 <- NULL
control2 <- NULL
for (i in levels(control$AgeGroup5)) {
coinflip <- runif(1)>.5
for (j in levels(control$Subgroup)) {
subsection <- control[control$AgeGroup5 == i & control$Subgroup == j,]
subsection$rank <- runif(nrow(subsection))
if ((nrow(subsection) %% 2) == 1)
{coinflip <- 1 - coinflip} # toggle from favouring one group or another
new_control1 <- top_n(subsection, as.integer(nrow(subsection)/2 + coinflip*.5), rank)
new_control2 <- anti_join(subsection, new_control1, by = "INTERNALID")
control1 <- rbind(control1, new_control1)
control2 <- rbind(control2, new_control2)
}
}
Phase 1, group 1
ggplot(treatment1, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 24.95)
Phase 1, group 2
ggplot(treatment2, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 24.95)
Phase 2, group 1
ggplot(control1, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 24.95)
Phase 2, group 2
ggplot(control2, aes(x = Age, fill=Subgroup)) +
geom_histogram(binwidth = 5, boundary = 24.95)
Re-attach patient names and demographic details to sub-groups, and export to Excel file (Phase 1 = Treat, Phase 2 = Control) for use by surveyors.
# set {r eval = FALSE when 'knitting' to form a HTML file}
# as identifying data should not be in a public space!
patient_details <- read.csv("20190301CSTpopulation_details.csv")
# file with patient details
patient_details <- patient_details %>%
select(c("INTERNALID", "SURNAME", "FIRSTNAME", "MIDDLENAME", "PREFERREDNAME", "TITLE", "DOB", "AGE", "RECORDNO"))
# select columns required for export
treatment1_details <- treatment1 %>%
select(c("INTERNALID", "AgeGroup5")) %>%
# will store AgeGroup5 groups in separate sheets
left_join(patient_details, by = "INTERNALID")
treatment2_details <- treatment2 %>%
select(c("INTERNALID", "AgeGroup5")) %>%
# will store AgeGroup5 groups treatment in separate sheets
left_join(patient_details, by = "INTERNALID")
control1_details <- control1 %>%
select(c("INTERNALID", "AgeGroup5")) %>%
# will store AgeGroup5 groups in separate sheets
left_join(patient_details, by = "INTERNALID")
control2_details <- control2 %>%
select(c("INTERNALID", "AgeGroup5")) %>%
# will store AgeGroup5 groups in separate sheets
left_join(patient_details, by = "INTERNALID")
wb_treat <- createWorkbook() # create blank workbook
for (i in levels(treatment1_details$AgeGroup5)) {
subsection1 <- treatment1_details[treatment1_details$AgeGroup5 == i,]
sheetname1 <- paste("Phase 1 Group 1 - ", i)
addWorksheet(wb_treat, sheetname1)
writeData(wb_treat, sheetname1, subsection1)
subsection2 <- treatment2_details[treatment2_details$AgeGroup5 == i,]
sheetname2 <- paste("Phase 1 Group 2 - ", i)
addWorksheet(wb_treat, sheetname2)
writeData(wb_treat, sheetname2, subsection2)
}
saveWorkbook(wb_treat, file = "20190301CSTPhase1Groups.xlsx", overwrite = TRUE)
wb_control <- createWorkbook() # create blank workbook
for (i in levels(control1_details$AgeGroup5)) {
subsection1 <- control1_details[control1_details$AgeGroup5 == i,]
sheetname1 <- paste("Phase 2 Group 1 - ", i)
addWorksheet(wb_control, sheetname1)
writeData(wb_control, sheetname1, subsection1)
subsection2 <- control2_details[control2_details$AgeGroup5 == i,]
sheetname2 <- paste("Phase 2 Group 2 - ", i)
addWorksheet(wb_control, sheetname2)
writeData(wb_control, sheetname2, subsection2)
}
saveWorkbook(wb_control, file = "20190301CSTPhase2Groups.xlsx", overwrite = TRUE)
# reads results and processes
xl1_name <- '20190401CSTPhase1Groups_results_20191114.xlsx' # results held in these XLSX spreadsheets
xl2_name <- '20190401CSTPhase2Groups_results_20191114.xlsx'
sheet_names <- getSheetNames(xl1_name) # not actually used
# the phase 1 groups who have been contacted as of 14th November 2019
id1_1_25 <- read.xlsx(xl1_name, sheet = "Phase 1 Group 1 - 25") %>%
pull(INTERNALID)
id1_2_25 <- read.xlsx(xl1_name, sheet = "Phase 1 Group 2 - 25") %>%
pull(INTERNALID)
id1_2_30 <- read.xlsx(xl1_name, sheet = "Phase 1 Group 2 - 30") %>%
pull(INTERNALID)
id1_2_35 <- read.xlsx(xl1_name, sheet = "Phase 1 Group 2 - 35") %>%
pull(INTERNALID)
id1b <- c(id1_2_25, id1_2_30, id1_2_35) # 'phase1' group contact by BE
id1 <- c(id1_1_25, id1b) # total phase 1 group contacted
# the comparison phase 2 groups (aged 25-40), not contacted by telephone
id2_1_25 <- read.xlsx(xl2_name, sheet = "Phase 2 Group 1 - 25") %>%
pull(INTERNALID)
id2_1_30 <- read.xlsx(xl2_name, sheet = "Phase 2 Group 1 - 30") %>%
pull(INTERNALID)
id2_1_35 <- read.xlsx(xl2_name, sheet = "Phase 2 Group 1 - 35") %>%
pull(INTERNALID)
id2_2_25 <- read.xlsx(xl2_name, sheet = "Phase 2 Group 2 - 25") %>%
pull(INTERNALID)
id2_2_30 <- read.xlsx(xl2_name, sheet = "Phase 2 Group 2 - 30") %>%
pull(INTERNALID)
id2_2_35 <- read.xlsx(xl2_name, sheet = "Phase 2 Group 2 - 35") %>%
pull(INTERNALID)
# equivalent phase 2 group
id2 <- c(id2_1_25, id2_2_25, id2_1_30, id2_2_30, id2_1_35, id2_2_35)
# all the IDs in the phase 1 (contacted) and phase 2 (not contacted)
id <- c(id1, id2)
df <- data.frame(InternalID = id)
df <- df %>%
mutate(Phase = InternalID %in% id1) # Phase is 'TRUE' if Phase 1
# as of 14th November 2019, the CSV list of those with no CST
# in the past two years (2 years ago, the CST was 'Pap', which
# is due in two years)
noCST1 <- read.csv("TelephoneCST_Phase1_NoCST.csv")
noCST2 <- read.csv("TelephoneCST_Phase2_NoCST.csv")
noCST1_id <- noCST1 %>% pull(INTERNALID) # just get the IDs
noCST2_id <- noCST2 %>% pull(INTERNALID)
noCST_ID <- c(noCST1_id, noCST2_id) # combine the IDs
df <- df %>%
mutate(CST = !(InternalID %in% noCST_ID))
# CST is TRUE if 'not' in the 'no CST' list
seenby_Buddini <- read.csv("20190401CSTBuddini.csv")
seenby_Buddini_id <- seenby_Buddini %>% pull(INTERNALID)
df <- df %>%
mutate(Group2 = InternalID %in% id1b,
Buddini = InternalID %in% seenby_Buddini_id)
# currently 'Group2' are those contacted by BE
# currently 'Buddini' are those who have previously been seen by BE
refugee_asylum <- read.csv("20190401CSTRefugeeAsylum.csv")
refugee_asylum_id <- refugee_asylum %>% pull(INTERNALID)
df <- df %>%
mutate(refugee = InternalID %in% refugee_asylum_id)
# those listed as asylum or refugee status
df_tab <- data.frame(CSTfalse = numeric(), CSTtrue = numeric())
a <- by(df$CST, df$CST, length) # both refugee and non-refugee groups, in both phases
df_tab <- rbind(df_tab, n = data.frame(CSTfalse = a[1], CSTtrue = a[2]))
a <- by(subset(df$CST, df$Phase), subset(df$CST, df$Phase), length) # phase 1
df_tab <- rbind(df_tab, Phase1 = data.frame(CSTfalse = a[1], CSTtrue = a[2]))
a <- by(subset(df$CST, !df$Phase), subset(df$CST, !df$Phase), length) #phase 2
df_tab <- rbind(df_tab, Phase2 = data.frame(CSTfalse = a[1], CSTtrue = a[2]))
a <- by(subset(df$CST, df$refugee), subset(df$CST, df$refugee), length) # refugee subset
df_tab <- rbind(df_tab, `n ` = data.frame(CSTfalse = a[1], CSTtrue = a[2]))
# extra space avoids duplicate rowname problem
a <- by(subset(df$CST, df$refugee & df$Phase), subset(df$CST, df$refugee & df$Phase), length)
df_tab <- rbind(df_tab, `Phase1 ` = data.frame(CSTfalse = a[1], CSTtrue = a[2]))
a <- by(subset(df$CST, df$refugee & !df$Phase), subset(df$CST, df$refugee & !df$Phase), length)
df_tab <- rbind(df_tab, `Phase2 `= data.frame(CSTfalse = a[1], CSTtrue = a[2]))
a <- by(subset(df$CST, !df$refugee), subset(df$CST, !df$refugee), length) # non-refugee subset
df_tab <- rbind(df_tab, `n ` = data.frame(CSTfalse = a[1], CSTtrue = a[2]))
a <- by(subset(df$CST, !df$refugee & df$Phase), subset(df$CST, !df$refugee & df$Phase), length)
df_tab <- rbind(df_tab, `Phase1 ` = data.frame(CSTfalse = a[1], CSTtrue = a[2]))
a <- by(subset(df$CST, !df$refugee & !df$Phase), subset(df$CST, !df$refugee & !df$Phase), length)
df_tab <- rbind(df_tab, `Phase2 ` = data.frame(CSTfalse = a[1], CSTtrue = a[2]))
df_tab$CSTfalseProp <- df_tab$CSTfalse/(df_tab$CSTfalse+df_tab$CSTtrue)
df_tab$CSTtrueProp <- df_tab$CSTtrue/(df_tab$CSTfalse+df_tab$CSTtrue)
df_tab <- df_tab[, c("CSTfalse", "CSTfalseProp", "CSTtrue", "CSTtrueProp")]
df_tab %>%
mutate(Group = row.names(.),
CSTfalseProp = color_tile("#DeF7E9", "#71CA97")(percent(CSTfalseProp, 1)),
CSTtrueProp= color_tile("#DeF7E9", "#71CA97")(percent(CSTtrueProp, 1))) %>%
dplyr::select(Group, everything()) %>%
kable("html", escape = F,
col.names = c(" ", "n", "%", "n", "%"),
booktabs = TRUE,
caption = 'Cervical screening recall results') %>%
kable_styling("striped", full_width = F) %>%
column_spec(1, width = "10em") %>%
row_spec(0, align = "c") %>%
add_header_above(c(" " = 1, "CST overdue" = 2, "CST done" = 2)) %>%
pack_rows("All", 1, 3) %>%
pack_rows("Refugee", 4, 6) %>%
pack_rows("Other", 7, 9)
interaction.plot(df$Phase, df$refugee, df$CST,
trace.label = "Refugee",
xlab = "Telephone recall",
ylab = "Cervical Screening up-to-date (proportion)")
model1 <- glm(CST ~ refugee * Phase, family = binomial(link = "logit"), data = df)
model2 <- stepAIC(model1, trace = FALSE)
# the 'best' model, combining explanatory power and simplicity, accord ing to the
# stepwise Akaiki Information Criterion (AIC) selection
huxreg("Refugee model" = model1, "Simple model" = model2) %>%
huxtable::set_caption("Refugee/Asylum-seeker status as a predictor of response to telephone-based recall") %>%
huxtable::set_label("tab:aic")