Data cleaning, management, and analysis of COVID-19 commmunity assessment with key informant interviews

This provides steps and code for data cleaning, management, and analysis of COVID-19 commuunity key informant assessment survey. See here for the questionnare. (Standard module version as of 2/25/2022)

This code:
1. Imports and cleans COVID-19 community key informant (aka “community”) dataset from Lime Survey (i.e., green and blue tabs in the chartbook),
2. Creates field check tables for data quality monitoring, and
3. Creates indicator estimate data for dashboards and chartbook (purple tab in the chartbook).

REQUIRED country-specific changes, based on the minmum country-specific adaptation.
1. Directories and local macro in A. SETTING per survey implementation information
2. Local macro in E.1. Country speciic code local per section 1 in the questionnaire

Depending on the extent of additional country-specific adaptation, further revise E.2. Construct analysis variables.

See Section E.2-Addendum (added on August 17, 2021)

(Last updated: 2022-05-19 12:58:47)


A. SETTING

#working directory where this markdown file and subfolders are located.
setwd("~/Dropbox/0 iSquared/iSquared_WHO/ACTA/3.AnalysisPlan/")

#chartbookdir<-("C:/Users/YoonJoung Choi/World Health Organization/BANICA, Sorin - HSA unit/2 Global goods & tools/2 HFAs/1 HFAs for COVID-19/4. Implementation support materials/4. Analysis and dashboards/")
chartbookdir<-("~/Dropbox/0 iSquared/iSquared_WHO/ACTA/3.AnalysisPlan/")

limesurveydir<-("~/Dropbox/0 iSquared/iSquared_WHO/ACTA/3.AnalysisPlan/ExportedCSV_FromLimeSurvey/")

# Define local macro for the survey 
country<-"COUNTRYNAME" #country name 
round<-1                #round      
year<-2020              #year of the mid point in data collection   
month<-12               #month of the mid point in data collection  

surveyid<-777777        #LimeSurvey survey ID

# local macro for analysis: no change needed  

B. Import and drop duplicate cases

B.1. Import raw data from LimeSurvey
#Import directly from LimeSurvey - using the correct link, according to the country overview page (edit 6/23/2021)
dtaraw<-read.csv(paste0("https://who.my-survey.host/index.php/plugins/direct?plugin=CountryOverview&docType=1&sid=",surveyid,"&language=en&function=createExport"))

obsraw<-nrow(dtaraw)
cols<-ncol(dtaraw)

Note: For the URL, we need to use part of the country overview page for the data server. For example, suppose the overview page link looks like this for a country named YYY:
https://extranet.who.int/dataformv3/index.php/plugins/direct?plugin=CountryOverview&country=YYY&password=XXXXXXXXX.

Replace part of the link before plugins with the part in the country-specific link. So, in this example, the code should be:

dtaraw<-read.csv(paste0("https://extranet.who.int/dataformv3/index.php/plugins/direct?plugin=CountryOverview&docType=1&sid=",surveyid,"&language=en&function=createExport"))

As opposed to the above:

dtaraw<-read.csv(paste0("https://who.my-survey.host/index.php/plugins/direct?plugin=CountryOverview&docType=1&sid=",surveyid,"&language=en&function=createExport"))
#Use mock data for practice 
dtaraw<-read.csv(paste0(limesurveydir,"LimeSurvey_Community_Example_R1.csv"))

obsraw<-nrow(dtaraw)
cols<-ncol(dtaraw)
dim(dtaraw)

As of 2022-05-19 12:58:47, the downloaded raw data has 45 observations and 194 variables.


B.2. Export/save the data daily in CSV form with date
write.csv(dtaraw, paste0(limesurveydir,"LimeSurvey_Community_", country, "_R", round, "_", date, ".csv"))

B.3. Export the data for chartbook

Then, export the raw data into the chartbook (green tab: “Facility-level raw data”) - as is.


B.4. Drop duplicate cases

Assess duplicate rows based on facility code: Q101. Among the duplicate rows, keep the latest row, based on the submission date/time.

Data managers must check the submitdate string values. Then modify format when you convert the string value to numeric, using as.POSIXct. See an example below. (Revision on 5/19/2022)

# CHECK submitdate 
str(dtaraw$submitdate)

# Identify duplicates 
dta<-dtaraw %>% 
    mutate(
        submitdate_string=as.character(submitdate),
        
        #use this line if submitedate does not have seconds 
        submitdate=as.POSIXct(as.character(submitdate),format= "%m/%d/%Y %H:%M")
        #use this line if submitedate dos seconds 
        #submitdate=as.POSIXct(as.character(submitdate),format= "%m/%d/%Y %H:%M:%S")
        )%>%
    group_by(Q105)%>% 
    mutate(
        dupe = n()>1, 
        latest = submitdate ==max(submitdate, na.rm = TRUE))%>%
    ungroup()

# Assess duplicates 
    table(dta$dupe, dta$latest) #cross tab
    dtaduplicate<-dta%>%filter(dupe==1)
    obsduplicate<-dtaduplicate%>%nrow() #number of duplicate rows
    obsduplicateunique<-length(unique(dtaduplicate$Q105)) #number of unique facilities among duplicate rows

#keep only unique/latest rows per facility     
dta<-dta%>%
    filter(latest==1)%>%
    select(-dupe, -latest)

#keep only if ID is NOT missing 
dta<-dta%>%
    filter(is.na(Q105)==FALSE)

obs<-nrow(dta)
obsunique<-length(unique(dta$Q105))

A total of 2 duplicate rows from 1 unique facilities were identified based on Q101. Now keeping only one unique row per facility, the dataset has 44 observations.


C. Data cleaning - variables

C.1. Change variable names to lowercase
names(dta)<-tolower(names(dta))

C.1.a Assess time stamp data
#REVISED 4/20/2021
#   interviewtime is availabl in dataset only when directly downloaded from the server, 
#   not via export plug-in used in this code
#   thus C.1.a is revised - see markdownfile for OLD and NEW code differences 
dta<-dta%>%
    select(-ends_with("time"))%>%
    select(-starts_with("grouptime"))

C.2. Change variable names to drop odd elements “y” “sq” - because of Lime survey’s naming convention
#Assess odd names to change
colnames(dta)

dtanew<-dta%>%
    #/*replace sq with _*/
    rename_all(.funs = funs(sub("sq", "_", .)))    
# Assess new names 
colnames(dtanew)

dta<-dtanew

C.3. Find non-numeric variables and desting

Check variables section by scion, drop prefix “A” in categorical/character variables, and convert to numeric.

#change all factor to numeric
dta<-dta%>%
    mutate_if(is.factor, as.character) 

#####* Section 1
    varlist<-dta%>%select(q108, q109, q111, q113, q114)%>%colnames()
    str(dta[varlist])
    
    dta<-dta%>%
        mutate_at(vars(varlist), list(~ sub("^A", "\\1", .)))%>%
        mutate_at(vars(varlist), funs(as.numeric))
    
    str(dta[varlist]) #check variables ready for further processing

#####* Section 2
    varlist<-dta%>%select(starts_with("q201_"))%>%colnames()
    str(dta[varlist]) 

    dta<-dta%>%
        mutate_at(vars(varlist), list(~ sub("^A", "\\1", .)))%>%
        mutate_at(vars(varlist), funs(as.numeric))
    
    str(dta[varlist]) 
    
#####* Section 3
    varlist<-dta%>%select(q302)%>%colnames()
    str(dta[varlist])
    
    dta<-dta%>%
        mutate_at(vars(varlist), list(~ sub("^A", "\\1", .)))%>%
        mutate_at(vars(varlist), funs(as.numeric))
    
    str(dta[varlist]) 

#####* Section 4
#####* Section 4 - option 2 in the October 29 version Q 
    #* Few countries would need to choose option 1 
    #* If option 1 is truly needed, refer to the older analysis code.     
    varlist<-dta%>%select(q401, q402, q403, q405)%>%colnames()
    str(dta[varlist])
    
    dta<-dta%>%
        mutate_at(vars(varlist), list(~ sub("^A", "\\1", .)))%>%
        mutate_at(vars(varlist), funs(as.numeric))
    
    str(dta[varlist]) 

#####* Section 5
#####* Section 5 - deleted in the October 29 version Q
    
#####* Section 6
    varlist<-dta%>%
        select(q601a, q602, q604, q605, starts_with("q607_"), q608, q609)%>%
        colnames()
    str(dta[varlist])
    
    dta<-dta%>%
        mutate_at(vars(varlist), list(~ sub("^A", "\\1", .)))%>%
        mutate_at(vars(varlist), funs(as.numeric))
    
    str(dta[varlist])  

#####* Section 7
    varlist<-dta%>%select(q701, q704)%>%colnames()
    str(dta[varlist])
    
    dta<-dta%>%
        mutate_at(vars(varlist), list(~ sub("^A", "\\1", .)))%>%
        mutate_at(vars(varlist), funs(as.numeric))
    
    str(dta[varlist]) 

C.4. Recode yes/no & yes/no/NA
table(dta$q601)

varlist<-dta%>%
    select(starts_with("q301_"), 
           starts_with("q303_"), 
           starts_with("q304_"),
           q403, 
           starts_with("q404_"),
           q405, 
           starts_with("q406_"),
           starts_with("q601"),
           starts_with("q603_"),
           starts_with("q606_"), 
           q608
           )%>%
    colnames()

dta[varlist][dta[varlist] == 2 ] <- 0 #no
table(dta$q601)

C.5. Label values

The following is value labels.

    #delimit;   
    
    lab define yesno 1"1. yes" 0"0. no";    
    foreach var of varlist 
        q301_*  q303_* q304_* 
        q403 q404_* q405 q406_*
        q601* q603_* q606_* q608
        {;  
    labe values `var' yesno; 
    };      

    lab define sex
        1"1.Male" 
        2"2.Female"
        3"3.Not responded";  
    lab values q111 sex; 

    lab define occupation
        1"1.Commnity leader" 
        2"2.Community health officer"
        3"3.Commuunity health volunteer"
        4"4.Civil society/NGOs";  
    lab values q113 occupation; 
            
    lab define area
        1"1.Urban"
        2"2.Rural";
    lab values q114 area;             
    
    lab define people3
        1"1.Most people"
        2"2.Some peoople"
        3"3.Few people"; 
    foreach var of varlist q201_*{; 
        lab values `var' people3;
        };
    
    lab define q302
        1"1.Remained stable"
        2"2.Moderately affected"
        3"3.Strongly affected"; 
    lab values q302 q302; 
    
    lab define people4
        1"1.Most peoople"
        2"2.Some people - more than half"
        3"3.Some people - less than half" 
        4"4.Few people" ;
    foreach var of varlist q401 q402 q403{; 
        lab values `var' people4;
        };

    lab define q602
        1"1.No risk"
        2"2.Slight"
        3"3.Moderate"
        4"4.High"
        5"5.Very high";  
    lab values q602 q602; 
    
    lab define q604
        1"1.Never"
        2"2.Sometimes"
        3"3.Often";  
    lab values q604 q604;   
                
    lab define q605
        1"1.Most support"
        2"2.Some support"
        3"3.Little support";  
    lab values q605 q605;   
    
    lab define q607
        1"1.Slightly reduced"
        2"2.Substantially reduced or suspended"
        3"3.Increased"
        4"3.No change";     
    foreach var of varlist q607_* {;    
        lab values `var' q607;
        };

    #delimit cr

D. Create field check tables

As of 2022-05-19, the following are “field check tables.” In Stata program, xls file is created. In R, the results are directly presented in this markdown file.

#REVISED 4/20/2021
#   since interviewlength is not created (see C.1.a) 
#   do not calculate "time" variables 
dtacheck<-dta%>%
    mutate(
        updatedate= as.character(as.Date(Sys.time(  ), format='%d%b%Y')), 
        date=as.POSIXct(as.character(submitdate_string),format= "%m/%d/%Y"), 
        xresult=q704==1, 
        responserate= xresult==1 #label define responselist 0 "Not complete" 1 "Complete"    
        )

Assess interview characteristics among all interviews.

# Date of field check table update and the total number of interviews
print(date)

[1] “2022-05-19”

# Date of interviews (submission date, final)
table(dtacheck$date)

2020-12-17 2020-12-18 4 40

# Interview response rate (%)
print(paste(as.character(round(mean(dtacheck$responserate, na.rm=TRUE)*100, 1)), "%"))

[1] “100 %”

#Average interview length (minutes), among completed interviews
#print(paste(as.character(round(mean(dtacheck$time_complete, na.rm=TRUE))), "minutes"))

#Average interview length (minutes), among partly completed interviews
#print(paste(as.character(round(mean(dtacheck$time_incomplete, na.rm=TRUE))), "minutes"))
#####/*the following calcualtes % missing in select questions among completed interviews*/  

#1. Missing unmet need responses in one or more of the tracer items (among completed interviews) 
varlist<-dtacheck%>%select(starts_with("q201_"))%>%colnames()
str(dtacheck[varlist])
dtacheck[varlist] <- lapply(dtacheck[varlist],
                             function(x){ifelse(is.na(x)==0,0,
                                                ifelse(is.na(x)==1,1,x))})

dtacheck<-dtacheck%>%
    mutate(
        missing_num = rowSums(dtacheck[varlist] ), 
        missing1 = missing_num, 
        missing1 = ifelse(missing_num>=1, 1, missing1) )

#2. Missing reason for unmet need during the pandemic in one or more of the tracer items (among completed interviews) 
varlist<-dtacheck%>%select(starts_with("q303_"))%>%colnames()
str(dtacheck[varlist])
dtacheck[varlist] <- lapply(dtacheck[varlist],
                             function(x){ifelse(is.na(x)==0,0,
                                                ifelse(is.na(x)==1,1,x))})

dtacheck<-dtacheck%>%
    mutate(
        missing_num = rowSums(dtacheck[varlist] ), 
        missing2 = missing_num, 
        missing2 = ifelse(missing_num>=1, 1, missing2) )

#3. Missing marginalized population (among completed interviews) - N/A since the October 29, 2021 version

#4. Missing vaccine demand among adults OR children, when applicable (among completed interviews)
varlist<-dtacheck%>%select(q402)%>%colnames()
str(dtacheck[varlist])
dtacheck[varlist] <- lapply(dtacheck[varlist],
                             function(x){ifelse(is.na(x)==0,0,
                                                ifelse(is.na(x)==1,1,x))})

dtacheck<-dtacheck%>%
    mutate(
        missing_num = rowSums(dtacheck[varlist] ), 
        missing4 = missing_num, 
        missing4 = ifelse(missing_num>=1, 1, missing4) )

#5. Missing reasons for no vaccine demand in one or more of the tracer items (among completed interviews)  
varlist<-dtacheck%>%select(starts_with("q404_"))%>%colnames()
str(dtacheck[varlist])
dtacheck[varlist] <- lapply(dtacheck[varlist],
                             function(x){ifelse(is.na(x)==0,0,
                                                ifelse(is.na(x)==1,1,x))})

dtacheck<-dtacheck%>%
    mutate(
        missing_num = rowSums(dtacheck[varlist] ), 
        missing5 = missing_num, 
        missing5 = ifelse(missing_num>=1, 1, missing5) )

#6. Missing increased social initiatives (among completed interviews) -  N/A since the October 29, 2021 version

#7. Missing increased health initiatives (among completed interviews) -  N/A since the October 29, 2021 version

#8. Missing reasons for risks (among completed interviews)
varlist<-dtacheck%>%select(starts_with("q603_"))%>%colnames()
str(dtacheck[varlist])
dtacheck[varlist] <- lapply(dtacheck[varlist],
                             function(x){ifelse(is.na(x)==0,0,
                                                ifelse(is.na(x)==1,1,x))})

dtacheck<-dtacheck%>%
    mutate(
        missing_num = rowSums(dtacheck[varlist] ), 
        missing8 = missing_num, 
        missing8 = ifelse(missing_num>=1, 1, missing8) )

#9. Missing CHW stigma (among completed interviews)
varlist<-dtacheck%>%select(q604)%>%colnames()
str(dtacheck[varlist])
dtacheck[varlist] <- lapply(dtacheck[varlist],
                             function(x){ifelse(is.na(x)==0,0,
                                                ifelse(is.na(x)==1,1,x))})

dtacheck<-dtacheck%>%
    mutate(
        missing_num = rowSums(dtacheck[varlist] ), 
        missing9 = missing_num, 
        missing9 = ifelse(missing_num>=1, 1, missing9) )

#10. Missing support needed (among completed interviews)    
varlist<-dtacheck%>%select(starts_with("q606_"))%>%colnames()
str(dtacheck[varlist])
dtacheck[varlist] <- lapply(dtacheck[varlist],
                             function(x){ifelse(is.na(x)==0,0,
                                                ifelse(is.na(x)==1,1,x))})

dtacheck<-dtacheck%>%
    mutate(
        missing_num = rowSums(dtacheck[varlist] ), 
        missing10 = missing_num, 
        missing10 = ifelse(missing_num>=1, 1, missing10) )

Assess level of missing responses among COMPLETED interviews.

dtacheck<-dtacheck%>%
    filter(xresult==1)

#1. Missing unmet need responses in one or more of the tracer items (among completed interviews) 
print(paste(as.character(round(mean(dtacheck$missing1, na.rm=TRUE)*100, 1)), "%"))

[1] “9.1 %”

#2. Missing reason for unmet need during the pandemic in one or more of the tracer items (among completed interviews) 
print(paste(as.character(round(mean(dtacheck$missing2, na.rm=TRUE)*100, 1)), "%"))

[1] “9.1 %”

#3. Missing marginalized population (among completed interviews) - N/A since the October 29, 2021 version

#4. Missing vaccine demand among adults OR children, when applicable (among completed interviews)
print(paste(as.character(round(mean(dtacheck$missing4, na.rm=TRUE)*100, 1)), "%"))

[1] “0 %”

#5. Missing reasons for no vaccine demand in one or more of the tracer items (among completed interviews)
print(paste(as.character(round(mean(dtacheck$missing5, na.rm=TRUE)*100, 1)), "%"))

[1] “9.1 %”

#6. Missing increased social initiatives (among completed interviews) - N/A since the October 29, 2021 version

#7. Missing increased health initiatives (among completed interviews) - N/A since the October 29, 2021 version

#8. Missing reasons for risks (among completed interviews)
print(paste(as.character(round(mean(dtacheck$missing8, na.rm=TRUE)*100, 1)), "%"))

[1] “18.2 %”

#9. Missing CHW stigma (among completed interviews)
print(paste(as.character(round(mean(dtacheck$missing9, na.rm=TRUE)*100, 1)), "%"))

[1] “0 %”

#10. Missing support needed (among completed interviews)    
print(paste(as.character(round(mean(dtacheck$missing10, na.rm=TRUE)*100, 1)), "%"))

[1] “15.9 %”


E. Create analytical variables

E.1. Country speciic code local

Update code here based on the questionnaire in the country

urbanmin<-1
urbanmax<-1

chwmin  <-2 #/*lowest code for CHWs in Q113*/
chwmax  <-3 #/*highest code for CHWs in Q113*/

E.2. Construct analysis variables

Created analysis variables section by section.
* give prefix z for background characteristics, which can be used as analysis strata
* give prefix x for binary variables, which will be used to calculate percentage
* give prefix y for integer/continuous variables, which will be used to calculate total number

#####* Section 1 
dta<-dta%>%
    mutate(
        country = country, 
        round = round,      
        respondentcode = q105, 
        month = month, 
        year = year, 
        
        zchw    = q113>=chwmin & q113<=chwmax,
        zurban  = q114>=urbanmin & q114<=urbanmax
    )%>%
    mutate_if(is.logical, as.numeric)%>%
    mutate_at(vars(starts_with("z")), 
              ~replace(., is.na(.)==TRUE, 0))

    #lab define zurban 0"Rural" 1"Urban"

    #lab var id "ID generated from Lime Survey"
    #lab var respondentcode "respondent ID from sample list"
#####* Section 2: need and use 

        temp1<-dta%>%select(starts_with("q201_"))%>%
            rename_all(.funs = funs(sub("q201_", "xunmetsomemost__", .)))%>%
            mutate_all( funs(ifelse(. ==2 |. ==3 , 1, 
                                  ifelse(. ==1, 0,  .))) )%>%
            mutate_all(~replace(., is.na(.), 0)) 
        
        temp2<-dta%>%select(starts_with("q201_"))%>%
            rename_all(.funs = funs(sub("q201_", "xunmetmost__", .)))%>%
            mutate_all( funs(ifelse(. ==3 , 1, 
                                  ifelse(. ==1 |. ==2, 0,  .))) )%>%
            mutate_all(~replace(., is.na(.), 0)) 

dta<-cbind(dta, temp1, temp2)

    varlistsomemost<-dta%>%select(starts_with("xunmetsomemost"))%>%colnames()
    varlistmost<-dta%>%select(starts_with("xunmetmost"))%>%colnames()
    
dta<-dta%>%
    mutate( 
        xunmetsomemost_num = rowSums(dta[ , varlistsomemost], na.rm=TRUE),
        xunmetsomemost = xunmetsomemost_num>=1, 
        xunmetmost_num = rowSums(dta[ , varlistmost], na.rm=TRUE),
        xunmetmost = xunmetmost_num>=1) 

    varlistsomemost<-dta%>%select(xunmetsomemost__001)%>%colnames()
    varlistmost<-dta%>%select(xunmetmost__001)%>%colnames()
    
dta<-dta%>%
    mutate( 
        #xunmetsomemost_num = rowSums(dta[ , varlistsomemost], na.rm=TRUE),
        xunmetsomemost_urgent = xunmetsomemost__001>=1, 
        #xunmetmost_num = rowSums(dta[ , varlistmost], na.rm=TRUE),
        xunmetmost_urgent = xunmetmost__001>=1)

    varlistsomemost<-dta%>%select(xunmetsomemost__003, xunmetsomemost__004)%>%colnames()
    varlistmost<-dta%>%select(xunmetmost__003, xunmetmost__004)%>%colnames()
    
dta<-dta%>%
    mutate( 
        xunmetsomemost_num = rowSums(dta[ , varlistsomemost], na.rm=TRUE),
        xunmetsomemost_chronic = xunmetsomemost_num>=1, 
        xunmetmost_num = rowSums(dta[ , varlistmost], na.rm=TRUE),
        xunmetmost_chronic = xunmetmost_num>=1)

    varlistsomemost<-dta%>%select(xunmetsomemost__006, xunmetsomemost__007, 
                                  xunmetsomemost__008, xunmetsomemost__009)%>%colnames()
    varlistmost<-dta%>%select(xunmetmost__006, xunmetmost__007, 
                              xunmetmost__008, xunmetmost__009)%>%colnames()
    
dta<-dta%>%
    mutate( 
        xunmetsomemost_num = rowSums(dta[ , varlistsomemost], na.rm=TRUE),
        xunmetsomemost_rmch = xunmetsomemost_num>=1, 
        xunmetmost_num = rowSums(dta[ , varlistmost], na.rm=TRUE),
        xunmetmost_rmch = xunmetmost_num>=1)

dta<-dta%>%
    mutate(
        xunmetsome = xunmetsomemost - xunmetmost, 
        xunmetsome_urgent = xunmetsomemost_urgent - xunmetmost_urgent, 
        xunmetsome_chronic = xunmetsomemost_chronic - xunmetmost_chronic, 
        xunmetsome_rmch = xunmetsomemost_rmch - xunmetmost_rmch 
    )
#####* Section 3: barriers  
dta<-dta%>%
    mutate( 
        #***** Pre-COVID barriers
        xbar_pre_demand =0,
        xbar_pre_info =0,
        xbar_pre_phacc =0,
        xbar_pre_fin =0,
        xbar_pre_qinput =0,
        xbar_pre_qexp =0,
        xbar_pre_admin =0,

        xbar_pre_demand = ifelse(q301_002==1 | q301_003==1, 1, xbar_pre_demand),  
        xbar_pre_info = ifelse(q301_001==1, 1, xbar_pre_info), 
        xbar_pre_phacc = ifelse(q301_004==1 | q301_005==1 | q301_006==1, 1, xbar_pre_phacc),  
        xbar_pre_fin = ifelse(q301_007==1 | q301_008==1, 1, xbar_pre_fin),  
        xbar_pre_qinput = ifelse(q301_009==1 | q301_010==1 | q301_011==1, 1, xbar_pre_qinput),  
        xbar_pre_qexp = ifelse(q301_012==1 | q301_013==1 | q301_014==1 | q301_015==1, 1, xbar_pre_qexp),  
        xbar_pre_admin = ifelse(q301_016==1 | q301_017==1 | q301_018==1, 1, xbar_pre_admin), 
        
        #***** Overall barriers during COVID "affected/deteriorated" 
        xbar_covid = q302>=2 & is.na(q302)==FALSE,  
    
        #***** During COVID barriers  
        xbar_covid_fear =0,
        xbar_covid_rec =0,
        xbar_covid_info =0,
        xbar_covid_phacc =0,
        xbar_covid_fin =0,
        xbar_covid_admin =0,
        xbar_covid_qinput =0,
        xbar_covid_qexp =0,  

        xbar_covid_fear = ifelse(q303_001==1 | q303_002==1 | q303_003==1 | q303_017==1, 1, xbar_covid_fear),  
        xbar_covid_rec = ifelse(q303_004==1 | q303_005==1, 1, xbar_covid_rec), 
        xbar_covid_info = ifelse(q303_006==1, 1, xbar_covid_info),  
        xbar_covid_phacc = ifelse(q303_007==1 | q303_008==1, 1, xbar_covid_phacc),  
        xbar_covid_fin = ifelse(q303_009==1 | q303_010==1 | q303_011==1, 1, xbar_covid_fin),    
        xbar_covid_admin = ifelse(q303_012==1 | q303_013==1 | q303_014==1, 1, xbar_covid_admin), 
        xbar_covid_qinput = ifelse(q303_015==1, 1, xbar_covid_qinput),  
        xbar_covid_qexp = ifelse(q303_016==1, 1, xbar_covid_qexp)
    )%>%
    mutate_at(vars(starts_with("xbar_covid_")), 
              ~replace(., is.na(.)==TRUE, 0))
    
        #***** Source of care /*COUNTRY-SPECIFIC MUST BE ADAPTED*/
        varlist<-dta%>%select(q304_001, q304_002, q304_003, q304_004, 
                              q304_005, q304_006, q304_007, q304_008)%>%colnames()

dta<-dta%>%
    mutate( 
        temp = rowSums(dta[ , varlist], na.rm=TRUE),
        xsource_trained = temp>=1
    )%>%
    rename_all(.funs = funs(sub("q304_", "xsource__", .)))
dta<-dta%>%
    mutate( 
        xconcern_most       = q401<=1,
        xconcern_mostsome   = q401<=2,
        
        xvac_adult_most     = q402<=1,
        xvac_adult_mostsome= q402<=2,
        #no more xvac_child_* and, thus, xvac_most* in Oct 29 version
    
        xvac_noaccess   = q403==1     
    )%>%
    rename_all(.funs = funs(sub("q404_", "xvac_noaccess_reason__", .)))%>%
    mutate(
        xvac_noaccess_reason_eligible = xvac_noaccess_reason__001==1,
        xvac_noaccess_reason_distance   = xvac_noaccess_reason__002==1 ,
        xvac_noaccess_reason_wait   = xvac_noaccess_reason__003==1 | xvac_noaccess_reason__004==1 ,
        xvac_noaccess_reason_app        = xvac_noaccess_reason__005==1 ,
        xvac_noaccess_reason_cost       = xvac_noaccess_reason__006==1  
    )%>%
    mutate_at(vars(starts_with("xvac_noaccess_reason")), 
              funs(ifelse(is.na(.)==TRUE, 0,.))
    )%>%
    mutate_at(vars(starts_with("xvac_noaccess_reason")), 
              funs(ifelse(xvac_noaccess!=1, NA,.))
    )%>%
    mutate( xvac_nowant     = q405==1)%>%     

    rename_all(.funs = funs(sub("q406_", "xvac_reason__", .)))%>%
    mutate(
        xvac_reason_noconcern   = xvac_reason__001==1,
        xvac_reason_exposure    = xvac_reason__004==1, 
        xvac_reason_anticovac   = xvac_reason__002==1 | xvac_reason__003==1,
        xvac_reason_antivac     = xvac_reason__005==1, 
        xvac_reason_time        = xvac_reason__006==1, 
        xvac_reason_cost        = xvac_reason__007==1 
    )%>%
    mutate_at(vars(starts_with("xvac_reason")), 
              funs(ifelse(xvac_nowant!=1, NA,.)) 
    )%>%
    mutate_if(is.logical, as.numeric)

table(dta$xvac_most)
summary(dta$xvac_reason__001)
summary(dta$xvac_reason_noconcern)
#####* Section 5: Community assets and vulnerabilities 
#####* Section 5 - deleted in the October 29 version Q
#####* Section 6: CHW service provision 
dta<-dta%>%
    mutate( xtraining = q601a==1)%>% 
    rename_all(.funs = funs(sub("q601b_", "xtraining__", .)))%>%
    mutate_at(vars(starts_with("xtraining__")), 
              funs(ifelse(is.na(.)==TRUE, 0,.))
    )%>%      
    
    mutate(     
        xrisk_modhigh   =q602>=3 & is.na(q602)==FALSE,
        xrisk_high  =q602>=4 & is.na(q602)==FALSE,
        
        xstigma =q604>=2 & is.na(q604)==FALSE,
        
        xsupport_most       =q605<=1,
        xsupport_somemost   =q605<=2
    )

        temp1<-dta%>%select(starts_with("q603_"))%>%
            rename_all(.funs = funs(sub("q603_", "xrisk_reason__", .)))%>%
            mutate_all( funs(ifelse(. ==1 , 1, 
                                  ifelse(. !=1, 0,  .))) )%>%
            mutate_all(~replace(., is.na(.), 0)) 
        
        temp2<-dta%>%select(starts_with("q606_"))%>%
            rename_all(.funs = funs(sub("q606_", "xsupportneed__", .)))%>%
            mutate_all( funs(ifelse(. ==1 , 1, 
                                  ifelse(. !=1, 0,  .))) )%>%
            mutate_all(~replace(., is.na(.), 0)) 

        temp3<-dta%>%select(starts_with("q607_"))%>%
            rename_all(.funs = funs(sub("q607_", "xsrvc_reduced__", .)))%>%
            mutate_all( funs(ifelse(. ==5, NA,
                                    ifelse(. <=2 , 1, 
                                        ifelse(. >=3 &  . <=4, 0, .)))) )
        
        temp4<-dta%>%select(starts_with("q607_"))%>%
            rename_all(.funs = funs(sub("q607_", "xsrvc_increased__", .)))%>%
            mutate_all( funs(ifelse(. ==5, NA,
                                    ifelse(. ==3 , 1, 
                                        ifelse(. <=2 | .==4, 0, .)))) )
        
        temp5<-dta%>%select(starts_with("q607_"))%>%
            rename_all(.funs = funs(sub("q607_", "xsrvc_nochange__", .)))%>%
            mutate_all( funs(ifelse(. ==5, NA,
                                    ifelse(. ==4 , 1, 
                                        ifelse(. <=3 , 0, .)))) )
        
dta<-cbind(dta, temp1, temp2, temp3, temp4, temp5)%>%
    #Revision 2021/10/29. xrisk_reason__*: replace with missing if not applicable 
    mutate_at(vars(starts_with("xrisk_reason__")), 
              funs(ifelse(xrisk_modhigh==0, NA,.))
    )
    #ENF OF REVISION

dta<-dta%>%
    mutate(
        xself_covax_any =q608==1 ,
        xself_covax_full    =q608==1 & q609<=2
    )%>%
    mutate_at(vars(starts_with("xself_covax")), 
              funs(ifelse(is.na(.)==TRUE, 0,.))
    )

E.2-Addendum. Rename indicators for easier data use

Rename indicators ending with sub-question numbers with more friendly names. These names are used in the dash board. Thus, it is important to ensure the indicator names are correct, if questionnaire is adapted beyond minimum requirements. (Addendum on August 17, 2021)

#dta%>%select(grep("__", names(dta)))%>%colnames() 
dta<-dta%>%
    rename(  
        xunmet__urgent   =  xunmetsomemost__001  , 
        xunmet__electsurg    =  xunmetsomemost__002  , 
        xunmet__chronicmeds  =  xunmetsomemost__003  , 
        xunmet__testing  =  xunmetsomemost__004  , 
        xunmet__mental   =  xunmetsomemost__005  , 
        xunmet__fp   =  xunmetsomemost__006  , 
        xunmet__anc  =  xunmetsomemost__007  , 
        xunmet__sba  =  xunmetsomemost__008  , 
        xunmet__immun    =  xunmetsomemost__009  , 
        xunmet__homebased    =  xunmetsomemost__010  , 
                    
        xunmetmost__urgent   =  xunmetmost__001  , 
        xunmetmost__electsurg    =  xunmetmost__002  , 
        xunmetmost__chronicmeds  =  xunmetmost__003  , 
        xunmetmost__testing  =  xunmetmost__004  , 
        xunmetmost__mental   =  xunmetmost__005  , 
        xunmetmost__fp   =  xunmetmost__006  , 
        xunmetmost__anc  =  xunmetmost__007  , 
        xunmetmost__sba  =  xunmetmost__008  , 
        xunmetmost__immun    =  xunmetmost__009  , 
        xunmetmost__homebased    =  xunmetmost__010  , 
                    
        xsource__chw     =  xsource__001     , 
        xsource__healthpost  =  xsource__002     , 
        xsource__hospital    =  xsource__003     , 
        xsource__pharm   =  xsource__004     , 
        xsource__c19testcentre   =  xsource__005     , 
        xsource__c19phone    =  xsource__006     , 
        xsource__othertrained    =  xsource__007     , 
        xsource__traditional     =  xsource__008     , 
        xsource__internet    =  xsource__009     , 
        xsource__other   =  xsource__010     , 
        xsource__none    =  xsource__011     , 

        xvac_noaccess_reason__eligible   =  xvac_noaccess_reason__001    ,
        xvac_noaccess_reason__distance   =  xvac_noaccess_reason__002    ,
        xvac_noaccess_reason__waitcrowd  =  xvac_noaccess_reason__003    ,
        xvac_noaccess_reason__waitstaff  =  xvac_noaccess_reason__004    ,
        xvac_noaccess_reason__app    =  xvac_noaccess_reason__005    ,
        xvac_noaccess_reason__cost   =  xvac_noaccess_reason__006    ,
        xvac_noaccess_reason__other          =  xvac_noaccess_reason__007    ,                  
        xvac_reason__notconcerned    =  xvac_reason__001     , 
        xvac_reason__uncertain   =  xvac_reason__002     , 
        xvac_reason__sideeffects     =  xvac_reason__003     , 
        xvac_reason__avoidfacilities     =  xvac_reason__004     , 
        xvac_reason__mistrust    =  xvac_reason__005     , 
        xvac_reason__toobusy     =  xvac_reason__006     , 
        xvac_reason__cost    =  xvac_reason__007     , 
        xvac_reason__other   =  xvac_reason__008     , 

        xtraining__spread    =  xtraining__001   ,
        xtraining__mask  =  xtraining__002   ,
        xtraining__covax     =  xtraining__003   ,
        
        xrisk_reason__manypeople     =  xrisk_reason__001    , 
        xrisk_reason__ppelack    =  xrisk_reason__002    , 
        xrisk_reason__novax  =  xrisk_reason__003    ,
        xrisk_reason__age    =  xrisk_reason__004    ,
        xrisk_reason__hours  =  xrisk_reason__005    ,
        xrisk_reason__transport  =  xrisk_reason__006    ,
        xrisk_reason__public     =  xrisk_reason__007    ,
                    
        xsupportneed__monetary   =  xsupportneed__001    , 
        xsupportneed__ppe    =  xsupportneed__002    , 
        xsupportneed__supp   =  xsupportneed__003    , 
        #xsupportneed__traincovid    =  xsupportneed__004    , 
        xsupportneed__tc_protection  =  xsupportneed__004    ,
        xsupportneed__tc_prevention  =  xsupportneed__005    ,
        xsupportneed__tc_vax     =  xsupportneed__006    ,
        xsupportneed__tc_management  =  xsupportneed__007    ,
        xsupportneed__tc_othercovid  =  xsupportneed__008    ,
             
        xsupportneed__trainother     =  xsupportneed__009    ,
        xsupportneed__trans  =  xsupportneed__010    ,
        xsupportneed__insurance  =  xsupportneed__011    ,
        xsupportneed__other  =  xsupportneed__012    ,
                    
        xsrv_reduced__immune     =  xsrvc_reduced__001   , 
        xsrv_reduced__malaria    =  xsrvc_reduced__002   , 
        xsrv_reduced__ntd    =  xsrvc_reduced__003   , 
        xsrv_reduced__tb     =  xsrvc_reduced__004   , 
        xsrv_reduced__home   =  xsrvc_reduced__005   , 
                    
        xsrv_increased__immune   =  xsrvc_increased__001     , 
        xsrv_increased__malaria  =  xsrvc_increased__002     , 
        xsrv_increased__ntd  =  xsrvc_increased__003     , 
        xsrv_increased__tb   =  xsrvc_increased__004     , 
        xsrv_increased__home     =  xsrvc_increased__005     , 
                    
        xsrv_nochange__immune    =  xsrvc_nochange__001  , 
        xsrv_nochange__malaria   =  xsrvc_nochange__002  , 
        xsrv_nochange__ntd   =  xsrvc_nochange__003  , 
        xsrv_nochange__tb    =  xsrvc_nochange__004  , 
        xsrv_nochange__home  =  xsrvc_nochange__005  
        
    )   

#dta%>%select(grep("__", names(dta)))%>%colnames() 

E.3. Export clean Respondent-level data for chart book
write.csv(dta, paste0("Community_", country, "_R", round, ".csv"))

F. Create and export indicator estimate data

F.1. Calculate estimates
dtatemp<-dta%>%
    mutate(
        obs=1,  
        
        obs_chw=NA, 
        obs_vacreason=NA, 
    
        obs_chw=ifelse(zchw==1, 1, obs_chw),    
        obs_vacreason=ifelse(xvac_adult_most!=1, 1, obs_vacreason), 
        #/*2/25/2022 no more xvac_most, since we ask about adults only*/
        
        #REVISION 2021/10/25: CREATE ADDITIONAL "OBS" VARIABLES TO HAVE CORRECT DENOMINATORS IN THE CHARTBOOK 
        obs_riskreason=NA, 
        obs_riskreason=ifelse(xrisk_modhigh==1, 1, obs_riskreason)
        #END OF REVISION
        
    )%>%
    select(-ends_with("_num"))

dtatempx<-dtatemp%>%select(country, round, month, year, starts_with("z"),
                           starts_with("x"))

dtatempy<-dtatemp%>%select(country, round, month, year, starts_with("z"),
                           starts_with("obs"))
# Among all facilities 
dtasummaryx<-dtatempx%>%
    group_by(country, round, month, year)%>%
    summarize_all(funs(mean(., na.rm = TRUE)))%>%
    ungroup()%>%
    mutate(group="All", grouplabel="")

dtasummaryy<-dtatempy%>%
    group_by(country, round, month, year)%>%
    summarize_all(funs(sum(., na.rm = TRUE)))%>%
    ungroup()%>%
    mutate(group="All", grouplabel="")


dtasummaryall<-left_join(dtasummaryx, dtasummaryy, 
                      by = c("country", "round", "month", "year", 
                             "group", "grouplabel"))%>%
    select(-starts_with("z"))


# By residential area
dtasummaryx<-dtatempx%>%
    group_by(country, round, month, year, zurban)%>%
    summarize_all(funs(mean(., na.rm = TRUE)))%>%
    ungroup()%>%
    mutate(
        group="Location", 
        grouplabel="",
        grouplabel= ifelse(zurban==0, "1.1 Rural" , grouplabel),
        grouplabel= ifelse(zurban==1, "1.2 Urban" , grouplabel) )

dtasummaryy<-dtatempy%>%
    group_by(country, round, month, year, zurban)%>%
    summarize_all(funs(sum(., na.rm = TRUE)))%>%
    ungroup()%>%
    mutate(
        group="Location", 
        grouplabel="",
        grouplabel= ifelse(zurban==0, "1.1 Rural" , grouplabel),
        grouplabel= ifelse(zurban==1, "1.2 Urban" , grouplabel) )

dtasummarylocation<-left_join(dtasummaryx, dtasummaryy, 
                      by = c("country", "round", "month", "year", 
                             "group", "grouplabel"))%>%
    select(-starts_with("z"))

# By respondent type
dtasummaryx<-dtatempx%>%
    group_by(country, round, month, year, zchw)%>%
    summarize_all(funs(mean(., na.rm = TRUE)))%>%
    ungroup()%>%
    mutate(
        group="Level", 
        grouplabel="",
        grouplabel= ifelse(zchw==0, "2.1 non CHWs" , grouplabel),
        grouplabel= ifelse(zchw==1, "2.2 CHWs" , grouplabel) )

dtasummaryy<-dtatempy%>%
    group_by(country, round, month, year, zchw)%>%
    summarize_all(funs(sum(., na.rm = TRUE)))%>%
    ungroup()%>%
    mutate(
        group="Level", 
        grouplabel="",
        grouplabel= ifelse(zchw==0, "2.1 non CHWs" , grouplabel),
        grouplabel= ifelse(zchw==1, "2.2 CHWs" , grouplabel) )

dtasummarychw<-left_join(dtasummaryx, dtasummaryy, 
                      by = c("country", "round", "month", "year", 
                             "group", "grouplabel"))%>%
    select(-starts_with("z"))

# Append all 
dim(dtasummaryall)
dim(dtasummarylocation)
dim(dtasummarychw)

setcolorder(dtasummaryall, 
            c("country", "round", "month", "year", "group", "grouplabel", "obs"))
setcolorder(dtasummarylocation, 
            c("country", "round", "month", "year", "group", "grouplabel", "obs"))
setcolorder(dtasummarychw, 
            c("country", "round", "month", "year", "group", "grouplabel", "obs"))

dtasummary<-rbind(dtasummaryall, dtasummarylocation, dtasummarychw)
dim(dtasummary)

dtasummary<-dtasummary%>%
    select(-starts_with("z"))%>%
    mutate_at(vars(starts_with("x")), 
              funs(round((.*100), 0)))%>%
    arrange(country, round, group, grouplabel)

#move up identification variables 
setcolorder(dtasummary, 
            c("country", "round", "month", "year", "group", "grouplabel", "obs"))
colnames(dtasummary[ , 1:10])

#deal with NaN
dtasummary[sapply(dtasummary, is.nan)] <- NA

F.2. Export indicator estimate data for chart book
write.csv(dtasummary, paste0("summary_Community_", country, "_R", round, ".csv"))
dtasummary<-dtasummary%>%
    mutate(
        updatedate = as.Date(Sys.time(  ), format='%d%b%Y'), 
        updatetime = Sys.time()
    )

# for cross check against Stata results 
write.csv(dtasummary, paste0("summary_Community_", country, "_R", round, "_R.csv"))

#write.xlsx(dta, 
#           paste0("WHO_Community_Chartbook.xlsx"), 
#           sheet = "Indicator estimate Data", 
#           append = TRUE)

END OF MARKDOWN FILE