logo
Data Issue

The data collection for the access and inclusion study was conducted by Ipsos and MCI. While this approach significantly reduced the time required for data collection, it also introduced challenges with data merging. Despite efforts to harmonize the data formats between the two, complete alignment wasn’t always possible. Some common issues are listed below. Although the solutions themselves were straightforward, identifying the differences between the two datasets was time-consuming.

Missmatch column names

The column names between MCI and IPSOS don’t always match. However, this discrepancy mostly occurs with select multiple questions, and there is a pattern. To align them with each other, the MCI data was renamed to match the IPSOS data. The following table shows the change log regarding this renaming.

Different code in data
  1. In the IPSOS data, for choice multiple the the values are looks like -

whereas MCI data looks like -

Additionally, ISPSOS didnt follow a consistent pattern for coding don’t know. They were supposed to use 99/98 but they didnt.

Different level of data

For some indicators such as RDD_phones_01 were collected at HH level by IPSOS but for MCI collected at Individual level. These variables were summarized at HH level to them aling with MCI

Missing columns

For some SP indicators, IPSOS asked questions for each SP program individually. However, MCI did not ask separately for each program, but rather for all programs together. This results in not having all indicators in both datasets.

  1. The following variables exists in MCI data but not in IPSOS data -
## [1] "uuid"                        "invidual_id"                
## [3] "IDP_application_1_other"     "disabililty_status_claim_01"
## [5] "idp_claim_date"              "rejection_reasonoth"
  1. The following variables exists in ISPS data but not in MCI data -
##   [1] "sys_RespNum"                        "sys_SequentialRespNum"             
##   [3] "sys_StartTime"                      "sys_EndTime"                       
##   [5] "sys_ElapsedTime"                    "sys_SumPageTimes"                  
##   [7] "sys_StartTimeStamp"                 "sys_EndTimeStamp"                  
##   [9] "sys_DataSource"                     "sys_RespStatus"                    
##  [11] "sys_RespRemoved"                    "sys_DispositionCode"               
##  [13] "sys_LastQuestion"                   "sys_UserJavaScript"                
##  [15] "sys_UserAgent"                      "sys_OperatingSystem"               
##  [17] "sys_Browser"                        "sys_IPAddress"                     
##  [19] "sys_ScreenWidth"                    "sys_CAPIDeviceID"                  
##  [21] "isTest"                             "MFxPROVIDER"                       
##  [23] "MFxTYPE"                            "MFxCOUNTRY"                        
##  [25] "MFxBATCH"                           "MFxTARGETCODE"                     
##  [27] "MFxTARGET"                          "MFxBASES"                          
##  [29] "contractor"                         "QuotaTotal"                        
##  [31] "QuotaTotalContractor"               "callbackOtherTelephone"            
##  [33] "LASTQUESTION"                       "RBACK"                             
##  [35] "ATTEMPTS"                           "targetPeople"                      
##  [37] "Consent_11_other"                   "newHoh"                            
##  [39] "callback_appointment"               "callback_telephone"                
##  [41] "callbackTelephone_2_other"          "hh_IDPc1"                          
##  [43] "hh_IDPc2"                           "hh_IDPc3"                          
##  [45] "hh_IDPc4"                           "hh_IDPc5"                          
##  [47] "hh_IDPc99"                          "IDP_application_2_other_1"         
##  [49] "IDP_application_3_other_1"          "IDP_application_5_other_1"         
##  [51] "hh_Member_Disability_1"             "idisabililty_status_claim_1_1_year"
##  [53] "disabililty_status_claim_2_1_month" "disabililty_status_claim_3_1_1"    
##  [55] "disabililty_status_claim_3_2_1"     "disabililty_status_claim_3_3_1"    
##  [57] "time_Disabled_Benefits_1_other"     "idp_claim_date_1_year"             
##  [59] "idp_claim_date_2_month"             "idp_claim_date_3"                  
##  [61] "idp_claim_date_4"                   "idp_time_2_other"                  
##  [63] "gmi_time_2_other"                   "hh_utilities_expense_1_other"      
##  [65] "hus_time_2_other"                   "not_received_assistance_2c1"       
##  [67] "not_received_assistance_2c2"        "not_received_assistance_2c3"       
##  [69] "not_received_assistance_2c4"        "not_received_assistance_2c5"       
##  [71] "not_received_assistance_2c6"        "not_received_assistance_2c7"       
##  [73] "not_received_assistance_2c8"        "not_received_assistance_2c9"       
##  [75] "not_received_assistance_2c10"       "not_received_assistance_2c11"      
##  [77] "not_received_assistance_2c12"       "not_received_assistance_2c13"      
##  [79] "not_received_assistance_2c14"       "not_received_assistance_2c15"      
##  [81] "not_received_assistance_2c16"       "not_received_assistance_2c17"      
##  [83] "not_received_assistance_2c99"       "not_received_assistance_15_other_2"
##  [85] "rejection_reason_2c1"               "rejection_reason_2c2"              
##  [87] "rejection_reason_2c3"               "rejection_reason_2c4"              
##  [89] "rejection_reason_2c5"               "rejection_reason_2c6"              
##  [91] "rejection_reason_2c7"               "rejection_reason_2c99"             
##  [93] "not_received_assistance_3c1"        "not_received_assistance_3c2"       
##  [95] "not_received_assistance_3c3"        "not_received_assistance_3c4"       
##  [97] "not_received_assistance_3c5"        "not_received_assistance_3c6"       
##  [99] "not_received_assistance_3c7"        "not_received_assistance_3c8"       
## [101] "not_received_assistance_3c9"        "not_received_assistance_3c10"      
## [103] "not_received_assistance_3c11"       "not_received_assistance_3c12"      
## [105] "not_received_assistance_3c13"       "not_received_assistance_3c14"      
## [107] "not_received_assistance_3c15"       "not_received_assistance_3c16"      
## [109] "not_received_assistance_3c17"       "not_received_assistance_3c99"      
## [111] "not_received_assistance_15_other_3" "rejection_reason_3c1"              
## [113] "rejection_reason_3c2"               "rejection_reason_3c3"              
## [115] "rejection_reason_3c4"               "rejection_reason_3c5"              
## [117] "rejection_reason_3c6"               "rejection_reason_3c7"              
## [119] "rejection_reason_3c99"              "not_received_assistance_4c1"       
## [121] "not_received_assistance_4c2"        "not_received_assistance_4c3"       
## [123] "not_received_assistance_4c4"        "not_received_assistance_4c5"       
## [125] "not_received_assistance_4c6"        "not_received_assistance_4c7"       
## [127] "not_received_assistance_4c8"        "not_received_assistance_4c9"       
## [129] "not_received_assistance_4c10"       "not_received_assistance_4c11"      
## [131] "not_received_assistance_4c12"       "not_received_assistance_4c13"      
## [133] "not_received_assistance_4c14"       "not_received_assistance_4c15"      
## [135] "not_received_assistance_4c16"       "not_received_assistance_4c17"      
## [137] "not_received_assistance_4c99"       "not_received_assistance_15_other_4"
## [139] "rejection_reason_4c1"               "rejection_reason_4c2"              
## [141] "rejection_reason_4c3"               "rejection_reason_4c4"              
## [143] "rejection_reason_4c5"               "rejection_reason_4c6"              
## [145] "rejection_reason_4c7"               "rejection_reason_4c99"             
## [147] "RDD_phones_2"                       "RDD_phones_3"                      
## [149] "RDD_phones_4"                       "RDD_phones_5"                      
## [151] "RDD_phones_6"                       "Recontact_telephone"               
## [153] "gmi_hh_income_threshold_new"        "pos"                               
## [155] "nvidual_idi"                        "relocation_year"                   
## [157] "relocation_month"
Some data cleaning:
  1. In Ipsos data- gmi_time and gmi_time_2_other, hus_time and hus_time_2_other,idp_time and idp_time_2_other was combinded to make it aligned with MCI data
  2. Similarly disability status claims columns disabililty_status_claim_1_1,disabililty_status_claim_2_1, disabililty_status_claim_3_1_1,disabililty_status_claim_3_2_1, disabililty_status_claim_3_3_1 was combined
Potential errors:

As IPSOS and MCI to some extent didn’t follow the same coding, you might see some discrepancies, such as multiple % for Don’t Know (1. for dont_know and 2. another for DK) responses for the same questions. I have tried to address similar issues, but there might still be some instances where you encounter such cases. Please make a list and send them to me if you find any