CATI Rapid study with MCA

Load libraries

Configuration

Load data

Prepare data

Let’s now create the dataset that we’ll use for modeling by filtering on some of the variables and transforming some variables to a be factors. There are still lots of NA values for age but we are going to impute those.

Check number of categories

Let’s select the categorical variables

##                                                       provincia 
##                                                               5 
##                                                   caso.suspeito 
##                                                               2 
## qual.e.a.principal.fonte.de.agua.para.beber.e.preparar.a.comida 
##                                                               5 
##                     como.e.que.a.familia.trata.a.agua.que.bebe. 
##                                                               5 
##                                tem.sistema.de.lavagem.das.maos. 
##                                                               2 
##                                              como.lavam.as.maos 
##                                                               2 
##                a.familia.come.alimentos.preparados.fora.da.casa 
##                                                               2 
##                                   se.teve.um.evento.particular. 
##                                                               5 
##                  sabe.como.reduzir.o.risco.de.morte.por.colera. 
##                                                               2 
##                                                      sanitation 
##                                                               4

One hot encode variables

Explore data

Data summary
Name	cati
Number of rows	1551
Number of columns	18
_______________________
Column type frequency:
character	17
numeric	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
provincia	0	1.00	4	12	5
distrito	0	1.00	3	16	28
posto.administrativo	0	1.00	1	17	141
localidade	0	1.00	3	21	185
comunidade.bairro	0	1.00	2	26	377
caso.suspeito	0	1.00	3	3	2
qual.e.a.principal.fonte.de.agua.para.beber.e.preparar.a.comida	0	1.00	3	15	5
como.e.que.a.familia.trata.a.agua.que.bebe.	0	1.00	5	9	5
tem.sistema.de.lavagem.das.maos.	0	1.00	3	3	2
como.lavam.as.maos	0	1.00	4	44	2
tem.latrina.	0	1.00	3	13	3
se.nao.	1122	0.28	21	32	2
a.familia.come.alimentos.preparados.fora.da.casa	0	1.00	3	3	2
qual.o.mercado.principal.onde.se.procura.alimentos	0	1.00	3	32	346
como.a.agua.e.armazenada.pedir.para.ver.a.balde	0	1.00	5	40	105
se.teve.um.evento.particular.	0	1.00	6	25	5
sabe.como.reduzir.o.risco.de.morte.por.colera.	0	1.00	3	3	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
	0	1	5.66	4.09	-1	4	5	7	102	▇▁▁▁▁

Explore dataset features

## Rows: 1,551
## Columns: 18
## $ provincia                                                       <chr> "Sofal…
## $ distrito                                                        <chr> "Beira…
## $ posto.administrativo                                            <chr> "Chive…
## $ localidade                                                      <chr> "Beira…
## $ comunidade.bairro                                               <chr> "Espan…
## $ n.                                                              <dbl> 3, 7, …
## $ caso.suspeito                                                   <chr> "Não",…
## $ qual.e.a.principal.fonte.de.agua.para.beber.e.preparar.a.comida <chr> "Água …
## $ como.e.que.a.familia.trata.a.agua.que.bebe.                     <chr> "Nao t…
## $ tem.sistema.de.lavagem.das.maos.                                <chr> "Não",…
## $ como.lavam.as.maos                                              <chr> "Agua"…
## $ tem.latrina.                                                    <chr> "SIM c…
## $ se.nao.                                                         <chr> NA, NA…
## $ a.familia.come.alimentos.preparados.fora.da.casa                <chr> "Sim",…
## $ qual.o.mercado.principal.onde.se.procura.alimentos              <chr> "Merca…
## $ como.a.agua.e.armazenada.pedir.para.ver.a.balde                 <chr> "Balde…
## $ se.teve.um.evento.particular.                                   <chr> "Event…
## $ sabe.como.reduzir.o.risco.de.morte.por.colera.                  <chr> "Sim",…

How surveyed households are distributed?

Let’s see the distribution of surveyed households by province:

What water sources are used on each province?

What handwash methods are used on each province?

What sanitation systems are used on each province?

What water treaments are used on each province?

How reported suspected cases are distributed by province?

Performing MCA

Extract eigenvalues and Scree Plot

##         eigenvalue variance.percent cumulative.variance.percent
## Dim.1  0.449648259       18.7353441                    18.73534
## Dim.2  0.249999578       10.4166491                    29.15199
## Dim.3  0.226854673        9.4522780                    38.60427
## Dim.4  0.170360756        7.0983648                    45.70264
## Dim.5  0.132724011        5.5301671                    51.23280
## Dim.6  0.121051102        5.0437959                    56.27660
## Dim.7  0.108054890        4.5022871                    60.77889
## Dim.8  0.101494085        4.2289202                    65.00781
## Dim.9  0.094919921        3.9549967                    68.96280
## Dim.10 0.090303823        3.7626593                    72.72546
## Dim.11 0.083670652        3.4862772                    76.21174
## Dim.12 0.080124365        3.3385152                    79.55025
## Dim.13 0.076032967        3.1680403                    82.71830
## Dim.14 0.071865453        2.9943939                    85.71269
## Dim.15 0.066485833        2.7702430                    88.48293
## Dim.16 0.056604458        2.3585191                    90.84145
## Dim.17 0.049051063        2.0437943                    92.88525
## Dim.18 0.043627412        1.8178088                    94.70305
## Dim.19 0.032376142        1.3490059                    96.05206
## Dim.20 0.029997648        1.2499020                    97.30196
## Dim.21 0.023306764        0.9711152                    98.27308
## Dim.22 0.018724171        0.7801738                    99.05325
## Dim.23 0.014568382        0.6070159                    99.66027
## Dim.24 0.008153591        0.3397330                   100.00000

MCA plot of variables

MCA plot of categories

Using FactomineR directly

Using ggplot to colour by variable they belong to

MCA density plot of categories and individuals

Quality of representation of variable categories

The quality of the representation is called the squared cosine (cos2), which measures the degree of association between variable categories and a particular axis. The cos2 of variable categories can be extracted as follow:

It’s also possible to create a bar plot of variable cos2 using the function fviz_cos2()

Also the correlation with the dimensios

Contribution of variable categories to the dimensions

The most important (or, contributing) variable categories can be highlighted on the scatter plot as follow:

Color individuals by groups

Province groups

Sanitation groups

Handwash groups

Water source groups

Water treatment groups

Events group

### Suspected case groups

Take-away food groups

Factor map 1

## Warning: `gather_()` was deprecated in tidyr 1.2.0.
## ℹ Please use `gather()` instead.
## ℹ The deprecated feature was likely used in the factoextra package.
##   Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

CATI Rapid study with MCA

Load libraries

Configuration

Load data

Prepare data

Check number of categories

One hot encode variables

Explore data

How surveyed households are distributed?

What water sources are used on each province?

What handwash methods are used on each province?

What sanitation systems are used on each province?

What water treaments are used on each province?

How reported suspected cases are distributed by province?

How reported suspected cases are distributed by province?

Performing MCA

Extract eigenvalues and Scree Plot

MCA plot of variables

MCA plot of categories

MCA density plot of categories and individuals

Quality of representation of variable categories

Contribution of variable categories to the dimensions

Color individuals by groups

Province groups

Sanitation groups

Handwash groups

Water source groups

Water treatment groups

Events group

Take-away food groups

Factor map 1

Factor map 2

Factor map 3

Dimension description