Note This is an R markdown document and refers to the data processing step of the manuscript titled “Associated Factors of ADHD diagnosis and psychostimulant use: a nationwide representative study”, by Arruda, Arruda, Portugal, Guidetti, Landeira-Fernandez, and Anunciação. Data and codes are available at https://osf.io/ubpve/.
Feel free to contact me at
Last update: 03 January, 2022 Thank you.

1 Get excel file

Done in June, 3, 2020

2 Packages

3 Data handling

3.1 Clean dataset

3.2 Get target variables

Use Target variables only. In the excel file, these variables are highlighted.

3.3 Adjust and check variables

Adjust variables that exceed the upper allowed value

## # A tibble: 2 x 2
##   escola_publica_1_part_2     n
##                     <dbl> <int>
## 1                       1  5993
## 2                       2  1121
## # A tibble: 2 x 2
##   public_School     n
##   <fct>         <int>
## 1 public         5993
## 2 private        1121
## # A tibble: 3 x 2
##   densidade_1_100_mil_2_100_500_mil_3_500_mil     n
##                                         <dbl> <int>
## 1                                           1  2600
## 2                                           2  3242
## 3                                           3  1272
## # A tibble: 3 x 2
##   city_size     n
##   <fct>     <int>
## 1 small      2600
## 2 medium     3242
## 3 big        1272
## # A tibble: 5 x 2
##   regiao     n
##   <chr>  <int>
## 1 CO       628
## 2 NE       872
## 3 NO       250
## 4 SE      2345
## 5 SU      3019
## # A tibble: 5 x 2
##   region     n
##   <fct>  <int>
## 1 CO       628
## 2 NE       872
## 3 NO       250
## 4 SE      2345
## 5 SU      3019
## # A tibble: 14 x 2
##    idade     n
##    <dbl> <int>
##  1     5    99
##  2     6   737
##  3     7  1166
##  4     8  1127
##  5     9  1110
##  6    10  1041
##  7    11   574
##  8    12   399
##  9    13   285
## 10    14   250
## 11    15   141
## 12    16   108
## 13    17    62
## 14    18    15
## # A tibble: 14 x 2
##      age     n
##    <dbl> <int>
##  1     5    99
##  2     6   737
##  3     7  1166
##  4     8  1127
##  5     9  1110
##  6    10  1041
##  7    11   574
##  8    12   399
##  9    13   285
## 10    14   250
## 11    15   141
## 12    16   108
## 13    17    62
## 14    18    15
## # A tibble: 2 x 2
##   sexo_fem_1_masc_2     n
##               <dbl> <int>
## 1                 1  3562
## 2                 2  3552
## # A tibble: 2 x 2
##   sex_male     n
##   <fct>    <int>
## 1 female    3562
## 2 male      3552
## # A tibble: 3 x 2
##   cor_1_branca_2_nao_branca_3_nao_informou     n
##                                      <dbl> <int>
## 1                                        1  4609
## 2                                        2  2227
## 3                                        3   278
## # A tibble: 3 x 2
##   race_white     n
##   <fct>      <int>
## 1 other       2227
## 2 white       4609
## 3 <NA>         278
## # A tibble: 4 x 2
##   conjugal_pais_1_casados_amaziados_2_separados_3_nao_informado     n
##                                                           <dbl> <int>
## 1                                                             1  4918
## 2                                                             2  1886
## 3                                                             3    29
## 4                                                            NA   281
## # A tibble: 3 x 2
##   married      n
##   <fct>    <int>
## 1 divorced  1886
## 2 married   4918
## 3 <NA>       310
## # A tibble: 4 x 2
##   grau_instrucao_chefe_1_analfabeto_2_alfabetizado_ate_em_incompleto_3_em~     n
##                                                                      <dbl> <int>
## 1                                                                        1   238
## 2                                                                        2  3699
## 3                                                                        3  2896
## 4                                                                        4   281
## # A tibble: 4 x 2
##   schooling         n
##   <fct>         <int>
## 1 illiteracy      238
## 2 primary        3699
## 3 high_or_above  2896
## 4 <NA>            281
## # A tibble: 3 x 2
##   classe_economica     n
##   <chr>            <int>
## 1 AB                2635
## 2 C                 3513
## 3 DE                 966
## # A tibble: 3 x 2
##   economic_status     n
##   <fct>           <int>
## 1 AB               2635
## 2 C                3513
## 3 DE                966
## # A tibble: 3 x 2
##   tabagismo_ativo_0_nao_1_sim_2_nao_informado     n
##                                         <dbl> <int>
## 1                                           0  6091
## 2                                           1   881
## 3                                           2   142
## # A tibble: 3 x 2
##   smoking     n
##   <fct>   <int>
## 1 no       6091
## 2 yes       881
## 3 <NA>      142
## # A tibble: 4 x 2
##   alcool_0_nao_1_sim_2_nao_informado     n
##                                <dbl> <int>
## 1                                  0  6441
## 2                                  1   463
## 3                                  2    16
## 4                                 NA   194
## # A tibble: 3 x 2
##   alcohol     n
##   <fct>   <int>
## 1 no       6441
## 2 yes       463
## 3 <NA>      210
## # A tibble: 4 x 2
##   desempenho_escolar_prof_1_acima_da_media_2_media_3_abaixo_da_media_4_na~     n
##                                                                      <dbl> <int>
## 1                                                                        1  2104
## 2                                                                        2  2733
## 3                                                                        3  1921
## 4                                                                        4   356
## # A tibble: 4 x 2
##   scholar_achievement     n
##   <fct>               <int>
## 1 average              2733
## 2 above                2104
## 3 below                1921
## 4 <NA>                  356
## # A tibble: 2 x 2
##   snap_pais_prof     n
##            <dbl> <int>
## 1              0  6439
## 2              1   675
## # A tibble: 2 x 2
##   snap_parents_only     n
##   <fct>             <int>
## 1 no                 6439
## 2 yes                 675
## # A tibble: 2 x 2
##   snap_pais_prof_2     n
##              <dbl> <int>
## 1                0  6245
## 2                1   869
## # A tibble: 2 x 2
##   snap_teachers_only     n
##   <fct>              <int>
## 1 no                  6245
## 2 yes                  869
## # A tibble: 3 x 2
##   age_group     n
##   <fct>     <int>
## 1 1          4239
## 2 2          2299
## 3 3           576

3.4 Distribution of SNAP

## # A tibble: 2 x 2
##   parent_reported_adhd     n
##                  <dbl> <int>
## 1                    0  6609
## 2                    1   505
## # A tibble: 2 x 2
##   adhd_parent     n
##   <fct>       <int>
## 1 no           6609
## 2 yes           505
## # A tibble: 2 x 2
##   adhd_risk_dsm_5_0_nao_1_sim     n
##                         <dbl> <int>
## 1                           0  6837
## 2                           1   277
## # A tibble: 2 x 2
##   adhd_risk     n
##   <fct>     <int>
## 1 no         6837
## 2 yes         277
## # A tibble: 3 x 2
##   uso_atual_de_psicoestimulante_0_nao_1_sim_2_nao_informado     n
##                                                       <dbl> <int>
## 1                                                         0  6962
## 2                                                         1   135
## 3                                                         2    17
## # A tibble: 3 x 2
##   psychostimulant     n
##   <fct>           <int>
## 1 no               6962
## 2 yes               135
## 3 <NA>               17

use only new (English) variables

##  [1] "public_School"       "city_size"           "region"             
##  [4] "age"                 "sex_male"            "race_white"         
##  [7] "married"             "schooling"           "economic_status"    
## [10] "smoking"             "alcohol"             "scholar_achievement"
## [13] "snapp_desatencao"    "snapp_hiper"         "snapp_total"        
## [16] "snapprof_desatencao" "snapprof_hiper"      "snapprof_total"     
## [19] "snap_parents_only"   "age_group"           "snap_teachers_only" 
## [22] "adhd_parent"         "adhd_risk"           "psychostimulant"

Insert a unique identification per participant

Check missing

Export this dataset as a csv file for reproduction.

We have two almost equal datasets. Ds is formed of the portuguese names with the original values. ds_selected is formed of variables written in English, with the same values to one we find in ds.

!Done

If you use this material, please cite. Thanks, Luis Anunciação, 2020 (update on January 2022)