Note This is an R markdown document and refers to the data processing step of the manuscript titled “Associated Factors of ADHD diagnosis and psychostimulant use: a nationwide representative study”, by Arruda, Arruda, Portugal, Guidetti, Landeira-Fernandez, and Anunciação. Data and codes are available at https://osf.io/ubpve/.
Feel free to contact me at luisfca@puc-rio.br
Last update: 03 January, 2022 Thank you.
Done in June, 3, 2020
Use Target variables only. In the excel file, these variables are highlighted.
Adjust variables that exceed the upper allowed value
## # A tibble: 2 x 2
## escola_publica_1_part_2 n
## <dbl> <int>
## 1 1 5993
## 2 2 1121
## # A tibble: 2 x 2
## public_School n
## <fct> <int>
## 1 public 5993
## 2 private 1121
## # A tibble: 3 x 2
## densidade_1_100_mil_2_100_500_mil_3_500_mil n
## <dbl> <int>
## 1 1 2600
## 2 2 3242
## 3 3 1272
## # A tibble: 3 x 2
## city_size n
## <fct> <int>
## 1 small 2600
## 2 medium 3242
## 3 big 1272
## # A tibble: 5 x 2
## regiao n
## <chr> <int>
## 1 CO 628
## 2 NE 872
## 3 NO 250
## 4 SE 2345
## 5 SU 3019
## # A tibble: 5 x 2
## region n
## <fct> <int>
## 1 CO 628
## 2 NE 872
## 3 NO 250
## 4 SE 2345
## 5 SU 3019
## # A tibble: 14 x 2
## idade n
## <dbl> <int>
## 1 5 99
## 2 6 737
## 3 7 1166
## 4 8 1127
## 5 9 1110
## 6 10 1041
## 7 11 574
## 8 12 399
## 9 13 285
## 10 14 250
## 11 15 141
## 12 16 108
## 13 17 62
## 14 18 15
## # A tibble: 14 x 2
## age n
## <dbl> <int>
## 1 5 99
## 2 6 737
## 3 7 1166
## 4 8 1127
## 5 9 1110
## 6 10 1041
## 7 11 574
## 8 12 399
## 9 13 285
## 10 14 250
## 11 15 141
## 12 16 108
## 13 17 62
## 14 18 15
## # A tibble: 2 x 2
## sexo_fem_1_masc_2 n
## <dbl> <int>
## 1 1 3562
## 2 2 3552
## # A tibble: 2 x 2
## sex_male n
## <fct> <int>
## 1 female 3562
## 2 male 3552
## # A tibble: 3 x 2
## cor_1_branca_2_nao_branca_3_nao_informou n
## <dbl> <int>
## 1 1 4609
## 2 2 2227
## 3 3 278
## # A tibble: 3 x 2
## race_white n
## <fct> <int>
## 1 other 2227
## 2 white 4609
## 3 <NA> 278
## # A tibble: 4 x 2
## conjugal_pais_1_casados_amaziados_2_separados_3_nao_informado n
## <dbl> <int>
## 1 1 4918
## 2 2 1886
## 3 3 29
## 4 NA 281
## # A tibble: 3 x 2
## married n
## <fct> <int>
## 1 divorced 1886
## 2 married 4918
## 3 <NA> 310
## # A tibble: 4 x 2
## grau_instrucao_chefe_1_analfabeto_2_alfabetizado_ate_em_incompleto_3_em~ n
## <dbl> <int>
## 1 1 238
## 2 2 3699
## 3 3 2896
## 4 4 281
## # A tibble: 4 x 2
## schooling n
## <fct> <int>
## 1 illiteracy 238
## 2 primary 3699
## 3 high_or_above 2896
## 4 <NA> 281
## # A tibble: 3 x 2
## classe_economica n
## <chr> <int>
## 1 AB 2635
## 2 C 3513
## 3 DE 966
## # A tibble: 3 x 2
## economic_status n
## <fct> <int>
## 1 AB 2635
## 2 C 3513
## 3 DE 966
## # A tibble: 3 x 2
## tabagismo_ativo_0_nao_1_sim_2_nao_informado n
## <dbl> <int>
## 1 0 6091
## 2 1 881
## 3 2 142
## # A tibble: 3 x 2
## smoking n
## <fct> <int>
## 1 no 6091
## 2 yes 881
## 3 <NA> 142
## # A tibble: 4 x 2
## alcool_0_nao_1_sim_2_nao_informado n
## <dbl> <int>
## 1 0 6441
## 2 1 463
## 3 2 16
## 4 NA 194
## # A tibble: 3 x 2
## alcohol n
## <fct> <int>
## 1 no 6441
## 2 yes 463
## 3 <NA> 210
## # A tibble: 4 x 2
## desempenho_escolar_prof_1_acima_da_media_2_media_3_abaixo_da_media_4_na~ n
## <dbl> <int>
## 1 1 2104
## 2 2 2733
## 3 3 1921
## 4 4 356
## # A tibble: 4 x 2
## scholar_achievement n
## <fct> <int>
## 1 average 2733
## 2 above 2104
## 3 below 1921
## 4 <NA> 356
## # A tibble: 2 x 2
## snap_pais_prof n
## <dbl> <int>
## 1 0 6439
## 2 1 675
## # A tibble: 2 x 2
## snap_parents_only n
## <fct> <int>
## 1 no 6439
## 2 yes 675
## # A tibble: 2 x 2
## snap_pais_prof_2 n
## <dbl> <int>
## 1 0 6245
## 2 1 869
## # A tibble: 2 x 2
## snap_teachers_only n
## <fct> <int>
## 1 no 6245
## 2 yes 869
## # A tibble: 3 x 2
## age_group n
## <fct> <int>
## 1 1 4239
## 2 2 2299
## 3 3 576
## # A tibble: 2 x 2
## parent_reported_adhd n
## <dbl> <int>
## 1 0 6609
## 2 1 505
## # A tibble: 2 x 2
## adhd_parent n
## <fct> <int>
## 1 no 6609
## 2 yes 505
## # A tibble: 2 x 2
## adhd_risk_dsm_5_0_nao_1_sim n
## <dbl> <int>
## 1 0 6837
## 2 1 277
## # A tibble: 2 x 2
## adhd_risk n
## <fct> <int>
## 1 no 6837
## 2 yes 277
## # A tibble: 3 x 2
## uso_atual_de_psicoestimulante_0_nao_1_sim_2_nao_informado n
## <dbl> <int>
## 1 0 6962
## 2 1 135
## 3 2 17
## # A tibble: 3 x 2
## psychostimulant n
## <fct> <int>
## 1 no 6962
## 2 yes 135
## 3 <NA> 17
use only new (English) variables
## [1] "public_School" "city_size" "region"
## [4] "age" "sex_male" "race_white"
## [7] "married" "schooling" "economic_status"
## [10] "smoking" "alcohol" "scholar_achievement"
## [13] "snapp_desatencao" "snapp_hiper" "snapp_total"
## [16] "snapprof_desatencao" "snapprof_hiper" "snapprof_total"
## [19] "snap_parents_only" "age_group" "snap_teachers_only"
## [22] "adhd_parent" "adhd_risk" "psychostimulant"
Insert a unique identification per participant
Check missing
Export this dataset as a csv file for reproduction.
We have two almost equal datasets. Ds is formed of the portuguese names with the original values. ds_selected is formed of variables written in English, with the same values to one we find in ds.
!Done
If you use this material, please cite. Thanks, Luis Anunciação, 2020 (update on January 2022)