Pesquisa PD-CRS 2021

Relatório dos resultados parciais do banco de dados revisados da pesquisa sobre o PD-CRS. Primeiro, vamos ler o banco de dados e fazer uma limpeza nos nomes das colunas para facilitar o uso no R.

Bancos de dados

Banco geral

O arquivo .sav foi importado para o R e depois exportado como .csv e, em seguida, importado novamente para o R. Isto foi feito para facilitar a transformacao das variaveis caracteres em numericas, que nao foram possiveis usando outros metodos tradicionais.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(haven)

# No PC Avell
#setwd("C:/Dropbox/Laboratorio/Pedro Brandao/PDCRS")
# No Mac M1
setwd("~/Dropbox/Laboratorio/Pedro Brandao/PDCRS")

# Le o banco .sav exportado pelo SurveyMonkey
dataset = read_sav("normas_pdcrs_2021_3.sav")

# Limpa os nomes das variaveis
dataset <- tibble::as_tibble(dataset, .name_repair = janitor::make_clean_names)
#dataset %>% glimpse

# Exporta para o formato .csv
write_csv(dataset, file ="dados.csv")

library(readr)
dados <- read_csv("dados.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   id = col_character(),
##   data_nasc = col_character(),
##   pesquisador = col_character(),
##   cidade = col_character(),
##   estado = col_character(),
##   nome = col_character(),
##   idade = col_character(),
##   naturalidade = col_character(),
##   estado_naturalidade = col_character(),
##   data_coleta = col_character(),
##   exclu02_psiquiatrico = col_logical(),
##   exclu03_recusa_tcle = col_logical(),
##   exclu05_experiencia_teste = col_logical(),
##   exclu06_linguaportuguesa = col_logical(),
##   exclu07_etilismo = col_logical(),
##   exclu08_problema_motor_mmss = col_logical(),
##   exclu10_afasia = col_logical(),
##   exclu12_tce_epilepsia = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.

dados %>% glimpse

## Rows: 558
## Columns: 192
## $ id                                        <chr> "2019001", "2019002", "20190…
## $ data_nasc                                 <chr> "02/05/2002", "18/02/2004", …
## $ pesquisador                               <chr> "Tatiane H. Werle", "Tatiane…
## $ cidade                                    <chr> "São Carlos", "Palmitos", "P…
## $ estado                                    <chr> "SC", "SC", "SC", "DF", "SC"…
## $ nome                                      <chr> "Gabriela Luft Werlans", "Br…
## $ idade                                     <chr> "16", "14", "17", "16", "18"…
## $ genero                                    <dbl> 2, 2, 1, 2, 1, 1, 2, 1, 1, 1…
## $ naturalidade                              <chr> "São Carlos", "Palmitos", "P…
## $ estado_naturalidade                       <chr> "SC", "SC", "SC", "DF", "SC"…
## $ regiao_brasil                             <dbl> 5, 5, 5, 3, 5, 5, 5, 3, 3, 3…
## $ data_coleta                               <chr> "12/12/2018", "13/12/2018", …
## $ escolaridade                              <dbl> 10, 8, 11, 10, 12, 10, 12, 9…
## $ faixa_escolaridade                        <dbl> 5, 2, 5, 4, 5, 4, 5, 4, 4, 6…
## $ criterio_exclusao                         <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ exclu01_faixa_etaria                      <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu02_psiquiatrico                      <lgl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu03_recusa_tcle                       <lgl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu04_deficiencia                       <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu05_experiencia_teste                 <lgl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu06_linguaportuguesa                  <lgl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu07_etilismo                          <lgl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu08_problema_motor_mmss               <lgl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu09_psicose                           <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu10_afasia                            <lgl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu11_prob_aprendizagem                 <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ exclu12_tce_epilepsia                     <lgl> NA, NA, NA, NA, NA, NA, NA, …
## $ fpt_cinco_pontos_total                    <dbl> 18, 12, 24, 23, 22, 25, 22, …
## $ fpt_cinco_pontos_persev                   <dbl> 0, 1, 0, 2, 0, 1, 0, 0, 2, 0…
## $ fluencia_animais                          <dbl> 18, 19, 27, 12, 14, 20, 20, …
## $ fluencia_frutas                           <dbl> 19, 13, 19, 11, 13, 10, 12, …
## $ fluencia_alternada_pares                  <dbl> 9, 7, 11, 6, 6, 7, 8, 7, 10,…
## $ fluencia_alternada_indiv_total            <dbl> 18, 14, 22, 12, 12, 14, 16, …
## $ ccc_brown_peterson_total                  <dbl> 41, 44, 41, 43, 41, 36, 45, …
## $ ccc_brown_peterson_0seg                   <dbl> 15, 15, 15, 15, 15, 15, 15, …
## $ ccc_brown_peterson_9seg                   <dbl> 13, 9, 5, 13, 11, 5, 9, 15, …
## $ ccc_brown_peterson_18seg                  <dbl> 9, 10, 13, 5, 10, 8, 10, 11,…
## $ ccc_brown_peterson_36seg                  <dbl> 4, 10, 8, 10, 5, 8, 11, 9, 5…
## $ tmt_a                                     <dbl> 52, 31, 15, 28, 23, 25, 31, …
## $ tmt_b                                     <dbl> 64, 59, 41, 58, 61, 57, 49, …
## $ hads_a1_tenso_contraido                   <dbl> 1, 2, 0, 1, 1, 1, 2, 1, 2, 1…
## $ hads_d1_gosto_pelas_coisas                <dbl> 1, 0, 0, 1, 1, 1, 2, 0, 0, 0…
## $ hads_a3_medo                              <dbl> 2, 3, 0, 2, 0, 0, 2, 0, 2, 0…
## $ hads_d4_risada                            <dbl> 1, 1, 0, 0, 0, 1, 1, 0, 0, 0…
## $ hads_a5_preocupacoes                      <dbl> 3, 2, 1, 1, 1, 1, 2, 2, 3, 2…
## $ hads_d6_alegre                            <dbl> 0, 1, 0, 1, 2, 1, 2, 0, 1, 0…
## $ hads_a7_relaxado                          <dbl> 2, 2, 0, 1, 2, 1, 2, 1, 2, 1…
## $ hads_d8_lento_pensar                      <dbl> 1, 2, 0, 0, 1, 1, 3, 1, 2, 2…
## $ hads_a9_medo                              <dbl> 2, 3, 1, 1, 1, 1, 3, 1, 1, 0…
## $ hads_d10_interesse_aparencia              <dbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 1…
## $ hads_a11_inquieto                         <dbl> 1, 3, 0, 0, 1, 2, 1, 0, 0, 0…
## $ hads_d12_animado                          <dbl> 1, 0, 0, 0, 0, 0, 3, 2, 0, 2…
## $ hads_a13_panico                           <dbl> 2, 1, 0, 0, 0, 0, 2, 0, 0, 0…
## $ hads_d14_prazer                           <dbl> 0, 2, 0, 1, 2, 0, 3, 3, 1, 1…
## $ pdcrs_1a_tent_01_luz                      <dbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_1a_tent_02_seda                     <dbl> 0, 1, 1, 1, 1, 1, 0, 1, 1, 1…
## $ pdcrs_1a_tent_03_areia                    <dbl> 1, 0, 0, 0, 0, 0, 1, 0, 0, 1…
## $ pdcrs_1a_tent_04_cilio                    <dbl> 0, 1, 0, 0, 0, 1, 1, 0, 1, 0…
## $ pdcrs_1a_tent_05_arroz                    <dbl> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0…
## $ pdcrs_1a_tent_06_gravata                  <dbl> 1, 1, 1, 1, 1, 0, 1, 0, 1, 0…
## $ pdcrs_1a_tent_07_quadro                   <dbl> 1, 0, 0, 0, 0, 0, 0, 1, 0, 1…
## $ pdcrs_1a_tent_08_bicicleta                <dbl> 0, 1, 0, 1, 0, 0, 0, 1, 1, 1…
## $ pdcrs_1a_tent_09_estrela                  <dbl> 1, 0, 0, 0, 0, 0, 0, 1, 1, 0…
## $ pdcrs_1a_tent_10_leao                     <dbl> 0, 0, 1, 0, 1, 1, 0, 1, 1, 1…
## $ pdcrs_1a_tent_11_anel                     <dbl> 1, 1, 1, 0, 0, 0, 0, 1, 1, 0…
## $ pdcrs_1a_tent_12_perfume                  <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 1, 0…
## $ pdcrs_2a_tent_01_luz                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_2a_tent_02_seda                     <dbl> 0, 0, 1, 1, 0, 1, 1, 1, 1, 1…
## $ pdcrs_2a_tent_03_areia                    <dbl> 1, 1, 0, 1, 1, 0, 1, 1, 0, 0…
## $ pdcrs_2a_tent_04_cilio                    <dbl> 1, 1, 1, 1, 0, 0, 0, 1, 1, 1…
## $ pdcrs_2a_tent_05_arroz                    <dbl> 0, 1, 1, 1, 0, 0, 1, 1, 0, 1…
## $ pdcrs_2a_tent_06_gravata                  <dbl> 1, 1, 1, 1, 0, 0, 1, 0, 1, 0…
## $ pdcrs_2a_tent_07_quadro                   <dbl> 1, 0, 1, 0, 1, 0, 0, 0, 0, 1…
## $ pdcrs_2a_tent_08_bicicleta                <dbl> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_2a_tent_09_estrela                  <dbl> 1, 0, 0, 1, 0, 1, 0, 1, 1, 0…
## $ pdcrs_2a_tent_10_leao                     <dbl> 0, 1, 1, 1, 0, 1, 1, 1, 1, 0…
## $ pdcrs_2a_tent_11_anel                     <dbl> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_2a_tent_12_perfume                  <dbl> 1, 1, 1, 0, 1, 0, 0, 0, 1, 1…
## $ pdcrs_3a_tent_01_luz                      <dbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1…
## $ pdcrs_3a_tent_02_seda                     <dbl> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1…
## $ pdcrs_3a_tent_03_areia                    <dbl> 1, 1, 1, 1, 1, 1, 0, 1, 0, 1…
## $ pdcrs_3a_tent_04_cilio                    <dbl> 1, 1, 1, 1, 0, 1, 0, 1, 1, 0…
## $ pdcrs_3a_tent_05_arroz                    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 0…
## $ pdcrs_3a_tent_06_gravata                  <dbl> 1, 1, 1, 1, 0, 1, 0, 0, 1, 1…
## $ pdcrs_3a_tent_07_quadro                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_3a_tent_08_bicicleta                <dbl> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1…
## $ pdcrs_3a_tent_09_estrela                  <dbl> 1, 1, 0, 1, 1, 0, 0, 1, 1, 1…
## $ pdcrs_3a_tent_10_leao                     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1…
## $ pdcrs_3a_tent_11_anel                     <dbl> 1, 1, 1, 1, 1, 0, 1, 0, 1, 0…
## $ pdcrs_3a_tent_12_perfume                  <dbl> 0, 1, 1, 1, 0, 0, 1, 1, 1, 1…
## $ pdcrs_s1_evoc_imediata                    <dbl> 12, 13, 12, 13, 10, 9, 9, 11…
## $ pdcrs_nomeacao_01_babador                 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_02_vela                    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_03_cereja                  <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 1…
## $ pdcrs_nomeacao_04_banqueta                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0…
## $ pdcrs_nomeacao_05_ancora                  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_06_tartaruga               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_07_pipa                    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_08_aquario                 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_09_lampada                 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_10_violao                  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_11_fivela                  <dbl> 1, 0, 1, 1, 0, 0, 1, 1, 1, 1…
## $ pdcrs_nomeacao_12_crina                   <dbl> 1, 1, 1, 0, 1, 1, 1, 0, 0, 0…
## $ pdcrs_nomeacao_13_anzol                   <dbl> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1…
## $ pdcrs_nomeacao_14_chave_fenda             <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_15_biombo                  <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1…
## $ pdcrs_nomeacao_16_alfinete                <dbl> 1, 0, 1, 1, 0, 1, 1, 0, 0, 1…
## $ pdcrs_nomeacao_17_chocalho                <dbl> 1, 1, 1, 0, 0, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_18_casco                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_19_extintor                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_nomeacao_20_trinco                  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_atencao_item01                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_atencao_item02                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_atencao_item03                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_atencao_item04                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_atencao_item05                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_atencao_item06                      <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 1…
## $ pdcrs_atencao_item07                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_atencao_item08                      <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 0, 1…
## $ pdcrs_atencao_item09                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_atencao_item10                      <dbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_mem_operac_item01                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_mem_operac_item02                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_mem_operac_item03                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_mem_operac_item04                   <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1…
## $ pdcrs_mem_operac_item05                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_mem_operac_item06                   <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0…
## $ pdcrs_mem_operac_item07                   <dbl> 1, 1, 1, 1, 1, 0, 1, 1, 0, 0…
## $ pdcrs_mem_operac_item08                   <dbl> 1, 1, 1, 1, 1, 0, 0, 0, 1, 0…
## $ pdcrs_mem_operac_item09                   <dbl> 1, 0, 0, 0, 0, 0, 1, 1, 1, 1…
## $ pdcrs_mem_operac_item10                   <dbl> 0, 1, 0, 0, 0, 0, 0, 1, 0, 0…
## $ pdcrs_relogio_item01_parece_relog         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_relogio_item02_sem_setores          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_relogio_item03_simetria             <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0…
## $ pdcrs_relogio_item04_num_1a12             <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_relogio_item05_sequencia_ok         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_relogio_item06_dois_pont            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0…
## $ pdcrs_relogio_item07_setas                <dbl> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_relogio_item08_pont_horas_menor     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_relogio_item09_sem_palavra          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_relogio_item10_sem_num25            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item01_parece_relog     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item02_sem_setores      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item03_simetria         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0…
## $ pdcrs_copia_relog_item04_num_1a12         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item05_sequencia_ok     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item06_dois_pont        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item07_setas            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item08_pont_horas_menor <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item09_sem_palavra      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_copia_relog_item10_sem_num25        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdcrs_evoc_tardia_01_luz                  <dbl> 1, 1, 1, 1, 0, 1, 1, 1, 1, 1…
## $ pdcrs_evoc_tardia_02_seda                 <dbl> 1, 1, 1, 1, 0, 0, 1, 1, 1, 1…
## $ pdcrs_evoc_tardia_03_areia                <dbl> 1, 1, 0, 1, 1, 0, 1, 1, 0, 1…
## $ pdcrs_evoc_tardia_04_cilio                <dbl> 0, 1, 1, 1, 1, 1, 0, 1, 1, 0…
## $ pdcrs_evoc_tardia_05_arroz                <dbl> 1, 1, 1, 1, 1, 1, 0, 0, 0, 0…
## $ pdcrs_evoc_tardia_06_gravata              <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 1, 1…
## $ pdcrs_evoc_tardia_07_quadro               <dbl> 1, 1, 1, 0, 0, 1, 1, 1, 1, 1…
## $ pdcrs_evoc_tardia_08_bicicleta            <dbl> 1, 1, 1, 1, 0, 1, 1, 1, 1, 1…
## $ pdcrs_evoc_tardia_09_estrela              <dbl> 1, 0, 1, 1, 0, 0, 1, 0, 0, 1…
## $ pdcrs_evoc_tardia_10_leao                 <dbl> 0, 0, 1, 0, 1, 1, 0, 1, 1, 0…
## $ pdcrs_evoc_tardia_11_anel                 <dbl> 1, 1, 0, 1, 1, 0, 1, 1, 0, 0…
## $ pdcrs_evoc_tardia_12_perfume              <dbl> 1, 1, 0, 0, 0, 0, 0, 1, 0, 0…
## $ pdcrs_s8_fluencia_alternada               <dbl> 13, 13, 13, 11, 13, 7, 13, 1…
## $ pdcrs_s9_fluencia_acoes                   <dbl> 16, 17, 24, 13, 13, 9, 15, 1…
## $ pdcfrs01lidardinheiro                     <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs02adm_contas                        <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs03plan_ferias                       <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs04recibos_faturas                   <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs05medicamentos                      <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs06ativ_cotidianas                   <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs07eletrodomesticos                  <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs08transp_publico                    <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs09resolv_imprevistos                <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs10explicaroquediz                   <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs11entenderoquele                    <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcfrs12tel_celular                       <dbl> NA, NA, NA, NA, NA, NA, NA, …
## $ pdcrs_memoria1                            <dbl> 6, 5, 5, 5, 5, 5, 4, 7, 9, 6…
## $ pdcrs_memoria2                            <dbl> 8, 9, 9, 10, 6, 6, 8, 9, 9, …
## $ pdcrs_memoria3                            <dbl> 11, 12, 11, 12, 9, 8, 6, 10,…
## $ pdcrs_s1_memoria_imediata                 <dbl> 11, 12, 11, 12, 9, 8, 8, 10,…
## $ pdcrs_s2_nomeacao                         <dbl> 20, 17, 19, 16, 16, 18, 18, …
## $ pdcrs_s3_atencao_sust                     <dbl> 10, 9, 10, 10, 10, 10, 10, 8…
## $ pdcrs_s4_memoria_operac                   <dbl> 9, 9, 8, 7, 8, 6, 8, 8, 8, 6…
## $ pdcrs_s5_desenho_relogio                  <dbl> 10, 10, 9, 10, 10, 10, 10, 9…
## $ pdcrs_s6_copia_relogio                    <dbl> 10, 10, 10, 10, 10, 10, 10, …
## $ pdcrs_s7_memoria_tardia                   <dbl> 7, 8, 9, 8, 4, 7, 6, 7, 7, 7…
## $ pdcrs_s8_fluencia_alternada_2             <dbl> 13, 13, 13, 11, 13, 7, 13, 1…
## $ pdcrs_s9_fluencia_acoes_2                 <dbl> 16, 17, 24, 13, 13, 9, 15, 1…
## $ pdcrs_frontal_subcortical_score           <dbl> 76, 78, 84, 71, 67, 57, 70, …
## $ pdcrs_posterior_cortical_score            <dbl> 30, 27, 29, 26, 26, 28, 28, …
## $ pdcrs_total_score                         <dbl> 106, 105, 113, 97, 93, 85, 9…

# Transforma a idade em variavel numerica
dados$idade <- as.numeric(dados$idade)

## Warning: NAs introduzidos por coerção

# Plota a distribuicao da idade
hist(dados$idade, breaks = 50)

Bancos individuais

Agora os bancos individuais foram criados para que se possa fazer o metodo de imputacao de forma apropriada. Em todos os bancos, a variavel ID foi selecionada para permitir uma futura comunicacao entre os diferentes bancos (correlacoes entre testes, analises MIMIC, etc).

# Dados demograficos
id <- dados %>% select(id,estado,idade:regiao_brasil,escolaridade,faixa_escolaridade)

# FDT - Five Digits Test
fpt <- dados %>% select(id, idade, escolaridade, genero, starts_with("fpt"))

# Fluencia verbal
fluencia <- dados %>% select(id, idade, escolaridade, genero, starts_with("fluencia"))

# CCC - Trigramas de Consoantes
ccc <- dados %>% select(id, idade, escolaridade, genero, starts_with("ccc"))

# TMT - Trilhas
tmt <- dados %>% select(id, idade, escolaridade, genero, starts_with("tmt"))

# HADS - Escala Hospitalar de Ansiedade e Depressao
hads <- dados %>% select(id, idade, escolaridade, genero, starts_with("hads"))

# PD-CFRS - Escala Funcional
cfrs <- dados %>% select(id, idade, escolaridade, genero, starts_with("pdcfrs"))

# PD-CRS - Escala Cognitiva
pdcrs <- dados %>% select(id, idade, escolaridade, genero, pdcrs_s1_memoria_imediata:pdcrs_posterior_cortical_score)

Teste TMT (Trilhas)

Tratamento dos dados

O problema do TMT é a cauda longa na direita. Essa cauda pode ocorrer por erro de aplicacao, descuido do participante, ou devido a lentidao psicomotora do respondente. Muitas vezes, um unico numero (ou letra) perdido pode levar um longo tempo. A literatura sobre o TMT é enorme, por isso irei focar naqueles artigos que usaram metodos de regressao na elaboracao das normas.

Neste artigo português, os autores apresentam também os escores derivados Trail Making Test: Regression-based Norms for the Portuguese Population de Cavaco et al. (2013) https://academic.oup.com/acn/article/28/2/189/5653. O primeiro problema agora é que não temos os valores corretos (em segundos) para dividir B/A ou subtrair B-A. Talvez, a nossa melhor estrategia nao seria impedir os alunos de inserir tempos maiores que 180 segundos em TMT-A e 300 segundos em TMT-B, já que temos a opcao de fazermos isso no R. Por exemplo, B-A: 340-225=115, o mesmo que 150-35=115. Pro primeiro individuo B/A seria 340/225=1.51, considerado valido em termos de controle, e pro segundo 4.29 o que ultrapassa o criterio de 2.5, sendo o dado invalido.

De qualquer modo, temos alguns casos ausentes (missing values). Iremos abordar duas estrategias para lidar com estes casos: a) eliminar os sujeitos que estao ausentes em TMT-A ou TMT-B, e depois imputar os que forem retirados com valores em A=180 e B=300. Isto é necessario para que a distribuicao do teste nao apresente um pico na extremidade da direita (num histograma, por exemplo). b) imputar os dados ausentes antes de eliminar os outliers e imputar novamente apos a primeira imputacao.

# Descricao dos dados TMT
sum(tmt$tmt_a < 6, na.rm = TRUE) # detecta casos muito rapidos

## [1] 0

sum(tmt$tmt_a > 179, na.rm = TRUE) # detecta casos muito lentos

## [1] 23

sum(tmt$tmt_b < 8, na.rm = TRUE)

## [1] 0

sum(tmt$tmt_b > 299, na.rm = TRUE)

## [1] 38

knitr::kable(psych::describe(tmt, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	558	275.3	159.2	1	551	550	6.7	54.7	137.2	275.5	412.8	495.3
idade	2	555	44.9	25.4	6	89	83	1.1	15.0	18.0	44.0	70.0	80.0
escolaridade	3	556	10.0	3.9	1	26	25	0.2	4.5	8.0	10.0	12.0	15.0
genero	4	558	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	5	535	60.2	45.9	15	381	366	2.0	26.0	32.0	45.0	68.5	111.6
tmt_b	6	527	133.5	104.1	13	1404	1391	4.5	55.0	68.5	105.0	160.0	266.0

# Temos duas estrategias: a) eliminar os sujeitos com dados ausentes, b) imputar antes de eliminar os outliers

# a) elimina os sujeitos com dados ausentes
tmt.na <- na.omit(tmt)
knitr::kable(psych::describe(tmt.na, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	522	257.4	148.9	1	515	514	6.5	51.1	128.2	257.5	385.8	462.9
idade	2	522	43.6	25.1	6	89	83	1.1	15.0	18.0	41.0	67.0	79.9
escolaridade	3	522	10.1	3.8	2	26	24	0.2	5.0	8.0	10.0	12.0	15.0
genero	4	522	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	5	522	59.6	45.5	15	381	366	2.0	26.0	32.0	44.5	67.8	109.8
tmt_b	6	522	133.2	104.2	13	1404	1391	4.6	55.0	68.2	105.0	160.0	265.5

# Agora elimina os sujeitos com TMT-A de 180 seg e TMT-B de 300 seg
sum(tmt.na$tmt_a > 179, na.rm = TRUE) # detecta cutoff

## [1] 21

sum(tmt.na$tmt_b > 299, na.rm = TRUE) # detecta cutoff

## [1] 37

tmt.na$tmt_a[tmt.na$tmt_a > 179] <- NA # elimina os valores muito lentos em TMT-A
tmt.na$tmt_b[tmt.na$tmt_b > 299] <- NA # elimina os valores muito lentos em TMT-B

knitr::kable(psych::describe(tmt.na, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	522	257.4	148.9	1	515	514	6.5	51.1	128.2	257.5	385.8	462.9
idade	2	522	43.6	25.1	6	89	83	1.1	15.0	18.0	41.0	67.0	79.9
escolaridade	3	522	10.1	3.8	2	26	24	0.2	5.0	8.0	10.0	12.0	15.0
genero	4	522	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	5	501	52.9	30.0	15	169	154	1.3	26.0	32.0	44.0	65.0	94.0
tmt_b	6	485	114.3	59.9	13	298	285	2.7	54.0	66.0	100.0	147.0	199.0

# Imputacao dos outliers em tmt.na
library(mice)

## 
## Attaching package: 'mice'

## The following object is masked from 'package:stats':
## 
##     filter

## The following objects are masked from 'package:base':
## 
##     cbind, rbind

imp.tmt.na <- mice(tmt.na, method = "cart", seed = 123) # Classification and Regression Trees

## 
##  iter imp variable
##   1   1  tmt_a  tmt_b
##   1   2  tmt_a  tmt_b
##   1   3  tmt_a  tmt_b
##   1   4  tmt_a  tmt_b
##   1   5  tmt_a  tmt_b
##   2   1  tmt_a  tmt_b
##   2   2  tmt_a  tmt_b
##   2   3  tmt_a  tmt_b
##   2   4  tmt_a  tmt_b
##   2   5  tmt_a  tmt_b
##   3   1  tmt_a  tmt_b
##   3   2  tmt_a  tmt_b
##   3   3  tmt_a  tmt_b
##   3   4  tmt_a  tmt_b
##   3   5  tmt_a  tmt_b
##   4   1  tmt_a  tmt_b
##   4   2  tmt_a  tmt_b
##   4   3  tmt_a  tmt_b
##   4   4  tmt_a  tmt_b
##   4   5  tmt_a  tmt_b
##   5   1  tmt_a  tmt_b
##   5   2  tmt_a  tmt_b
##   5   3  tmt_a  tmt_b
##   5   4  tmt_a  tmt_b
##   5   5  tmt_a  tmt_b

## Warning: Number of logged events: 1

compl.tmt.na <- complete(imp.tmt.na)
sum(is.na(compl.tmt.na)) # checa se ha missing

## [1] 0

knitr::kable(psych::describe(tmt.na, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	522	257.4	148.9	1	515	514	6.5	51.1	128.2	257.5	385.8	462.9
idade	2	522	43.6	25.1	6	89	83	1.1	15.0	18.0	41.0	67.0	79.9
escolaridade	3	522	10.1	3.8	2	26	24	0.2	5.0	8.0	10.0	12.0	15.0
genero	4	522	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	5	501	52.9	30.0	15	169	154	1.3	26.0	32.0	44.0	65.0	94.0
tmt_b	6	485	114.3	59.9	13	298	285	2.7	54.0	66.0	100.0	147.0	199.0

knitr::kable(psych::describe(compl.tmt.na, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	522	257.4	148.9	1	515	514	6.5	51.1	128.2	257.5	385.8	462.9
idade	2	522	43.6	25.1	6	89	83	1.1	15.0	18.0	41.0	67.0	79.9
escolaridade	3	522	10.1	3.8	2	26	24	0.2	5.0	8.0	10.0	12.0	15.0
genero	4	522	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	5	522	54.9	32.1	15	169	154	1.4	26.0	32.0	44.0	66.0	101.0
tmt_b	6	522	120.1	63.5	13	298	285	2.8	55.0	68.2	105.0	153.0	221.9

# b) imputa antes de eliminar os outliers em tmt
imp.tmt <- mice(tmt, method = "cart", seed = 123) # Classification and Regression Trees

## 
##  iter imp variable
##   1   1  idade  escolaridade  tmt_a  tmt_b
##   1   2  idade  escolaridade  tmt_a  tmt_b
##   1   3  idade  escolaridade  tmt_a  tmt_b
##   1   4  idade  escolaridade  tmt_a  tmt_b
##   1   5  idade  escolaridade  tmt_a  tmt_b
##   2   1  idade  escolaridade  tmt_a  tmt_b
##   2   2  idade  escolaridade  tmt_a  tmt_b
##   2   3  idade  escolaridade  tmt_a  tmt_b
##   2   4  idade  escolaridade  tmt_a  tmt_b
##   2   5  idade  escolaridade  tmt_a  tmt_b
##   3   1  idade  escolaridade  tmt_a  tmt_b
##   3   2  idade  escolaridade  tmt_a  tmt_b
##   3   3  idade  escolaridade  tmt_a  tmt_b
##   3   4  idade  escolaridade  tmt_a  tmt_b
##   3   5  idade  escolaridade  tmt_a  tmt_b
##   4   1  idade  escolaridade  tmt_a  tmt_b
##   4   2  idade  escolaridade  tmt_a  tmt_b
##   4   3  idade  escolaridade  tmt_a  tmt_b
##   4   4  idade  escolaridade  tmt_a  tmt_b
##   4   5  idade  escolaridade  tmt_a  tmt_b
##   5   1  idade  escolaridade  tmt_a  tmt_b
##   5   2  idade  escolaridade  tmt_a  tmt_b
##   5   3  idade  escolaridade  tmt_a  tmt_b
##   5   4  idade  escolaridade  tmt_a  tmt_b
##   5   5  idade  escolaridade  tmt_a  tmt_b

## Warning: Number of logged events: 1

compl.tmt <- complete(imp.tmt)
sum(is.na(compl.tmt)) # checa se ha missing

## [1] 0

# Agora elimina os sujeitos com TMT-A de 180 seg e TMT-B de 300 seg
sum(tmt$tmt_a > 179, na.rm = TRUE) # detecta cutoff

## [1] 23

sum(tmt$tmt_b > 299, na.rm = TRUE) # detecta cutoff

## [1] 38

tmt$tmt_a[tmt$tmt_a > 179] <- NA # elimina os valores muito lentos em TMT-A
tmt$tmt_b[tmt$tmt_b > 299] <- NA # elimina os valores muito lentos em TMT-B

# imputa novamente em b)
imp.tmt2 <- mice(tmt, method = "cart", seed = 123) # Classification and Regression Trees

## 
##  iter imp variable
##   1   1  idade  escolaridade  tmt_a  tmt_b
##   1   2  idade  escolaridade  tmt_a  tmt_b
##   1   3  idade  escolaridade  tmt_a  tmt_b
##   1   4  idade  escolaridade  tmt_a  tmt_b
##   1   5  idade  escolaridade  tmt_a  tmt_b
##   2   1  idade  escolaridade  tmt_a  tmt_b
##   2   2  idade  escolaridade  tmt_a  tmt_b
##   2   3  idade  escolaridade  tmt_a  tmt_b
##   2   4  idade  escolaridade  tmt_a  tmt_b
##   2   5  idade  escolaridade  tmt_a  tmt_b
##   3   1  idade  escolaridade  tmt_a  tmt_b
##   3   2  idade  escolaridade  tmt_a  tmt_b
##   3   3  idade  escolaridade  tmt_a  tmt_b
##   3   4  idade  escolaridade  tmt_a  tmt_b
##   3   5  idade  escolaridade  tmt_a  tmt_b
##   4   1  idade  escolaridade  tmt_a  tmt_b
##   4   2  idade  escolaridade  tmt_a  tmt_b
##   4   3  idade  escolaridade  tmt_a  tmt_b
##   4   4  idade  escolaridade  tmt_a  tmt_b
##   4   5  idade  escolaridade  tmt_a  tmt_b
##   5   1  idade  escolaridade  tmt_a  tmt_b
##   5   2  idade  escolaridade  tmt_a  tmt_b
##   5   3  idade  escolaridade  tmt_a  tmt_b
##   5   4  idade  escolaridade  tmt_a  tmt_b
##   5   5  idade  escolaridade  tmt_a  tmt_b

## Warning: Number of logged events: 1

compl.tmt2 <- complete(imp.tmt2)
sum(is.na(compl.tmt2)) # checa se ha missing

## [1] 0

# Banco original
knitr::kable(psych::describe(tmt, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	558	275.3	159.2	1	551	550	6.7	54.7	137.2	275.5	412.8	495.3
idade	2	555	44.9	25.4	6	89	83	1.1	15.0	18.0	44.0	70.0	80.0
escolaridade	3	556	10.0	3.9	1	26	25	0.2	4.5	8.0	10.0	12.0	15.0
genero	4	558	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	5	512	53.3	30.4	15	169	154	1.3	26.0	32.0	44.0	65.0	95.8
tmt_b	6	489	114.4	60.1	13	298	285	2.7	54.0	66.0	100.0	147.0	199.0

# Banco imputado a)
knitr::kable(psych::describe(compl.tmt.na, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	522	257.4	148.9	1	515	514	6.5	51.1	128.2	257.5	385.8	462.9
idade	2	522	43.6	25.1	6	89	83	1.1	15.0	18.0	41.0	67.0	79.9
escolaridade	3	522	10.1	3.8	2	26	24	0.2	5.0	8.0	10.0	12.0	15.0
genero	4	522	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	5	522	54.9	32.1	15	169	154	1.4	26.0	32.0	44.0	66.0	101.0
tmt_b	6	522	120.1	63.5	13	298	285	2.8	55.0	68.2	105.0	153.0	221.9

# Banco imputado b)
knitr::kable(psych::describe(compl.tmt2, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	558	275.3	159.2	1	551	550	6.7	54.7	137.2	275.5	412.8	495.3
idade	2	558	44.9	25.4	6	89	83	1.1	15.0	18.0	44.0	70.0	80.0
escolaridade	3	558	10.1	3.9	1	26	25	0.2	4.7	8.0	10.0	12.0	16.0
genero	4	558	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	5	558	57.2	33.6	15	169	154	1.4	26.0	34.0	46.0	70.0	106.0
tmt_b	6	558	121.3	63.4	13	298	285	2.7	55.0	70.2	107.0	151.8	223.5

Analises graficas (TMT-A)

# Seleciona qual banco de dados usar
df = compl.tmt.na

# devtools::install_github("dustinfife/flexplot")
library(flexplot)

## 
## Attaching package: 'flexplot'

## The following object is masked from 'package:ggplot2':
## 
##     flip_data

# Relacoes univariadas (TMT-A)
flexplot(tmt_a ~ idade, data=df, method = 'loess', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Tem um efeito quadratico para a idade
flexplot(tmt_a ~ idade, data=df, method = 'quadratic', se=T)

flexplot(tmt_a ~ escolaridade, data=df, method = 'loess', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Tem um efeito cubico para a escolaridade
flexplot(tmt_a ~ escolaridade, data=df, method = 'cubic', se=T)

# Relacoes multivariadas
flexplot(tmt_a ~ idade + escolaridade | genero, data=df, method = 'lm')

## `geom_smooth()` using formula 'y ~ x'

# Com IC-95%
flexplot(tmt_a ~ idade + escolaridade | genero, data=df, method = 'lm', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Modelo com interacao
full.mod <- lm(tmt_a ~ idade * escolaridade, data=df)
full.mod

## 
## Call:
## lm(formula = tmt_a ~ idade * escolaridade, data = df)
## 
## Coefficients:
##        (Intercept)               idade        escolaridade  idade:escolaridade  
##           32.71287             0.96700            -1.13881            -0.01863

#estimates(full.mod)

# Modelo sem interacao
reduced.mod = lm(tmt_a ~ idade + escolaridade, data=df)
reduced.mod

## 
## Call:
## lm(formula = tmt_a ~ idade + escolaridade, data = df)
## 
## Coefficients:
##  (Intercept)         idade  escolaridade  
##      44.0497        0.7753       -2.2669

#estimates(reduced.mod)

visualize(full.mod)

model.comparison(full.mod, reduced.mod)

## $statistics
##                  aic      bic bayes.factor p.value r.squared
## full.mod    4840.844 4862.132        0.102   0.196     0.403
## reduced.mod 4840.528 4857.558        9.844             0.401
## 
## $pred.difference
##    0%   25%   50%   75%  100% 
## 0.003 0.314 0.835 1.628 4.910

Analises graficas (TMT-B)

# Relacoes univariadas (TMT-A)
flexplot(tmt_b ~ idade, data=df, method = 'loess', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Tem um efeito linear para a idade
flexplot(tmt_b ~ idade, data=df, method = 'lm', se=T)

## `geom_smooth()` using formula 'y ~ x'

flexplot(tmt_b ~ escolaridade, data=df, method = 'loess', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Tem um efeito cubico para a escolaridade
flexplot(tmt_b ~ escolaridade, data=df, method = 'cubic', se=T)

# Relacoes multivariadas
flexplot(tmt_b ~ idade + escolaridade | genero, data=df, method = 'lm')

## `geom_smooth()` using formula 'y ~ x'

# Com IC-95%
flexplot(tmt_b ~ idade + escolaridade | genero, data=df, method = 'lm', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Modelo com interacao
full.mod = lm(tmt_b ~ idade * escolaridade, data=df)
full.mod

## 
## Call:
## lm(formula = tmt_b ~ idade * escolaridade, data = df)
## 
## Coefficients:
##        (Intercept)               idade        escolaridade  idade:escolaridade  
##          104.51692             1.48673            -5.38926             0.01161

#estimates(full.mod)

# Modelo sem interacao
reduced.mod = lm(tmt_b ~ idade + escolaridade, data=df)
reduced.mod

## 
## Call:
## lm(formula = tmt_b ~ idade + escolaridade, data = df)
## 
## Coefficients:
##  (Intercept)         idade  escolaridade  
##       97.451         1.606        -4.686

#estimates(reduced.mod)

visualize(full.mod)

model.comparison(full.mod, reduced.mod)

## $statistics
##                  aic      bic bayes.factor p.value r.squared
## full.mod    5523.894 5545.182        0.048   0.675     0.438
## reduced.mod 5522.071 5539.102       20.912             0.438
## 
## $pred.difference
##    0%   25%   50%   75%  100% 
## 0.002 0.196 0.520 1.015 3.060

Escores derivados

Os escores derivados foram calculados a partir do artigo https://academic.oup.com/acn/article/28/2/189/5653, considerando a subtração B-A do teste mais complexo (TMT-B) pelo mais facil (TMT-A), a divisao B/A, a subtracao do mais dificil pelo mais facil divido pelo mais facil (B-A)/A, a soma dos tempos divido por 100 (A+B)/100 e a simples soma dos tempos A+B.

# B-A
df$`B-A` <- df$tmt_b - df$tmt_a
# B/A
df$`B/A` <- df$tmt_b / df$tmt_a
# (B-A)/A
df$`(B-A)/A` <- (df$tmt_b - df$tmt_a) / df$tmt_a
# A+B
df$`A+B` <- df$tmt_a + df$tmt_b
# (AxB)/100
df$`(A+B)/100` <- (df$tmt_a * df$tmt_b) / 100

# Descritivas dos escores derivados
knitr::kable(psych::describe(df[-1], skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
idade	1	522	43.6	25.1	6.0	89.0	83.0	1.1	15.0	18.0	41.0	67.0	79.9
escolaridade	2	522	10.1	3.8	2.0	26.0	24.0	0.2	5.0	8.0	10.0	12.0	15.0
genero	3	522	1.5	0.5	1.0	2.0	1.0	0.0	1.0	1.0	2.0	2.0	2.0
tmt_a	4	522	54.9	32.1	15.0	169.0	154.0	1.4	26.0	32.0	44.0	66.0	101.0
tmt_b	5	522	120.1	63.5	13.0	298.0	285.0	2.8	55.0	68.2	105.0	153.0	221.9
B-A	6	522	65.2	46.1	-28.0	225.0	253.0	2.0	20.0	31.2	53.0	90.0	128.0
B/A	7	522	2.4	0.9	0.5	5.8	5.3	0.0	1.4	1.7	2.2	2.7	3.8
(B-A)/A	8	522	1.4	0.9	-0.5	4.8	5.3	0.0	0.4	0.7	1.2	1.7	2.8
A+B	9	522	175.0	89.5	38.0	453.0	415.0	3.9	84.0	106.0	150.5	219.2	314.6
(A+B)/100	10	522	80.6	88.6	3.2	471.5	468.3	3.9	14.9	24.9	45.9	98.6	199.6

# Histogramas
DataExplorer::plot_histogram(df[4:11], ggtheme = theme_bw())

# Histogramas e densidades
par(mfrow = c(3, 3), mar = c(4, 2, 2, 2))
for (var in c("tmt_a", "tmt_b", "B-A", "B/A", "(B-A)/A", "A+B", "(A+B)/100")) {
  x = na.omit(df[-1])[, var]
  hist(x, prob = TRUE, xlab = var, ylab = "", main = "", col = "skyblue")
  lines(density(x), lwd = 2, col = "#FF80AA")
  curve(dnorm(x, mean = mean(x), sd = sd(x)), from = min(x), to = max(x),
        add = TRUE, lwd = 2, col = "steelblue")
}

par(mfrow = c(1, 1))

# Correlacoes robustas (matriz de correlacao Winzorizada)
corr <- WRS2::pball(df[2:11])
DataExplorer::plot_correlation(corr$pbcorm)

Alguns escores derivados são praticamente identicos, como o A+B e (A+B)/100, B/A e (B-A/A), e outros possuem correlacoes superiores a 0,90. Considerando os manuais tradicionais como o Compendium de Strauss, os escores derivados B-A e B/A sao mais informativos sobre a diferenca individual entre as tarefas.

Regressoes TMT-A

Normas baseadas em regressao, obvio! A questão é selecionar qual tipo de regressao (linear, quadratica, robusta, Bayesiana etc). Uma boa solucao é usar a distribuicao-t, ja que o pressuposto de normalidade dos residuos da regressao foi violado.

# Function for Root Mean Squared Error
RMSE <- function(error) { sqrt(mean(error^2)) }
#RMSE(fit$residuals)

# Function for Mean Absolute Error
MAE <- function(error) { mean(abs(error)) }
#mae(fit$residuals)

# https://easystats.github.io/performance/
library(performance)

## 
## Attaching package: 'performance'

## The following object is masked from 'package:flexplot':
## 
##     icc

library(heavy)

# Estima o vetor central e a matriz de dispersão assumindo que os dados vieram de uma distribuição multivariada de cauda pesada. Isso fornece algum grau de robustez para outliers sem fornecer um ponto de ruptura alto.
fit <- heavyFit(~ tmt_a + idade + escolaridade, data = df, family = Student(df = 4))
summary(fit)

## Estimation under multivariate heavy-tailed distributions
##  Data: df; Family: Student(df = 6.13372) 
## 
## Center:
##               Estimate Std.Error Z value p-value
## tmt_a        48.3682  27.5228    1.7574  0.0789
## idade        38.6864  25.7832    1.5005  0.1335
## escolaridade 10.0355   3.5626    2.8169  0.0048
## 
## Scatter matrix estimate:
##              tmt_a      idade      escolaridade
## tmt_a        621.429036                        
## idade        354.867533 545.356032             
## escolaridade -15.108398   8.133473  10.412028  
## 
## Number of Observations: 522
## Log-likelihood: -4707.324 on 9 degrees of freedom

# Modelo linear tradicional
fit.tmta <- lm(tmt_a ~ idade + escolaridade, data = df)
summary(fit.tmta)

## 
## Call:
## lm(formula = tmt_a ~ idade + escolaridade, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.526 -15.097  -3.785   8.480 117.991 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   44.0498     3.4430  12.794  < 2e-16 ***
## idade          0.7753     0.0437  17.743  < 2e-16 ***
## escolaridade  -2.2669     0.2863  -7.918 1.47e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.85 on 519 degrees of freedom
## Multiple R-squared:  0.4013, Adjusted R-squared:  0.399 
## F-statistic: 173.9 on 2 and 519 DF,  p-value: < 2.2e-16

# Modelo linear com distribuicao-t
fit.tmta1 <- heavyLm(tmt_a ~ idade + escolaridade, data = df, family = Student(df = 4))
summary(fit.tmta1)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 2.22383) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -36.9235  -9.5058   0.5408  13.4075 119.0527 
## 
## Coefficients:
##               Estimate Std.Error Z value p-value
## (Intercept)  42.4967   2.4296   17.4915  0.0000
## idade         0.5996   0.0308   19.4466  0.0000
## escolaridade -1.9543   0.2020   -9.6737  0.0000
## 
## Degrees of freedom: 522 total; 519 residual
## Scale estimate: 189.7424
## Log-likelihood: -2362.758 on 4 degrees of freedom

# Checa a distribuicao do modelo
check_distribution(fit.tmta1)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.28125
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.06250	0.03125
F	0.00000	0.00000
gamma	0.00000	0.06250
lognormal	0.00000	0.09375
neg. binomial (zero-infl.)	0.00000	0.53125
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.40625	0.00000
uniform	0.00000	0.00000
weibull	0.00000	0.00000

plot(check_distribution(fit.tmta1))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (idade ao quadrado)
fit.tmta2 <- heavyLm(tmt_a ~ idade + I(idade^2) + escolaridade, data = df, family = Student(df = 4))
summary(fit.tmta2)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 2.42305) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -51.1189  -9.5402   0.1469  12.7512 116.0328 
## 
## Coefficients:
##               Estimate Std.Error Z value  p-value 
## (Intercept)   61.3525   3.8386   15.9832   0.0000
## idade         -0.5281   0.1677   -3.1480   0.0016
## I(idade^2)     0.0133   0.0018    7.4628   0.0000
## escolaridade  -2.1453   0.2033  -10.5548   0.0000
## 
## Degrees of freedom: 522 total; 518 residual
## Scale estimate: 192.4479
## Log-likelihood: -2344.778 on 5 degrees of freedom

check_distribution(fit.tmta2)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.28125
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.06250	0.03125
F	0.00000	0.00000
gamma	0.00000	0.06250
lognormal	0.00000	0.09375
neg. binomial (zero-infl.)	0.00000	0.53125
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.40625	0.00000
uniform	0.00000	0.00000
weibull	0.00000	0.00000

plot(check_distribution(fit.tmta2))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.tmta3 <- heavyLm(tmt_a ~ idade + I(idade^2) + I(sqrt(escolaridade)), data = df, family = Student(df = 4))
summary(fit.tmta3)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 2.40837) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -50.093  -9.373   0.254  12.692 116.215 
## 
## Coefficients:
##                        Estimate Std.Error Z value  p-value 
## (Intercept)            82.3300   5.1898   15.8639   0.0000
## idade                  -0.5286   0.1675   -3.1553   0.0016
## I(idade^2)              0.0129   0.0018    7.2897   0.0000
## I(sqrt(escolaridade)) -13.4483   1.2345  -10.8935   0.0000
## 
## Degrees of freedom: 522 total; 518 residual
## Scale estimate: 190.7047
## Log-likelihood: -2343.874 on 5 degrees of freedom

check_distribution(fit.tmta3)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.28125
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.06250	0.03125
F	0.00000	0.00000
gamma	0.00000	0.06250
lognormal	0.00000	0.09375
neg. binomial (zero-infl.)	0.00000	0.53125
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.40625	0.00000
uniform	0.00000	0.00000
weibull	0.00000	0.00000

plot(check_distribution(fit.tmta3))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.tmta4 <- heavyLm(tmt_a ~ idade + I(escolaridade^3), data = df, family = Student(df = 4))
summary(fit.tmta4)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 2.32323) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -40.3894  -9.7171   0.4895  13.9554 118.6414 
## 
## Coefficients:
##                    Estimate Std.Error Z value p-value
## (Intercept)       26.2944   1.6244   16.1874  0.0000
## idade              0.6505   0.0329   19.7544  0.0000
## I(escolaridade^3) -0.0037   0.0005   -7.9339  0.0000
## 
## Degrees of freedom: 522 total; 519 residual
## Scale estimate: 205.8183
## Log-likelihood: -2372.694 on 4 degrees of freedom

check_distribution(fit.tmta4)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.28125
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.03125	0.03125
F	0.00000	0.00000
gamma	0.03125	0.06250
lognormal	0.00000	0.09375
neg. binomial (zero-infl.)	0.00000	0.53125
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.40625	0.00000
uniform	0.00000	0.00000
weibull	0.00000	0.00000

plot(check_distribution(fit.tmta4))

## Warning: Removed 5 rows containing missing values (geom_segment).

## Warning: Removed 5 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.tmta5 <- heavyLm(tmt_a ~ idade + I(idade^2) + I(escolaridade^3), data = df, family = Student(df = 4))
summary(fit.tmta5)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 2.47132) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -55.1241  -9.8731  -0.1095  12.6497 115.7711 
## 
## Coefficients:
##                    Estimate Std.Error Z value p-value
## (Intercept)       43.1601   3.2262   13.3781  0.0000
## idade             -0.4422   0.1716   -2.5767  0.0100
## I(idade^2)         0.0130   0.0018    7.1353  0.0000
## I(escolaridade^3) -0.0042   0.0005   -9.1246  0.0000
## 
## Degrees of freedom: 522 total; 518 residual
## Scale estimate: 205.1573
## Log-likelihood: -2356.754 on 5 degrees of freedom

check_distribution(fit.tmta5)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.28125
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.06250	0.03125
F	0.00000	0.00000
gamma	0.00000	0.06250
lognormal	0.00000	0.09375
neg. binomial (zero-infl.)	0.00000	0.53125
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.40625	0.00000
uniform	0.00000	0.00000
weibull	0.00000	0.00000

plot(check_distribution(fit.tmta5))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Encontra o modelo com menor residuo (Root Mean Squared Error)
b1 <- RMSE(fit.tmta$residuals)
b2 <- RMSE(fit.tmta1$residuals)
b3 <- RMSE(fit.tmta2$residuals) 
b4 <- RMSE(fit.tmta3$residuals)
b5 <- RMSE(fit.tmta4$residuals)
b6 <- RMSE(fit.tmta5$residuals)

# Encontra o modelo com menor residuo (Mean Absolute Error)
c1 <- MAE(fit.tmta$residuals)
c2 <- MAE(fit.tmta1$residuals)
c3 <- MAE(fit.tmta2$residuals)
c4 <- MAE(fit.tmta3$residuals)
c5 <- MAE(fit.tmta4$residuals)
c6 <- MAE(fit.tmta5$residuals)

knitr::kable(data.frame(Model = c("tmta.1","tmta.2","tmta.3","tmta.4","tmta.5","tmta.6"),
           RMSE = c(b1,b2,b3,b4,b5,b6),
           MAE = c(c1,c2,c3,c4,c5,c6)), digits = 2)

Model	RMSE	MAE
tmta.1	24.78	17.63
tmta.2	25.89	17.13
tmta.3	24.34	16.45
tmta.4	24.38	16.43
tmta.5	26.05	17.38
tmta.6	24.72	16.86

Regressoes TMT-B

# Estima o vetor central e a matriz de dispersão assumindo que os dados vieram de uma distribuição multivariada de cauda pesada. Isso fornece algum grau de robustez para outliers sem fornecer um ponto de ruptura alto.
fit <- heavyFit(~ tmt_b + idade + escolaridade, data = df, family = Student(df = 4))
summary(fit)

## Estimation under multivariate heavy-tailed distributions
##  Data: df; Family: Student(df = 36.2447) 
## 
## Center:
##               Estimate Std.Error Z value  p-value 
## tmt_b        117.7256  63.3026    1.8597   0.0629
## idade         42.7889  25.4806    1.6793   0.0931
## escolaridade  10.1015   3.7849    2.6689   0.0076
## 
## Scatter matrix estimate:
##              tmt_b      idade      escolaridade
## tmt_b        3812.90632                        
## idade         931.47365  617.77939             
## escolaridade  -47.45824   11.07252   13.63082  
## 
## Number of Observations: 522
## Log-likelihood: -4144.964 on 9 degrees of freedom

# Modelo linear tradicional
fit.tmtb <- lm(tmt_b ~ idade + escolaridade, data = df)
summary(fit.tmtb)

## 
## Call:
## lm(formula = tmt_b ~ idade + escolaridade, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -117.058  -31.119   -9.214   28.551  212.685 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  97.45051    6.61386  14.734   <2e-16 ***
## idade         1.60624    0.08394  19.136   <2e-16 ***
## escolaridade -4.68608    0.54996  -8.521   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 47.73 on 519 degrees of freedom
## Multiple R-squared:  0.438,  Adjusted R-squared:  0.4358 
## F-statistic: 202.2 on 2 and 519 DF,  p-value: < 2.2e-16

# Modelo linear com distribuicao-t
fit.tmtb1 <- heavyLm(tmt_b ~ idade + escolaridade, data = df, family = Student(df = 4))
summary(fit.tmtb1)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 9.75453) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -114.063  -28.849   -6.894   30.754  215.886 
## 
## Coefficients:
##               Estimate Std.Error Z value p-value
## (Intercept)  97.3058   6.4152   15.1680  0.0000
## idade         1.5923   0.0814   19.5582  0.0000
## escolaridade -4.8667   0.5334   -9.1231  0.0000
## 
## Degrees of freedom: 522 total; 519 residual
## Scale estimate: 1807.479
## Log-likelihood: -2752.89 on 4 degrees of freedom

check_distribution(fit.tmtb1)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.21875
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.00000	0.03125
F	0.06250	0.00000
gamma	0.03125	0.03125
lognormal	0.03125	0.12500
neg. binomial (zero-infl.)	0.06250	0.59375
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.25000	0.00000
uniform	0.00000	0.00000
weibull	0.03125	0.00000

plot(check_distribution(fit.tmtb1))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (idade ao quadrado)
fit.tmtb2 <- heavyLm(tmt_b ~ idade + I(idade^2) + escolaridade, data = df, family = Student(df = 4))
summary(fit.tmtb2)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 6.39897) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -119.48  -25.83   -6.63   31.40  225.42 
## 
## Coefficients:
##               Estimate Std.Error Z value  p-value 
## (Intercept)  144.1923   9.5837   15.0456   0.0000
## idade         -1.1474   0.4188   -2.7397   0.0061
## I(idade^2)     0.0299   0.0044    6.7383   0.0000
## escolaridade  -5.3098   0.5075  -10.4634   0.0000
## 
## Degrees of freedom: 522 total; 518 residual
## Scale estimate: 1496.113
## Log-likelihood: -2733.166 on 5 degrees of freedom

check_distribution(fit.tmtb2)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.21875
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.00000	0.03125
F	0.06250	0.00000
gamma	0.03125	0.03125
lognormal	0.03125	0.12500
neg. binomial (zero-infl.)	0.06250	0.59375
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.25000	0.00000
uniform	0.00000	0.00000
weibull	0.03125	0.00000

plot(check_distribution(fit.tmtb2))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.tmtb3 <- heavyLm(tmt_b ~ idade + I(idade^2) + I(sqrt(escolaridade)), data = df, family = Student(df = 4))
summary(fit.tmtb3)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 6.3069) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -115.462  -25.612   -6.896   29.829  223.346 
## 
## Coefficients:
##                        Estimate Std.Error Z value  p-value 
## (Intercept)           195.8998  12.9399   15.1392   0.0000
## idade                  -1.2359   0.4177   -2.9591   0.0031
## I(idade^2)              0.0303   0.0044    6.8354   0.0000
## I(sqrt(escolaridade)) -32.8568   3.0781  -10.6743   0.0000
## 
## Degrees of freedom: 522 total; 518 residual
## Scale estimate: 1476.977
## Log-likelihood: -2731.075 on 5 degrees of freedom

check_distribution(fit.tmtb3)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.21875
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.00000	0.03125
F	0.06250	0.00000
gamma	0.03125	0.03125
lognormal	0.03125	0.12500
neg. binomial (zero-infl.)	0.06250	0.59375
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.25000	0.00000
uniform	0.00000	0.00000
weibull	0.03125	0.00000

plot(check_distribution(fit.tmtb3))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.tmtb4 <- heavyLm(tmt_b ~ idade + I(escolaridade^3), data = df, family = Student(df = 4))
summary(fit.tmtb4)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 11.415) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -120.91  -30.46   -6.18   28.88  209.72 
## 
## Coefficients:
##                    Estimate Std.Error Z value p-value
## (Intercept)       57.6593   4.2761   13.4839  0.0000
## idade              1.6794   0.0867   19.3728  0.0000
## I(escolaridade^3) -0.0087   0.0012   -7.0990  0.0000
## 
## Degrees of freedom: 522 total; 519 residual
## Scale estimate: 1967.719
## Log-likelihood: -2766.942 on 4 degrees of freedom

check_distribution(fit.tmtb4)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.21875
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.03125	0.03125
F	0.06250	0.00000
gamma	0.03125	0.03125
lognormal	0.00000	0.12500
neg. binomial (zero-infl.)	0.06250	0.59375
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.25000	0.00000
uniform	0.00000	0.00000
weibull	0.03125	0.00000

plot(check_distribution(fit.tmtb4))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.tmtb5 <- heavyLm(tmt_b ~ idade + I(idade^2) + I(escolaridade^3), data = df, family = Student(df = 4))
summary(fit.tmtb5)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 7.35838) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -126.358  -28.227   -6.273   30.471  218.165 
## 
## Coefficients:
##                    Estimate Std.Error Z value p-value
## (Intercept)       96.3370   8.1916   11.7604  0.0000
## idade             -0.7536   0.4358   -1.7295  0.0837
## I(idade^2)         0.0267   0.0046    5.7746  0.0000
## I(escolaridade^3) -0.0096   0.0012   -8.1597  0.0000
## 
## Degrees of freedom: 522 total; 518 residual
## Scale estimate: 1682.2
## Log-likelihood: -2752.473 on 5 degrees of freedom

check_distribution(fit.tmtb5)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.21875
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.00000	0.03125
F	0.06250	0.00000
gamma	0.03125	0.03125
lognormal	0.03125	0.12500
neg. binomial (zero-infl.)	0.06250	0.59375
negative binomial	0.00000	0.00000
normal	0.53125	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.25000	0.00000
uniform	0.00000	0.00000
weibull	0.03125	0.00000

plot(check_distribution(fit.tmtb5))

## Warning: Removed 6 rows containing missing values (geom_segment).

## Warning: Removed 6 rows containing missing values (geom_point).

# Encontra o modelo com menor residuo (Root Mean Squared Error)
b1 <- RMSE(fit.tmtb$residuals)
b2 <- RMSE(fit.tmtb1$residuals)
b3 <- RMSE(fit.tmtb2$residuals) 
b4 <- RMSE(fit.tmtb3$residuals) # modelo vencedor
b5 <- RMSE(fit.tmtb4$residuals)
b6 <- RMSE(fit.tmtb5$residuals)

# Encontra o modelo com menor residuo (Mean Absolute Error)
c1 <- MAE(fit.tmtb$residuals)
c2 <- MAE(fit.tmtb1$residuals)
c3 <- MAE(fit.tmtb2$residuals)
c4 <- MAE(fit.tmtb3$residuals) # modelo vencedor
c5 <- MAE(fit.tmtb4$residuals)
c6 <- MAE(fit.tmtb5$residuals)

knitr::kable(data.frame(Model = c("tmtb.1","tmtb.2","tmtb.3","tmtb.4","tmtb.5","tmtb.6"),
           RMSE = c(b1,b2,b3,b4,b5,b6),
           MAE = c(c1,c2,c3,c4,c5,c6)), digits = 2)

Model	RMSE	MAE
tmtb.1	47.60	37.38
tmtb.2	47.67	37.04
tmtb.3	46.45	35.42
tmtb.4	46.28	35.32
tmtb.5	48.85	38.04
tmtb.6	48.00	37.05

Obviamente, podemos procurar por modelos alternativos, como os obtidos pelo pacote glmulti. Aqui tem um exemplo de sintaxe:

glmulti.lm.out <- glmulti(tmt_a ~ idade * I(idade^2) * escolaridade * I(sqrt(escolaridade)) * genero, data = df, level = 2, # No interaction considered method = “h”, # Exhaustive approach crit = “aic”, # AIC as criteria confsetsize = 5, # Keep 5 best models plotty = F, report = F, # No plot or interim reports fitfunction = “lm”) # lm function

Show 5 best models (Use @ instead of $ for an S4 object) glmulti.lm.out@formulas

Onde os resultados podem ser tmt_a ~ 1 + I(sqrt(escolaridade)) + I(idade^2):idade para o TMT-A, ou tmt_b ~ 1 + I(idade^2) + escolaridade + I(sqrt(escolaridade)) para o TMT-B.

Sintaxes robustas

## Robust statistics
library(WRS2)

## 
## Attaching package: 'WRS2'

## The following object is masked from 'package:flexplot':
## 
##     diet

# Function to calculate 20% trimmed mean
tmean <- function(x,tr=.2,na.rm=FALSE,STAND=NULL){
  if(na.rm)x<-x[!is.na(x)]
  val<-mean(x,tr)
  val
}

# Function to calculate 20% trimmed standard deviation (SD)
sd_trim <- function(x,trim=0.2, const=TRUE){
  # trimmed sd, where x is a matrix (column-wise)
  x <- as.matrix(x)
  if (const){
    if (trim==0.1){const <- 0.7892}
    else if (trim==0.2){const <- 0.6615}
    else {warning("Did you specify the correct consistency constant for trimming?")}
  }
  else{const <- 1}
  m <- apply(x,2,mean,trim)
  res <- x-rep(1,nrow(x))%*%t(m)
  qu <- apply(abs(res),2,quantile,1-trim)
  sdtrim <- apply(matrix(res[t(abs(t(res))<=qu)]^2,ncol=ncol(x),byrow=FALSE),2,sum)
  sdtrim <- sqrt(sdtrim/((nrow(x)*(1-trim)-1)))/const
  return(sdtrim)
}

Árvores de decisão (TMT-A)

# Ver isto http://melanie-roethlisberger.ch/wp-content/uploads/2018/01/01_datives_Conditional_random_forest.R
library(caret)

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following objects are masked _by_ '.GlobalEnv':
## 
##     MAE, RMSE

## The following object is masked from 'package:purrr':
## 
##     lift

library(partykit)

## Loading required package: grid

## Loading required package: libcoin

## Loading required package: mvtnorm

detach("package:partykit", unload=TRUE)
library(party)

## Loading required package: modeltools

## Loading required package: stats4

## Loading required package: strucchange

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## 
## Attaching package: 'strucchange'

## The following object is masked from 'package:stringr':
## 
##     boundary

# Análise exploratória
plot(ctree(tmt_a ~ idade + escolaridade + genero, data = compl.tmt))

# Árvores condicionais com 5-fold cross-validation
set.seed(3456)
model <- train(
  tmt_a ~ idade + escolaridade,
  data = compl.tmt, 
  method = 'ctree2',
  trControl = trainControl("cv", number = 5, classProbs = FALSE),
  tuneGrid = expand.grid(maxdepth = 3, mincriterion = 0.95)
)
plot(model$finalModel)

t(sapply(unique(where(model$finalModel)), function(x) {
  n <- nodes(model$finalModel, x)[[1]]
  tmt_a <- compl.tmt[as.logical(n$weights), "tmt_a"]
  cbind.data.frame("Node" = as.integer(x),
                   "T.Mean" = mean(tmt_a, trim = 0.2),
                   "T.SD" = sd_trim(tmt_a),
                   psych::describe(tmt_a, quant=c(.25,.50,.75), skew = FALSE))
}))

##      Node T.Mean   T.SD     vars n   mean     sd       min max range se      
## [1,] 5    35.60465 11.89395 1    286 39.18182 18.60565 15  151 136   1.100175
## [2,] 8    50.65385 17.55044 1    84  53.25    18.5556  19  105 86    2.024582
## [3,] 11   66.43333 35.84761 1    50  78.76    52.48928 20  267 247   7.423106
## [4,] 10   112.5179 70.62699 1    92  120.6413 65.6244  31  381 350   6.841817
## [5,] 7    72.16667 24.51797 1    38  77.39474 29.06334 41  169 128   4.714697
## [6,] 4    67.5     23.06892 1    8   83       56.93104 41  218 177   20.12816
##      Q0.25 Q0.5  Q0.75
## [1,] 28    35    44   
## [2,] 41    47.5  65.25
## [3,] 49    61.5  101  
## [4,] 70    102.5 165  
## [5,] 56    71    87.25
## [6,] 55.25 62    82.5

# Obtenção das médias, medianas, DP e SSEs das árvores condicionais
tree <- party::ctree(tmt_a ~ idade + escolaridade, data = compl.tmt)

t(sapply(unique(where(tree)), function(x) {
    n <- nodes(tree, x)[[1]]
    tmt_a <- compl.tmt[as.logical(n$weights), "tmt_a"]
    cbind.data.frame("Node" = as.integer(x),
                     psych::describe(tmt_a, quant=c(.25,.50,.75), skew = FALSE),
                     "SSE" = sum((tmt_a - mean(tmt_a))^2))  
}))

##      Node vars n   mean     sd       min max range se       Q0.25 Q0.5  Q0.75
## [1,] 6    1    167 41.8024  20.85528 17  151 134   1.61383  31    36    48   
## [2,] 7    1    119 35.5042  14.17259 15  90  75    1.299199 26.5  32    40.5 
## [3,] 11   1    72  55.59722 18.75765 19  105 86    2.210611 42.75 50    66   
## [4,] 15   1    50  78.76    52.48928 20  267 247   7.423106 49    61.5  101  
## [5,] 14   1    92  120.6413 65.6244  31  381 350   6.841817 70    102.5 165  
## [6,] 9    1    38  77.39474 29.06334 41  169 128   4.714697 56    71    87.25
## [7,] 4    1    8   83       56.93104 41  218 177   20.12816 55.25 62    82.5 
## [8,] 12   1    12  39.16667 8.632216 26  50  24    2.491906 32.5  41    46.25
##      SSE     
## [1,] 72200.48
## [2,] 23701.75
## [3,] 24981.32
## [4,] 135001.1
## [5,] 391897.2
## [6,] 31253.08
## [7,] 22688   
## [8,] 819.6667

# Calcula os valores p dentro de cada node
terNodes <- unique(where(tree))
setdiff(1:max(terNodes), terNodes)

## [1]  1  2  3  5  8 10 13

sapply(setdiff(1:max(terNodes), terNodes), function(x) {
          n <- nodes(tree, x)[[1]]
          pvalue <- 1 - nodes(tree, x)[[1]]$criterion$maxcriterion
          plab <- ifelse(pvalue < 10^(-3),
                         paste("p <", 10^(-3)),
                         paste("p =", round(pvalue, digits = 3)))
          c("Node" = x, "P-value" = plab)
})

##         [,1]        [,2]        [,3]        [,4]        [,5]        [,6]       
## Node    "1"         "2"         "3"         "5"         "8"         "10"       
## P-value "p < 0.001" "p < 0.001" "p < 0.001" "p = 0.012" "p < 0.001" "p = 0.003"
##         [,7]       
## Node    "13"       
## P-value "p < 0.001"

Árvores de decisão (TMT-B)

library(partykit)

## 
## Attaching package: 'partykit'

## The following objects are masked from 'package:party':
## 
##     cforest, ctree, ctree_control, edge_simple, mob, mob_control,
##     node_barplot, node_bivplot, node_boxplot, node_inner, node_surv,
##     node_terminal, varimp

detach("package:partykit", unload=TRUE)
library(party)
# Análise exploratória
plot(ctree(tmt_b ~ idade + escolaridade + genero, data = compl.tmt))

# Árvores condicionais com 5-fold cross-validation
set.seed(3456)
model <- train(
  tmt_b ~ idade + escolaridade,
  data = compl.tmt, 
  method = 'ctree2',
  trControl = trainControl("cv", number = 5, classProbs = FALSE),
  tuneGrid = expand.grid(maxdepth = 3, mincriterion = 0.95)
)
plot(model$finalModel)

t(sapply(unique(where(model$finalModel)), function(x) {
  n <- nodes(model$finalModel, x)[[1]]
  tmt_b <- compl.tmt[as.logical(n$weights), "tmt_b"]
  cbind.data.frame("Node" = as.integer(x),
                   "T.Mean" = mean(tmt_b, trim = 0.2),
                   "T.SD" = sd_trim(tmt_b),
                   psych::describe(tmt_b, quant=c(.25,.50,.75), skew = FALSE))
}))

##      Node T.Mean   T.SD     vars n   mean     sd       min max  range se      
## [1,] 5    78.12805 32.30328 1    272 85.12868 37.03736 13  280  267   2.24572 
## [2,] 4    133.7143 65.01829 1    22  146.6364 72.27365 56  300  244   15.40879
## [3,] 8    109.5    41.85892 1    62  114.1452 44.96514 42  266  224   5.710579
## [4,] 7    167.1389 74.50546 1    60  175.0167 70.4731  47  300  253   9.098038
## [5,] 11   195.5156 110.8798 1    106 205.0849 99.59471 54  587  533   9.673494
## [6,] 10   230.8182 81.14701 1    36  282.4444 229.0394 105 1404 1299  38.17324
##      Q0.25  Q0.5  Q0.75 
## [1,] 58     75    105   
## [2,] 93     139.5 184.25
## [3,] 78.75  109   138.75
## [4,] 122.25 157   231.5 
## [5,] 120.25 193   281.5 
## [6,] 180    226   298.5

# Obtenção das médias, medianas, DP e SSEs das árvores condicionais
tree <- party::ctree(tmt_b ~ idade + escolaridade, data = compl.tmt)

t(sapply(unique(where(tree)), function(x) {
    n <- nodes(tree, x)[[1]]
    tmt_b <- compl.tmt[as.logical(n$weights), "tmt_b"]
    cbind.data.frame("Node" = as.integer(x),
                     psych::describe(tmt_b, quant=c(.25,.50,.75), skew = FALSE),
                     "SSE" = sum((tmt_b - mean(tmt_b))^2))  
}))

##      Node vars n   mean     sd       min max  range se       Q0.25  Q0.5 
## [1,] 6    1    193 79.29016 32.01707 13  180  167   2.304639 57     69   
## [2,] 4    1    22  146.6364 72.27365 56  300  244   15.40879 93     139.5
## [3,] 10   1    62  114.1452 44.96514 42  266  224   5.710579 78.75  109  
## [4,] 9    1    60  175.0167 70.4731  47  300  253   9.098038 122.25 157  
## [5,] 13   1    106 205.0849 99.59471 54  587  533   9.673494 120.25 193  
## [6,] 12   1    36  282.4444 229.0394 105 1404 1299  38.17324 180    226  
## [7,] 7    1    79  99.39241 44.18476 27  280  253   4.971174 63     95   
##      Q0.75  SSE     
## [1,] 97     196817.8
## [2,] 184.25 109693.1
## [3,] 138.75 123333.7
## [4,] 231.5  293021  
## [5,] 281.5  1041506 
## [6,] 298.5  1836067 
## [7,] 131.5  152278.8

# Calcula os valores p dentro de cada node
terNodes <- unique(where(tree))
setdiff(1:max(terNodes), terNodes)

## [1]  1  2  3  5  8 11

sapply(setdiff(1:max(terNodes), terNodes), function(x) {
          n <- nodes(tree, x)[[1]]
          pvalue <- 1 - nodes(tree, x)[[1]]$criterion$maxcriterion
          plab <- ifelse(pvalue < 10^(-3),
                         paste("p <", 10^(-3)),
                         paste("p =", round(pvalue, digits = 3)))
          c("Node" = x, "P-value" = plab)
})

##         [,1]        [,2]        [,3]        [,4]       [,5]        [,6]       
## Node    "1"         "2"         "3"         "5"        "8"         "11"       
## P-value "p < 0.001" "p < 0.001" "p < 0.001" "p = 0.01" "p < 0.001" "p = 0.011"

Teste FPT (5-Pontos)

knitr::kable(psych::describe(fpt, skew = F, quant=c(.1,.25,.5,.75,.90)), digits = 1)

	vars	n	mean	sd	min	max	range	se	Q0.1	Q0.25	Q0.5	Q0.75	Q0.9
id*	1	558	275.3	159.2	1	551	550	6.7	54.7	137.2	275.5	412.8	495.3
idade	2	555	44.9	25.4	6	89	83	1.1	15.0	18.0	44.0	70.0	80.0
escolaridade	3	556	10.0	3.9	1	26	25	0.2	4.5	8.0	10.0	12.0	15.0
genero	4	558	1.5	0.5	1	2	1	0.0	1.0	1.0	2.0	2.0	2.0
fpt_cinco_pontos_total	5	558	19.9	9.2	0	46	46	0.4	8.0	13.0	19.0	26.0	33.0
fpt_cinco_pontos_persev	6	558	2.9	4.5	0	36	36	0.2	0.0	0.0	2.0	4.0	7.0

Analises graficas (FPT)

library(flexplot)

# Cria o banco para as analises
df = fpt[-1]

# Relacoes univariadas (TMT-A)
flexplot(fpt_cinco_pontos_total ~ idade, data=df, method = 'loess', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Tem um efeito quadratico para a idade
flexplot(fpt_cinco_pontos_total ~ idade, data=df, method = 'quadratic', se=T)

flexplot(fpt_cinco_pontos_total ~ escolaridade, data=df, method = 'loess', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Tem um efeito cubico para a escolaridade
flexplot(fpt_cinco_pontos_total ~ escolaridade, data=df, method = 'cubic', se=T)

# Relacoes multivariadas
flexplot(fpt_cinco_pontos_total ~ idade + escolaridade | genero, data=df, method = 'lm')

## `geom_smooth()` using formula 'y ~ x'

# Com IC-95%
flexplot(fpt_cinco_pontos_total ~ idade + escolaridade | genero, data=df, method = 'lm', se=T)

## `geom_smooth()` using formula 'y ~ x'

# Modelo com interacao
full.mod <- lm(fpt_cinco_pontos_total ~ idade * escolaridade, data=df)
full.mod

## 
## Call:
## lm(formula = fpt_cinco_pontos_total ~ idade * escolaridade, data = df)
## 
## Coefficients:
##        (Intercept)               idade        escolaridade  idade:escolaridade  
##          23.438212           -0.222555            0.307970            0.007351

#estimates(full.mod)

# Modelo sem interacao
reduced.mod = lm(fpt_cinco_pontos_total ~ idade + escolaridade, data=df)
reduced.mod

## 
## Call:
## lm(formula = fpt_cinco_pontos_total ~ idade + escolaridade, data = df)
## 
## Coefficients:
##  (Intercept)         idade  escolaridade  
##      18.9070       -0.1479        0.7622

#estimates(reduced.mod)

visualize(full.mod)

model.comparison(full.mod, reduced.mod)

## $statistics
##                  aic      bic bayes.factor p.value r.squared
## full.mod    3871.348 3892.925        0.165   0.101     0.251
## reduced.mod 3872.059 3889.320        6.065             0.247
## 
## $pred.difference
##    0%   25%   50%   75%  100% 
## 0.000 0.105 0.327 0.675 2.004

Regressoes FPT

fit <- heavyFit(~ fpt_cinco_pontos_total + idade + escolaridade, data = df, family = Student(df = 4))
summary(fit)

## Estimation under multivariate heavy-tailed distributions
##  Data: df; Family: Student(df = 131.698) 
## 
## Center:
##                         Estimate Std.Error Z value p-value
## fpt_cinco_pontos_total 19.9239   9.1904    2.1679  0.0302
## idade                  44.6760  25.5216    1.7505  0.0800
## escolaridade           10.0198   3.8863    2.5782  0.0099
## 
## Scatter matrix estimate:
##                        fpt_cinco_pontos_total idade      escolaridade
## fpt_cinco_pontos_total  83.227268                                    
## idade                  -88.263277             641.821757             
## escolaridade            10.013815               8.799284  14.882438  
## 
## Number of Observations: 553
## Log-likelihood: -2708.326 on 9 degrees of freedom

# Modelo linear tradicional
fit.fpt <- lm(fpt_cinco_pontos_total ~ idade + escolaridade, data = df)
summary(fit.fpt)

## 
## Call:
## lm(formula = fpt_cinco_pontos_total ~ idade + escolaridade, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.1628  -5.3693  -0.6611   4.5291  23.4323 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.90703    1.07417  17.602   <2e-16 ***
## idade        -0.14790    0.01344 -11.005   <2e-16 ***
## escolaridade  0.76222    0.08763   8.698   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.985 on 550 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.2473, Adjusted R-squared:  0.2446 
## F-statistic: 90.35 on 2 and 550 DF,  p-value: < 2.2e-16

# Modelo linear com distribuicao-t
fit.fpt1 <- heavyLm(fpt_cinco_pontos_total ~ idade + escolaridade, data = df, family = Student(df = 4))
summary(fit.fpt1)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 25.3032) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.0884  -5.2776  -0.5808   4.6559  23.5201 
## 
## Coefficients:
##               Estimate Std.Error Z value  p-value 
## (Intercept)   18.7998   1.0669   17.6203   0.0000
## idade         -0.1483   0.0133  -11.1091   0.0000
## escolaridade   0.7661   0.0870    8.8020   0.0000
## 
## Degrees of freedom: 553 total; 550 residual
## Scale estimate: 58.4532
## Log-likelihood: -1931.599 on 4 degrees of freedom

# Checa a distribuicao do modelo
check_distribution(fit.fpt1)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.78125
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.00000	0.00000
F	0.00000	0.00000
gamma	0.00000	0.00000
lognormal	0.00000	0.00000
neg. binomial (zero-infl.)	0.00000	0.21875
negative binomial	0.00000	0.00000
normal	0.71875	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.28125	0.00000
uniform	0.00000	0.00000
weibull	0.00000	0.00000

plot(check_distribution(fit.fpt1))

## Warning: Removed 4 rows containing missing values (geom_segment).

## Warning: Removed 4 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (idade ao quadrado)
fit.fpt2 <- heavyLm(fpt_cinco_pontos_total ~ idade + I(idade^2) + escolaridade, data = df, family = Student(df = 4))
summary(fit.fpt2)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 32.3574) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.2672  -4.8687  -0.2156   4.7167  22.1777 
## 
## Coefficients:
##               Estimate Std.Error Z value p-value
## (Intercept)  11.6795   1.6470    7.0914  0.0000
## idade         0.2518   0.0732    3.4381  0.0006
## I(idade^2)   -0.0043   0.0008   -5.5365  0.0000
## escolaridade  0.8275   0.0854    9.6846  0.0000
## 
## Degrees of freedom: 553 total; 549 residual
## Scale estimate: 56.35343
## Log-likelihood: -1916.636 on 5 degrees of freedom

check_distribution(fit.fpt2)

Distribution	p_Residuals	p_Response
bernoulli	0.0000	0.00000
beta	0.0000	0.00000
beta-binomial	0.0000	0.78125
binomial	0.0000	0.00000
chi	0.0000	0.00000
exponential	0.0000	0.00000
F	0.0000	0.00000
gamma	0.0000	0.00000
lognormal	0.0000	0.00000
neg. binomial (zero-infl.)	0.0000	0.21875
negative binomial	0.0000	0.00000
normal	0.6875	0.00000
pareto	0.0000	0.00000
poisson	0.0000	0.00000
poisson (zero-infl.)	0.0000	0.00000
tweedie	0.3125	0.00000
uniform	0.0000	0.00000
weibull	0.0000	0.00000

plot(check_distribution(fit.fpt2))

## Warning: Removed 4 rows containing missing values (geom_segment).

## Warning: Removed 4 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.fpt3 <- heavyLm(fpt_cinco_pontos_total ~ idade + I(idade^2) + I(sqrt(escolaridade)), data = df, family = Student(df = 4))
summary(fit.fpt3)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 48.967) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.9636  -4.9298  -0.2239   4.8150  22.1624 
## 
## Coefficients:
##                        Estimate Std.Error Z value p-value
## (Intercept)            4.4226   2.2079    2.0031  0.0452
## idade                  0.2634   0.0737    3.5734  0.0004
## I(idade^2)            -0.0043   0.0008   -5.5486  0.0000
## I(sqrt(escolaridade))  4.8806   0.5186    9.4104  0.0000
## 
## Degrees of freedom: 553 total; 549 residual
## Scale estimate: 57.95441
## Log-likelihood: -1918.491 on 5 degrees of freedom

check_distribution(fit.fpt3)

Distribution	p_Residuals	p_Response
bernoulli	0.0000	0.00000
beta	0.0000	0.00000
beta-binomial	0.0000	0.78125
binomial	0.0000	0.00000
chi	0.0000	0.00000
exponential	0.0000	0.00000
F	0.0000	0.00000
gamma	0.0000	0.00000
lognormal	0.0000	0.00000
neg. binomial (zero-infl.)	0.0000	0.21875
negative binomial	0.0000	0.00000
normal	0.6875	0.00000
pareto	0.0000	0.00000
poisson	0.0000	0.00000
poisson (zero-infl.)	0.0000	0.00000
tweedie	0.3125	0.00000
uniform	0.0000	0.00000
weibull	0.0000	0.00000

plot(check_distribution(fit.fpt3))

## Warning: Removed 4 rows containing missing values (geom_segment).

## Warning: Removed 4 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.fpt4 <- heavyLm(fpt_cinco_pontos_total ~ idade + I(escolaridade^3), data = df, family = Student(df = 4))
summary(fit.fpt4)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 17.5691) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.783  -5.473  -0.477   4.670  24.744 
## 
## Coefficients:
##                    Estimate Std.Error Z value  p-value 
## (Intercept)        24.7752   0.6985   35.4705   0.0000
## idade              -0.1679   0.0137  -12.2207   0.0000
## I(escolaridade^3)   0.0017   0.0002    8.5586   0.0000
## 
## Degrees of freedom: 553 total; 550 residual
## Scale estimate: 56.7942
## Log-likelihood: -1933.481 on 4 degrees of freedom

check_distribution(fit.fpt4)

Distribution	p_Residuals	p_Response
bernoulli	0.00000	0.00000
beta	0.00000	0.00000
beta-binomial	0.00000	0.78125
binomial	0.00000	0.00000
chi	0.00000	0.00000
exponential	0.00000	0.00000
F	0.00000	0.00000
gamma	0.00000	0.00000
lognormal	0.00000	0.00000
neg. binomial (zero-infl.)	0.00000	0.21875
negative binomial	0.00000	0.00000
normal	0.65625	0.00000
pareto	0.00000	0.00000
poisson	0.00000	0.00000
poisson (zero-infl.)	0.00000	0.00000
tweedie	0.34375	0.00000
uniform	0.00000	0.00000
weibull	0.00000	0.00000

plot(check_distribution(fit.fpt4))

## Warning: Removed 4 rows containing missing values (geom_segment).

## Warning: Removed 4 rows containing missing values (geom_point).

# Modelo quadratico com distribuicao-t (raiz quadrada da escolaridade)
fit.fpt5 <- heavyLm(fpt_cinco_pontos_total ~ idade + I(idade^2) + I(escolaridade^3), data = df, family = Student(df = 4))
summary(fit.fpt5)

## Linear model under heavy-tailed distributions
##  Data: df; Family: Student(df = 35.9384) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.4335  -5.1236  -0.2857   4.9079  22.9268 
## 
## Coefficients:
##                    Estimate Std.Error Z value p-value
## (Intercept)       18.9673   1.3791   13.7531  0.0000
## idade              0.1874   0.0736    2.5466  0.0109
## I(idade^2)        -0.0038   0.0008   -4.8928  0.0000
## I(escolaridade^3)  0.0018   0.0002    8.9888  0.0000
## 
## Degrees of freedom: 553 total; 549 residual
## Scale estimate: 57.80182
## Log-likelihood: -1921.923 on 5 degrees of freedom

check_distribution(fit.fpt5)

Distribution	p_Residuals	p_Response
bernoulli	0.0000	0.00000
beta	0.0000	0.00000
beta-binomial	0.0000	0.78125
binomial	0.0000	0.00000
chi	0.0000	0.00000
exponential	0.0000	0.00000
F	0.0000	0.00000
gamma	0.0000	0.00000
lognormal	0.0000	0.00000
neg. binomial (zero-infl.)	0.0000	0.21875
negative binomial	0.0000	0.00000
normal	0.6875	0.00000
pareto	0.0000	0.00000
poisson	0.0000	0.00000
poisson (zero-infl.)	0.0000	0.00000
tweedie	0.3125	0.00000
uniform	0.0000	0.00000
weibull	0.0000	0.00000

plot(check_distribution(fit.fpt5))

## Warning: Removed 4 rows containing missing values (geom_segment).

## Warning: Removed 4 rows containing missing values (geom_point).

# Encontra o modelo com menor residuo (Root Mean Squared Error)
b1 <- RMSE(fit.fpt$residuals)
b2 <- RMSE(fit.fpt1$residuals)
b3 <- RMSE(fit.fpt2$residuals) # modelo vencedor
b4 <- RMSE(fit.fpt3$residuals) 
b5 <- RMSE(fit.fpt4$residuals)
b6 <- RMSE(fit.fpt5$residuals)

# Encontra o modelo com menor residuo (Mean Absolute Error)
c1 <- MAE(fit.fpt$residuals)
c2 <- MAE(fit.fpt1$residuals)
c3 <- MAE(fit.fpt2$residuals) # modelo vencedor
c4 <- MAE(fit.fpt3$residuals) 
c5 <- MAE(fit.fpt4$residuals)
c6 <- MAE(fit.fpt5$residuals)

knitr::kable(data.frame(Model = c("fpt.1","fpt.2","fpt.3","fpt.4","fpt.5","fpt.6"),
           RMSE = c(b1,b2,b3,b4,b5,b6),
           MAE = c(c1,c2,c3,c4,c5,c6)), digits = 2)

Model	RMSE	MAE
fpt.1	7.96	6.22
fpt.2	7.96	6.22
fpt.3	7.75	6.04
fpt.4	7.77	6.09
fpt.5	8.00	6.22
fpt.6	7.82	6.10

Árvores de decisão (FPT)

# Remove NAs
fpt.na <- na.omit(fpt)

library(partykit)

## 
## Attaching package: 'partykit'

## The following objects are masked from 'package:party':
## 
##     cforest, ctree, ctree_control, edge_simple, mob, mob_control,
##     node_barplot, node_bivplot, node_boxplot, node_inner, node_surv,
##     node_terminal, varimp

detach("package:partykit", unload=TRUE)
library(party)
# Análise exploratória
plot(ctree(fpt_cinco_pontos_total ~ idade + escolaridade + genero, data = fpt.na))

# Árvores condicionais com 5-fold cross-validation
set.seed(3456)
model <- train(
  fpt_cinco_pontos_total ~ idade + escolaridade,
  data = fpt.na, 
  method = 'ctree2',
  trControl = trainControl("cv", number = 5, classProbs = FALSE),
  tuneGrid = expand.grid(maxdepth = 2, mincriterion = 0.95)
)
plot(model$finalModel)

#t(sapply(unique(where(model$finalModel)), function(x) {
#  n <- nodes(model$finalModel, x)[[1]]
#  fpt_cinco_pontos_total <- fpt.na[as.logical(n$weights), "fpt_cinco_pontos_total"]
#  cbind.data.frame("Node" = as.integer(x),
#                   "T.Mean" = mean(fpt_cinco_pontos_total, trim = 0.2),
#                   "T.SD" = sd_trim(fpt_cinco_pontos_total),
#                   psych::describe(fpt_cinco_pontos_total, quant=c(.25,.50,.75), skew = FALSE))
#}))


# Obtenção das médias, medianas, DP e SSEs das árvores condicionais
tree <- party::ctree(fpt_cinco_pontos_total ~ idade + escolaridade, data = fpt.na)

#t(sapply(unique(where(tree)), function(x) {
#    n <- nodes(tree, x)[[1]]
#    fpt_cinco_pontos_total <- fpt.na[as.logical(n$weights), "fpt_cinco_pontos_total"]
#    cbind.data.frame("Node" = as.integer(x),
#                     psych::describe(fpt_cinco_pontos_total, quant=c(.25,.50,.75), skew = FALSE),
#                     "SSE" = sum((fpt_cinco_pontos_total - mean(fpt_cinco_pontos_total))^2))  
#}))

# Calcula os valores p dentro de cada node
terNodes <- unique(where(tree))
setdiff(1:max(terNodes), terNodes)

## [1]  1  2  3  6  8 11

sapply(setdiff(1:max(terNodes), terNodes), function(x) {
          n <- nodes(tree, x)[[1]]
          pvalue <- 1 - nodes(tree, x)[[1]]$criterion$maxcriterion
          plab <- ifelse(pvalue < 10^(-3),
                         paste("p <", 10^(-3)),
                         paste("p =", round(pvalue, digits = 3)))
          c("Node" = x, "P-value" = plab)
})

##         [,1]        [,2]        [,3]        [,4]        [,5]        [,6]       
## Node    "1"         "2"         "3"         "6"         "8"         "11"       
## P-value "p < 0.001" "p < 0.001" "p = 0.005" "p < 0.001" "p = 0.006" "p < 0.001"

PD-CRS 2021

Pedro Brandão & Danilo Pereira

03/03/2021