Análisis estadístico con beneficio clínico sustancial como variable respuesta

Para valorar la normalidad de la muestra se utilizó el test estadístico de Shapiro-Wilk y la evaluación gráfica mediante gráficos cuantil-cuantil, histogramas y gráficos de cajas y bigotes. Para la valoración del supuesto de homocedasticidad se utilizó la prueba de Levene. Las comparaciones entre los grupos con y sin beneficio clínico sustancial, en cuanto a las variables numéricas continuas se realizaron mediante los Test-T, Test de Welch, Test de Mann Whitney o el Test de la mediana, según cumplimiento de supuestos. En cuanto a las variables categóricas, se utilizó el test Chi-cuadrado o el Test Exacto de Fisher, según correspondiera.

Con el objetivo de identificar los predictores de beneficio clínico sustancial, se ajustó un modelo de regresión logística con variables predictoras basalescomo variables independientes y la presencia de beneficio clínico sustancial como variable de respuesta, definida como puntaje de la escala GROC > o = a +5. La relación entre la variable de resultado y la exposición se determinó de forma inicial mediante la realización de análisis de regresión simple. De dicho análisis, se seleccionaron para el análisis de regresión múltiple aquellas consideradas de relevancia clínica por parte de los autores y/o que presentaran un p-valor < 0.2 en el análisis de regresión simple y que presenten relevancia clínica.

Para facilitar la interpretación del Odds Ratio (OR), se utilizó la función “cutpointr” para elegir el mejor punto de corte para su categorización dicotómica. Se seleccionó el modelo con menor valor de criterio de información de Akaike.

La bondad de ajuste del modelo se analizó mediante el test de Hosmer-Lemeshow. Un valor de p<0.05 se interpretó como indicador de que el modelo no se ajustó satisfactoriamente a los datos. Para establecer el nivel de presición se utilizó el área bajo la curva (AUC). El AUC presenta un rango de 0 a 1 (un AUC < 0.5 indica que el desempeño del modelo es peor que el azar, mientras que un AUC = 1 indica desempeño perfecto). Áreas bajo la curva >0.7 y >0.9 se consideran desempeños aceptables y excelentes respectivamente. Asi mismo, se calculo la sensibilidad y especificidad del modelo final.

Como medida de asociación se reportaron los OR’s con sus correspondientes intervalos de confianza al 95%.

Para todos los análisis se utilizó el programa R versión 4.2.1.20.* Se consideró como estadísticamente significativo un p ≤ 0,05.

** Kuhn M. caret: Classification and Regression Training. R package. Disponible en: https://CRAN.R-project.org/package=caret. * R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/.

Análisis en subgrupos BCS

Evaluación de la distribución de cada variable para determinar como reportarla y con qué estadístico compararlas en base al puntaje GROC.

Tabla comparativa de pacientes con y sin BCS

labels(df_comp) <- c(Edad="Edad (años)", Nacionalidad = "Nacionalidad", nac_cat = "Nacionalidad categorizada", Sexo = "Sexo", Ocupacion = "Ocupación", Estudios = "Nivel de estudios", Estado_civil = "Estado civil", Diagnostico = "Diagnóstico médico", Duracion = "Duración del síntoma", T1_EVA = "Dolor al ingreso", T1_PCS = "Catastrofismo al ingreso", T1_IDB = "Índice de depresión de Beck al ingreso", T1_RM = "Roland Morris al ingreso", Indice1 = "EuroQoL al ingreso", T1_Analgesicos = "Uso de analgésicos")

tablauno <- tableby(GROC~medtest(Edad, "median", "q1q3")+
                      chisq(Sexo)+
                      fe(Nacionalidad)+
                      chisq(nac_cat)+
                      fe(Ocupacion)+
                      fe(Estudios)+
                      fe(Estado_civil)+
                      fe(Diagnostico)+
                      medtest(Duracion, "median", "q1q3")+
                      medtest(T1_PCS, "median", "q1q3")+
                      medtest(T1_EVA, "median", "q1q3")+
                      medtest(T1_IDB, "median", "q1q3")+
                      T1_RM+
                      chisq(T1_Analgesicos)+
                      Indice1, 
                    data = df_comp, 
                    control=tableby.control(stats.labels = list(meansd= "Media (DS)", median="Mediana", range="Rango", q1q3 = "RIQ"),
                    test.pname = "p-valor", digits=2 ))

summary(tablauno, text=T, pfootnote = T) %>%
  kbl(caption = "Tabla 1: Valores basales por grupo")%>%
  footnote(
    number=c("Test de la Mediana", "Test de Chi Cuadrado", "Test Exacto de Fisher", "Test-T"),
    general_title = "Referencias: DS: Desvío standard; RIQ: Rango intercuartílico")%>%
                      kable_styling()

Tabla 1: Valores basales por grupo
	0 (N=49)	1 (N=53)	Total (N=102)	p-valor
Edad (años)				0.844 (1)
Mediana	48.00	46.00	46.50
RIQ	31.00, 57.00	36.00, 54.00	34.50, 55.75
Sexo				0.639 (2)
Femenino	34 (69.4%)	39 (73.6%)	73 (71.6%)
Masculino	15 (30.6%)	14 (26.4%)	29 (28.4%)
Nacionalidad				0.769 (3)
Argentina	25 (51.0%)	34 (64.2%)	59 (57.8%)
Boliviana	5 (10.2%)	5 (9.4%)	10 (9.8%)
Brasilera	1 (2.0%)	0 (0.0%)	1 (1.0%)
Colombiana	0 (0.0%)	1 (1.9%)	1 (1.0%)
Iraní	1 (2.0%)	0 (0.0%)	1 (1.0%)
Paraguaya	5 (10.2%)	5 (9.4%)	10 (9.8%)
Peruana	9 (18.4%)	6 (11.3%)	15 (14.7%)
Uruguaya	1 (2.0%)	0 (0.0%)	1 (1.0%)
Venezolana	2 (4.1%)	2 (3.8%)	4 (3.9%)
Nacionalidad categorizada				0.180 (2)
Media (DS)	0.51 (0.51)	0.64 (0.48)	0.58 (0.50)
Rango	0.00 - 1.00	0.00 - 1.00	0.00 - 1.00
Ocupación				0.881 (3)
Ama de casa	7 (14.3%)	10 (18.9%)	17 (16.7%)
Comerciante	9 (18.4%)	10 (18.9%)	19 (18.6%)
Construcción	2 (4.1%)	5 (9.4%)	7 (6.9%)
Costura	3 (6.1%)	2 (3.8%)	5 (4.9%)
Desocupado	6 (12.2%)	7 (13.2%)	13 (12.7%)
Empleada doméstica	10 (20.4%)	10 (18.9%)	20 (19.6%)
Estudiante	2 (4.1%)	3 (5.7%)	5 (4.9%)
Otros	10 (20.4%)	6 (11.3%)	16 (15.7%)
Nivel de estudios				0.413 (3)
Primaria completa	9 (18.4%)	9 (17.0%)	18 (17.6%)
Primaria incompleta	3 (6.1%)	4 (7.5%)	7 (6.9%)
Secundaria completa	17 (34.7%)	25 (47.2%)	42 (41.2%)
Secundaria incompleta	5 (10.2%)	7 (13.2%)	12 (11.8%)
Terciario/Universitario	15 (30.6%)	8 (15.1%)	23 (22.5%)
Estado civil				0.008 (3)
Casado	15 (30.6%)	12 (22.6%)	27 (26.5%)
Conviviente	6 (12.2%)	4 (7.5%)	10 (9.8%)
Divorciado	10 (20.4%)	2 (3.8%)	12 (11.8%)
Soltero	18 (36.7%)	31 (58.5%)	49 (48.0%)
Viudo	0 (0.0%)	4 (7.5%)	4 (3.9%)
Diagnóstico médico				0.139 (3)
Artrosis facetaria	2 (4.1%)	0 (0.0%)	2 (2.0%)
Espondilolisis/listesis	2 (4.1%)	1 (1.9%)	3 (2.9%)
Lumbalgia	32 (65.3%)	36 (67.9%)	68 (66.7%)
Lumbociatalgia	5 (10.2%)	12 (22.6%)	17 (16.7%)
Otros	8 (16.3%)	4 (7.5%)	12 (11.8%)
Duración del síntoma				0.338 (1)
Mediana	15.00	11.00	12.00
RIQ	7.00, 48.00	5.00, 48.00	6.00, 48.00
Catastrofismo al ingreso				0.574 (1)
Mediana	33.00	33.00	33.00
RIQ	26.00, 40.00	23.00, 39.00	23.50, 40.00
Dolor al ingreso				0.651 (1)
Mediana	7.00	7.00	7.00
RIQ	6.00, 8.00	5.00, 8.00	5.47, 8.00
Índice de depresión de Beck al ingreso				0.844 (1)
Mediana	13.00	12.00	12.50
RIQ	9.00, 20.00	7.00, 22.00	8.00, 20.00
Roland Morris al ingreso				0.496 (4)
Media (DS)	10.71 (4.51)	10.08 (4.90)	10.38 (4.71)
Rango	3.00 - 23.00	0.00 - 23.00	0.00 - 23.00
Uso de analgésicos				0.180 (2)
No	25 (51.0%)	34 (64.2%)	59 (57.8%)
Si	24 (49.0%)	19 (35.8%)	43 (42.2%)
EuroQoL al ingreso				0.460 (4)
Media (DS)	0.49 (0.21)	0.52 (0.26)	0.50 (0.24)
Rango	0.10 - 0.88	-0.18 - 1.00	-0.18 - 1.00
¹ Test de la Mediana
² Test de Chi Cuadrado
³ Test Exacto de Fisher
⁴ Test-T

Selección de variables para el modelo

Utilizo de la base de datos todas las variables del tiempo de medición 1 que hay (excepto la valoración cruda del EuroQoL y el EQVAS que tiene muchos faltantes)

Realizamos un analisis de regresion simple

df_seleccion<-df_comp[,c(1:11, 14:16, 33, 35) ]

modelo_todas<-glm(GROC~. , data = df_seleccion, family = binomial())
summary(modelo_todas)

## 
## Call:
## glm(formula = GROC ~ ., family = binomial(), data = df_seleccion)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.14253  -0.90559   0.00021   0.89075   1.90808  
## 
## Coefficients:
##                                      Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                        -1.681e+01  2.275e+03  -0.007   0.9941  
## Edad                                6.457e-03  2.854e-02   0.226   0.8210  
## NacionalidadBoliviana               2.010e-01  1.093e+00   0.184   0.8541  
## NacionalidadBrasilera              -1.582e+01  3.956e+03  -0.004   0.9968  
## NacionalidadColombiana              1.910e+01  3.956e+03   0.005   0.9961  
## NacionalidadIraní                  -1.600e+01  3.956e+03  -0.004   0.9968  
## NacionalidadParaguaya               8.165e-02  1.022e+00   0.080   0.9363  
## NacionalidadPeruana                -7.734e-01  8.191e-01  -0.944   0.3450  
## NacionalidadUruguaya               -1.649e+01  3.956e+03  -0.004   0.9967  
## NacionalidadVenezolana              8.342e-01  1.424e+00   0.586   0.5579  
## SexoMasculino                      -7.999e-02  7.934e-01  -0.101   0.9197  
## OcupacionComerciante               -6.454e-01  1.007e+00  -0.641   0.5215  
## OcupacionConstrucción               1.304e+00  1.871e+00   0.697   0.4857  
## OcupacionCostura                   -5.952e-01  1.897e+00  -0.314   0.7537  
## OcupacionDesocupado                -1.098e+00  1.113e+00  -0.986   0.3240  
## OcupacionEmpleada doméstica        -1.018e+00  9.624e-01  -1.057   0.2903  
## OcupacionEstudiante                -5.902e-01  1.508e+00  -0.391   0.6956  
## OcupacionOtros                     -1.326e+00  1.334e+00  -0.994   0.3203  
## EstudiosPrimaria incompleta         1.465e-01  1.292e+00   0.113   0.9097  
## EstudiosSecundaria completa         2.423e-01  7.878e-01   0.308   0.7584  
## EstudiosSecundaria incompleta       1.343e-01  1.072e+00   0.125   0.9003  
## EstudiosTerciario/Universitario    -1.348e+00  1.021e+00  -1.320   0.1868  
## Estado_civilConviviente            -2.223e-01  1.072e+00  -0.207   0.8357  
## Estado_civilDivorciado             -1.528e+00  1.144e+00  -1.336   0.1817  
## Estado_civilSoltero                 1.206e+00  7.168e-01   1.682   0.0925 .
## Estado_civilViudo                   1.764e+01  1.937e+03   0.009   0.9927  
## DiagnosticoEspondilolisis/listesis  1.688e+01  2.275e+03   0.007   0.9941  
## DiagnosticoLumbalgia                1.818e+01  2.275e+03   0.008   0.9936  
## DiagnosticoLumbociatalgia           1.895e+01  2.275e+03   0.008   0.9934  
## DiagnosticoOtros                    1.892e+01  2.275e+03   0.008   0.9934  
## Duracion                            4.462e-03  4.746e-03   0.940   0.3472  
## T1_PCS                             -3.091e-02  3.237e-02  -0.955   0.3396  
## T1_EVA                              7.987e-02  1.714e-01   0.466   0.6411  
## T1_IDB                             -1.003e-03  3.396e-02  -0.030   0.9764  
## T1_EQporc                          -7.773e-03  1.550e-02  -0.502   0.6160  
## T1_RM                              -2.865e-02  6.900e-02  -0.415   0.6780  
## T1_AnalgesicosSi                   -2.940e-01  6.747e-01  -0.436   0.6630  
## Indice1                            -4.025e-01  1.657e+00  -0.243   0.8081  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 104.10  on  64  degrees of freedom
## AIC: 180.1
## 
## Number of Fisher Scoring iterations: 16

modelo_edad<-glm(GROC~Edad, data = df_seleccion, family = binomial)
summary(modelo_edad)

## 
## Call:
## glm(formula = GROC ~ Edad, family = binomial, data = df_seleccion)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.244  -1.205   1.119   1.146   1.169  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.217512   0.738739   0.294    0.768
## Edad        -0.003074   0.015732  -0.195    0.845
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 141.21  on 100  degrees of freedom
## AIC: 145.21
## 
## Number of Fisher Scoring iterations: 3

modelo_duracion<-glm(GROC~Duracion, data = df_seleccion, family = binomial)
summary(modelo_duracion)

## 
## Call:
## glm(formula = GROC ~ Duracion, family = binomial, data = df_seleccion)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.230  -1.223   1.126   1.130   1.257  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.126027   0.237185   0.531    0.595
## Duracion    -0.001175   0.003218  -0.365    0.715
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 141.11  on 100  degrees of freedom
## AIC: 145.11
## 
## Number of Fisher Scoring iterations: 3

modelo_pcs<-glm(GROC~T1_PCS, data = df_seleccion, family = binomial)
summary(modelo_pcs)

## 
## Call:
## glm(formula = GROC ~ T1_PCS, family = binomial, data = df_seleccion)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.280  -1.203   1.088   1.149   1.192  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.25393    0.58958   0.431    0.667
## T1_PCS      -0.00563    0.01781  -0.316    0.752
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 141.14  on 100  degrees of freedom
## AIC: 145.14
## 
## Number of Fisher Scoring iterations: 3

modelo_dolor<-glm(GROC~T1_EVA, data = df_seleccion, family = binomial)
summary(modelo_dolor)

## 
## Call:
## glm(formula = GROC ~ T1_EVA, family = binomial, data = df_seleccion)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.227  -1.212   1.121   1.143   1.170  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.04487    0.79751  -0.056    0.955
## T1_EVA       0.01797    0.11255   0.160    0.873
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 141.22  on 100  degrees of freedom
## AIC: 145.22
## 
## Number of Fisher Scoring iterations: 3

modelo_depre<-glm(GROC~T1_IDB, data = df_seleccion, family = binomial)
summary(modelo_depre)

## 
## Call:
## glm(formula = GROC ~ T1_IDB, family = binomial, data = df_seleccion)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.213  -1.211   1.142   1.144   1.152  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.0840556  0.3658348   0.230    0.818
## T1_IDB      -0.0003737  0.0205792  -0.018    0.986
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 141.24  on 100  degrees of freedom
## AIC: 145.24
## 
## Number of Fisher Scoring iterations: 3

modelo_rm<-glm(GROC~T1_RM, data = df_seleccion, family = binomial)
summary(modelo_rm)

## 
## Call:
## glm(formula = GROC ~ T1_RM, family = binomial, data = df_seleccion)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.305  -1.203   1.037   1.139   1.303  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.38241    0.48533   0.788    0.431
## T1_RM       -0.02925    0.04258  -0.687    0.492
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 140.77  on 100  degrees of freedom
## AIC: 144.77
## 
## Number of Fisher Scoring iterations: 3

modelo_eq<-glm(GROC~Indice1, data = df_seleccion, family = binomial)
summary(modelo_eq)

## 
## Call:
## glm(formula = GROC ~ Indice1, family = binomial, data = df_seleccion)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.314  -1.191   1.032   1.148   1.329  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.2385     0.4692  -0.508    0.611
## Indice1       0.6285     0.8438   0.745    0.456
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 140.69  on 100  degrees of freedom
## AIC: 144.69
## 
## Number of Fisher Scoring iterations: 3

Genero una tabla con la misma salida que el modelo anterior pero con mayor información

# Hago un análisis de regresión simple con todas las variables valoradas en el primer tiempo de evaluación

df_comp %>%
  select(GROC, Edad, Nacionalidad, nac_cat, Sexo, Ocupacion, Estudios, Estado_civil, Duracion, T1_PCS, T1_EVA, T1_IDB, Indice1, T1_RM,T1_Analgesicos) %>%
  tbl_uvregression(
    method = glm,
    y = GROC,
    method.args = list(family = binomial(link = "logit")),
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 2)
  ) %>%
  add_global_p() %>%  # add global p-value 
  add_nevent() %>%    # add number of events of the outcome
  bold_p() %>%        # bold p-values under a given threshold (default 0.05)
  bold_labels()

Characteristic	N	Event N	OR¹	95% CI¹	p-value
Edad (años)	102	53	1.00	0.97, 1.03	0.85
Nacionalidad	102	53			0.50
Argentina			—	—
Boliviana			0.74	0.19, 2.91
Brasilera			0.00
Colombiana			11,508,354	0.00, NA
Iraní			0.00
Paraguaya			0.74	0.19, 2.91
Peruana			0.49	0.15, 1.54
Uruguaya			0.00
Venezolana			0.74	0.08, 6.47
Nacionalidad categorizada	102	53	1.72	0.78, 3.83	0.18
Sexo	102	53			0.64
Femenino			—	—
Masculino			0.81	0.34, 1.93
Ocupación	102	53			0.86
Ama de casa			—	—
Comerciante			0.78	0.20, 2.92
Construcción			1.75	0.28, 14.8
Costura			0.47	0.05, 3.54
Desocupado			0.82	0.19, 3.56
Empleada doméstica			0.70	0.18, 2.57
Estudiante			1.05	0.14, 9.61
Otros			0.42	0.10, 1.67
Nivel de estudios	102	53			0.40
Primaria completa			—	—
Primaria incompleta			1.33	0.23, 8.48
Secundaria completa			1.47	0.48, 4.53
Secundaria incompleta			1.40	0.32, 6.38
Terciario/Universitario			0.53	0.15, 1.87
Estado civil	102	53			0.004
Casado			—	—
Conviviente			0.83	0.18, 3.61
Divorciado			0.25	0.03, 1.18
Soltero			2.15	0.83, 5.70
Viudo			19,564,201	0.00, NA
Duración del síntoma	102	53	1.00	0.99, 1.01	0.71
Catastrofismo al ingreso	102	53	0.99	0.96, 1.03	0.75
Dolor al ingreso	102	53	1.02	0.82, 1.27	0.87
Índice de depresión de Beck al ingreso	102	53	1.00	0.96, 1.04	0.99
EuroQoL al ingreso	102	53	1.87	0.36, 10.1	0.45
Roland Morris al ingreso	102	53	0.97	0.89, 1.06	0.49
Uso de analgésicos	102	53			0.18
No			—	—
Si			0.58	0.26, 1.28
¹ OR = Odds Ratio, CI = Confidence Interval

Debido a que es posible que no todas las variables continuas tengan una relación lineal con el objetivo primario y que realizar categorias en ellas tenga mejor sentido.

Utilizo la funcion cutpointr para elegir el mejor punto de corte para la categorizacion de las variables elegidas segun criterio clinico

Realizo el modelo solamente con todas las variables categóricas

df_comp_cat<-df_comp[,c(35:42)]

modelo_cat<-glm(GROC~. , data = df_comp_cat, family = binomial())
summary(modelo_cat)

## 
## Call:
## glm(formula = GROC ~ ., family = binomial(), data = df_comp_cat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8794  -1.0689   0.6231   1.0758   1.8509  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    1.11807    0.62578   1.787   0.0740 .
## nac_cat        0.48640    0.43486   1.119   0.2633  
## edad_cat1     -0.76882    0.48758  -1.577   0.1148  
## duracion_cat1 -0.87498    0.44421  -1.970   0.0489 *
## pcs_cat1      -0.02587    0.47449  -0.055   0.9565  
## eva_cat1      -0.06374    0.52564  -0.121   0.9035  
## idb_cat1      -0.50878    0.47998  -1.060   0.2891  
## rm_cat1       -0.39000    0.47575  -0.820   0.4124  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 129.55  on  94  degrees of freedom
## AIC: 145.55
## 
## Number of Fisher Scoring iterations: 4

Genero la tabla que me da mayor información

# Repito el analisis de regresión simple ahora solo con las variables categorizadas según su punto de corte óptimo

df_comp %>%
  select(GROC, edad_cat, duracion_cat, pcs_cat, eva_cat, idb_cat, rm_cat) %>%
  tbl_uvregression(
    method = glm,
    y = GROC,
    method.args = list(family = binomial(link = "logit")),
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 2),
    show_single_row = c(edad_cat, duracion_cat, pcs_cat, eva_cat, idb_cat, rm_cat),
    label = c(edad_cat~"Edad > 55 años", duracion_cat~ "Duración > 12 meses", pcs_cat~ "PCS > 37",
              eva_cat~"Dolor > 6", idb_cat ~ "Indice de Beck > 10", rm_cat  ~ "Roland Morris > 11")
  ) %>%
  add_global_p() %>%  # add global p-value 
  add_nevent() %>%    # add number of events of the outcome
  bold_p() %>%        # bold p-values under a given threshold (default 0.05)
  bold_labels()

Characteristic	N	Event N	OR¹	95% CI¹	p-value
Edad > 55 años	102	53	0.50	0.21, 1.19	0.12
Duración > 12 meses	102	53	0.42	0.19, 0.95	0.036
PCS > 37	102	53	0.69	0.31, 1.52	0.35
Dolor > 6	102	53	0.59	0.23, 1.46	0.26
Indice de Beck > 10	102	53	0.55	0.23, 1.26	0.16
Roland Morris > 11	102	53	0.54	0.24, 1.17	0.12
¹ OR = Odds Ratio, CI = Confidence Interval

#  add_q() %>%         # adjusts global p-values for multiple testing
#  bold_p(t = 0.10, q = TRUE) %>% # now bold q-values under the threshold of 0.10
#> add_q: Adjusting p-values with
#> `stats::p.adjust(x$table_body$p.value, method = "fdr")`

Best GLM

df_best<-df_seleccion[,c(1, 3, 8:14, 16)]

modelo_best<-glm(GROC~. , data = df_best, family = binomial())

modelo<-model.matrix(modelo_best)

Xy<-data.frame(cbind(modelo[,-1], (df_best$GROC)))

Xy$V10<-Xy$V10-1

res.bestglm<-bestglm(Xy = Xy,
                     family = binomial,
                     IC = "AIC",
                     method = "exhaustive")

res.bestglm$BestModels

res.bestglm$Subsets # observar que pone asterísco en el mejor (Menor AIC)

Random Forest

Utilizamos Random Forest como metodo alternativo de seleccion de variables. Antes de eso armo una base de datos con las variables que serían de interés para desarrollar el modelo. Todas las de la evaluación basal con su forma continua y categórica (se llamará df_comp_rf).

Lo primero que se puede observar en el random Random Forest es que parecen haber varias variables involucradas en el desarrollo del outcome primario.

Aquellas con mayor importancia parecen ser Ocupación, Roland Morris categorizado y Edad. Llamativamente, Estado civil y Duración parecen tener un valor intermedio con respecto a otras variables. Incluso Duración categorizada es la que menos importancia tendría.

# Prueba: Que pasa con los que cambian el valor de PCS

df_seleccion$PCS_cambio<-df_comp$T1_PCS-df_comp$T2_PCS

modelo_cambio<-glm(GROC~. , data = df_seleccion, family = binomial())

summary(modelo_cambio)

## 
## Call:
## glm(formula = GROC ~ ., family = binomial(), data = df_seleccion)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.14706  -0.85152   0.00022   0.88283   1.87047  
## 
## Coefficients:
##                                      Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                        -1.594e+01  2.205e+03  -0.007   0.9942  
## Edad                                3.299e-04  2.933e-02   0.011   0.9910  
## NacionalidadBoliviana               4.174e-01  1.115e+00   0.374   0.7082  
## NacionalidadBrasilera              -1.536e+01  3.956e+03  -0.004   0.9969  
## NacionalidadColombiana              1.915e+01  3.956e+03   0.005   0.9961  
## NacionalidadIraní                  -1.603e+01  3.956e+03  -0.004   0.9968  
## NacionalidadParaguaya              -3.298e-02  1.048e+00  -0.031   0.9749  
## NacionalidadPeruana                -8.778e-01  8.126e-01  -1.080   0.2800  
## NacionalidadUruguaya               -1.577e+01  3.956e+03  -0.004   0.9968  
## NacionalidadVenezolana              7.951e-01  1.434e+00   0.555   0.5792  
## SexoMasculino                       6.031e-02  8.259e-01   0.073   0.9418  
## OcupacionComerciante               -8.691e-01  1.036e+00  -0.839   0.4015  
## OcupacionConstrucción               1.148e+00  1.968e+00   0.583   0.5597  
## OcupacionCostura                   -1.230e+00  1.985e+00  -0.620   0.5354  
## OcupacionDesocupado                -1.079e+00  1.147e+00  -0.941   0.3468  
## OcupacionEmpleada doméstica        -1.035e+00  9.757e-01  -1.061   0.2889  
## OcupacionEstudiante                -7.549e-01  1.576e+00  -0.479   0.6320  
## OcupacionOtros                     -1.478e+00  1.355e+00  -1.091   0.2754  
## EstudiosPrimaria incompleta         5.651e-01  1.336e+00   0.423   0.6724  
## EstudiosSecundaria completa         2.071e-01  7.938e-01   0.261   0.7941  
## EstudiosSecundaria incompleta       3.502e-01  1.086e+00   0.322   0.7471  
## EstudiosTerciario/Universitario    -1.505e+00  1.058e+00  -1.423   0.1549  
## Estado_civilConviviente            -1.414e-01  1.103e+00  -0.128   0.8980  
## Estado_civilDivorciado             -2.006e+00  1.224e+00  -1.639   0.1013  
## Estado_civilSoltero                 1.196e+00  7.238e-01   1.652   0.0985 .
## Estado_civilViudo                   1.726e+01  1.911e+03   0.009   0.9928  
## DiagnosticoEspondilolisis/listesis  1.688e+01  2.205e+03   0.008   0.9939  
## DiagnosticoLumbalgia                1.827e+01  2.205e+03   0.008   0.9934  
## DiagnosticoLumbociatalgia           1.900e+01  2.205e+03   0.009   0.9931  
## DiagnosticoOtros                    1.944e+01  2.205e+03   0.009   0.9930  
## Duracion                            4.526e-03  4.853e-03   0.933   0.3511  
## T1_PCS                             -4.897e-02  3.535e-02  -1.385   0.1659  
## T1_EVA                              7.158e-02  1.746e-01   0.410   0.6819  
## T1_IDB                              2.470e-03  3.533e-02   0.070   0.9443  
## T1_EQporc                          -1.330e-02  1.625e-02  -0.818   0.4131  
## T1_RM                              -1.068e-02  7.175e-02  -0.149   0.8816  
## T1_AnalgesicosSi                   -4.811e-01  6.983e-01  -0.689   0.4909  
## Indice1                            -4.361e-01  1.690e+00  -0.258   0.7964  
## PCS_cambio                          5.013e-02  3.867e-02   1.296   0.1948  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 102.38  on  63  degrees of freedom
## AIC: 180.38
## 
## Number of Fisher Scoring iterations: 16

# Establezco punto de corte del cambio y hago un modelo con todas las variables categorizadas

PCS_cambio <-cutpointr(df_seleccion, PCS_cambio  , GROC, method = maximize_metric, metric = youden)

## Assuming the positive class is 1

## Assuming the positive class has higher x values

summary(PCS_cambio)

df_comp_cat<-df_comp[,c(35:42)]

df_comp_cat$PCS_cambio<-as.factor(ifelse(df_seleccion$PCS_cambio > 7, 1, 0))

modelo_cat_cambio<-glm(GROC~. , data = df_comp_cat, family = binomial())

summary(modelo_cat_cambio)

## 
## Call:
## glm(formula = GROC ~ ., family = binomial(), data = df_comp_cat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2541  -0.9987   0.5578   0.9726   1.6407  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     1.0402     0.6404   1.624   0.1043  
## nac_cat         0.4272     0.4471   0.955   0.3394  
## edad_cat1      -1.0260     0.5154  -1.991   0.0465 *
## duracion_cat1  -0.9492     0.4573  -2.075   0.0379 *
## pcs_cat1       -0.1096     0.4985  -0.220   0.8260  
## eva_cat1       -0.2027     0.5452  -0.372   0.7100  
## idb_cat1       -0.4742     0.5002  -0.948   0.3431  
## rm_cat1        -0.3333     0.4896  -0.681   0.4960  
## PCS_cambio1     1.1006     0.5129   2.146   0.0319 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 124.60  on  93  degrees of freedom
## AIC: 142.6
## 
## Number of Fisher Scoring iterations: 4

Modelos armados en base a lo hallado

modelo_1 <- glm(GROC ~ duracion_cat + Edad, data = df_comp_rf, family = binomial())

summary(modelo_1)

## 
## Call:
## glm(formula = GROC ~ duracion_cat + Edad, family = binomial(), 
##     data = df_comp_rf)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4862  -1.0966   0.8984   1.1662   1.3390  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    0.847297   0.814591   1.040   0.2983  
## duracion_cat1 -0.866969   0.415855  -2.085   0.0371 *
## Edad          -0.005596   0.016096  -0.348   0.7281  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 136.73  on  99  degrees of freedom
## AIC: 142.73
## 
## Number of Fisher Scoring iterations: 4

modelo_2 <- glm(GROC ~ duracion_cat + edad_cat, data = df_comp_rf, family = binomial())

summary(modelo_2)

## 
## Call:
## glm(formula = GROC ~ duracion_cat + edad_cat, family = binomial(), 
##     data = df_comp_rf)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5626  -1.1489   0.8361   1.1912   1.5597  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     0.8714     0.3701   2.355   0.0185 *
## duracion_cat1  -0.9388     0.4259  -2.204   0.0275 *
## edad_cat1      -0.7975     0.4592  -1.737   0.0824 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 133.75  on  99  degrees of freedom
## AIC: 139.75
## 
## Number of Fisher Scoring iterations: 4

modelo_3 <- glm(GROC ~ duracion_cat + edad_cat + PCS_cambio, data = df_comp_cat, family = binomial())

summary(modelo_3)

## 
## Call:
## glm(formula = GROC ~ duracion_cat + edad_cat + PCS_cambio, family = binomial(), 
##     data = df_comp_cat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9812  -1.0336   0.5503   0.9005   1.8278  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     0.6932     0.3818   1.816   0.0694 .
## duracion_cat1  -1.0412     0.4415  -2.358   0.0184 *
## edad_cat1      -1.1140     0.5015  -2.221   0.0263 *
## PCS_cambio1     1.1181     0.4992   2.240   0.0251 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 128.34  on  98  degrees of freedom
## AIC: 136.34
## 
## Number of Fisher Scoring iterations: 4

modelo_4 <- glm(GROC ~ duracion_cat + edad_cat + PCS_cambio + rm_cat, data = df_comp_cat, family = binomial())

summary(modelo_4)

## 
## Call:
## glm(formula = GROC ~ duracion_cat + edad_cat + PCS_cambio + rm_cat, 
##     family = binomial(), data = df_comp_cat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0709  -1.1193   0.4992   0.9964   1.6887  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     0.9102     0.4267   2.133   0.0329 *
## duracion_cat1  -1.0485     0.4447  -2.358   0.0184 *
## edad_cat1      -1.0127     0.5097  -1.987   0.0469 *
## PCS_cambio1     1.1094     0.5037   2.203   0.0276 *
## rm_cat1        -0.5292     0.4336  -1.221   0.2222  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 126.84  on  97  degrees of freedom
## AIC: 136.84
## 
## Number of Fisher Scoring iterations: 4

library(ResourceSelection)

## ResourceSelection 0.3-5   2019-07-22

df_comp_cat$GROC<-as.numeric(df_comp_cat$GROC)-1 # Hosmer Lemenshow necesita numericos

htest<-hoslem.test(df_comp_cat$GROC, modelo_3$fitted.values, g = 10)
htest # no rechaza

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  df_comp_cat$GROC, modelo_3$fitted.values
## X-squared = 12.247, df = 8, p-value = 0.1405

htest$observed

##                
## cutyhat         y0 y1
##   [0.188,0.396]  9  7
##   (0.396,0.414] 22 11
##   (0.414,0.415]  7  1
##   (0.415,0.667]  7 15
##   (0.667,0.668]  2  4
##   (0.668,0.684]  0 11
##   (0.684,0.86]   2  4

htest$expected # tienen que ser mayores a 5

##                
## cutyhat              yhat0      yhat1
##   [0.188,0.396] 11.3241759  4.6758241
##   (0.396,0.414] 19.3427108 13.6572892
##   (0.414,0.415]  4.6812801  3.3187199
##   (0.415,0.667]  7.3331133 14.6668867
##   (0.667,0.668]  1.9945440  4.0054560
##   (0.668,0.684]  3.4812485  7.5187515
##   (0.684,0.86]   0.8429274  5.1570726

sum(htest$expected<5) # Si hay menores de 5 tengo que unir grupos para que el test sea valido

## [1] 7

# Reagrupo en 6 en lugar de 10
htest<-hoslem.test(df_comp_cat$GROC, modelo_3$fitted.values, g = 6)
htest # no rechaza

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  df_comp_cat$GROC, modelo_3$fitted.values
## X-squared = 4.4763, df = 4, p-value = 0.3454

sum(htest$expected<5)

## [1] 5

Valoracion de bondad de ajuste mediante k-Folds

library(caTools)

## Warning: package 'caTools' was built under R version 4.2.1

set.seed(123)
split <- sample.split(df_comp_cat$GROC, SplitRatio = 0.80)
training_set <- subset(df_comp_cat, split == TRUE)
test_set <- subset(df_comp_cat, split == FALSE)

library(caret)

## Warning: package 'caret' was built under R version 4.2.1

## 
## Attaching package: 'caret'

## The following objects are masked from 'package:cutpointr':
## 
##     precision, recall, sensitivity, specificity

## The following object is masked from 'package:survival':
## 
##     cluster

## The following object is masked from 'package:purrr':
## 
##     lift

folds <- createFolds(df_comp_cat$GROC, k = 5)


# Regresion Logistica
cvRegresionLogistica <- lapply(folds, function(x){
  training_fold <- training_set[-x, ]
  test_fold <- training_set[x, ]
  clasificador <- glm(GROC~duracion_cat + edad_cat + PCS_cambio,
                      family =  binomial, data = training_fold)
  y_pred <- predict(clasificador, type = 'response', newdata = test_fold)
  y_pred <- ifelse(y_pred > 0.5, 1, 0)
  y_pred <- factor(y_pred, levels = c("0", "1"), labels = c("BCS no", "BCS si"))
  cm <- table(test_fold$GROC, y_pred)
  precision <- (cm[1,1] + cm[2,2]) / (cm[1,1] + cm[2,2] +cm[1,2] + cm[2,1])
  return(precision)
})
precisionRegresionLogistica <- mean(as.numeric(cvRegresionLogistica))

precisionRegresionLogistica

## [1] 0.7266616

Valoración de bondad de ajuste mediante Curva ROC

library(ROCit)

clase<-df_comp_cat$GROC
ROC <- rocit(modelo_3$fitted.values,class=clase)
score<-modelo_3$fitted.values
plot(ROC, col="#1c61b6AA")

ROC$AUC

## [1] 0.6952253

# busco la distancia del (fpr,tpr) al (0,1)

distancia<-sqrt(ROC$FPR^2 +(ROC$TPR-1)^2)
plot(ROC$Cutoff,distancia,pch=20)

minimo<-ROC$Cutoff[which.min(distancia)] # 0.6675760

measure <- measureit(score = score, class = clase, measure = c("ACC", "SENS", "SPEC"))
measure

##      Cutoff      Depth TP FP TN FN       ACC      SENS      SPEC
## 1       Inf 0.00000000  0  0 49 53 0.4803922 0.0000000 1.0000000
## 2 0.8595121 0.05882353  4  2 47 49 0.5000000 0.0754717 0.9591837
## 3 0.6835229 0.16666667 15  2 47 38 0.6078431 0.2830189 0.9591837
## 4 0.6675760 0.22549020 19  4 45 34 0.6274510 0.3584906 0.9183673
## 5 0.6666767 0.44117647 34 11 38 19 0.7058824 0.6415094 0.7755102
## 6 0.4148400 0.51960784 35 18 31 18 0.6470588 0.6603774 0.6326531
## 7 0.4138572 0.84313725 46 40  9  7 0.5392157 0.8679245 0.1836735
## 8 0.3963231 0.92156863 50 44  5  3 0.5392157 0.9433962 0.1020408
## 9 0.1881549 1.00000000 53 49  0  0 0.5196078 1.0000000 0.0000000

Realizamos la AUC y le agregamos el IC95%

library(pROC)

rocobj <- plot.roc(df_comp_cat$GROC,modelo_3$fitted.values,
                   main="Confidence intervals", 
                   percent=TRUE,ci=TRUE,print.auc=TRUE)

ciobj <- ci.se(rocobj, specificities=seq(0, 100, 5))
#insertamos el IC
plot(ciobj, type="shape", col="#1c61b6AA")
plot(ci(rocobj, of="thresholds", thresholds="best"))

Prueba solo con estables

df_prueba<-df_comp[, c(20, 35:42)]

modelo_prueba <- glm(GROC ~ . , data = df_prueba, family = binomial())

summary(modelo_prueba)

## 
## Call:
## glm(formula = GROC ~ ., family = binomial(), data = df_prueba)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9978  -1.0779   0.6016   1.0491   1.8376  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    1.08735    0.62653   1.736   0.0827 .
## T2_GROC        0.09134    0.09744   0.937   0.3486  
## nac_cat        0.56200    0.44564   1.261   0.2073  
## edad_cat1     -0.72441    0.49005  -1.478   0.1393  
## duracion_cat1 -0.83175    0.44819  -1.856   0.0635 .
## pcs_cat1       0.01756    0.47872   0.037   0.9707  
## eva_cat1      -0.06315    0.52775  -0.120   0.9047  
## idb_cat1      -0.54219    0.48400  -1.120   0.2626  
## rm_cat1       -0.42736    0.48033  -0.890   0.3736  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.25  on 101  degrees of freedom
## Residual deviance: 128.65  on  93  degrees of freedom
## AIC: 146.65
## 
## Number of Fisher Scoring iterations: 4

df_estable<-df_prueba

df_estable$estables<-ifelse(df_estable$T2_GROC > 2, 1, ifelse(df_estable$T2_GROC < -2, 1, 0))

df_estable<-df_estable[df_estable$estables == 0, ]

modelo_estable <- glm(GROC ~ . , data = df_estable, family = binomial())

summary(modelo_estable)

## 
## Call:
## glm(formula = GROC ~ ., family = binomial(), data = df_estable)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2977  -1.0486   0.0372   1.0652   1.7621  
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    0.60324    0.65960   0.915   0.3604  
## T2_GROC        0.56526    0.41385   1.366   0.1720  
## nac_cat        0.38476    0.50752   0.758   0.4484  
## edad_cat1     -1.01573    0.58773  -1.728   0.0839 .
## duracion_cat1 -0.86975    0.51567  -1.687   0.0917 .
## pcs_cat1       0.44708    0.55529   0.805   0.4208  
## eva_cat1       0.06022    0.64134   0.094   0.9252  
## idb_cat1      -0.34719    0.57257  -0.606   0.5443  
## rm_cat1       -0.19252    0.57477  -0.335   0.7377  
## estables            NA         NA      NA       NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 110.904  on 79  degrees of freedom
## Residual deviance:  98.525  on 71  degrees of freedom
## AIC: 116.52
## 
## Number of Fisher Scoring iterations: 4

PCS - Análisis secundario

Villalba Federico