1 tabla univariante

Data Frame Summary

datisc

Dimensions: 64 x 62
Duplicates: 0
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 articulacion [factor]
1. CADERA
2. RODILLA
32(50.0%)
32(50.0%)
64 (100.0%) 0 (0.0%)
2 tipo_intervencion [factor]
1. PRIMARIA
2. REVISION
52(81.2%)
12(18.8%)
64 (100.0%) 0 (0.0%)
3 indicacion [factor]
1. Artrosis
2. Otros
44(68.8%)
20(31.2%)
64 (100.0%) 0 (0.0%)
4 cementada [factor]
1. NO
2. Cementado
29(45.3%)
35(54.7%)
64 (100.0%) 0 (0.0%)
5 polietileno [factor]
1. SI
2. NO
34(53.1%)
30(46.9%)
64 (100.0%) 0 (0.0%)
6 edad [numeric]
Mean (sd) : 66.4 (12.2)
min ≤ med ≤ max:
16 ≤ 68 ≤ 85
IQR (CV) : 17.5 (0.2)
35 distinct values 64 (100.0%) 0 (0.0%)
7 sexo [factor]
1. H
2. M
44(68.8%)
20(31.2%)
64 (100.0%) 0 (0.0%)
8 asa [factor]
1. I
2. II
3. III
4. IV
7(10.9%)
34(53.1%)
22(34.4%)
1(1.6%)
64 (100.0%) 0 (0.0%)
9 ecv [factor]
1. FALSE
2. TRUE
42(65.6%)
22(34.4%)
64 (100.0%) 0 (0.0%)
10 ete_previa [factor]
1. FALSE
2. TRUE
59(92.2%)
5(7.8%)
64 (100.0%) 0 (0.0%)
11 neumopatia [factor]
1. FALSE
2. TRUE
48(75.0%)
16(25.0%)
64 (100.0%) 0 (0.0%)
12 hta [factor]
1. FALSE
2. TRUE
25(39.1%)
39(60.9%)
64 (100.0%) 0 (0.0%)
13 psiquiatrica [factor]
1. FALSE
2. TRUE
62(96.9%)
2(3.1%)
64 (100.0%) 0 (0.0%)
14 diabetes [factor]
1. FALSE
2. TRUE
46(71.9%)
18(28.1%)
64 (100.0%) 0 (0.0%)
15 imc [numeric]
Mean (sd) : 31.7 (5.9)
min ≤ med ≤ max:
19 ≤ 32 ≤ 51
IQR (CV) : 7.2 (0.2)
25 distinct values 64 (100.0%) 0 (0.0%)
16 gastropatia [factor]
1. FALSE
2. TRUE
52(81.2%)
12(18.8%)
64 (100.0%) 0 (0.0%)
17 inmunodepresion [factor]
1. FALSE
2. TRUE
60(93.8%)
4(6.2%)
64 (100.0%) 0 (0.0%)
18 malignidad [factor]
1. FALSE
2. TRUE
54(84.4%)
10(15.6%)
64 (100.0%) 0 (0.0%)
19 marcapasos [factor]
1. FALSE
2. TRUE
63(98.4%)
1(1.6%)
64 (100.0%) 0 (0.0%)
20 renopatia [factor]
1. FALSE
2. TRUE
57(89.1%)
7(10.9%)
64 (100.0%) 0 (0.0%)
21 reumatismo [factor]
1. FALSE
2. TRUE
51(79.7%)
13(20.3%)
64 (100.0%) 0 (0.0%)
22 hepatopatia [factor]
1. FALSE
2. TRUE
62(96.9%)
2(3.1%)
64 (100.0%) 0 (0.0%)
23 iproprevia [factor]
1. FALSE
2. TRUE
63(98.4%)
1(1.6%)
64 (100.0%) 0 (0.0%)
24 vih [factor] 1. FALSE
64(100.0%)
64 (100.0%) 0 (0.0%)
25 coagulopatia [factor]
1. FALSE
2. TRUE
54(84.4%)
10(15.6%)
64 (100.0%) 0 (0.0%)
26 anticoagulantes [factor]
1. FALSE
2. TRUE
56(87.5%)
8(12.5%)
64 (100.0%) 0 (0.0%)
27 antiagregacion [factor]
1. FALSE
2. TRUE
48(75.0%)
16(25.0%)
64 (100.0%) 0 (0.0%)
28 tabac2 [factor]
1. NO
2. SI
34(53.1%)
30(46.9%)
64 (100.0%) 0 (0.0%)
29 alcohol [factor]
1. FALSE
2. TRUE
51(79.7%)
13(20.3%)
64 (100.0%) 0 (0.0%)
30 seroma [factor]
1. FALSE
2. TRUE
27(42.2%)
37(57.8%)
64 (100.0%) 0 (0.0%)
31 hematoma [factor]
1. FALSE
2. TRUE
46(71.9%)
18(28.1%)
64 (100.0%) 0 (0.0%)
32 skin_infection [factor]
1. FALSE
2. TRUE
35(54.7%)
29(45.3%)
64 (100.0%) 0 (0.0%)
33 fistula [factor]
1. FALSE
2. TRUE
40(62.5%)
24(37.5%)
64 (100.0%) 0 (0.0%)
34 fiebre [factor]
1. FALSE
2. TRUE
42(65.6%)
22(34.4%)
64 (100.0%) 0 (0.0%)
35 tipo_ip [factor]
1. AGUDA
2. HEMATOGENA
55(85.9%)
9(14.1%)
64 (100.0%) 0 (0.0%)
36 diasadair [numeric]
Mean (sd) : 35.1 (17.6)
min ≤ med ≤ max:
13 ≤ 29 ≤ 87
IQR (CV) : 20.5 (0.5)
34 distinct values 55 (85.9%) 9 (14.1%)
37 dias_clinica [numeric]
Mean (sd) : 15.3 (13.7)
min ≤ med ≤ max:
2 ≤ 8 ≤ 43
IQR (CV) : 25 (0.9)
15 distinct values 21 (32.8%) 43 (67.2%)
38 pcr_sangre [numeric]
Mean (sd) : 10.8 (10.1)
min ≤ med ≤ max:
0.1 ≤ 7.3 ≤ 33
IQR (CV) : 13.2 (0.9)
56 distinct values 64 (100.0%) 0 (0.0%)
39 wbc_sangre [numeric]
Mean (sd) : 9.2 (4.1)
min ≤ med ≤ max:
3.2 ≤ 8.1 ≤ 24
IQR (CV) : 4.4 (0.4)
48 distinct values 64 (100.0%) 0 (0.0%)
40 hemocultivo [factor]
1. FALSE
2. TRUE
60(93.8%)
4(6.2%)
64 (100.0%) 0 (0.0%)
41 germen [factor]
1. CoNS
2. CoNS+ E.faecalis
3. CoNS+ Peptoniphilus spp.
4. CoNS+E. faecalis
5. CoNS+S. lugdunensis
6. Corynebacterium spp.
7. Cultivo negativo
8. E. faecalis
9. E. faecalis+ S. marcescen
10. K. pneumoniae+ CoNS
[ 14 others ]
17(26.6%)
1(1.6%)
1(1.6%)
1(1.6%)
1(1.6%)
1(1.6%)
4(6.2%)
1(1.6%)
1(1.6%)
1(1.6%)
35(54.7%)
64 (100.0%) 0 (0.0%)
42 organism [factor]
1. CoNS
2. Corynebacterium spp.
3. Culture negative
4. E. faecalis
5. L. monocytogenes
6. P. acnes
7. P. aeruginosa
8. Polymicrobial
9. S. dysgalactiae
10. S. lugdunensis
[ 5 others ]
17(26.6%)
1(1.6%)
4(6.2%)
1(1.6%)
1(1.6%)
2(3.1%)
2(3.1%)
10(15.6%)
1(1.6%)
2(3.1%)
23(35.9%)
64 (100.0%) 0 (0.0%)
43 cultivoporc [numeric]
Mean (sd) : 61.9 (34.8)
min ≤ med ≤ max:
0 ≤ 66.7 ≤ 100
IQR (CV) : 66.7 (0.6)
18 distinct values 64 (100.0%) 0 (0.0%)
44 resdair [factor]
1. EXITO
2. FRACASO
39(60.9%)
25(39.1%)
64 (100.0%) 0 (0.0%)
45 klic_score [numeric]
Mean (sd) : 2.1 (1.8)
min ≤ med ≤ max:
0 ≤ 2 ≤ 6.5
IQR (CV) : 3.2 (0.9)
10 distinct values 42 (65.6%) 22 (34.4%)
46 shohat [numeric]
Mean (sd) : 65.2 (9.9)
min ≤ med ≤ max:
44.2 ≤ 66.2 ≤ 86.9
IQR (CV) : 15.4 (0.2)
62 distinct values 64 (100.0%) 0 (0.0%)
47 team_main [factor]
1. FALSE
2. TRUE
6(9.4%)
58(90.6%)
64 (100.0%) 0 (0.0%)
48 er [factor]
1. FALSE
2. TRUE
53(82.8%)
11(17.2%)
64 (100.0%) 0 (0.0%)
49 exitus [factor]
1. 2016
2. 2017
3. 2018
4. 2021
1(25.0%)
1(25.0%)
1(25.0%)
1(25.0%)
4 (6.2%) 60 (93.8%)
50 edadmas70 [logical]
1. FALSE
2. TRUE
39(60.9%)
25(39.1%)
64 (100.0%) 0 (0.0%)
51 imc30 [logical]
1. FALSE
2. TRUE
23(35.9%)
41(64.1%)
64 (100.0%) 0 (0.0%)
52 asamas2 [logical]
1. FALSE
2. TRUE
41(64.1%)
23(35.9%)
64 (100.0%) 0 (0.0%)
53 pcrmas11 [logical]
1. FALSE
2. TRUE
45(70.3%)
19(29.7%)
64 (100.0%) 0 (0.0%)
54 diasdmas45 [logical]
1. FALSE
2. TRUE
42(76.4%)
13(23.6%)
55 (85.9%) 9 (14.1%)
55 diascmas12 [logical]
1. FALSE
2. TRUE
13(61.9%)
8(38.1%)
21 (32.8%) 43 (67.2%)
56 wbcmas10 [logical]
1. FALSE
2. TRUE
43(67.2%)
21(32.8%)
64 (100.0%) 0 (0.0%)
57 pcrmas3 [logical]
1. FALSE
2. TRUE
15(23.4%)
49(76.6%)
64 (100.0%) 0 (0.0%)
58 wbcmas6.8 [logical]
1. FALSE
2. TRUE
19(29.7%)
45(70.3%)
64 (100.0%) 0 (0.0%)
59 diasdmas25 [logical]
1. FALSE
2. TRUE
18(32.7%)
37(67.3%)
55 (85.9%) 9 (14.1%)
60 diascmas7 [logical]
1. FALSE
2. TRUE
8(38.1%)
13(61.9%)
21 (32.8%) 43 (67.2%)
61 dias_cd [numeric]
Mean (sd) : 31.5 (18.9)
min ≤ med ≤ max:
2 ≤ 27.5 ≤ 87
IQR (CV) : 21.5 (0.6)
39 distinct values 64 (100.0%) 0 (0.0%)
62 tabaco [factor]
1. FALSE
2. TRUE
3. VERDADERO EX
34(53.1%)
12(18.8%)
18(28.1%)
64 (100.0%) 0 (0.0%)

Generated by summarytools 1.0.0 (R version 4.0.2)
2022-02-09

2 grafico continuas

## Warning: Removed 7 rows containing non-finite values (stat_density).
## Warning: Removed 9 rows containing non-finite values (stat_density).
## Warning: Removed 43 rows containing non-finite values (stat_density).

3 tabla bivariante

4 validaciones del shothat

4.1 muestra y parametros previos

Se dispone de una muestra de 64 personas para shothat, 41 para klic_score. El tamaño necesario para una validación es un asunto controvertido. Harrell et al sugieren que al menos haya 100 casos (). Vergouwe et al. que, para un resultado binario, al menos, a estos 100 casos, se acompañe de 100 controles (). Estas recomendaciones se basaban en el tamaño muestral necesario para detectar una diferencia significativa entre las medidas de rendimiento detectadas y las pre-especificadas con un poder del 80% y 5% de nivel de significación (por ejemplo, asumiendo una diferencia de 0.1 en el estadistico C), y asumiendo que la prevalencia se mantenia constante.. Estudios recientes han abordado una relajacion de estas asunciones mediante simulacion (Riley, Pavlou)

Harrell FE, Lee KL and Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 15: 361–387.

Vergouwe Y, Steyerberg EW, Eijkemans MJC, et al. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol 2005; 58: 475–483.

Pavlou M, Qu C, Omar RZ, Seaman SR, Steyerberg EW, White IR, Ambler G. Estimation of required sample size for external validation of risk models for binary outcomes. Statistical Methods in Medical Research. 2021 Apr

Riley RD, Debray TP, Collins GS, Archer L, Ensor J, van Smeden M, Snell KI. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Statistics in Medicine. 2021 May 24.

## siguiendo el trabajo de Pavlou:

#The target is to calculate the size of the validation data so as to estimate the C-statistic, the Calibration Slope and the Calibration in the Large with sufficient precision. In this example, the required precision is reflected by a SE of the estimated C-statistic of at most 0.025, and SE of the estimated Calibration Slope and Calibration in the Large of at most 0.1.

# The anticipated values of the outcome prevalence and the C-statistic are p=0. and C=0.75, respectively.

sampsizeval(p=0.4, c=0.76, se_c=0.06, se_cs =0.2, se_cl = 0.15)
## $size_c_statistic
## [1] 65
## 
## $size_calibration_slope
## [1] 159
## 
## $size_calibration_large
## [1] 231
## 
## $size_recommended
## [1] 231

Se concluye que con 64 personas asumiendo una prevalencia (incidencia de fracaso) de 0.4, podriamos detectar un estadistico C de 0.76 con una precision de 0.06.

4.2 shohat

4.2.1 Performance Score and optimal cutoff

perfScores(preds, truth = datiscs$exito, namePos = 1)
## 
##      Performance Score(s)
## 
##                        Score     Value
## 1     area under curve (AUC) 0.6994872
## 2          Gini index (GINI) 0.3989744
## 3           Brier score (BS) 0.2148645
## 4 positive Brier score (PBS) 0.1468029
## 5 negative Brier score (NBS) 0.3210406
## 6 weighted Brier score (WBS) 0.2339218
## 7 balanced Brier score (BBS) 0.2339218
optCutoff(preds, truth =  datiscs$exito, namePos = 1)
## Optimal Cut-off             YJS 
##       0.7016923       0.3928205

Punto de corte optimo segun alguna métrica -a mirar- :en realidad hay que optar por un corte en funcion del “coste” de falso positivo/negativo.. es decir del uso que se espera y las consecuencias de los diversos errores. Para eso tambien podemos buscar gráficos.. da igual que trabajemos como aqui con la probabilidad predicha por el modelo o con un sistema de puntuación, esto es menos relevante

4.2.2 Confusion Matrix and Statistics (threshold=0.30/0.50/0.70)

confusionMatrix(data = factor(as.numeric(preds>0.70))   ,
                reference =factor(datiscs$exito), 
                dnn = c("Predicted", "Actual"),
                mode = "everything",
                positive = "1") 
## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  0  1
##         0 21 19
##         1  4 20
##                                          
##                Accuracy : 0.6406         
##                  95% CI : (0.511, 0.7568)
##     No Information Rate : 0.6094         
##     P-Value [Acc > NIR] : 0.353528       
##                                          
##                   Kappa : 0.3185         
##                                          
##  Mcnemar's Test P-Value : 0.003509       
##                                          
##             Sensitivity : 0.5128         
##             Specificity : 0.8400         
##          Pos Pred Value : 0.8333         
##          Neg Pred Value : 0.5250         
##               Precision : 0.8333         
##                  Recall : 0.5128         
##                      F1 : 0.6349         
##              Prevalence : 0.6094         
##          Detection Rate : 0.3125         
##    Detection Prevalence : 0.3750         
##       Balanced Accuracy : 0.6764         
##                                          
##        'Positive' Class : 1              
## 
confusionMatrix(data = factor(as.numeric(preds>0.50))   ,
                reference =factor(datiscs$exito), 
                dnn = c("Predicted", "Actual"),
                mode = "everything",
                positive = "1") 
## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  0  1
##         0 11  7
##         1 14 32
##                                           
##                Accuracy : 0.6719          
##                  95% CI : (0.5431, 0.7841)
##     No Information Rate : 0.6094          
##     P-Value [Acc > NIR] : 0.1855          
##                                           
##                   Kappa : 0.2743          
##                                           
##  Mcnemar's Test P-Value : 0.1904          
##                                           
##             Sensitivity : 0.8205          
##             Specificity : 0.4400          
##          Pos Pred Value : 0.6957          
##          Neg Pred Value : 0.6111          
##               Precision : 0.6957          
##                  Recall : 0.8205          
##                      F1 : 0.7529          
##              Prevalence : 0.6094          
##          Detection Rate : 0.5000          
##    Detection Prevalence : 0.7188          
##       Balanced Accuracy : 0.6303          
##                                           
##        'Positive' Class : 1               
## 
confusionMatrix(data = factor(as.numeric(preds>0.30))   ,
                reference =factor(datiscs$exito), 
                dnn = c("Predicted", "Actual"),
                mode = "everything",
                positive = "1") 
## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  0  1
##         0  1  0
##         1 24 39
##                                          
##                Accuracy : 0.625          
##                  95% CI : (0.4951, 0.743)
##     No Information Rate : 0.6094         
##     P-Value [Acc > NIR] : 0.4528         
##                                          
##                   Kappa : 0.0483         
##                                          
##  Mcnemar's Test P-Value : 2.668e-06      
##                                          
##             Sensitivity : 1.0000         
##             Specificity : 0.0400         
##          Pos Pred Value : 0.6190         
##          Neg Pred Value : 1.0000         
##               Precision : 0.6190         
##                  Recall : 1.0000         
##                      F1 : 0.7647         
##              Prevalence : 0.6094         
##          Detection Rate : 0.6094         
##    Detection Prevalence : 0.9844         
##       Balanced Accuracy : 0.5200         
##                                          
##        'Positive' Class : 1              
## 

### ROCs, Scoring classifiers

ROCRpred = prediction(preds, datiscs$exito)

perf <- performance(ROCRpred, "tpr", "fpr")
plot(perf, avg= "threshold", colorize=TRUE, lwd= 3,
     main= " ROC curves ...")
plot(perf, lty=3, col="grey78", add=TRUE)
abline(0,1)

perf <- performance(ROCRpred, "sens", "spec")
plot(perf, avg= "threshold", colorize=TRUE, lwd= 3,
     main="... Sensitivity/Specificity plots ...")
plot(perf, lty=3, col="grey78", add=TRUE)

# perf <- performance(ROCRpred, "prec", "rec")
# plot(perf, avg= "threshold", colorize=TRUE, lwd= 3,
#      main= "... Precision/Recall graphs (tpr/sens)->F1 metric")
# plot(perf, lty=3, col="grey78", add=TRUE)

Aqui con diferentes metricas y un codigo de color con los posibles puntos de corte

4.3 bondad de ajuste Hosmer Lemeshow, grafico calibración

 HLgof.test(fit = preds, obs = as.numeric(datiscs$exito))
## $C
## 
##  Hosmer-Lemeshow C statistic
## 
## data:  preds and as.numeric(datiscs$exito)
## X-squared = 15.676, df = 8, p-value = 0.04727
## 
## 
## $H
## 
##  Hosmer-Lemeshow H statistic
## 
## data:  preds and as.numeric(datiscs$exito)
## X-squared = 13.006, df = 8, p-value = 0.1116
plot(cfs, las=1)

## 
## n=64   Mean absolute error=0.047   Mean squared error=0.0032
## 0.9 Quantile of absolute error=0.094

Conclusion no hay buen ajuste-calibración (p<0.05)