1 tabla univariante

Data Frame Summary

datisc

Dimensions: 64 x 62
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

Valid

Missing

articulacion [factor]

1. CADERA

2. RODILLA

32	(	50.0%	)
32	(	50.0%	)

64 (100.0%)

0 (0.0%)

tipo_intervencion [factor]

1. PRIMARIA

2. REVISION

52	(	81.2%	)
12	(	18.8%	)

64 (100.0%)

0 (0.0%)

indicacion [factor]

1. Artrosis

2. Otros

44	(	68.8%	)
20	(	31.2%	)

64 (100.0%)

0 (0.0%)

cementada [factor]

1. NO

2. Cementado

29	(	45.3%	)
35	(	54.7%	)

64 (100.0%)

0 (0.0%)

polietileno [factor]

1. SI

2. NO

34	(	53.1%	)
30	(	46.9%	)

64 (100.0%)

0 (0.0%)

edad [numeric]

Mean (sd) : 66.4 (12.2)

min ≤ med ≤ max:

16 ≤ 68 ≤ 85

IQR (CV) : 17.5 (0.2)

35 distinct values

64 (100.0%)

0 (0.0%)

sexo [factor]

1. H

2. M

44	(	68.8%	)
20	(	31.2%	)

64 (100.0%)

0 (0.0%)

asa [factor]

1. I

2. II

3. III

4. IV

7	(	10.9%	)
34	(	53.1%	)
22	(	34.4%	)
1	(	1.6%	)

64 (100.0%)

0 (0.0%)

ecv [factor]

1. FALSE

2. TRUE

42	(	65.6%	)
22	(	34.4%	)

64 (100.0%)

0 (0.0%)

ete_previa [factor]

1. FALSE

2. TRUE

59	(	92.2%	)
5	(	7.8%	)

64 (100.0%)

0 (0.0%)

neumopatia [factor]

1. FALSE

2. TRUE

48	(	75.0%	)
16	(	25.0%	)

64 (100.0%)

0 (0.0%)

hta [factor]

1. FALSE

2. TRUE

25	(	39.1%	)
39	(	60.9%	)

64 (100.0%)

0 (0.0%)

psiquiatrica [factor]

1. FALSE

2. TRUE

62	(	96.9%	)
2	(	3.1%	)

64 (100.0%)

0 (0.0%)

diabetes [factor]

1. FALSE

2. TRUE

46	(	71.9%	)
18	(	28.1%	)

64 (100.0%)

0 (0.0%)

imc [numeric]

Mean (sd) : 31.7 (5.9)

min ≤ med ≤ max:

19 ≤ 32 ≤ 51

IQR (CV) : 7.2 (0.2)

25 distinct values

64 (100.0%)

0 (0.0%)

gastropatia [factor]

1. FALSE

2. TRUE

52	(	81.2%	)
12	(	18.8%	)

64 (100.0%)

0 (0.0%)

inmunodepresion [factor]

1. FALSE

2. TRUE

60	(	93.8%	)
4	(	6.2%	)

64 (100.0%)

0 (0.0%)

malignidad [factor]

1. FALSE

2. TRUE

54	(	84.4%	)
10	(	15.6%	)

64 (100.0%)

0 (0.0%)

marcapasos [factor]

1. FALSE

2. TRUE

63	(	98.4%	)
1	(	1.6%	)

64 (100.0%)

0 (0.0%)

renopatia [factor]

1. FALSE

2. TRUE

57	(	89.1%	)
7	(	10.9%	)

64 (100.0%)

0 (0.0%)

reumatismo [factor]

1. FALSE

2. TRUE

51	(	79.7%	)
13	(	20.3%	)

64 (100.0%)

0 (0.0%)

hepatopatia [factor]

1. FALSE

2. TRUE

62	(	96.9%	)
2	(	3.1%	)

64 (100.0%)

0 (0.0%)

iproprevia [factor]

1. FALSE

2. TRUE

63	(	98.4%	)
1	(	1.6%	)

64 (100.0%)

0 (0.0%)

vih [factor]

1. FALSE

(

100.0%

)

64 (100.0%)

0 (0.0%)

coagulopatia [factor]

1. FALSE

2. TRUE

54	(	84.4%	)
10	(	15.6%	)

64 (100.0%)

0 (0.0%)

anticoagulantes [factor]

1. FALSE

2. TRUE

56	(	87.5%	)
8	(	12.5%	)

64 (100.0%)

0 (0.0%)

antiagregacion [factor]

1. FALSE

2. TRUE

48	(	75.0%	)
16	(	25.0%	)

64 (100.0%)

0 (0.0%)

tabac2 [factor]

1. NO

2. SI

34	(	53.1%	)
30	(	46.9%	)

64 (100.0%)

0 (0.0%)

alcohol [factor]

1. FALSE

2. TRUE

51	(	79.7%	)
13	(	20.3%	)

64 (100.0%)

0 (0.0%)

seroma [factor]

1. FALSE

2. TRUE

27	(	42.2%	)
37	(	57.8%	)

64 (100.0%)

0 (0.0%)

hematoma [factor]

1. FALSE

2. TRUE

46	(	71.9%	)
18	(	28.1%	)

64 (100.0%)

0 (0.0%)

skin_infection [factor]

1. FALSE

2. TRUE

35	(	54.7%	)
29	(	45.3%	)

64 (100.0%)

0 (0.0%)

fistula [factor]

1. FALSE

2. TRUE

40	(	62.5%	)
24	(	37.5%	)

64 (100.0%)

0 (0.0%)

fiebre [factor]

1. FALSE

2. TRUE

42	(	65.6%	)
22	(	34.4%	)

64 (100.0%)

0 (0.0%)

tipo_ip [factor]

1. AGUDA

2. HEMATOGENA

55	(	85.9%	)
9	(	14.1%	)

64 (100.0%)

0 (0.0%)

diasadair [numeric]

Mean (sd) : 35.1 (17.6)

min ≤ med ≤ max:

13 ≤ 29 ≤ 87

IQR (CV) : 20.5 (0.5)

34 distinct values

55 (85.9%)

9 (14.1%)

dias_clinica [numeric]

Mean (sd) : 15.3 (13.7)

min ≤ med ≤ max:

2 ≤ 8 ≤ 43

IQR (CV) : 25 (0.9)

15 distinct values

21 (32.8%)

43 (67.2%)

pcr_sangre [numeric]

Mean (sd) : 10.8 (10.1)

min ≤ med ≤ max:

0.1 ≤ 7.3 ≤ 33

IQR (CV) : 13.2 (0.9)

56 distinct values

64 (100.0%)

0 (0.0%)

wbc_sangre [numeric]

Mean (sd) : 9.2 (4.1)

min ≤ med ≤ max:

3.2 ≤ 8.1 ≤ 24

IQR (CV) : 4.4 (0.4)

48 distinct values

64 (100.0%)

0 (0.0%)

hemocultivo [factor]

1. FALSE

2. TRUE

60	(	93.8%	)
4	(	6.2%	)

64 (100.0%)

0 (0.0%)

germen [factor]

1. CoNS

2. CoNS+ E.faecalis

3. CoNS+ Peptoniphilus spp.

4. CoNS+E. faecalis

5. CoNS+S. lugdunensis

6. Corynebacterium spp.

7. Cultivo negativo

8. E. faecalis

9. E. faecalis+ S. marcescen

10. K. pneumoniae+ CoNS

[ 14 others ]

17	(	26.6%	)
1	(	1.6%	)
1	(	1.6%	)
1	(	1.6%	)
1	(	1.6%	)
1	(	1.6%	)
4	(	6.2%	)
1	(	1.6%	)
1	(	1.6%	)
1	(	1.6%	)
35	(	54.7%	)

64 (100.0%)

0 (0.0%)

organism [factor]

1. CoNS

2. Corynebacterium spp.

3. Culture negative

4. E. faecalis

5. L. monocytogenes

6. P. acnes

7. P. aeruginosa

8. Polymicrobial

9. S. dysgalactiae

10. S. lugdunensis

[ 5 others ]

17	(	26.6%	)
1	(	1.6%	)
4	(	6.2%	)
1	(	1.6%	)
1	(	1.6%	)
2	(	3.1%	)
2	(	3.1%	)
10	(	15.6%	)
1	(	1.6%	)
2	(	3.1%	)
23	(	35.9%	)

64 (100.0%)

0 (0.0%)

cultivoporc [numeric]

Mean (sd) : 61.9 (34.8)

min ≤ med ≤ max:

0 ≤ 66.7 ≤ 100

IQR (CV) : 66.7 (0.6)

18 distinct values

64 (100.0%)

0 (0.0%)

resdair [factor]

1. EXITO

2. FRACASO

39	(	60.9%	)
25	(	39.1%	)

64 (100.0%)

0 (0.0%)

klic_score [numeric]

Mean (sd) : 2.1 (1.8)

min ≤ med ≤ max:

0 ≤ 2 ≤ 6.5

IQR (CV) : 3.2 (0.9)

10 distinct values

42 (65.6%)

22 (34.4%)

shohat [numeric]

Mean (sd) : 65.2 (9.9)

min ≤ med ≤ max:

44.2 ≤ 66.2 ≤ 86.9

IQR (CV) : 15.4 (0.2)

62 distinct values

64 (100.0%)

0 (0.0%)

team_main [factor]

1. FALSE

2. TRUE

6	(	9.4%	)
58	(	90.6%	)

64 (100.0%)

0 (0.0%)

er [factor]

1. FALSE

2. TRUE

53	(	82.8%	)
11	(	17.2%	)

64 (100.0%)

0 (0.0%)

exitus [factor]

1. 2016

2. 2017

3. 2018

4. 2021

1	(	25.0%	)
1	(	25.0%	)
1	(	25.0%	)
1	(	25.0%	)

4 (6.2%)

60 (93.8%)

edadmas70 [logical]

1. FALSE

2. TRUE

39	(	60.9%	)
25	(	39.1%	)

64 (100.0%)

0 (0.0%)

imc30 [logical]

1. FALSE

2. TRUE

23	(	35.9%	)
41	(	64.1%	)

64 (100.0%)

0 (0.0%)

asamas2 [logical]

1. FALSE

2. TRUE

41	(	64.1%	)
23	(	35.9%	)

64 (100.0%)

0 (0.0%)

pcrmas11 [logical]

1. FALSE

2. TRUE

45	(	70.3%	)
19	(	29.7%	)

64 (100.0%)

0 (0.0%)

diasdmas45 [logical]

1. FALSE

2. TRUE

42	(	76.4%	)
13	(	23.6%	)

55 (85.9%)

9 (14.1%)

diascmas12 [logical]

1. FALSE

2. TRUE

13	(	61.9%	)
8	(	38.1%	)

21 (32.8%)

43 (67.2%)

wbcmas10 [logical]

1. FALSE

2. TRUE

43	(	67.2%	)
21	(	32.8%	)

64 (100.0%)

0 (0.0%)

pcrmas3 [logical]

1. FALSE

2. TRUE

15	(	23.4%	)
49	(	76.6%	)

64 (100.0%)

0 (0.0%)

wbcmas6.8 [logical]

1. FALSE

2. TRUE

19	(	29.7%	)
45	(	70.3%	)

64 (100.0%)

0 (0.0%)

diasdmas25 [logical]

1. FALSE

2. TRUE

18	(	32.7%	)
37	(	67.3%	)

55 (85.9%)

9 (14.1%)

diascmas7 [logical]

1. FALSE

2. TRUE

8	(	38.1%	)
13	(	61.9%	)

21 (32.8%)

43 (67.2%)

dias_cd [numeric]

Mean (sd) : 31.5 (18.9)

min ≤ med ≤ max:

2 ≤ 27.5 ≤ 87

IQR (CV) : 21.5 (0.6)

39 distinct values

64 (100.0%)

0 (0.0%)

tabaco [factor]

1. FALSE

2. TRUE

3. VERDADERO EX

34	(	53.1%	)
12	(	18.8%	)
18	(	28.1%	)

64 (100.0%)

0 (0.0%)

Generated by summarytools 1.0.0 (R version 4.0.2)
2022-02-09

2 grafico continuas

## Warning: Removed 7 rows containing non-finite values (stat_density).

## Warning: Removed 9 rows containing non-finite values (stat_density).

## Warning: Removed 43 rows containing non-finite values (stat_density).

3 tabla bivariante

4 validaciones del shothat

4.1 muestra y parametros previos

Se dispone de una muestra de 64 personas para shothat, 41 para klic_score. El tamaño necesario para una validación es un asunto controvertido. Harrell et al sugieren que al menos haya 100 casos (). Vergouwe et al. que, para un resultado binario, al menos, a estos 100 casos, se acompañe de 100 controles (). Estas recomendaciones se basaban en el tamaño muestral necesario para detectar una diferencia significativa entre las medidas de rendimiento detectadas y las pre-especificadas con un poder del 80% y 5% de nivel de significación (por ejemplo, asumiendo una diferencia de 0.1 en el estadistico C), y asumiendo que la prevalencia se mantenia constante.. Estudios recientes han abordado una relajacion de estas asunciones mediante simulacion (Riley, Pavlou)

Harrell FE, Lee KL and Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 15: 361–387.

Vergouwe Y, Steyerberg EW, Eijkemans MJC, et al. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol 2005; 58: 475–483.

Pavlou M, Qu C, Omar RZ, Seaman SR, Steyerberg EW, White IR, Ambler G. Estimation of required sample size for external validation of risk models for binary outcomes. Statistical Methods in Medical Research. 2021 Apr

Riley RD, Debray TP, Collins GS, Archer L, Ensor J, van Smeden M, Snell KI. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Statistics in Medicine. 2021 May 24.

## siguiendo el trabajo de Pavlou:

#The target is to calculate the size of the validation data so as to estimate the C-statistic, the Calibration Slope and the Calibration in the Large with sufficient precision. In this example, the required precision is reflected by a SE of the estimated C-statistic of at most 0.025, and SE of the estimated Calibration Slope and Calibration in the Large of at most 0.1.

# The anticipated values of the outcome prevalence and the C-statistic are p=0. and C=0.75, respectively.

sampsizeval(p=0.4, c=0.76, se_c=0.06, se_cs =0.2, se_cl = 0.15)

## $size_c_statistic
## [1] 65
## 
## $size_calibration_slope
## [1] 159
## 
## $size_calibration_large
## [1] 231
## 
## $size_recommended
## [1] 231

Se concluye que con 64 personas asumiendo una prevalencia (incidencia de fracaso) de 0.4, podriamos detectar un estadistico C de 0.76 con una precision de 0.06.

4.2 shohat

4.2.1 Performance Score and optimal cutoff

perfScores(preds, truth = datiscs$exito, namePos = 1)

## 
##      Performance Score(s)
## 
##                        Score     Value
## 1     area under curve (AUC) 0.6994872
## 2          Gini index (GINI) 0.3989744
## 3           Brier score (BS) 0.2148645
## 4 positive Brier score (PBS) 0.1468029
## 5 negative Brier score (NBS) 0.3210406
## 6 weighted Brier score (WBS) 0.2339218
## 7 balanced Brier score (BBS) 0.2339218

optCutoff(preds, truth =  datiscs$exito, namePos = 1)

## Optimal Cut-off             YJS 
##       0.7016923       0.3928205

Punto de corte optimo segun alguna métrica -a mirar- :en realidad hay que optar por un corte en funcion del “coste” de falso positivo/negativo.. es decir del uso que se espera y las consecuencias de los diversos errores. Para eso tambien podemos buscar gráficos.. da igual que trabajemos como aqui con la probabilidad predicha por el modelo o con un sistema de puntuación, esto es menos relevante

4.2.2 Confusion Matrix and Statistics (threshold=0.30/0.50/0.70)

confusionMatrix(data = factor(as.numeric(preds>0.70))   ,
                reference =factor(datiscs$exito), 
                dnn = c("Predicted", "Actual"),
                mode = "everything",
                positive = "1")

## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  0  1
##         0 21 19
##         1  4 20
##                                          
##                Accuracy : 0.6406         
##                  95% CI : (0.511, 0.7568)
##     No Information Rate : 0.6094         
##     P-Value [Acc > NIR] : 0.353528       
##                                          
##                   Kappa : 0.3185         
##                                          
##  Mcnemar's Test P-Value : 0.003509       
##                                          
##             Sensitivity : 0.5128         
##             Specificity : 0.8400         
##          Pos Pred Value : 0.8333         
##          Neg Pred Value : 0.5250         
##               Precision : 0.8333         
##                  Recall : 0.5128         
##                      F1 : 0.6349         
##              Prevalence : 0.6094         
##          Detection Rate : 0.3125         
##    Detection Prevalence : 0.3750         
##       Balanced Accuracy : 0.6764         
##                                          
##        'Positive' Class : 1              
##

confusionMatrix(data = factor(as.numeric(preds>0.50))   ,
                reference =factor(datiscs$exito), 
                dnn = c("Predicted", "Actual"),
                mode = "everything",
                positive = "1")

## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  0  1
##         0 11  7
##         1 14 32
##                                           
##                Accuracy : 0.6719          
##                  95% CI : (0.5431, 0.7841)
##     No Information Rate : 0.6094          
##     P-Value [Acc > NIR] : 0.1855          
##                                           
##                   Kappa : 0.2743          
##                                           
##  Mcnemar's Test P-Value : 0.1904          
##                                           
##             Sensitivity : 0.8205          
##             Specificity : 0.4400          
##          Pos Pred Value : 0.6957          
##          Neg Pred Value : 0.6111          
##               Precision : 0.6957          
##                  Recall : 0.8205          
##                      F1 : 0.7529          
##              Prevalence : 0.6094          
##          Detection Rate : 0.5000          
##    Detection Prevalence : 0.7188          
##       Balanced Accuracy : 0.6303          
##                                           
##        'Positive' Class : 1               
##

confusionMatrix(data = factor(as.numeric(preds>0.30))   ,
                reference =factor(datiscs$exito), 
                dnn = c("Predicted", "Actual"),
                mode = "everything",
                positive = "1")

## Confusion Matrix and Statistics
## 
##          Actual
## Predicted  0  1
##         0  1  0
##         1 24 39
##                                          
##                Accuracy : 0.625          
##                  95% CI : (0.4951, 0.743)
##     No Information Rate : 0.6094         
##     P-Value [Acc > NIR] : 0.4528         
##                                          
##                   Kappa : 0.0483         
##                                          
##  Mcnemar's Test P-Value : 2.668e-06      
##                                          
##             Sensitivity : 1.0000         
##             Specificity : 0.0400         
##          Pos Pred Value : 0.6190         
##          Neg Pred Value : 1.0000         
##               Precision : 0.6190         
##                  Recall : 1.0000         
##                      F1 : 0.7647         
##              Prevalence : 0.6094         
##          Detection Rate : 0.6094         
##    Detection Prevalence : 0.9844         
##       Balanced Accuracy : 0.5200         
##                                          
##        'Positive' Class : 1              
##

### ROCs, Scoring classifiers

ROCRpred = prediction(preds, datiscs$exito)

perf <- performance(ROCRpred, "tpr", "fpr")
plot(perf, avg= "threshold", colorize=TRUE, lwd= 3,
     main= " ROC curves ...")
plot(perf, lty=3, col="grey78", add=TRUE)
abline(0,1)

perf <- performance(ROCRpred, "sens", "spec")
plot(perf, avg= "threshold", colorize=TRUE, lwd= 3,
     main="... Sensitivity/Specificity plots ...")
plot(perf, lty=3, col="grey78", add=TRUE)

# perf <- performance(ROCRpred, "prec", "rec")
# plot(perf, avg= "threshold", colorize=TRUE, lwd= 3,
#      main= "... Precision/Recall graphs (tpr/sens)->F1 metric")
# plot(perf, lty=3, col="grey78", add=TRUE)

Aqui con diferentes metricas y un codigo de color con los posibles puntos de corte

4.3 bondad de ajuste Hosmer Lemeshow, grafico calibración

 HLgof.test(fit = preds, obs = as.numeric(datiscs$exito))

## $C
## 
##  Hosmer-Lemeshow C statistic
## 
## data:  preds and as.numeric(datiscs$exito)
## X-squared = 15.676, df = 8, p-value = 0.04727
## 
## 
## $H
## 
##  Hosmer-Lemeshow H statistic
## 
## data:  preds and as.numeric(datiscs$exito)
## X-squared = 13.006, df = 8, p-value = 0.1116

plot(cfs, las=1)

## 
## n=64   Mean absolute error=0.047   Mean squared error=0.0032
## 0.9 Quantile of absolute error=0.094

Conclusion no hay buen ajuste-calibración (p<0.05)

Resultado DAIR

1 tabla univariante

Data Frame Summary

datisc

2 grafico continuas

3 tabla bivariante

4 validaciones del shothat

4.1 muestra y parametros previos

4.2 shohat

4.2.1 Performance Score and optimal cutoff

4.2.2 Confusion Matrix and Statistics (threshold=0.30/0.50/0.70)

4.3 bondad de ajuste Hosmer Lemeshow, grafico calibración