Analisis exploratorio base de datos

library(readxl)
library(summarytools)

BASE <- read_excel("Anoscopia 2023-2025.xlsx")

print(dfSummary(BASE), method = 'render')

Data Frame Summary

BASE

Dimensions: 1145 x 9
Duplicates: 0
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 IDENTIFICACION [character]
1. 1
2. 1010191309
3. 1022987036
4. 1033256605
5. 1053805912
6. 1068974644
7. 1073519382
8. 21021388
9. 3275519
10. 51646616
[ 997 others ]
4(0.3%)
3(0.3%)
3(0.3%)
3(0.3%)
3(0.3%)
3(0.3%)
3(0.3%)
3(0.3%)
3(0.3%)
3(0.3%)
1114(97.3%)
1145 (100.0%) 0 (0.0%)
2 NOMBRE [character]
1. JUAN
2. JOSE
3. LUIS
4. CARLOS
5. DIEGO
6. JORGE
7. MARIA
8. JHON
9. DANIEL
10. OSCAR
[ 380 others ]
59(5.2%)
46(4.0%)
34(3.0%)
30(2.6%)
26(2.3%)
23(2.0%)
22(1.9%)
18(1.6%)
16(1.4%)
16(1.4%)
854(74.7%)
1144 (99.9%) 1 (0.1%)
3 APELLIDO [character]
1. RODRIGUEZ
2. RAMIREZ
3. GONZALEZ
4. SANCHEZ
5. GOMEZ
6. MARTINEZ
7. GARCIA
8. GUTIERREZ
9. HERRERA
10. DIAZ
[ 501 others ]
37(3.2%)
23(2.0%)
18(1.6%)
17(1.5%)
16(1.4%)
14(1.2%)
12(1.1%)
11(1.0%)
11(1.0%)
10(0.9%)
971(85.2%)
1140 (99.6%) 5 (0.4%)
4 FECHA CITA [POSIXct, POSIXt]
min : 2024-01-10 13:45:00
med : 2024-11-13 14:20:00
max : 2025-09-20
range : 1y 8m 9d 10H 15M 0S
1141 distinct values 1144 (99.9%) 1 (0.1%)
5 EDAD [numeric]
Mean (sd) : 42.4 (12.9)
min ≤ med ≤ max:
22 ≤ 42 ≤ 66
IQR (CV) : 24 (0.3)
22 distinct values 1145 (100.0%) 0 (0.0%)
6 TOMA DE BIOPSIA [character]
1. NO
2. SI
1(0.1%)
1144(99.9%)
1145 (100.0%) 0 (0.0%)
7 DIAGNÓSTICO ENDOSCOPICO [character]
1. CONDILOMAS
2. HSIL CUADRANTE ANTERIOR
3. HSIL LATERAL IZQUIERDA
4. HSIL POSTERIOR LSIL
5. LSIL
1(0.1%)
36(3.1%)
72(6.3%)
1(0.1%)
1034(90.4%)
1144 (99.9%) 1 (0.1%)
8 RESULTADO DE BIOPSIA [character]
1. NEOPLASIA ESCAMOSA INTRAE
2. LESION ESCAMOSA ANAL INTR
3. LESION ESCAMOSA INTRAEPIT
4. DERECHO LESION ESCAMOSA I
5. LESION ESCAMOSA INTRAEPIT
6. NEOPLASIA ESCAMOSA INTRAE
7. LESION ESCAMOSA INTRAEPIT
8. LSIL GRADO III (DISPLASIA
9. NEOPLASIA ESCAMOSA INTRAE
10. PARED ANTERIOR: NEOPLASIA
[ 6 others ]
635(55.5%)
107(9.4%)
106(9.3%)
105(9.2%)
94(8.2%)
16(1.4%)
15(1.3%)
15(1.3%)
15(1.3%)
15(1.3%)
21(1.8%)
1144 (99.9%) 1 (0.1%)
9 VIH [character]
1. NO
2. SI
275(24.0%)
870(76.0%)
1145 (100.0%) 0 (0.0%)

Generated by summarytools 1.1.4 (R version 4.4.2)
2025-10-13

Tabla 1

Se incluyen las variables edad, toma de biopsia, para la limieza de texto

Grado\sII detecta “Grado II” aunque haya uno o varios espacios. LSIL[ -]II detecta “LSIL-II”, “LSIL – II” o “LSIL II”. (?!I) evita que “Grado I” capture “Grado II”. perl = TRUE activa esa exclusión (lookahead negativo).

Cruce de variables

DIAGNOSTICO ENDOSCOPICO VIH - RESULTADO BIOPSIA

BASE_sinNA <- subset(BASE, !is.na(Rbiopsia))

print(ctable(
  x = BASE_sinNA$`DIAGNÓSTICO ENDOSCOPICO`,
  y = BASE_sinNA$Rbiopsia,
  prop = "r", 
  headings = FALSE), method = "render")
Rbiopsia
`DIAGNÓSTICO ENDOSCOPICO` Alto Grado Bajo Grado Total
HSIL CUADRANTE ANTERIOR 21 ( 58.3% ) 15 ( 41.7% ) 36 ( 100.0% )
HSIL LATERAL IZQUIERDA 37 ( 51.4% ) 35 ( 48.6% ) 72 ( 100.0% )
HSIL POSTERIOR LSIL 0 ( 0.0% ) 1 ( 100.0% ) 1 ( 100.0% )
LSIL 595 ( 57.5% ) 439 ( 42.5% ) 1034 ( 100.0% )
<NA> 0 ( 0.0% ) 1 ( 100.0% ) 1 ( 100.0% )
Total 653 ( 57.1% ) 491 ( 42.9% ) 1144 ( 100.0% )

Generated by summarytools 1.1.4 (R version 4.4.2)
2025-10-13

print(ctable(
  x = BASE_sinNA$VIH,
  y = BASE_sinNA$Rbiopsia,
  prop = "r",
   chisq = TRUE,
  OR    = TRUE,
  headings = FALSE),method = "render")
Rbiopsia
VIH Alto Grado Bajo Grado Total
SI 501 ( 57.7% ) 368 ( 42.3% ) 869 ( 100.0% )
NO 152 ( 55.3% ) 123 ( 44.7% ) 275 ( 100.0% )
Total 653 ( 57.1% ) 491 ( 42.9% ) 1144 ( 100.0% )
 Χ2 = 0.3906   df = 1   p = .5320
O.R. (95% C.I.) = 1.10  (0.84 - 1.45)

Generated by summarytools 1.1.4 (R version 4.4.2)
2025-10-13