Problemática de investigación

La inasistencia a citas médicas es un problema recurrente en los hospitales, que afecta la eficiencia en el uso de recursos y genera barreras en el acceso oportuno a la atención.
Pregunta de investigación: ¿Qué factores influyen en la asistencia a citas médicas y cómo impactan en la eficiencia hospitalaria?

#Cargar librerias#

Importar base de datos

## Rows: 111488 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): appointment_id, slot_id, status, patient_id, sex, age_group
## dbl  (4): scheduling_interval, appointment_duration, waiting_time, age
## date (2): scheduling_date, appointment_date
## time (4): appointment_time, check_in_time, start_time, end_time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Crear variable no_show

appointments <- appointments %>% mutate(no_show = ifelse(status == “did not attend”, 1, 0))


# Ver valores únicos de la columna 'status'
unique(appointments$status)

# Ver estructura de la base
str(appointments)

# Contar valores faltantes por columna
colSums(is.na(appointments))


# Exploración inicial

``` r
unique(appointments$status)

## [1] "did not attend" "attended"       "cancelled"      "unknown"       
## [5] "scheduled"

str(appointments)

## tibble [111,488 × 17] (S3: tbl_df/tbl/data.frame)
##  $ appointment_id      : chr [1:111488] "0000138" "0000146" "0000021" "0000233" ...
##  $ slot_id             : chr [1:111488] "0000001" "0000023" "0000024" "0000025" ...
##  $ scheduling_date     : Date[1:111488], format: "2014-12-28" "2014-12-29" ...
##  $ appointment_date    : Date[1:111488], format: "2015-01-01" "2015-01-01" ...
##  $ appointment_time    : 'hms' num [1:111488] 08:00:00 13:30:00 13:45:00 14:00:00 ...
##   ..- attr(*, "units")= chr "secs"
##  $ scheduling_interval : num [1:111488] 4 3 15 1 6 2 2 2 4 4 ...
##  $ status              : chr [1:111488] "did not attend" "did not attend" "attended" "attended" ...
##  $ check_in_time       : 'hms' num [1:111488] NA NA 13:36:45 13:59:32 ...
##   ..- attr(*, "units")= chr "secs"
##  $ appointment_duration: num [1:111488] NA NA 5.2 28.9 NA 7.7 4.2 27.1 NA 1.2 ...
##  $ start_time          : 'hms' num [1:111488] NA NA 13:37:57 14:00:40 ...
##   ..- attr(*, "units")= chr "secs"
##  $ end_time            : 'hms' num [1:111488] NA NA 13:43:09 14:29:34 ...
##   ..- attr(*, "units")= chr "secs"
##  $ waiting_time        : num [1:111488] NA NA 1.2 1.1 NA 21.7 16.2 1 NA 8.5 ...
##  $ patient_id          : chr [1:111488] "08285" "05972" "06472" "05376" ...
##  $ sex                 : chr [1:111488] "Male" "Male" "Male" "Female" ...
##  $ age                 : num [1:111488] 37 84 77 37 72 51 28 33 29 90 ...
##  $ age_group           : chr [1:111488] "35-39" "80-84" "75-79" "35-39" ...
##  $ no_show             : num [1:111488] 1 1 0 0 0 0 0 0 0 0 ...

colSums(is.na(appointments))

##       appointment_id              slot_id      scheduling_date 
##                    0                    0                    0 
##     appointment_date     appointment_time  scheduling_interval 
##                    0                    0                    0 
##               status        check_in_time appointment_duration 
##                    0                25456                25456 
##           start_time             end_time         waiting_time 
##                25456                25456                25456 
##           patient_id                  sex                  age 
##                    0                    0                    0 
##            age_group              no_show 
##                    0                    0

Ver estructura de la base

str(appointments)

Contar valores faltantes por columna

colSums(is.na(appointments))

#ESTADISTICAS DESCRIPTIVAS# par(mfrow=c(2,2))

barplot(table(appointments$sex), main="Distribución por sexo", col="lightgreen") barplot(table(appointments$age_group), main=“Distribución por grupos de edad”, col=“lightyellow”) barplot(table(appointments$status), main=“Estado de las citas médicas”, col=“lightpink”)

#Modelo matemático: Regresión logística binaria# appointments <- appointments %>% mutate(no_show = ifelse(status == “did not attend”, 1, 0))

modelo_logistico <- glm(no_show ~ age + sex + waiting_time, data = appointments, family = binomial)

modelo_lineal <- lm(waiting_time ~ age + sex, data = appointments) summary(modelo_lineal)

Resumen estadístico de todas las variables

summary(modelo_logistico) summary(modelo_lineal)

#Validación y testeo del modelo logístico#

par(mfrow=c(2,2)) plot(modelo_logistico)

Validación cruzada

set.seed(123) train_control <- trainControl(method=“cv”, number=5) modelo_cv <- train(no_show ~ age + sex + waiting_time, data=appointments, method=“glm”, family=“binomial”, trControl=train_control) print(modelo_cv)

#Modelo matemático: Regresión lineal múltiple#

appointments$sex <- as.factor(appointments$sex)

modelo_lineal <- lm(waiting_time ~ age + sex, data=appointments) summary(modelo_lineal)

#Comparación de modelos# comparacion <- data.frame( Modelo = c(“Regresión Logística Binaria”,“Regresión Lineal Múltiple”), Indicador = c(“AIC”,“R²”), Valor = c(AIC(modelo_logistico), round(summary(modelo_lineal)$r.squared, 4)) ) comparacion

#Indicadores de eficiencia hospitalaria#

Tasa global de inasistencia

tasa_inasistencia <- mean(appointments$no_show) tasa_inasistencia

Tasa de inasistencia por sexo

tasa_inasistencia_sexo <- appointments %>% group_by(sex) %>% summarise(inasistencia = mean(no_show)) tasa_inasistencia_sexo

#Gráficos de comportamiento y variabilidad#

Edad vs probabilidad de faltar

## `geom_smooth()` using formula = 'y ~ x'

Sexo vs variabilidad

## `geom_smooth()` using formula = 'y ~ x'

Actividad – Análisis de asistencia a citas médicas

Laura e Isabel

2026-06-23