Librerías necesarias para el estudio

library(readxl)
library(ggplot2)
library(dplyr)

## 
## Adjuntando el paquete: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tibble)
library(base)
library(Amelia)

## Cargando paquete requerido: Rcpp

## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.3, built: 2024-11-07)
## ## Copyright (C) 2005-2025 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##

library(EnvStats)

## 
## Adjuntando el paquete: 'EnvStats'

## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm

## The following object is masked from 'package:base':
## 
##     print.default

library(nortest)
library(corrplot)

## corrplot 0.95 loaded

library(gridExtra)

## 
## Adjuntando el paquete: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(MASS)

## 
## Adjuntando el paquete: 'MASS'

## The following object is masked from 'package:EnvStats':
## 
##     boxcox

## The following object is masked from 'package:dplyr':
## 
##     select

library(corrplot)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ lubridate 1.9.4     ✔ stringr   1.5.1
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ gridExtra::combine() masks dplyr::combine()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ MASS::select()       masks dplyr::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tidyr)
library(GGally)

Datos horarios de radiación y variables meteorológicas para análisis de salud cutánea y edificaciones sostenibles en Chía, Cundinamarca (NASA POWER)

datos <- read.csv("C:/Users/Administrador/Downloads/DATASET_PROY_OFICIAL.csv",
                      sep=",",header=TRUE,
                      fileEncoding = "UTF-8")

Naturaleza de los datos

head(datos)

##   YEAR MO DY HR ALLSKY_SFC_UVB ALLSKY_SFC_LW_DWN PRECTOTCORR    PS
## 1 2025  1  1  0              0            370.20        0.48 84.53
## 2 2025  1  1  1              0            372.25        0.49 84.48
## 3 2025  1  1  2              0            363.10        0.53 84.44
## 4 2025  1  1  3              0            367.58        0.48 84.43
## 5 2025  1  1  4              0            366.67        0.37 84.43
## 6 2025  1  1  5              0            365.10        0.30 84.47

tail(datos)

##      YEAR MO DY HR ALLSKY_SFC_UVB ALLSKY_SFC_LW_DWN PRECTOTCORR   PS
## 5131 2025  8  2 18           -999              -999        -999 -999
## 5132 2025  8  2 19           -999              -999        -999 -999
## 5133 2025  8  2 20           -999              -999        -999 -999
## 5134 2025  8  2 21           -999              -999        -999 -999
## 5135 2025  8  2 22           -999              -999        -999 -999
## 5136 2025  8  2 23           -999              -999        -999 -999

dim(datos)

## [1] 5136    8

colnames(datos)

## [1] "YEAR"              "MO"                "DY"               
## [4] "HR"                "ALLSKY_SFC_UVB"    "ALLSKY_SFC_LW_DWN"
## [7] "PRECTOTCORR"       "PS"

summary(datos)

##       YEAR            MO              DY              HR       
##  Min.   :2025   Min.   :1.000   Min.   : 1.00   Min.   : 0.00  
##  1st Qu.:2025   1st Qu.:2.000   1st Qu.: 8.00   1st Qu.: 5.75  
##  Median :2025   Median :4.000   Median :15.50   Median :11.50  
##  Mean   :2025   Mean   :4.056   Mean   :15.53   Mean   :11.50  
##  3rd Qu.:2025   3rd Qu.:6.000   3rd Qu.:23.00   3rd Qu.:17.25  
##  Max.   :2025   Max.   :8.000   Max.   :31.00   Max.   :23.00  
##  ALLSKY_SFC_UVB    ALLSKY_SFC_LW_DWN  PRECTOTCORR             PS         
##  Min.   :-999.00   Min.   :-999.0    Min.   :-999.000   Min.   :-999.00  
##  1st Qu.:-999.00   1st Qu.:-999.0    1st Qu.:   0.550   1st Qu.:  84.39  
##  Median :   0.00   Median : 366.5    Median :   2.420   Median :  84.54  
##  Mean   :-443.27   Mean   :-227.9    Mean   :  -6.914   Mean   :  69.21  
##  3rd Qu.:   0.01   3rd Qu.: 389.4    3rd Qu.:   6.970   3rd Qu.:  84.65  
##  Max.   :   2.21   Max.   : 436.5    Max.   : 455.330   Max.   :  84.93

YEAR → Año del registro.

MO → Mes del registro.

DY → Día del registro.

HR → Hora del registro (horario local o UTC).

ALLSKY_SFC_UVB → Irradiancia UVB en superficie bajo cielo total en W/m².

ALLSKY_SFC_LW_DWN → Irradiancia de onda larga descendente en superficie bajo cielo total en W/m².

PRECTOTCORR → Precipitación total corregida en mm/h.

PS → Presión superficial en kPa.

Analisis de los datos

str(datos)

## 'data.frame':    5136 obs. of  8 variables:
##  $ YEAR             : int  2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 ...
##  $ MO               : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DY               : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ HR               : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ ALLSKY_SFC_UVB   : num  0 0 0 0 0 0 0.01 0.14 0.56 1.16 ...
##  $ ALLSKY_SFC_LW_DWN: num  370 372 363 368 367 ...
##  $ PRECTOTCORR      : num  0.48 0.49 0.53 0.48 0.37 0.3 0.27 1.45 1.13 0.85 ...
##  $ PS               : num  84.5 84.5 84.4 84.4 84.4 ...

table(datos$YEAR)

## 
## 2025 
## 5136

Con esta base de datos Analizar la exposición a radiación UVB y las condiciones meteorológicas en Chía para evaluarriesgos para la salud cutánea de la población y derivar recomendaciones de diseño para edificios sostenibles que reduzcan la exposición nociva al sol.

Variables categóricas:

table(datos$YEAR)

## 
## 2025 
## 5136

table(datos$MO)

## 
##   1   2   3   4   5   6   7   8 
## 744 672 744 720 744 720 744  48

table(datos$DY)

## 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
## 192 192 168 168 168 168 168 168 168 168 168 168 168 168 168 168 168 168 168 168 
##  21  22  23  24  25  26  27  28  29  30  31 
## 168 168 168 168 168 168 168 168 144 144  96

barplot(table(datos$ALLSKY_SFC_UVB),las=2)

barplot(table(datos$ALLSKY_SFC_LW_DWN),las=2)

barplot(table(datos$PRECTOTCORR),las=2)

barplot(table(datos$PS),las=2)

Datos faltantes

is.na(datos[1:8,])

##    YEAR    MO    DY    HR ALLSKY_SFC_UVB ALLSKY_SFC_LW_DWN PRECTOTCORR    PS
## 1 FALSE FALSE FALSE FALSE          FALSE             FALSE       FALSE FALSE
## 2 FALSE FALSE FALSE FALSE          FALSE             FALSE       FALSE FALSE
## 3 FALSE FALSE FALSE FALSE          FALSE             FALSE       FALSE FALSE
## 4 FALSE FALSE FALSE FALSE          FALSE             FALSE       FALSE FALSE
## 5 FALSE FALSE FALSE FALSE          FALSE             FALSE       FALSE FALSE
## 6 FALSE FALSE FALSE FALSE          FALSE             FALSE       FALSE FALSE
## 7 FALSE FALSE FALSE FALSE          FALSE             FALSE       FALSE FALSE
## 8 FALSE FALSE FALSE FALSE          FALSE             FALSE       FALSE FALSE

suppressWarnings(missmap(datos))

No hay datos faltantes en esta tabla

Al haber datos negativos, siendo algo imposible en términos físicos, vamos a quitar esos valores.

datos_limpios <- datos %>%
  filter(PS >= 0)
summary(datos_limpios$PS)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   82.96   84.40   84.54   84.40   84.65   84.93

datos_limpios <- datos %>%
  filter(ALLSKY_SFC_UVB >= 0)
summary(datos_limpios$ALLSKY_SFC_UVB)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.3831  0.6200  2.2100

datos_limpios <- datos %>%
  filter(ALLSKY_SFC_LW_DWN >= 0)
summary(datos_limpios$ALLSKY_SFC_LW_DWN)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   309.0   375.6   387.2   387.6   400.9   436.5

dim(datos_limpios)

## [1] 2856    8

Disminuye la dimensión

Verificamos normalidad:

safe_ks_test <- function(x) {
  if(length(unique(x)) < 2) return(NA)  
  if(sd(x) == 0) return(NA)             
  
  x_scaled <- scale(x)
  suppressWarnings({
    tryCatch({
      ks.test(x_scaled, "pnorm", mean = 0, sd = 1)$p.value
    }, error = function(e) NA)
  })
}
ks_resultados <- sapply(datos_limpios, safe_ks_test)
ks_tabla <- data.frame(
  Variable = names(ks_resultados),
  KS_pvalue = round(ks_resultados, 4),
  Normalidad = ifelse(ks_resultados > 0.05, "Normal", "No normal"),
  row.names = NULL
)
print(ks_tabla)

##            Variable KS_pvalue Normalidad
## 1              YEAR        NA       <NA>
## 2                MO     0.000  No normal
## 3                DY     0.000  No normal
## 4                HR     0.000  No normal
## 5    ALLSKY_SFC_UVB     0.000  No normal
## 6 ALLSKY_SFC_LW_DWN     0.020  No normal
## 7       PRECTOTCORR     0.000  No normal
## 8                PS     0.008  No normal

Identificamos si hay variables problemáticas

variables_problematicas <- names(which(is.na(ks_resultados)))
if(length(variables_problematicas) > 0) {
  cat("\nVariables con algun problema técnico):\n")
  print(variables_problematicas)
}

## 
## Variables con algun problema técnico):
## [1] "YEAR"

Siendo que Año solo lleva un dato constante (2025), no sea relevante en el estudio y dar NO normal. Vemos que ninguno de nuestros datos son normales.

Visualización boxplot para cada variable numérica:

datos_limpios %>%
  pivot_longer(cols = where(is.numeric), names_to = "Variable", values_to = "Valor") %>%
  ggplot(aes(x = Variable, y = Valor)) +
  geom_boxplot(outlier.colour = "red", fill = "lightblue") +
  theme_minimal() +
  coord_flip() +
  labs(title = "Detección visual de outliers", x = "Variable", y = "Valor")

Detectamos e imputamos los datos atípicos

datos_dect <- datos_limpios
outliers_rosner <- list()
for (var in names(datos_dect)) {
  x <- datos_dect[[var]]
  
  # Verificar si la variable es numérica
  if (!is.numeric(x)) {
    cat("\nVariable", var, "no es numérica. Se omite.\n")
    next  # Saltar a la siguiente variable
  }
  
  # Calcular k_max seguro (evitar k=0 o k > n)
  n <- length(x)
  k_max <- min(10, floor(n * 0.1))
  if (k_max < 1) {
    cat("\nVariable", var, "tiene muy pocos datos. Se omite.\n")
    next
  }
  
  
  if (length(unique(x)) == 1) {
    cat("\nVariable", var, "es constante. No hay outliers.\n")
    next
  }
  
  ros <- tryCatch(
    expr = {
      rosnerTest(x, k = k_max)
    },
    error = function(e) {
      cat("\nError en", var, ":", e$message, "\n")
      NULL
    }
  )
  
  if (!is.null(ros)) {
    outliers_rosner[[var]] <- ros
    cat("\n=== Variable:", var, "===\n")
    print(ros)
  }
}

## 
## Variable YEAR es constante. No hay outliers.
## 
## === Variable: MO ===
## 
## Results of Outlier Test
## -------------------------
## 
## Test Method:                     Rosner's Test for Outliers
## 
## Hypothesized Distribution:       Normal
## 
## Data:                            x
## 
## Sample Size:                     2856
## 
## Test Statistics:                 R.1  = 1.348239
##                                  R.2  = 1.348905
##                                  R.3  = 1.349572
##                                  R.4  = 1.350239
##                                  R.5  = 1.350908
##                                  R.6  = 1.351577
##                                  R.7  = 1.352248
##                                  R.8  = 1.352920
##                                  R.9  = 1.353592
##                                  R.10 = 1.354266
## 
## Test Statistic Parameter:        k = 10
## 
## Alternative Hypothesis:          Up to 10 observations are not
##                                  from the same Distribution.
## 
## Type I Error:                    5%
## 
## Number of Outliers Detected:     0
## 
##    i   Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 2.487395 1.121911     4    2161 1.348239   4.287968   FALSE
## 2  1 2.486865 1.121751     4    2162 1.348905   4.287888   FALSE
## 3  2 2.486335 1.121589     4    2163 1.349572   4.287809   FALSE
## 4  3 2.485804 1.121428     4    2164 1.350239   4.287729   FALSE
## 5  4 2.485273 1.121266     4    2165 1.350908   4.287649   FALSE
## 6  5 2.484742 1.121103     4    2166 1.351577   4.287570   FALSE
## 7  6 2.484211 1.120940     4    2167 1.352248   4.287490   FALSE
## 8  7 2.483678 1.120777     4    2168 1.352920   4.287410   FALSE
## 9  8 2.483146 1.120614     4    2169 1.353592   4.287330   FALSE
## 10 9 2.482613 1.120450     4    2170 1.354266   4.287250   FALSE
## 
## 
## 
## === Variable: DY ===
## 
## Results of Outlier Test
## -------------------------
## 
## Test Method:                     Rosner's Test for Outliers
## 
## Hypothesized Distribution:       Normal
## 
## Data:                            x
## 
## Sample Size:                     2856
## 
## Test Statistics:                 R.1  = 1.806546
##                                  R.2  = 1.807896
##                                  R.3  = 1.809249
##                                  R.4  = 1.810605
##                                  R.5  = 1.811965
##                                  R.6  = 1.813327
##                                  R.7  = 1.814693
##                                  R.8  = 1.816061
##                                  R.9  = 1.817433
##                                  R.10 = 1.818808
## 
## Test Statistic Parameter:        k = 10
## 
## Alternative Hypothesis:          Up to 10 observations are not
##                                  from the same Distribution.
## 
## Type I Error:                    5%
## 
## Number of Outliers Detected:     0
## 
##    i   Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 15.40336 8.633404    31     721 1.806546   4.287968   FALSE
## 2  1 15.39790 8.629978    31     722 1.807896   4.287888   FALSE
## 3  2 15.39243 8.626545    31     723 1.809249   4.287809   FALSE
## 4  3 15.38696 8.623104    31     724 1.810605   4.287729   FALSE
## 5  4 15.38149 8.619656    31     725 1.811965   4.287649   FALSE
## 6  5 15.37601 8.616201    31     726 1.813327   4.287570   FALSE
## 7  6 15.37053 8.612738    31     727 1.814693   4.287490   FALSE
## 8  7 15.36504 8.609269    31     728 1.816061   4.287410   FALSE
## 9  8 15.35955 8.605792    31     729 1.817433   4.287330   FALSE
## 10 9 15.35406 8.602307    31     730 1.818808   4.287250   FALSE
## 
## 
## 
## === Variable: HR ===
## 
## Results of Outlier Test
## -------------------------
## 
## Test Method:                     Rosner's Test for Outliers
## 
## Hypothesized Distribution:       Normal
## 
## Data:                            x
## 
## Sample Size:                     2856
## 
## Test Statistics:                 R.1  = 1.661034
##                                  R.2  = 1.662128
##                                  R.3  = 1.663225
##                                  R.4  = 1.664324
##                                  R.5  = 1.665424
##                                  R.6  = 1.666528
##                                  R.7  = 1.667633
##                                  R.8  = 1.668740
##                                  R.9  = 1.669850
##                                  R.10 = 1.670962
## 
## Test Statistic Parameter:        k = 10
## 
## Alternative Hypothesis:          Up to 10 observations are not
##                                  from the same Distribution.
## 
## Type I Error:                    5%
## 
## Number of Outliers Detected:     0
## 
##    i   Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 11.50000 6.923399     0       1 1.661034   4.287968   FALSE
## 2  1 11.50403 6.921264     0      25 1.662128   4.287888   FALSE
## 3  2 11.50806 6.919124     0      49 1.663225   4.287809   FALSE
## 4  3 11.51209 6.916980     0      73 1.664324   4.287729   FALSE
## 5  4 11.51613 6.914831     0      97 1.665424   4.287649   FALSE
## 6  5 11.52017 6.912678     0     121 1.666528   4.287570   FALSE
## 7  6 11.52421 6.910520     0     145 1.667633   4.287490   FALSE
## 8  7 11.52826 6.908358     0     169 1.668740   4.287410   FALSE
## 9  8 11.53230 6.906191     0     193 1.669850   4.287330   FALSE
## 10 9 11.53635 6.904020     0     217 1.670962   4.287250   FALSE
## 
## 
## 
## === Variable: ALLSKY_SFC_UVB ===
## 
## Results of Outlier Test
## -------------------------
## 
## Test Method:                     Rosner's Test for Outliers
## 
## Hypothesized Distribution:       Normal
## 
## Data:                            x
## 
## Sample Size:                     2856
## 
## Test Statistics:                 R.1  = 3.117333
##                                  R.2  = 3.106111
##                                  R.3  = 3.060574
##                                  R.4  = 3.049009
##                                  R.5  = 3.054527
##                                  R.6  = 3.025694
##                                  R.7  = 2.996667
##                                  R.8  = 2.967449
##                                  R.9  = 2.972570
##                                  R.10 = 2.873994
## 
## Test Statistic Parameter:        k = 10
## 
## Alternative Hypothesis:          Up to 10 observations are not
##                                  from the same Distribution.
## 
## Type I Error:                    5%
## 
## Number of Outliers Detected:     0
## 
##    i    Mean.i      SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 0.3831057 0.5860440  2.21    2460 3.117333   4.287968   FALSE
## 2  1 0.3824658 0.5851479  2.20    2317 3.106111   4.287888   FALSE
## 3  2 0.3818290 0.5842600  2.17    2436 3.060574   4.287809   FALSE
## 4  3 0.3812022 0.5834020  2.16     421 3.049009   4.287729   FALSE
## 5  4 0.3805785 0.5825522  2.16    1956 3.054527   4.287649   FALSE
## 6  5 0.3799544 0.5816999  2.14     420 3.025694   4.287570   FALSE
## 7  6 0.3793368 0.5808665  2.12     396 2.996667   4.287490   FALSE
## 8  7 0.3787259 0.5800518  2.10     397 2.967449   4.287410   FALSE
## 9  8 0.3781215 0.5792557  2.10     444 2.972570   4.287330   FALSE
## 10 9 0.3775167 0.5784574  2.04     348 2.873994   4.287250   FALSE
## 
## 
## 
## === Variable: ALLSKY_SFC_LW_DWN ===
## 
## Results of Outlier Test
## -------------------------
## 
## Test Method:                     Rosner's Test for Outliers
## 
## Hypothesized Distribution:       Normal
## 
## Data:                            x
## 
## Sample Size:                     2856
## 
## Test Statistics:                 R.1  = 4.294229
##                                  R.2  = 3.983582
##                                  R.3  = 3.872947
##                                  R.4  = 3.758889
##                                  R.5  = 3.625438
##                                  R.6  = 3.622852
##                                  R.7  = 3.612468
##                                  R.8  = 3.589756
##                                  R.9  = 3.484462
##                                  R.10 = 3.466326
## 
## Test Statistic Parameter:        k = 10
## 
## Alternative Hypothesis:          Up to 10 observations are not
##                                  from the same Distribution.
## 
## Type I Error:                    5%
## 
## Number of Outliers Detected:     1
## 
##    i   Mean.i     SD.i  Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 387.6236 18.31380 308.98     512 4.294229   4.287968    TRUE
## 2  1 387.6512 18.25773 314.92     530 3.983582   4.287888   FALSE
## 3  2 387.6767 18.21008 317.15     536 3.872947   4.287809   FALSE
## 4  3 387.7014 18.16531 319.42     464 3.758889   4.287729   FALSE
## 5  4 387.7253 18.12342 322.02     532 3.625438   4.287649   FALSE
## 6  5 387.7484 18.08475 322.23     511 3.622852   4.287570   FALSE
## 7  6 387.7714 18.04621 322.58     533 3.612468   4.287490   FALSE
## 8  7 387.7943 18.00798 323.15     535 3.589756   4.287410   FALSE
## 9  8 387.8169 17.97034 325.20     529 3.484462   4.287330   FALSE
## 10 9 387.8389 17.93511 325.67     416 3.466326   4.287250   FALSE
## 
## 
## 
## === Variable: PRECTOTCORR ===
## 
## Results of Outlier Test
## -------------------------
## 
## Test Method:                     Rosner's Test for Outliers
## 
## Hypothesized Distribution:       Normal
## 
## Data:                            x
## 
## Sample Size:                     2856
## 
## Test Statistics:                 R.1  = 16.27643
##                                  R.2  = 16.64534
##                                  R.3  = 16.62721
##                                  R.4  = 16.80364
##                                  R.5  = 16.17348
##                                  R.6  = 16.61217
##                                  R.7  = 16.35847
##                                  R.8  = 15.61968
##                                  R.9  = 16.28419
##                                  R.10 = 16.80536
## 
## Test Statistic Parameter:        k = 10
## 
## Alternative Hypothesis:          Up to 10 observations are not
##                                  from the same Distribution.
## 
## Type I Error:                    5%
## 
## Number of Outliers Detected:     10
## 
##    i   Mean.i     SD.i  Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 7.648169 27.50491 455.33    2660 16.27643   4.287968    TRUE
## 2  1 7.491363 26.20185 443.63    2661 16.64534   4.287888    TRUE
## 3  2 7.338546 24.90143 421.38    2659 16.62721   4.287809    TRUE
## 4  3 7.193421 23.66789 404.90    2662 16.80364   4.287729    TRUE
## 5  4 7.053973 22.46925 370.46    2663 16.17348   4.287649    TRUE
## 6  5 6.926506 21.41704 362.71    2658 16.61217   4.287570    TRUE
## 7  6 6.801670 20.35694 339.81    2664 16.35847   4.287490    TRUE
## 8  7 6.684784 19.38038 309.40    2665 15.61968   4.287410    TRUE
## 9  8 6.578494 18.53463 308.40    2670 16.28419   4.287330    TRUE
## 10 9 6.472480 17.65315 303.14    2671 16.80536   4.287250    TRUE
## 
## 
## 
## === Variable: PS ===
## 
## Results of Outlier Test
## -------------------------
## 
## Test Method:                     Rosner's Test for Outliers
## 
## Hypothesized Distribution:       Normal
## 
## Data:                            x
## 
## Sample Size:                     2856
## 
## Test Statistics:                 R.1  = 3.050053
##                                  R.2  = 3.055570
##                                  R.3  = 2.919352
##                                  R.4  = 2.853259
##                                  R.5  = 2.857842
##                                  R.6  = 2.791292
##                                  R.7  = 2.795607
##                                  R.8  = 2.799941
##                                  R.9  = 2.804297
##                                  R.10 = 2.768189
## 
## Test Statistic Parameter:        k = 10
## 
## Alternative Hypothesis:          Up to 10 observations are not
##                                  from the same Distribution.
## 
## Type I Error:                    5%
## 
## Number of Outliers Detected:     0
## 
##    i   Mean.i      SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 84.54155 0.1414897 84.11     760 3.050053   4.287968   FALSE
## 2  1 84.54170 0.1412837 84.11     784 3.055570   4.287888   FALSE
## 3  2 84.54185 0.1410770 84.13     759 2.919352   4.287809   FALSE
## 4  3 84.54200 0.1408908 84.14     785 2.853259   4.287729   FALSE
## 5  4 84.54214 0.1407142 84.14    1000 2.857842   4.287649   FALSE
## 6  5 84.54228 0.1405370 84.15     761 2.791292   4.287570   FALSE
## 7  6 84.54242 0.1403694 84.15     783 2.795607   4.287490   FALSE
## 8  7 84.54256 0.1402012 84.15     808 2.799941   4.287410   FALSE
## 9  8 84.54269 0.1400327 84.15    1001 2.804297   4.287330   FALSE
## 10 9 84.54283 0.1398636 84.93    1834 2.768189   4.287250   FALSE

for (var in names(outliers_rosner)) {
  
  if (!is.null(outliers_rosner[[var]]) && 
      "all.stats" %in% names(outliers_rosner[[var]])) {
    
    ros_stats <- outliers_rosner[[var]]$all.stats
    filas_out <- ros_stats$Obs.Num[ros_stats$Outlier]  
    
    
    if (length(filas_out) > 0 && all(filas_out %in% 1:nrow(datos_dect))) {
      datos_dect[filas_out, var] <- NA
    }
  }
}

datos_imput <- datos_dect  #

for (var in names(datos_imput)) {
  if (is.numeric(datos_imput[[var]])) {  #
    na_count <- sum(is.na(datos_imput[[var]]))
    
    if (na_count > 0) {
      mediana_val <- median(datos_imput[[var]], na.rm = TRUE)
      datos_imput[[var]][is.na(datos_imput[[var]])] <- mediana_val
      cat(sprintf("Imputados %d outliers en %s con mediana = %.2f\n", 
                  na_count, var, mediana_val))
    }
  }
}

## Imputados 1 outliers en ALLSKY_SFC_LW_DWN con mediana = 387.23
## Imputados 10 outliers en PRECTOTCORR con mediana = 2.09

cat("\nResumen de NAs después del reemplazo:\n")

## 
## Resumen de NAs después del reemplazo:

print(colSums(is.na(datos_dect)))

##              YEAR                MO                DY                HR 
##                 0                 0                 0                 0 
##    ALLSKY_SFC_UVB ALLSKY_SFC_LW_DWN       PRECTOTCORR                PS 
##                 0                 1                10                 0

cat("\nResumen estadístico después de imputación:\n")

## 
## Resumen estadístico después de imputación:

print(summary(datos_imput))

##       YEAR            MO              DY             HR        ALLSKY_SFC_UVB  
##  Min.   :2025   Min.   :1.000   Min.   : 1.0   Min.   : 0.00   Min.   :0.0000  
##  1st Qu.:2025   1st Qu.:1.000   1st Qu.: 8.0   1st Qu.: 5.75   1st Qu.:0.0000  
##  Median :2025   Median :3.000   Median :15.0   Median :11.50   Median :0.0000  
##  Mean   :2025   Mean   :2.487   Mean   :15.4   Mean   :11.50   Mean   :0.3831  
##  3rd Qu.:2025   3rd Qu.:3.000   3rd Qu.:23.0   3rd Qu.:17.25   3rd Qu.:0.6200  
##  Max.   :2025   Max.   :4.000   Max.   :31.0   Max.   :23.00   Max.   :2.2100  
##  ALLSKY_SFC_LW_DWN  PRECTOTCORR            PS       
##  Min.   :314.9     Min.   :  0.000   Min.   :84.11  
##  1st Qu.:375.6     1st Qu.:  0.450   1st Qu.:84.44  
##  Median :387.2     Median :  2.095   Median :84.54  
##  Mean   :387.7     Mean   :  6.353   Mean   :84.54  
##  3rd Qu.:400.9     3rd Qu.:  6.683   3rd Qu.:84.64  
##  Max.   :436.5     Max.   :293.580   Max.   :84.93

datos_imput %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Valor") %>%
  ggplot(aes(x = Variable, y = Valor)) +
  geom_boxplot(outlier.colour = "red", fill = "lightblue", alpha = 0.7) +
  stat_summary(fun = median, geom = "point", shape = 18, size = 3, color = "blue") +
  theme_minimal() +
  coord_flip() +
  labs(title = "Distribución después de imputación de outliers",
       subtitle = "Puntos azules = medianas imputadas",
       x = "Variable",
       y = "Valor") +
  theme(plot.title = element_text(face = "bold"))

Ya todo limpio

datos_oficiales <- datos_imput

Con nuestro dataset limpio, empezamos nuestro análisis descriptibo y las visualizaciones de estas relaciones para sacar conclusiones.

summary(datos_oficiales)

##       YEAR            MO              DY             HR        ALLSKY_SFC_UVB  
##  Min.   :2025   Min.   :1.000   Min.   : 1.0   Min.   : 0.00   Min.   :0.0000  
##  1st Qu.:2025   1st Qu.:1.000   1st Qu.: 8.0   1st Qu.: 5.75   1st Qu.:0.0000  
##  Median :2025   Median :3.000   Median :15.0   Median :11.50   Median :0.0000  
##  Mean   :2025   Mean   :2.487   Mean   :15.4   Mean   :11.50   Mean   :0.3831  
##  3rd Qu.:2025   3rd Qu.:3.000   3rd Qu.:23.0   3rd Qu.:17.25   3rd Qu.:0.6200  
##  Max.   :2025   Max.   :4.000   Max.   :31.0   Max.   :23.00   Max.   :2.2100  
##  ALLSKY_SFC_LW_DWN  PRECTOTCORR            PS       
##  Min.   :314.9     Min.   :  0.000   Min.   :84.11  
##  1st Qu.:375.6     1st Qu.:  0.450   1st Qu.:84.44  
##  Median :387.2     Median :  2.095   Median :84.54  
##  Mean   :387.7     Mean   :  6.353   Mean   :84.54  
##  3rd Qu.:400.9     3rd Qu.:  6.683   3rd Qu.:84.64  
##  Max.   :436.5     Max.   :293.580   Max.   :84.93

Análisis y visualización

Boxplot irradiancia UVB por mes

ggplot(datos_oficiales, aes(x = MO, y = ALLSKY_SFC_UVB)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Irradiancia UVB por mes", x = "Mes", y = "ALLSKY_SFC_UVB (W/m²)") +
  theme_minimal()

## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

La altura de las cajas indica la dispersión de la irradiancia UVB. Al no tener una caja visible, da a entender que todos los valores son muy similares, es decir, poca variación).

Gráfico de violín para PS por mes:

ggplot(datos_oficiales, aes(x = PS, y = ALLSKY_SFC_UVB)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "loess", color = "blue") +
  labs(title = "Relación entre Presión Superficial y UVB", x = "PS (kPa)", y = "ALLSKY_SFC_UVB (W/m²)") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

Mediante esta gráfica los datos muestran que, a medida que la presión superficial aumenta ligeramente de 84.25 a 84.75 kPa, la radiación UVB no sigue un patrón específico pues vemos que primero está en 0.5, luego sube a 1.0, vuelve a 0.5 y cae a 0.0, lo que indicar que la presión superficial no es el único factor que influye en la radiación UVB, pudiendo ser las nubes, el ozono de la atmósfera o hora del día en que fue tomado el dato y que puede afectar la medición.

Y vemos que el valor máximo que tomó la variable UVB es de 1.0 W/m² y ocurre en una presión de 84.50 kPa, lo que nos puede decir que estas radiaciones UVB pueden llegar a ser tan altas en presiones igual de altas, por lo cual nos puede dar indicios de correlación.

Por ello, hacemos una matriz de correlación para verificar e identificar cuánta variables se relacionan entre sí:

ggpairs(datos_oficiales, progress = FALSE)

## Warning in cor(x, y): La desviación estándar es cero
## Warning in cor(x, y): La desviación estándar es cero
## Warning in cor(x, y): La desviación estándar es cero
## Warning in cor(x, y): La desviación estándar es cero
## Warning in cor(x, y): La desviación estándar es cero
## Warning in cor(x, y): La desviación estándar es cero
## Warning in cor(x, y): La desviación estándar es cero

Vemos una correlación entre ALLSKY_SPC_LVB y PRECTOTCORR es negativa pero significativa pues mayor precipitación, menor radiación UVB.

En los rayos UVB y la presión hay correlación negativa moderada porque la presión superficial más alta podría asociarse con menor UVB, como haber cielos despejados con alta presión.

Con las precipitaciones y la presión tiene una correlación negativa fuerte pues a mayor presión se relaciona con menor precipitación como en climas secos o los anticiclones.

Ahora vemos las estadísticas por mes:

resumen_mes <- datos_oficiales %>%
  group_by(MO) %>%
  summarise(
    UVB_mediana = median(ALLSKY_SFC_UVB, na.rm = TRUE),
    UVB_media = mean(ALLSKY_SFC_UVB, na.rm = TRUE),
    PS_media = mean(PS, na.rm = TRUE),
    Precip_mediana = median(PRECTOTCORR, na.rm = TRUE)
  )
print(resumen_mes)

##   MO UVB_mediana UVB_media PS_media Precip_mediana
## 1  1           0 0.3960753 84.56856          0.365
## 2  2           0 0.3535565 84.49644          2.240
## 3  3           0 0.3692204 84.56747          3.200
## 4  4           0 0.4126149 84.52852          4.605

Viendo los resultados de estas estadísticas vemos lo siguiente:

En la radiación UVB presenta una mediana = 0 en todos los meses, esto indica que más del 50% de los valores UVB son cero, es decir, periodos sin radiación UVB como las noches o días nublados. También se puede ver que la media de la radiación UVB aumenta en los meses de enero (0.396) a abril (0.413), posiblemente por mayor insolación en primavera.
En la presión superficial hay poca variación entre meses, diciendo así que hay un clima estable.
En cuanto a las precipitaciones hay un aumento progresivo en los meses de enero, con una medida de 0.365 mm, siendo la medida semejante a la época de sequía. Y en abril con 4.605 mm, siendo ese el tope máximo de precipitaciones registrada en el territorio. Eso da un patrón en el aumento de lluvias en la época de la primavera.

Por ello, la radiación UVB es muy baj pero existen horas y/o días con exposición constante. La presión es estable y las precipitaciones aumentan significativamente de invierno a primavera.

EDA_PROY_FINAL

María Clara Ávila y Mateo José Giraldo

2025-08-14

Análisis y visualización

Boxplot irradiancia UVB por mes

Gráfico de violín para PS por mes: