El presente estudio tiene como objetivo analizar la evolución del AQI en varios municipios de Puerto Rico durante el período 2020–2024, utilizando técnicas estadísticas abordadas en el curso. Para ello, se emplean métodos de regresión lineal y análisis de series de tiempo con el fin de evaluar diferencias entre municipios, identificar tendencias temporales y explorar la estructura dinámica del índice a nivel mensual. Los datos originales fueron procesados mediante agregación mensual e interpolación de valores faltantes, garantizando así la continuidad temporal necesaria para los análisis realizados.
Descripción de los datos
El estudio se realizó utilizando una base de datos de Índice de Calidad del Aire (AQI) correspondiente al período 2020–2024, con observaciones diarias para distintos municipios de Puerto Rico. Las variables principales incluyen la fecha de medición, el municipio (county Name) y el valor del AQI, que mide la calidad del aire y permite evaluar posibles riesgos para la salud.
Como primer paso, la variable de fecha fue transformada al formato adecuado (Date) y posteriormente agregada a nivel mensual, con el fin de reducir la variabilidad diaria y facilitar tanto el análisis comparativo entre municipios como el análisis de series de tiempo.
Durante la exploración inicial se identificaron valores faltantes en la variable AQI, los cuales fueron tratados mediante interpolación lineal, utilizando el método na.approx. Este procedimiento permite mantener la continuidad temporal de la serie sin introducir cambios abruptos artificiales en los datos.
Finalmente, se calculó el AQI promedio mensual por municipio, generando una base de datos agregada (data_mensual) que sirvió como insumo para los modelos de regresión y las series de tiempo.
La metodología se dividió en dos enfoques principales:
análisis transversal mediante modelos de regresión lineal
análisis temporal mediante modelos de series de tiempo (ARIMA).
library(zoo)
library(dplyr)
library(lubridate)
library(plotly)
library(ggplot2)
library(dynlm)
library(tseries)
library(forecast)
library(broom)
library(lmtest)
data <- (read_xlsx("AQI 2020-2024.xlsx",sheet = 3))
data$Date <- as.Date(data$Date)
data <- data %>%
mutate(YearMonth = floor_date(Date, "month"))
data$AQI <- na.approx(data$AQI, maxgap = Inf) #codigo para poder interpolar los datos faltantes
data_mensual <- data %>%
group_by(`county Name`, YearMonth) %>%
summarise(AQI = mean(AQI, na.rm = TRUE)) %>%
ungroup() %>%
mutate(
Year = year(YearMonth),
Month = month(YearMonth)
)
## `summarise()` has grouped output by 'county Name'. You can override using the
## `.groups` argument.
modelorlm<- lm(
AQI ~ Year + Month + `county Name`,
data = data_mensual
)
summary(modelorlm)
##
## Call:
## lm(formula = AQI ~ Year + Month + `county Name`, data = data_mensual)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.558 -8.015 -1.463 7.228 57.774
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 663.02791 1185.39587 0.559 0.576376
## Year -0.31666 0.58625 -0.540 0.589523
## Month -0.01077 0.23953 -0.045 0.964156
## `county Name`Caguas -6.36678 2.63095 -2.420 0.016148 *
## `county Name`Catano 23.41974 2.60791 8.980 < 2e-16 ***
## `county Name`Mayagnez 8.70337 2.60776 3.337 0.000958 ***
## `county Name`Ponce 3.10488 2.63095 1.180 0.238930
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.22 on 285 degrees of freedom
## Multiple R-squared: 0.3404, Adjusted R-squared: 0.3265
## F-statistic: 24.51 on 6 and 285 DF, p-value: < 2.2e-16
#Modelos de nulo y saturado
formulalm <- AQI~Year + Month + `county Name`
m0_log <- lm(AQI~ 1, data = data_mensual) # sin predictores
mF_log <- lm(formulalm, data = data_mensual) # todos los predictores
# Hacia adelante
m1_step_log <- step(m0_log,
scope = list(lower = ~1, upper = formula(mF_log)),
direction = "forward",
k = log(nrow(data_mensual)),
trace = TRUE)
## Start: AIC=1670.54
## AQI ~ 1
##
## Df Sum of Sq RSS AIC
## + `county Name` 4 29695.0 57713 1572.0
## <none> 87408 1670.5
## + Year 1 67.3 87341 1676.0
## + Month 1 11.5 87397 1676.2
##
## Step: AIC=1572.04
## AQI ~ `county Name`
##
## Df Sum of Sq RSS AIC
## <none> 57713 1572.0
## + Year 1 59.035 57654 1577.4
## + Month 1 0.425 57713 1577.7
#Modelo Final
data_mensual$`county Name` <- as.factor(data_mensual$`county Name`)
data_mensual$`county Name` <- relevel(
data_mensual$`county Name`,
ref = "Catano"
)
mod.final <-lm(AQI~`county Name`,data= data_mensual)
summary(mod.final)
##
## Call:
## lm(formula = AQI ~ `county Name`, data = data_mensual)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.172 -8.341 -1.297 7.039 57.178
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.089 1.846 24.965 < 2e-16 ***
## `county Name`Bayamon -23.410 2.600 -9.004 < 2e-16 ***
## `county Name`Caguas -29.794 2.634 -11.313 < 2e-16 ***
## `county Name`Mayagnez -14.701 2.611 -5.631 4.27e-08 ***
## `county Name`Ponce -20.323 2.634 -7.716 1.99e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.18 on 287 degrees of freedom
## Multiple R-squared: 0.3397, Adjusted R-squared: 0.3305
## F-statistic: 36.92 on 4 and 287 DF, p-value: < 2.2e-16
df <- data.frame(
yhat = fitted.values(mod.final),
res = rstandard(mod.final))
ggplot(df, aes(sample = res)) +
stat_qq(color = "blue") +
stat_qq_line(linewidth = 1) +
labs(x = "Cuantiles teóricos", y = "Cuantiles muestrales") +
theme_minimal(base_size = 14)
shapiro.test(df$res)
##
## Shapiro-Wilk normality test
##
## data: df$res
## W = 0.96068, p-value = 4.181e-07
#Varianza Constante
ggplot(df, aes(x = yhat, y = res)) +
geom_point(alpha = 0.6, color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
labs(x = "Valores ajustados", y = "Residuales estandarizados") +
theme_minimal(base_size = 14)
bptest(mod.final)
##
## studentized Breusch-Pagan test
##
## data: mod.final
## BP = 27.43, df = 4, p-value = 1.627e-05
#Independencia
df1 <- data.frame(
res = rstandard(mod.final)) %>%
mutate(orden = 1:length(res))
ggplot(df1, aes(x = orden, y = res)) +
geom_point(alpha = 0.6, color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
labs(x = "Orden/tiempo", y = "Residuales estandarizados") +
theme_minimal(base_size = 14)
dwtest(mod.final)
##
## Durbin-Watson test
##
## data: mod.final
## DW = 0.80827, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0
caguas_m <- data %>%
filter(`county Name` == "Caguas") %>%
mutate(YearMonth = floor_date(Date, "month")) %>%
group_by(YearMonth) %>%
summarise(AQI = mean(AQI, na.rm = TRUE))
ts_caguas <- ts(caguas_m$AQI, start = c(2020,1), end = c(2024,12), frequency = 12)
plot(ts_caguas)
d_caguas <- decompose(ts_caguas)
plot(d_caguas)
adf.test(ts_caguas)
##
## Augmented Dickey-Fuller Test
##
## data: ts_caguas
## Dickey-Fuller = -3.5917, Lag order = 3, p-value = 0.0415
## alternative hypothesis: stationary
acf(ts_caguas, lag.max = 36, main = "ACF - Serie original")
pacf(ts_caguas, lag.max = 36, main = "PACF - Serie Caguas")
AR1.caguas <- dynlm(ts_caguas ~ L(ts_caguas, 1))
AR2.caguas <- dynlm(ts_caguas ~ L(ts_caguas, 1:2))
summary(AR1.caguas); summary(AR2.caguas)
##
## Time series regression with "ts" data:
## Start = 2020(2), End = 2024(12)
##
## Call:
## dynlm(formula = ts_caguas ~ L(ts_caguas, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5320 -3.6080 -0.6528 1.3234 20.6816
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.6929 1.8793 3.029 0.00368 **
## L(ts_caguas, 1) 0.6466 0.1014 6.374 3.48e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.929 on 57 degrees of freedom
## Multiple R-squared: 0.4162, Adjusted R-squared: 0.4059
## F-statistic: 40.63 on 1 and 57 DF, p-value: 3.484e-08
##
## Time series regression with "ts" data:
## Start = 2020(3), End = 2024(12)
##
## Call:
## dynlm(formula = ts_caguas ~ L(ts_caguas, 1:2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.8333 -3.5493 -0.5448 2.1534 20.2125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.2364 1.9907 3.635 0.000613 ***
## L(ts_caguas, 1:2)1 0.8259 0.1302 6.342 4.48e-08 ***
## L(ts_caguas, 1:2)2 -0.2760 0.1301 -2.121 0.038449 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.779 on 55 degrees of freedom
## Multiple R-squared: 0.4607, Adjusted R-squared: 0.441
## F-statistic: 23.49 on 2 and 55 DF, p-value: 4.231e-08
ma.caguas <- arima(ts_caguas, order = c(0,0,1))
ar.caguas <- arima(ts_caguas, order = c(1,0,0))
arma.caguas <- arima(ts_caguas, order = c(1,0,1))
ma2.caguas <- arima(ts_caguas, order = c(0,0,2))
ar2.caguas <- arima(ts_caguas, order = c(2,0,0))
arma2.caguas <- arima(ts_caguas, order = c(2,0,2))
AIC(ma.caguas, ar.caguas, arma.caguas,ma2.caguas,ar2.caguas,arma2.caguas)
## df AIC
## ma.caguas 3 409.8418
## ar.caguas 3 406.0589
## arma.caguas 4 404.3293
## ma2.caguas 4 401.2185
## ar2.caguas 4 403.5017
## arma2.caguas 6 404.8518
prediccion.caguas <- forecast(ma2.caguas, h=12)
autoplot(prediccion.caguas)
bayamon_m <- data %>%
filter(`county Name` == "Bayamon") %>%
mutate(YearMonth = floor_date(Date, "month")) %>%
group_by(YearMonth) %>%
summarise(AQI = mean(AQI, na.rm = TRUE))
ts_bayamon <- ts(bayamon_m$AQI,
start = c(2020,1),end = c(2024,12),
frequency = 12)
plot(ts_bayamon)
d_bayamon <- decompose(ts_bayamon)
plot(d_bayamon)
adf.test(ts_bayamon)
##
## Augmented Dickey-Fuller Test
##
## data: ts_bayamon
## Dickey-Fuller = -3.6814, Lag order = 3, p-value = 0.03394
## alternative hypothesis: stationary
acf(ts_bayamon, lag.max = 36, main = "ACF - Serie original")
pacf(ts_bayamon, lag.max = 36, main = "PACF - Serie Caguas")
AR1.bayamon <- dynlm(ts_bayamon ~ L(ts_bayamon, 1))
AR2.bayamon <- dynlm(ts_bayamon ~ L(ts_bayamon, 1:2))
summary(AR1.bayamon); summary(AR2.bayamon)
##
## Time series regression with "ts" data:
## Start = 2020(2), End = 2024(12)
##
## Call:
## dynlm(formula = ts_bayamon ~ L(ts_bayamon, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.0841 -4.2185 -0.0818 4.5026 26.0417
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.8566 2.6415 3.731 0.000441 ***
## L(ts_bayamon, 1) 0.5594 0.1083 5.164 3.2e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.738 on 57 degrees of freedom
## Multiple R-squared: 0.3188, Adjusted R-squared: 0.3068
## F-statistic: 26.67 on 1 and 57 DF, p-value: 3.199e-06
##
## Time series regression with "ts" data:
## Start = 2020(3), End = 2024(12)
##
## Call:
## dynlm(formula = ts_bayamon ~ L(ts_bayamon, 1:2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.0058 -4.5903 0.1594 4.1758 25.5248
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.08012 2.99150 3.035 0.003667 **
## L(ts_bayamon, 1:2)1 0.52195 0.13551 3.852 0.000308 ***
## L(ts_bayamon, 1:2)2 0.07295 0.13337 0.547 0.586588
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.852 on 55 degrees of freedom
## Multiple R-squared: 0.3191, Adjusted R-squared: 0.2944
## F-statistic: 12.89 on 2 and 55 DF, p-value: 2.566e-05
ma.bayamon <- arima(ts_bayamon, order = c(0,0,1))
ar.bayamon <- arima(ts_bayamon, order = c(1,0,0))
arma.bayamon <- arima(ts_bayamon, order = c(1,0,1))
ma2.bayamon <- arima(ts_bayamon, order = c(0,0,2))
ar2.bayamon <- arima(ts_bayamon, order = c(2,0,0))
arma2.bayamon <- arima(ts_bayamon, order = c(2,0,2))
AIC(ma.bayamon, ar.bayamon, arma.bayamon,ma2.bayamon,ar2.bayamon,arma2.bayamon)
## df AIC
## ma.bayamon 3 425.8962
## ar.bayamon 3 420.9369
## arma.bayamon 4 422.4684
## ma2.bayamon 4 426.4465
## ar2.bayamon 4 422.6160
## arma2.bayamon 6 425.0508
prediccion.bayamon <- forecast(ar.bayamon, h=12)
autoplot(prediccion.bayamon)
catano_m <- data %>%
filter(`county Name` == "Catano") %>%
mutate(YearMonth = floor_date(Date, "month")) %>%
group_by(YearMonth) %>%
summarise(AQI = mean(AQI, na.rm = TRUE))
ts_catano <- ts(catano_m$AQI, start = c(2020,1), end = c(2024,12), frequency = 12)
plot_catano <- plot(ts_catano)
d_catano <- decompose(ts_catano)
plot(d_catano)
adf.test(ts_catano)
##
## Augmented Dickey-Fuller Test
##
## data: ts_catano
## Dickey-Fuller = -3.3516, Lag order = 3, p-value = 0.07202
## alternative hypothesis: stationary
log.datos.ts.catano <- log(ts_catano)
dlog.datos.ts.catano <- diff(log.datos.ts.catano)
adf.test(dlog.datos.ts.catano)
## Warning in adf.test(dlog.datos.ts.catano): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: dlog.datos.ts.catano
## Dickey-Fuller = -5.0429, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
acf(dlog.datos.ts.catano, lag.max = 36, main = "ACF - Serie Cataño")
pacf(dlog.datos.ts.catano, lag.max = 36, main = "PACF - Serie Cataño ")
AR1.catano <- dynlm(dlog.datos.ts.catano ~ L(dlog.datos.ts.catano, 1))
AR2.catano <- dynlm(dlog.datos.ts.catano ~ L(dlog.datos.ts.catano, 1:2))
summary(AR1.catano); summary(AR2.catano)
##
## Time series regression with "ts" data:
## Start = 2020(3), End = 2024(12)
##
## Call:
## dynlm(formula = dlog.datos.ts.catano ~ L(dlog.datos.ts.catano,
## 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.04153 -0.19141 0.01888 0.24977 0.83514
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.00561 0.05059 -0.111 0.912
## L(dlog.datos.ts.catano, 1) -0.02788 0.13962 -0.200 0.842
##
## Residual standard error: 0.3849 on 56 degrees of freedom
## Multiple R-squared: 0.0007114, Adjusted R-squared: -0.01713
## F-statistic: 0.03987 on 1 and 56 DF, p-value: 0.8425
##
## Time series regression with "ts" data:
## Start = 2020(4), End = 2024(12)
##
## Call:
## dynlm(formula = dlog.datos.ts.catano ~ L(dlog.datos.ts.catano,
## 1:2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02675 -0.19443 0.01985 0.26504 0.86203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.00731 0.05183 -0.141 0.888
## L(dlog.datos.ts.catano, 1:2)1 -0.03358 0.14291 -0.235 0.815
## L(dlog.datos.ts.catano, 1:2)2 -0.05591 0.14262 -0.392 0.697
##
## Residual standard error: 0.391 on 54 degrees of freedom
## Multiple R-squared: 0.003876, Adjusted R-squared: -0.03302
## F-statistic: 0.1051 on 2 and 54 DF, p-value: 0.9004
ima.catano <- arima(dlog.datos.ts.catano, order = c(0,1,1))
ari.catano <- arima(dlog.datos.ts.catano, order = c(1,1,0))
arima.catano <- arima(dlog.datos.ts.catano, order = c(1,1,1))
ima2.catano <- arima(dlog.datos.ts.catano, order = c(0,1,2))
ari2.catano <- arima(dlog.datos.ts.catano, order = c(2,1,0))
arima2.catano <- arima(dlog.datos.ts.catano, order = c(2,1,2))
AIC(ima.catano, ari.catano, arima.catano,ima2.catano,ari2.catano,arima2.catano)
## df AIC
## ima.catano 2 60.80490
## ari.catano 2 81.21995
## arima.catano 3 62.79980
## ima2.catano 3 62.79937
## ari2.catano 3 77.38081
## arima2.catano 5 64.50649
prediccion.catano<- forecast(ima.catano, h=12)
autoplot(prediccion.catano)
Mayaguez_m <- data %>%
filter(`county Name` == "Mayagnez") %>%
mutate(YearMonth = floor_date(Date, "month")) %>%
group_by(YearMonth) %>%
summarise(AQI = mean(AQI, na.rm = TRUE))
ts_mayaguez <- ts(Mayaguez_m$AQI, start = c(2020,1), end = c(2024,12), frequency = 12)
plot_mayaguez <- plot(ts_mayaguez)
d_mayaguez <- decompose(ts_mayaguez)
plot(d_mayaguez)
adf.test(ts_mayaguez)
##
## Augmented Dickey-Fuller Test
##
## data: ts_mayaguez
## Dickey-Fuller = -2.9852, Lag order = 3, p-value = 0.1766
## alternative hypothesis: stationary
log.datos.ts.mayaguez <- log(ts_mayaguez)
dlog.datos.ts.mayaguez <- diff(log.datos.ts.mayaguez)
adf.test(dlog.datos.ts.mayaguez)
## Warning in adf.test(dlog.datos.ts.mayaguez): p-value smaller than printed
## p-value
##
## Augmented Dickey-Fuller Test
##
## data: dlog.datos.ts.mayaguez
## Dickey-Fuller = -4.8527, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
acf(ts_mayaguez, lag.max = 36, main = "ACF - Serie Mayaguez")
pacf(ts_mayaguez, lag.max = 36, main = "PACF - Serie Mayaguez")
AR1.mayaguez <- dynlm(dlog.datos.ts.mayaguez ~ L(dlog.datos.ts.mayaguez, 1))
AR2.mayaguez <- dynlm(dlog.datos.ts.mayaguez ~ L(dlog.datos.ts.mayaguez, 1:2))
summary(AR1.mayaguez); summary(AR2.mayaguez)
##
## Time series regression with "ts" data:
## Start = 2020(3), End = 2024(12)
##
## Call:
## dynlm(formula = dlog.datos.ts.mayaguez ~ L(dlog.datos.ts.mayaguez,
## 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.04929 -0.24513 0.03042 0.21840 1.38498
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.00413 0.06136 -0.067 0.947
## L(dlog.datos.ts.mayaguez, 1) -0.05642 0.13360 -0.422 0.674
##
## Residual standard error: 0.4673 on 56 degrees of freedom
## Multiple R-squared: 0.003174, Adjusted R-squared: -0.01463
## F-statistic: 0.1783 on 1 and 56 DF, p-value: 0.6744
##
## Time series regression with "ts" data:
## Start = 2020(4), End = 2024(12)
##
## Call:
## dynlm(formula = dlog.datos.ts.mayaguez ~ L(dlog.datos.ts.mayaguez,
## 1:2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02636 -0.27426 0.03899 0.23721 1.14723
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01370 0.06023 -0.227 0.8209
## L(dlog.datos.ts.mayaguez, 1:2)1 -0.08024 0.13048 -0.615 0.5411
## L(dlog.datos.ts.mayaguez, 1:2)2 -0.26563 0.13020 -2.040 0.0462 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4546 on 54 degrees of freedom
## Multiple R-squared: 0.07552, Adjusted R-squared: 0.04128
## F-statistic: 2.206 on 2 and 54 DF, p-value: 0.12
ima.mayaguez <- arima(dlog.datos.ts.mayaguez, order = c(0,1,1))
ari.mayaguez <- arima(dlog.datos.ts.mayaguez, order = c(1,1,0))
arima.mayaguez <- arima(dlog.datos.ts.mayaguez, order = c(1,1,1))
ima2.mayaguez <- arima(dlog.datos.ts.mayaguez, order = c(0,1,2))
ari2.mayaguez <- arima(dlog.datos.ts.mayaguez, order = c(2,1,0))
arima2.mayaguez <- arima(dlog.datos.ts.mayaguez, order = c(2,1,2))
AIC(ima.mayaguez, ari.mayaguez, arima.mayaguez,ima2.mayaguez,ari2.mayaguez,arima2.mayaguez)
## df AIC
## ima.mayaguez 2 82.80429
## ari.mayaguez 2 111.53462
## arima.mayaguez 3 84.71496
## ima2.mayaguez 3 84.62231
## ari2.mayaguez 3 101.85716
## arima2.mayaguez 5 79.48979
prediccion.mayaguez<- forecast(arima2.mayaguez, h=12)
autoplot(prediccion.mayaguez)
Ponce_m <- data %>%
filter(`county Name` == "Ponce") %>%
mutate(YearMonth = floor_date(Date, "month")) %>%
group_by(YearMonth) %>%
summarise(AQI = mean(AQI, na.rm = TRUE))
ts_ponce <- ts(Ponce_m$AQI, start = c(2020,1), end = c(2024,12), frequency = 12)
plot_ponce <- plot(ts_ponce)
d_ponce <- decompose(ts_ponce)
plot(d_ponce)
adf.test(ts_ponce)
##
## Augmented Dickey-Fuller Test
##
## data: ts_ponce
## Dickey-Fuller = -3.9734, Lag order = 3, p-value = 0.01686
## alternative hypothesis: stationary
acf(ts_ponce, lag.max = 36, main = "ACF - Serie Ponce")
pacf(ts_ponce, lag.max = 36, main = "PACF - Serie Ponce")
AR1.ponce <- dynlm(ts_ponce ~ L(ts_ponce, 1))
AR2.ponce <- dynlm(ts_ponce ~ L(ts_ponce, 1:2))
summary(AR1.ponce); summary(AR2.ponce)
##
## Time series regression with "ts" data:
## Start = 2020(2), End = 2024(12)
##
## Call:
## dynlm(formula = ts_ponce ~ L(ts_ponce, 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.905 -8.322 -1.915 5.128 45.166
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.9138 3.4341 4.343 5.84e-05 ***
## L(ts_ponce, 1) 0.4162 0.1212 3.433 0.00112 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.05 on 57 degrees of freedom
## Multiple R-squared: 0.1713, Adjusted R-squared: 0.1568
## F-statistic: 11.78 on 1 and 57 DF, p-value: 0.001119
##
## Time series regression with "ts" data:
## Start = 2020(3), End = 2024(12)
##
## Call:
## dynlm(formula = ts_ponce ~ L(ts_ponce, 1:2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.800 -7.671 -1.016 4.951 45.394
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.9539 3.9134 2.799 0.00705 **
## L(ts_ponce, 1:2)1 0.3103 0.1315 2.360 0.02183 *
## L(ts_ponce, 1:2)2 0.2578 0.1313 1.964 0.05464 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.86 on 55 degrees of freedom
## Multiple R-squared: 0.2271, Adjusted R-squared: 0.199
## F-statistic: 8.079 on 2 and 55 DF, p-value: 0.0008389
ma.ponce <- arima(ts_ponce, order = c(0,0,1))
ar.ponce <- arima(ts_ponce, order = c(1,0,0))
arma.ponce <- arima(ts_ponce, order = c(1,0,1))
ma2.ponce <- arima(ts_ponce, order = c(0,0,2))
ar2.ponce <- arima(ts_ponce, order = c(2,0,0))
arma2.ponce <- arima(ts_ponce, order = c(2,0,2))
AIC(ma.ponce, ar.ponce, arma.ponce,ma2.ponce,ar2.ponce,arma2.ponce)
## df AIC
## ma.ponce 3 465.9470
## ar.ponce 3 461.8253
## arma.ponce 4 460.6637
## ma2.ponce 4 461.9600
## ar2.ponce 4 459.8888
## arma2.ponce 6 463.6903
prediccion.ponce<- forecast(ar2.ponce, h=12)
autoplot(prediccion.ponce)
El análisis del AQI en Puerto Rico durante 2020–2024 muestra que la calidad del aire varía significativamente entre municipios, siendo Cataño y Mayagüez los de mayor exposición, mientras que Caguas y Bayamón presentan valores menores. Los modelos de regresión lineal indicaron que la localización geográfica explica alrededor del 34% de la variabilidad del AQI. Por su parte, el análisis de series de tiempo reveló dependencia temporal en la mayoría de los municipios, con modelos ARIMA capaces de proyectar tendencias estables a corto plazo. Estos hallazgos sugieren que tanto factores espaciales como temporales influyen en la calidad del aire, proporcionando información útil para políticas ambientales y estrategias de mitigación específicas por municipio.