class: center, middle, inverse, title-slide .title[ # Modelos GAMLSS e o pacote gamlss ] .subtitle[ ## Aplicação em dados de temperatura máxima ] .author[ ### Elias Silva de Medeiros ] .date[ ### 2025-11-17 ] --- class: center, middle, inverse # **GAMLSS** ## *Generalized Additive Models for Location, Scale and Shape* ### Elias Silva de Medeiros --- # **Roteiro da Apresentação** - Motivação - O que é GAMLSS - Estrutura geral - Tipos de distribuições - Ajuste no R com o pacote **gamlss** - Splines e suavizadores - Diagnósticos - Exemplo completo - Conclusões --- # Artigo 1: Início da nossa pesquisa <https://doi.org/10.3390/w11091843>  --- # Artigo 2: <https://doi.org/10.3390/w11112368>  --- # **Por que GAMLSS?** - GLM e GAM modelam **apenas a média** - Muitos dados possuem: - Heterocedasticidade - Assimetria - Caudas pesadas - Outliers - GAMLSS modela até **quatro parâmetros**: `$$\mu; \sigma; \nu; \tau$$` 💡 **Grande catálogo** (mais de 100 distribuições) --- # **Estrutura do Modelo** GAMLSS modela cada parâmetro com uma fórmula distinta: $$ g_1(\mu) = \eta_1 = X_1 \beta_1$$ `$$g_2(\sigma) = \eta_2 = X_2 \beta_2$$` `$$g_3(\nu) = \eta_3 = X_3 \beta_3$$` `$$g_4(\tau) = \eta_4 = X_4 \beta_4$$` Cada função de ligação pode ser diferente. --- # **Exemplos de Distribuições** Distribuições frequentemente usadas: - **NO** --- Normal - **GA** --- Gamma - **BE** --- Beta - **GG** --- Generalized Gamma - **ST3, ST4** --- Skew t - **DAGUM** - **GB2** - **BCT** --- Box-Cox t 💡 Boa parte dessas não existe em GLM/GAM tradicionais. --- # **Carregando Pacotes** ``` r library(gamlss) library(gamlss.dist) library(gamlss.add) ``` --- # **Ajuste Básico** ``` r mod <- gamlss( y ~ x1 + pb(x2), sigma.formula = ~ x1, family = GA, data = dados ) ``` --- # **Exemplo Prático** ## Dados simulados ``` r set.seed(123) n <- 200 x <- runif(n, 0, 10) y <- 3 + 0.7*x + rnorm(n, sd = 0.5 + 0.05*x) dados <- data.frame(x, y) ``` --- # **Ajuste com Splines** ``` r mod1 <- gamlss( y ~ pb(x), sigma.formula = ~ pb(x), family = NO, data = dados ) ``` 📝 Aqui modelamos média **e** dispersão como funções não paramétricas de x. --- # **Splines Disponíveis** - `pb(x)` --- P-splines - `cs(x)` --- Cubic splines - `lo(x)` --- Loess smoother - `gam(x)` --- GAM smoother - `fp(x)` --- Fractional polynomials Exemplo: ``` r mod2 <- gamlss( y ~ pb(x), sigma.formula = ~ cs(x), family = NO, data = dados ) ``` --- # **Diagnósticos** ### Gráficos de diagnóstico gerais ``` r plot(mod1) ``` --- # **Worm Plot** Indicador visual de ajuste: ``` r wp(mod1) ``` --- # **Resíduos vs Ajustados** ``` r plot(resid(mod1) ~ fitted(mod1)) ``` --- # **Comparação entre Distribuições** ``` r mod_NO <- gamlss(y ~ pb(x), sigma.formula = ~ pb(x), family = NO, data = dados) mod_ST3 <- gamlss(y ~ pb(x), sigma.formula = ~ pb(x), family = ST3, data = dados) mod_GA <- gamlss(y ~ pb(x), sigma.formula = ~ pb(x), family = GA, data = dados) GAIC(mod_NO, mod_ST3, mod_GA) ``` --- # **Modelo Mais Geral: BCT** ``` r mod3 <- gamlss( y ~ pb(x), sigma.formula = ~ pb(x), nu.formula = ~ 1, tau.formula = ~ 1, family = BCT, data = dados ) ``` --- # **Interpretação dos Parâmetros** - **μ** -- tendência central - **σ** -- dispersão (heterocedasticidade) - **ν** -- assimetria - **τ** -- peso das caudas Exemplo interpretativo: > O aumento de x provoca aumento tanto da média quanto da variabilidade > de y, sugerindo heterocedasticidade capturada pelo modelo. --- # **Resultados e Coeficientes** ``` r summary(mod3) ``` --- # *Aplicações - temperatura máxima* - **Pacotes** ``` r library(gamlss) library(geobr) library(ggplot2) library(sf) library(dplyr) library(tidyr) library(gridExtra) ``` - **Dados** [Baixar dados](https://docs.google.com/spreadsheets/d/1j9O3btW2QND798omSEWI0JRyWNZbND1D/edit?usp=sharing&ouid=104806220123248509303&rtpof=true&sd=true) ``` r dados_long <- readxl::read_xlsx("nordesteTmax.xlsx") head(dados_long) ``` ``` ## # A tibble: 6 × 12 ## PARAMETER.x YEAR Station Mes tmax code_state abbrev_state name_state ## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <chr> ## 1 T2M_MAX 1984 S_-10_-41.875 JAN 37.0 29 BA Bahia ## 2 T2M_MAX 1984 S_-10_-41.875 FEB 37.2 29 BA Bahia ## 3 T2M_MAX 1984 S_-10_-41.875 MAR 37.0 29 BA Bahia ## 4 T2M_MAX 1984 S_-10_-41.875 APR 32.2 29 BA Bahia ## 5 T2M_MAX 1984 S_-10_-41.875 MAY 33.4 29 BA Bahia ## 6 T2M_MAX 1984 S_-10_-41.875 JUN 33.6 29 BA Bahia ## # ℹ 4 more variables: code_region <dbl>, name_region <chr>, LON <dbl>, ## # LAT <dbl> ``` --- # Mapa Brasil por Regiões ``` r reg = geobr::read_region(year=2018, showProgress = F) coods = unique.data.frame(cbind(dados_long[,c("LAT","LON","Station")])) ggplot() + geom_sf(data = reg, aes(fill = name_region))+ geom_point(data = coods, aes(x = LON, y = LAT)) ``` <!-- --> --- # Mapa Brasil por UF ``` r UF = geobr::read_state(year=2020, showProgress = F) ggplot() + geom_sf(data = UF, aes(fill = name_region))+ geom_point(data = coods, aes(x = LON, y = LAT)) ``` <!-- --> --- # Mapa Nordetes por UF ``` r Nordeste = dplyr::filter(UF, name_region == "Nordeste") ggplot() + geom_sf(data = Nordeste, aes(fill = abbrev_state ))+ geom_point(data = coods, aes(x = LON, y = LAT)) ``` <!-- --> --- ### Fernando de Noronha? ``` r muni <- geobr::read_municipality(year = 2020, showProgress = F) Nordeste2 = dplyr::filter(muni, name_region == "Nordeste" & name_muni != "Fernando De Noronha") ggplot() + geom_sf(data = Nordeste2, aes(fill = abbrev_state ))+ geom_point(data = coods, aes(x = LON, y = LAT)) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-6-1.png" width="60%" /> --- ### Modelando tmax Janeiro ``` r Janeiro = dplyr::filter(dados_long, Mes == "JAN" & YEAR == 2015) Janeiro ``` ``` ## # A tibble: 410 × 12 ## PARAMETER.x YEAR Station Mes tmax code_state abbrev_state name_state ## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <chr> ## 1 T2M_MAX 2015 S_-10_-41.8… JAN 37.1 29 BA Bahia ## 2 T2M_MAX 2015 S_-10_-42.5 JAN 35.7 29 BA Bahia ## 3 T2M_MAX 2015 S_-10_-43.1… JAN 37.5 29 BA Bahia ## 4 T2M_MAX 2015 S_-10_-43.75 JAN 37.7 22 PI Piauí ## 5 T2M_MAX 2015 S_-10_-44.3… JAN 37.8 22 PI Piauí ## 6 T2M_MAX 2015 S_-10_-45 JAN 37.3 22 PI Piauí ## 7 T2M_MAX 2015 S_-10_-45.6… JAN 35.7 22 PI Piauí ## 8 T2M_MAX 2015 S_-10_-46.25 JAN 35.0 21 MA Maranhão ## 9 T2M_MAX 2015 S_-10.5_-41… JAN 38.5 29 BA Bahia ## 10 T2M_MAX 2015 S_-10.5_-42… JAN 36.8 29 BA Bahia ## # ℹ 400 more rows ## # ℹ 4 more variables: code_region <dbl>, name_region <chr>, LON <dbl>, ## # LAT <dbl> ``` --- ### Dados observados ``` r p1 = ggplot(data = Janeiro) + geom_raster(aes(x = LON, y = LAT, fill = tmax))+ geom_sf(data = Nordeste2, fill = "transparent")+ scale_fill_viridis_c(option = "turbo", name = "Tmax", limits = c(30, 41), # <<< Fixando mínimo e máximo da escala oob = scales::squish)+ labs(x = NULL, y = NULL, subtitle = "Dados observados") p1 ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-8-1.png" width="60%" /> --- ### Ajustes gamlss com Normal ``` r mod2 = gamlss(tmax ~ LON + LAT, family = NO(mu.link = "identity"), data = Janeiro, trace = F) plot(mod2) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-9-1.png" width="60%" /> ``` ## ****************************************************************** ## Summary of the Quantile Residuals ## mean = -1.402021e-16 ## variance = 1.002445 ## coef. of skewness = -0.5721595 ## coef. of kurtosis = 3.434666 ## Filliben correlation coefficient = 0.9898436 ## ****************************************************************** ``` --- ### Gráfico do ajuste Normal ``` r grid = st_sample(Nordeste2, size = 10000, type = "regular") gridNordeste = data.frame(sf::st_coordinates(grid)) colnames(gridNordeste) ``` ``` ## [1] "X" "Y" ``` ``` r colnames(gridNordeste) = c("LON", "LAT") preds = predict(mod2, newdata = gridNordeste, type = "response") gridNordeste$pred = preds kpred <- st_as_sf(gridNordeste, coords = c("LON", "LAT"), crs = 4326) p2 = ggplot() + geom_sf(data = kpred, aes(color = pred)) + geom_sf(data = Nordeste2, fill = "transparent")+ scale_color_viridis_c(option = "turbo", name = "Tmax", limits = c(30, 41), # <<< Fixando mínimo e máximo da escala oob = scales::squish) + labs(subtitle = "Ajuste Normal") ``` --- ### Comparando o ajuste Normal ``` r gridExtra::grid.arrange(p1,p2, nrow = 1, ncol = 2) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-11-1.png" width="90%" /> --- ### Analisando a relação entre as coordenadas ``` r pa = ggplot(Janeiro, aes(x = LAT, y = tmax))+ geom_point()+ geom_smooth(span = 0.3) pb = ggplot(Janeiro, aes(x = LON, y = tmax))+ geom_point()+ geom_smooth(span = 0.3) gridExtra::grid.arrange(pa,pb, nrow = 1, ncol = 2) ``` ``` ## `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ## `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-12-1.png" width="90%" /> --- ### Ajustes gamlss com Normal + splines ``` r mod3 = gamlss(tmax ~ pb(LON) + pb(LAT), family = NO(mu.link = "identity"), data = Janeiro, trace = F) plot(mod3) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-13-1.png" width="60%" /> ``` ## ****************************************************************** ## Summary of the Quantile Residuals ## mean = -4.875685e-15 ## variance = 1.002445 ## coef. of skewness = -0.5803631 ## coef. of kurtosis = 3.682573 ## Filliben correlation coefficient = 0.9884347 ## ****************************************************************** ``` --- ### Ajustes gamlss com Normal + splines ``` r gridNordeste = data.frame(sf::st_coordinates(grid)) colnames(gridNordeste) ``` ``` ## [1] "X" "Y" ``` ``` r colnames(gridNordeste) = c("LON", "LAT") preds = predict(mod3, newdata = gridNordeste, type = "response") gridNordeste$pred = preds kpred <- st_as_sf(gridNordeste, coords = c("LON", "LAT"), crs = 4326) p3 = ggplot() + geom_sf(data = kpred, aes(color = pred)) + geom_sf(data = Nordeste2, fill = "transparent")+ scale_color_viridis_c(option = "turbo", name = "Tmax", limits = c(30, 41), # <<< Fixando mínimo e máximo da escala oob = scales::squish) + labs(subtitle = "Normal + splines") ``` --- ### Comparando os ajustes ``` r gridExtra::grid.arrange(p1,p2,p3, nrow = 1, ncol = 3) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-15-1.png" width="90%" /> --- ### Ajustes gamlss com Normal + splines + link ``` r mod4 = gamlss(tmax ~ pb(LON) + pb(LAT), family = NO(mu.link = "log"), data = Janeiro, trace = F) plot(mod4) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-16-1.png" width="60%" /> ``` ## ****************************************************************** ## Summary of the Quantile Residuals ## mean = -0.001261076 ## variance = 1.002443 ## coef. of skewness = -0.5745155 ## coef. of kurtosis = 3.668736 ## Filliben correlation coefficient = 0.9885206 ## ****************************************************************** ``` --- ### Ajustes gamlss com Normal + splines + link ``` r gridNordeste = data.frame(sf::st_coordinates(grid)) colnames(gridNordeste) ``` ``` ## [1] "X" "Y" ``` ``` r colnames(gridNordeste) = c("LON", "LAT") preds = predict(mod4, newdata = gridNordeste, type = "response") gridNordeste$pred = preds kpred <- st_as_sf(gridNordeste, coords = c("LON", "LAT"), crs = 4326) p4 = ggplot() + geom_sf(data = kpred, aes(color = pred)) + geom_sf(data = Nordeste2, fill = "transparent")+ scale_color_viridis_c(option = "turbo", name = "Tmax", limits = c(30, 41), # <<< Fixando mínimo e máximo da escala oob = scales::squish) + labs(subtitle = "Normal + splines + link") ``` --- ### Comparando os ajustes ``` r gridExtra::grid.arrange(p1,p2,p3,p4, nrow = 2, ncol = 3) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-18-1.png" width="90%" /> --- ### Ajustes gamlss com Gama + splines + link ``` r mod5 = gamlss(tmax ~ pb(LON) + pb(LAT), family = GA(mu.link = "log"), data = Janeiro, trace = F) plot(mod5) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-19-1.png" width="60%" /> ``` ## ****************************************************************** ## Summary of the Quantile Residuals ## mean = 2.936261e-05 ## variance = 1.002446 ## coef. of skewness = -0.6783456 ## coef. of kurtosis = 3.840118 ## Filliben correlation coefficient = 0.9852758 ## ****************************************************************** ``` --- ### Ajustes gamlss com Gama + splines + link ``` r gridNordeste = data.frame(sf::st_coordinates(grid)) colnames(gridNordeste) ``` ``` ## [1] "X" "Y" ``` ``` r colnames(gridNordeste) = c("LON", "LAT") preds = predict(mod5, newdata = gridNordeste, type = "response") gridNordeste$pred = preds kpred <- st_as_sf(gridNordeste, coords = c("LON", "LAT"), crs = 4326) p5 = ggplot() + geom_sf(data = kpred, aes(color = pred)) + geom_sf(data = Nordeste2, fill = "transparent")+ scale_color_viridis_c(option = "turbo", name = "Tmax", limits = c(30, 41), # <<< Fixando mínimo e máximo da escala oob = scales::squish) + labs(subtitle = "Gama + splines + link") ``` --- ### Comparando os ajustes ``` r gridExtra::grid.arrange(p1,p2,p3,p4,p5, nrow = 2, ncol = 3) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-21-1.png" width="90%" /> --- ### Ajustes gamlss com Weibull + splines + link ``` r mod6 = gamlss(tmax ~ pb(LON) + pb(LAT), family = WEI(mu.link = "log"), data = Janeiro, trace = F) plot(mod6) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-22-1.png" width="60%" /> ``` ## ****************************************************************** ## Summary of the Quantile Residuals ## mean = -0.004236434 ## variance = 1.036808 ## coef. of skewness = -0.1918765 ## coef. of kurtosis = 2.755928 ## Filliben correlation coefficient = 0.9976105 ## ****************************************************************** ``` --- ### Ajustes gamlss com Weibull + splines + link ``` r gridNordeste = data.frame(sf::st_coordinates(grid)) colnames(gridNordeste) ``` ``` ## [1] "X" "Y" ``` ``` r colnames(gridNordeste) = c("LON", "LAT") preds = predict(mod6, newdata = gridNordeste, type = "response") gridNordeste$pred = preds kpred <- st_as_sf(gridNordeste, coords = c("LON", "LAT"), crs = 4326) p6 = ggplot() + geom_sf(data = kpred, aes(color = pred)) + geom_sf(data = Nordeste2, fill = "transparent")+ scale_color_viridis_c(option = "turbo", name = "Tmax", limits = c(30, 41), # <<< Fixando mínimo e máximo da escala oob = scales::squish) + labs(subtitle = "Weibull + splines + link") ``` --- ### Comparando os ajustes ``` r gridExtra::grid.arrange(p1,p2,p3,p4,p5,p6, nrow = 2, ncol = 3) ``` <img src="data:image/png;base64,#gamlss_apresentacao_files/figure-html/unnamed-chunk-24-1.png" width="90%" /> --- ``` r GAIC(mod2,mod3,mod4,mod5,mod6) ``` ``` ## df AIC ## mod6 18.72661 1427.147 ## mod3 16.28288 1480.206 ## mod4 16.33821 1481.233 ## mod5 16.38368 1489.711 ## mod2 4.00000 1659.626 ``` --- # **Conclusões** - GAMLSS é altamente flexível - Permite modelar média, variância, assimetria e curtose - Possui grande catálogo de distribuições - Ideal para dados com heterocedasticidade e assimetria - Fácil de integrar com splines e modelos aditivos --- class: center, middle # **Obrigado!** Elias Silva de Medeiros UFGD - eliasmedeiros@ufgd.edu.br Dourados - MS