regressão OLS e regressão
quantílica para investigar a relação entre os índices
Clean_Energy (variável dependente) e
Energy (variável explicativa), a partir da base:
BaseDadosEstatisticaBruno.xlsx.
# Carregar pacotes
library(readxl)
library(ggplot2)
library(quantreg)
library(dplyr)
# Ler o banco de dados
BaseDadosEstatisticaBruno <- read_excel("~/aula_3_atividade/BaseDadosEstatisticaBruno.xlsx")
# Visualizar estrutura
head(BaseDadosEstatisticaBruno)
## # A tibble: 6 × 3
## date Clean_Energy Energy
## <dttm> <dbl> <dbl>
## 1 2005-02-18 00:00:00 1251. 334.
## 2 2005-02-22 00:00:00 1257. 330.
## 3 2005-02-23 00:00:00 1245. 335.
## 4 2005-02-24 00:00:00 1261. 343.
## 5 2005-02-25 00:00:00 1279. 352.
## 6 2005-02-28 00:00:00 1287. 351.
summary(BaseDadosEstatisticaBruno)
## date Clean_Energy Energy
## Min. :2005-02-18 00:00:00.00 Min. : 392.6 Min. :179.9
## 1st Qu.:2010-02-19 00:00:00.00 1st Qu.: 613.8 1st Qu.:424.6
## Median :2015-02-19 00:00:00.00 Median : 904.5 Median :507.1
## Mean :2015-02-17 10:42:36.41 Mean :1138.6 Mean :506.0
## 3rd Qu.:2020-02-19 00:00:00.00 3rd Qu.:1375.5 3rd Qu.:583.4
## Max. :2025-02-20 00:00:00.00 Max. :3911.7 Max. :749.4
# Definir variáveis
x <- BaseDadosEstatisticaBruno$Energy
y <- BaseDadosEstatisticaBruno$Clean_Energy
dat <- data.frame(x = x, y = y)
# Gráfico de dispersão
ggplot(dat, aes(x, y)) +
geom_point(alpha = 0.4) +
labs(x = "Energy", y = "Clean Energy",
title = "Dispersão: Clean Energy vs. Energy")
ols <- lm(y ~ x, data = dat)
summary(ols)
##
## Call:
## lm(formula = y ~ x, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -810.2 -519.6 -170.2 214.9 2838.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1452.99999 46.75648 31.076 < 2e-16 ***
## x -0.62124 0.09024 -6.884 6.52e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 712.2 on 5031 degrees of freedom
## Multiple R-squared: 0.009332, Adjusted R-squared: 0.009135
## F-statistic: 47.39 on 1 and 5031 DF, p-value: 6.52e-12
ggplot(dat, aes(x, y)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", colour = "blue") +
labs(title = "OLS: Clean Energy ~ Energy")
taus <- c(0.10, 0.25, 0.50, 0.75, 0.90)
mod.qr <- rq(y ~ x, tau = taus, data = dat, method = "fn")
summary(mod.qr)
##
## Call: rq(formula = y ~ x, tau = taus, data = dat, method = "fn")
##
## tau: [1] 0.1
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## (Intercept) 682.61357 6.93337 98.45335 0.00000
## x -0.27032 0.01536 -17.60194 0.00000
##
## Call: rq(formula = y ~ x, tau = taus, data = dat, method = "fn")
##
## tau: [1] 0.25
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## (Intercept) 744.45160 18.51991 40.19737 0.00000
## x -0.25534 0.03287 -7.76847 0.00000
##
## Call: rq(formula = y ~ x, tau = taus, data = dat, method = "fn")
##
## tau: [1] 0.5
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## (Intercept) 1567.40920 59.37433 26.39877 0.00000
## x -1.15384 0.10666 -10.81844 0.00000
##
## Call: rq(formula = y ~ x, tau = taus, data = dat, method = "fn")
##
## tau: [1] 0.75
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## (Intercept) 1993.82326 23.66437 84.25421 0.00000
## x -1.33261 0.05859 -22.74390 0.00000
##
## Call: rq(formula = y ~ x, tau = taus, data = dat, method = "fn")
##
## tau: [1] 0.9
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## (Intercept) 193.07785 169.52289 1.13895 0.25478
## x 4.71251 0.47098 10.00575 0.00000
# Gráfico base com várias regressões quantílicas
plot(dat$x, dat$y, xlab = "Energy", ylab = "Clean Energy",
pch = 20, col = rgb(0, 0, 0, 0.4))
for (i in seq_along(taus)) {
abline(mod.qr$coefficients[, i], col = i + 1, lwd = 1.5)
}
abline(ols, col = "blue", lty = 2, lwd = 2)
legend("topleft", legend = paste("Tau =", taus),
col = 2:(length(taus) + 1), lty = 1, cex = 0.9, bty = "n")
# Alternativa em ggplot2: adicionar retas quantílicas manualmente
coefs <- as.data.frame(t(mod.qr$coefficients))
colnames(coefs) <- c("Intercept", "Slope")
coefs$tau <- taus
ggplot(dat, aes(x, y)) +
geom_point(alpha = 0.35) +
geom_abline(data = coefs, aes(intercept = Intercept, slope = Slope),
linetype = "solid", size = 0.8, alpha = 0.9) +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed", size = 0.9, colour = "blue") +
labs(title = "OLS (tracejada) e Regressões Quantílicas (cheias)",
subtitle = "Quantis: 0.10, 0.25, 0.50, 0.75, 0.90",
x = "Energy", y = "Clean Energy")
A OLS é suficiente?
Não. A OLS fornece uma única inclinação média e não captura as
diferenças relevantes entre quantis, inclusive a possível mudança de
sinal nos extremos superiores.
A regressão quantílica é útil? Sim. Ela evidencia a variação na intensidade e no sinal do efeito ao longo da distribuição, oferecendo uma visão mais completa para a análise.