Usaremos o conjunto de pacotes "tidyverse" para realizar algumas operações, sobretudo utilizando recursos gráficos.
# ---- Importação das Bibliotecas ----
library("MASS")
library("AER")
library("tidyverse")
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.4
df <- tibble(Boston)
##
## Call:
## lm(formula = df$medv ~ df$lstat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.168 -3.990 -1.318 2.034 24.500
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.55384 0.56263 61.41 <2e-16 ***
## df$lstat -0.95005 0.03873 -24.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.216 on 504 degrees of freedom
## Multiple R-squared: 0.5441, Adjusted R-squared: 0.5432
## F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
##
## Call:
## lm(formula = df$medv ~ I(log(df$lstat, base = exp(1))))
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.4599 -3.5006 -0.6686 2.1688 26.0129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52.1248 0.9652 54.00 <2e-16 ***
## I(log(df$lstat, base = exp(1))) -12.4810 0.3946 -31.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.329 on 504 degrees of freedom
## Multiple R-squared: 0.6649, Adjusted R-squared: 0.6643
## F-statistic: 1000 on 1 and 504 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = df$medv ~ log(df$lstat) + I(log(df$lstat)^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.7372 -3.2447 -0.6264 2.3164 26.8335
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.8109 2.7446 22.885 < 2e-16 ***
## log(df$lstat) -22.5970 2.4683 -9.155 < 2e-16 ***
## I(log(df$lstat)^2) 2.2232 0.5357 4.150 3.9e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.245 on 503 degrees of freedom
## Multiple R-squared: 0.676, Adjusted R-squared: 0.6748
## F-statistic: 524.8 on 2 and 503 DF, p-value: < 2.2e-16
## Argumentos
x1 = 10
x2 = 11
# Função que retorna a variação de y em um intervalo de x, que se situa entre por x1 e x2
delta_y <- function(x1, x2){
c = summary(log2_mod)$coefficients[1]
b = summary(log2_mod)$coefficients[2]
a = summary(log2_mod)$coefficients[3]
y1 = b*log(x1, base = exp(1)) + a*log(x1, base = exp(1))^2 + c
y2 = b*log(x2, base = exp(1)) + a*log(x2, base = exp(1))^2 + c
return(y2 - y1)
}
delta_y(x1, x2)
## [1] -1.157735
df <- df %>%
mutate(old = ifelse(age >= 95, 1, 0))
mod_co <- lm(df$medv ~ df$chas + df$old + I(df$chas*df$old))
summary(mod_co)
##
## Call:
## lm(formula = df$medv ~ df$chas + df$old + I(df$chas * df$old))
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.771 -4.765 -1.683 2.776 33.433
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.6989 0.4503 52.624 < 2e-16 ***
## df$chas 4.0582 1.6872 2.405 0.0165 *
## df$old -7.1319 0.9493 -7.513 2.67e-13 ***
## I(df$chas * df$old) 10.5462 3.7577 2.807 0.0052 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.604 on 502 degrees of freedom
## Multiple R-squared: 0.1301, Adjusted R-squared: 0.1249
## F-statistic: 25.02 on 3 and 502 DF, p-value: 4.244e-15
Analisando o modelo “mod_co” através da função summary, podemos tirar 3 conclusões: 1- Caso a variável dummy “old” seja = 1 (“age” >= 95%) e a variável “chas” for =0 , o valor esperado para a mediana dos preços das casas na vizinhança tende a cair 7.1319 ( US$ 7131.9) 2- Se a variável “old” for = 0 e a variável “chas” for = 1,o valor esperado da mediana dos preços tende a subir 4.0582 (US$4058,2) 3- Já se o valor das duas variáveis for = 1, o valor esperado da mediana dos preços das casas tende a subir 10.5462 (US $10546.2).
##
## Call:
## lm(formula = df$medv ~ df$indus + df$old + I(df$indus * df$old))
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.379 -5.066 -1.588 3.015 33.046
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.06350 0.72882 41.250 <2e-16 ***
## df$indus -0.65844 0.06569 -10.024 <2e-16 ***
## df$old -7.48438 3.07918 -2.431 0.0154 *
## I(df$indus * df$old) 0.37115 0.17558 2.114 0.0350 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.024 on 502 degrees of freedom
## Multiple R-squared: 0.2434, Adjusted R-squared: 0.2389
## F-statistic: 53.83 on 3 and 502 DF, p-value: < 2.2e-16
Analisando o modelo percebe-se uma situação parecida à vista no exercício 8, variando Indus positivamente o preço tende a cair 0.65844. Já analisando a variável Indus e Old ao mesmo tempo, nota-se que o aumento simultâneo nessas duas variáveis leva a um aumento de preço de 0.37115.