La Regresión Lineal múltiple

La regresión lineal múltiple permite generar un modelo lineal en el que el valor de la variable dependiente o respuesta (\(Y\)) se determina a partir de un conjunto de variables independientes llamadas predictores (\(X1\), \(X2\), \(X3\),…, \(Xn\)). Es una extensión de la regresión lineal simple.

Los modelos lineales múltiples tiene la siguiente forma: \(Y_{i}=(\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}+\cdots+\beta_{n}X_{ni})+e_{i}\)

Donde \(Y\) es la variablde dependiente.
\(\beta_{0}\) es la ordenada en el origen (o intercepta), el valor de la variable dependiente \(Y\) cuando todos los predictores son cero.
\(\beta_{i}\) es el efecto promedio que tiene el incremento en una unidad de la variable predictora \(Yi\) sobre la variable dependiente \(Xi\), manteniéndose constantes el resto de variables. Se conocen como coeficientes parciales de regresión.
\(e_{i}\): es el residuo o error. Calculado como la diferencia entre el valor observado y el estimado por el modelo.

El modelo de Regresion Lineal Multiple (MRM) tiene la siguiente estructura matricial. \[Y_{nx1}=X_{nx(k+1)}\beta _{k+1}+\varepsilon _{nx1}\]

Modelo \[\begin{bmatrix}Y_{1} \\Y_{2} \\\vdots \\ Y_{n} \end{bmatrix}=\begin{bmatrix} 1 & X_{11} & \cdots & X_{1k}\\ 1 & X_{12} & \cdots & \cdots\\ \vdots & \vdots & \ddots & \ddots \\ 1 & X_{n1} & \cdots & X_{nk} \end{bmatrix}\begin{bmatrix} \beta _{0}\\ \beta _{1}\ \\\vdots \ \\\beta _{k} \ \end{bmatrix}+\begin{bmatrix} \varepsilon _{1} \\ \varepsilon _{2} \\ \vdots \\ \varepsilon _{n} \end{bmatrix}\]

La librerías

library(tidyr)
library(pastecs)
library(correlation)

## Warning: package 'correlation' was built under R version 4.1.3

library(boot)
library(corrplot)
library(qgraph)

En este video lo explico todo

Si no abre, copia y pega el link a tu explorador

library(vembedr)
embed_url("https://www.youtube.com/watch?v=cpLoEcJC1m4")

Los datos

Los datos que se utilizarán son de las librerías de r, estos son: data(“trees”). Estos datos vienen como: “Girth”, “Height”, y “Volume”, es decir; es la circunferencia de árbol a 1.3 m, la altura total del árbol, y el volumen. Son 31 observaciones y 3 variables de Black Cherry (Prunus serotina). Note que las unidades son en pulgadas, pies y pies cubicos, respectivamente. Para eso será necesario convertirlas al sistema métrico decimal (S.M.D.), para mejor entendimiento. Ademas re-nombraremos las variables. Note that the diameter (in inches) is erroneously labelled Girth in the data.

Datos<-trees                    # el objeto
Datos<-Datos %>% drop_na()      # Eliminar celdas con NA
attach(Datos)                   # crucial
head(Datos)

##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

Los modelos a ajustar

Vol = β₀ + (β₁*D) + ε
Vol = β₀ + (β₁*H) + ε
Vol = β₀ + (β₁*D) + (β₂*H) + ε
Vol = β₀ + (β₁*D) + (β₂*H) + (β₃*DH) + ε
Vol = β₀ + (β₁*D²) + β₂ (H²) + ε
Vol = β₀ + (β₁*D) + (β₂*H)+ (β₃*D²H) + ε
Vol = β₀ + (β₁*DH) + ε
Vol = β₀ + (β₁*D²H) + ε
Vol = β₀ + (β₁ D²H²) + ε
lnVol = β₀ + (β₁*lnD) + ε
lnVol = β₀ + (β₁*lnH) + ε
lnVol = β₀ + (β₁*lnDH) + ε
lnVol = β₀ + (β₁*lnD²H) + ε
lnVol = β₀ + (β₁*lnD²H²) + ε
lnVol = β₀ + (β₁*lnD) + (β₂*lnH) + (β₃*D²H) + ε
lnVol = β₀ + (β₁*lnD) + (β₂*lnH) + ε

Generación de variables necesarias

Datos$D<-(Girth*2.54)           #Convertir de "inches" a "cm" y llamar "Girth" como "Diametro"
Datos$H<-(Height*0.3048)         #Convertir de "ft" a "m" y llamar "Height" como "Altura"
Datos$V<-(Volume*0.0283168)      #Convertir de "cubic ft" a "m3" y llamar "Volume" como "Volumen"
attach(Datos)
plot(D, V)

# 
# #Generando las variables necesarias para el modelo 
# 
Datos$DH<-D*H; attach(Datos)
Datos$D2<-D*D; attach(Datos)
Datos$H2<-H*H; attach(Datos)
Datos$D2H<-D2*H; attach(Datos)
Datos$D2H2<-D2*H2; attach(Datos)
Datos$lnDH<-log(DH); attach(Datos)
Datos$lnD2H<-log(D2H); attach(Datos)
Datos$lnD2H2<-log(D2H2); attach(Datos)
Datos$lnD<-log(D); attach(Datos)
Datos$lnH<-log(H); attach(Datos)
Datos$lnV<-log(V); attach(Datos)
View(Datos)
Datos<-Datos %>% drop_na()      # Eliminar celdas con NA
attach(Datos)

Visualizando la correlación de las variables

library(psych)

## Warning: package 'psych' was built under R version 4.1.3

library(colorRamps)

## Warning: package 'colorRamps' was built under R version 4.1.3

library(corrr)

## Warning: package 'corrr' was built under R version 4.1.3

res.cor <- correlate(Datos); res.cor

## # A tibble: 17 x 18
##    term    Girth Height Volume      D      H      V     DH     D2     H2    D2H
##    <chr>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 Girth  NA      0.519  0.967  1      0.519  0.967  0.972  0.993  0.520  0.980
##  2 Height  0.519 NA      0.598  0.519  1      0.598  0.697  0.508  0.999  0.596
##  3 Volume  0.967  0.598 NA      0.967  0.598  1      0.977  0.979  0.603  0.989
##  4 D       1      0.519  0.967 NA      0.519  0.967  0.972  0.993  0.520  0.980
##  5 H       0.519  1      0.598  0.519 NA      0.598  0.697  0.508  0.999  0.596
##  6 V       0.967  0.598  1      0.967  0.598 NA      0.977  0.979  0.603  0.989
##  7 DH      0.972  0.697  0.977  0.972  0.697  0.977 NA      0.971  0.699  0.987
##  8 D2      0.993  0.508  0.979  0.993  0.508  0.979  0.971 NA      0.512  0.992
##  9 H2      0.520  0.999  0.603  0.520  0.999  0.603  0.699  0.512 NA      0.601
## 10 D2H     0.980  0.596  0.989  0.980  0.596  0.989  0.987  0.992  0.601 NA    
## 11 D2H2    0.957  0.655  0.984  0.957  0.655  0.984  0.987  0.974  0.662  0.995
## 12 lnDH    0.957  0.726  0.938  0.957  0.726  0.938  0.984  0.936  0.724  0.947
## 13 lnD2H   0.981  0.644  0.946  0.981  0.644  0.946  0.982  0.959  0.642  0.957
## 14 lnD2H2  0.957  0.726  0.938  0.957  0.726  0.938  0.984  0.936  0.724  0.947
## 15 lnD     0.992  0.530  0.940  0.992  0.530  0.940  0.961  0.970  0.529  0.953
## 16 lnH     0.517  0.999  0.592  0.517  0.999  0.592  0.693  0.504  0.995  0.590
## 17 lnV     0.969  0.648  0.960  0.969  0.648  0.960  0.974  0.950  0.646  0.950
## # ... with 7 more variables: D2H2 <dbl>, lnDH <dbl>, lnD2H <dbl>, lnD2H2 <dbl>,
## #   lnD <dbl>, lnH <dbl>, lnV <dbl>

corPlot(Datos[, 4:15],
        gr = colorRampPalette(heat.colors(40)))   # correlacion

Ajustando los modelos

Vol = β₀ + (β₁*D) + ε
Vol = β₀ + (β₁*H) + ε
Vol = β₀ + (β₁*D) + (β₂*H) + ε
Vol = β₀ + (β₁*D) + (β₂*H) + (β₃*DH) + ε
Vol = β₀ + (β₁*D²) + β₂ (H²) + ε
Vol = β₀ + (β₁*D) + (β₂*H)+ (β₃*(D²H)) + ε
Vol = β₀ + (β₁*DH) + ε
Vol = β₀ + (β₁*D²H) + ε
Vol = β₀ + (β₁ D²H²) + ε
lnVol = β₀ + (β₁*lnD) + ε
lnVol = β₀ + (β₁*lnH) + ε
lnVol = β₀ + (β₁*lnDH) + ε
lnVol = β₀ + (β₁*lnD²H) + ε
lnVol = β₀ + (β₁*lnD²H²) + ε
lnVol = β₀ + (β₁*lnD) + (β₂*lnH) + (β₃*D²H) + ε
lnVol = β₀ + (β₁*lnD) + (β₂*lnH) + ε

mod_1 <- lm(V ~ D, data=Datos); summary(mod_1)

## 
## Call:
## lm(formula = V ~ D, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.228385 -0.087972  0.004303  0.098961  0.271468 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.046121   0.095290  -10.98 7.62e-12 ***
## D            0.056476   0.002758   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1204 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

mod_2 <- lm(V ~ H, data=Datos); summary(mod_2)

## 
## Call:
## lm(formula = V ~ H, data = Datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60242 -0.28018 -0.08195  0.34171  0.84532 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.46706    0.82892  -2.976 0.005835 ** 
## H            0.14338    0.03566   4.021 0.000378 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3794 on 29 degrees of freedom
## Multiple R-squared:  0.3579, Adjusted R-squared:  0.3358 
## F-statistic: 16.16 on 1 and 29 DF,  p-value: 0.0003784

mod_3 <- lm(V ~ D + H, data=Datos); summary(mod_3)

## 
## Call:
## lm(formula = V ~ D + H, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.181411 -0.075021 -0.008143  0.062306  0.240259 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.642025   0.244607  -6.713 2.75e-07 ***
## D            0.052488   0.002946  17.816  < 2e-16 ***
## H            0.031517   0.012091   2.607   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared:  0.948,  Adjusted R-squared:  0.9442 
## F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

mod_4 <- lm(V ~ D + H + DH, data=Datos); summary(mod_4)

## 
## Call:
## lm(formula = V ~ D + H + DH, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.186383 -0.030222  0.008568  0.044291  0.132096 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.9650816  0.6749523   2.911  0.00713 ** 
## D           -0.0652830  0.0214197  -3.048  0.00511 ** 
## H           -0.1205028  0.0287853  -4.186  0.00027 ***
## DH           0.0049251  0.0008916   5.524 7.48e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0767 on 27 degrees of freedom
## Multiple R-squared:  0.9756, Adjusted R-squared:  0.9728 
## F-statistic: 359.3 on 3 and 27 DF,  p-value: < 2.2e-16

mod_5 <- lm(V ~ D2 + H2, data=Datos); summary(mod_5)

## 
## Call:
## lm(formula = V ~ D2 + H2, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.145436 -0.062615  0.005763  0.069788  0.120467 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.162e-01  9.040e-02  -4.604 8.19e-05 ***
## D2           7.381e-04  2.919e-05  25.283  < 2e-16 ***
## H2           7.206e-04  1.887e-04   3.818 0.000682 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07875 on 28 degrees of freedom
## Multiple R-squared:  0.9733, Adjusted R-squared:  0.9714 
## F-statistic:   510 on 2 and 28 DF,  p-value: < 2.2e-16

mod_6 <- lm(V ~ D + H + D2H, data=Datos); summary(mod_6)

## 
## Call:
## lm(formula = V ~ D + H + D2H, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.139710 -0.031007 -0.004463  0.048810  0.119015 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.682e-02  3.097e-01  -0.151    0.881    
## D           -1.050e-03  9.067e-03  -0.116    0.909    
## H            2.780e-03  9.326e-03   0.298    0.768    
## D2H          3.092e-05  5.113e-06   6.047 1.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07295 on 27 degrees of freedom
## Multiple R-squared:  0.9779, Adjusted R-squared:  0.9754 
## F-statistic: 398.1 on 3 and 27 DF,  p-value: < 2.2e-16

mod_7 <- lm(V ~ DH, data=Datos); summary(mod_7)

## 
## Call:
## lm(formula = V ~ DH, data = Datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27423 -0.05360 -0.01545  0.07503  0.15607 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.148e-01  6.652e-02  -10.75 1.26e-11 ***
## DH           1.993e-03  8.125e-05   24.53  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1015 on 29 degrees of freedom
## Multiple R-squared:  0.954,  Adjusted R-squared:  0.9524 
## F-statistic: 601.7 on 1 and 29 DF,  p-value: < 2.2e-16

mod_8 <- lm(V ~ D2H -1, data=Datos); summary(mod_8)

## 
## Call:
## lm(formula = V ~ D2H - 1, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.132227 -0.030672 -0.009461  0.045434  0.121603 
## 
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)    
## D2H 3.036e-05  3.920e-07   77.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06952 on 30 degrees of freedom
## Multiple R-squared:  0.995,  Adjusted R-squared:  0.9949 
## F-statistic:  5996 on 1 and 30 DF,  p-value: < 2.2e-16

mod_9 <- lm(V ~ D2H2, data=Datos); summary(mod_9)

## 
## Call:
## lm(formula = V ~ D2H2, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.210030 -0.062616 -0.005158  0.062191  0.151258 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 8.058e-02  2.975e-02   2.709   0.0112 *  
## D2H2        1.155e-06  3.836e-08  30.100   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08338 on 29 degrees of freedom
## Multiple R-squared:  0.969,  Adjusted R-squared:  0.9679 
## F-statistic:   906 on 1 and 29 DF,  p-value: < 2.2e-16

mod_10 <- lm(lnV ~ lnD, data=Datos); summary(mod_10)

## 
## Call:
## lm(formula = lnV ~ lnD, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.205999 -0.068702  0.001011  0.072585  0.247963 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.96836    0.31416  -25.36   <2e-16 ***
## lnD          2.19997    0.08983   24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.115 on 29 degrees of freedom
## Multiple R-squared:  0.9539, Adjusted R-squared:  0.9523 
## F-statistic: 599.7 on 1 and 29 DF,  p-value: < 2.2e-16

mod_11 <- lm(lnV ~ lnH, data=Datos); summary(mod_11)

## 
## Call:
## lm(formula = lnV ~ lnH, data = Datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.65691 -0.27917 -0.08039  0.42193  0.61252 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -12.7919     2.7247  -4.695 5.92e-05 ***
## lnH           3.9821     0.8677   4.589 7.93e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4074 on 29 degrees of freedom
## Multiple R-squared:  0.4207, Adjusted R-squared:  0.4008 
## F-statistic: 21.06 on 1 and 29 DF,  p-value: 7.928e-05

mod_12 <- lm(lnV ~ lnDH, data=Datos); summary(mod_12)

## 
## Call:
## lm(formula = lnV ~ lnDH, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.245089 -0.032449  0.006533  0.067887  0.150925 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -12.19442    0.40029  -30.46   <2e-16 ***
## lnDH          1.79567    0.06033   29.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0953 on 29 degrees of freedom
## Multiple R-squared:  0.9683, Adjusted R-squared:  0.9672 
## F-statistic: 885.8 on 1 and 29 DF,  p-value: < 2.2e-16

mod_13 <- lm(lnV ~ lnD2H, data=Datos); summary(mod_13)

## 
## Call:
## lm(formula = lnV ~ lnD2H, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.167714 -0.048284 -0.007407  0.064576  0.137522 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -10.45760    0.28721  -36.41   <2e-16 ***
## lnD2H         1.00473    0.02835   35.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08041 on 29 degrees of freedom
## Multiple R-squared:  0.9774, Adjusted R-squared:  0.9767 
## F-statistic:  1256 on 1 and 29 DF,  p-value: < 2.2e-16

mod_14 <- lm(lnV ~ lnD2H2, data=Datos); summary(mod_14)

## 
## Call:
## lm(formula = lnV ~ lnD2H2, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.245089 -0.032449  0.006533  0.067887  0.150925 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -12.19442    0.40029  -30.46   <2e-16 ***
## lnD2H2        0.89784    0.03017   29.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0953 on 29 degrees of freedom
## Multiple R-squared:  0.9683, Adjusted R-squared:  0.9672 
## F-statistic: 885.8 on 1 and 29 DF,  p-value: < 2.2e-16

mod_15 <- lm(lnV ~ lnD + lnH + lnD2H, data=Datos); summary(mod_15)

## 
## Call:
## lm(formula = lnV ~ lnD + lnH + lnD2H, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.168561 -0.048488  0.002431  0.063637  0.129223 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -10.71682    0.54996 -19.487  < 2e-16 ***
## lnD           1.98265    0.07501  26.432  < 2e-16 ***
## lnH           1.11712    0.20444   5.464 7.81e-06 ***
## lnD2H              NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08139 on 28 degrees of freedom
## Multiple R-squared:  0.9777, Adjusted R-squared:  0.9761 
## F-statistic: 613.2 on 2 and 28 DF,  p-value: < 2.2e-16

mod_16 <- lm(lnV ~ lnD + lnH, data=Datos); summary(mod_16)

## 
## Call:
## lm(formula = lnV ~ lnD + lnH, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.168561 -0.048488  0.002431  0.063637  0.129223 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -10.71682    0.54996 -19.487  < 2e-16 ***
## lnD           1.98265    0.07501  26.432  < 2e-16 ***
## lnH           1.11712    0.20444   5.464 7.81e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08139 on 28 degrees of freedom
## Multiple R-squared:  0.9777, Adjusted R-squared:  0.9761 
## F-statistic: 613.2 on 2 and 28 DF,  p-value: < 2.2e-16

Seleccionando el mejor modelo

# estadisticos de todos los modelos
library(performance)

## Warning: package 'performance' was built under R version 4.1.3

Seleccion<-compare_performance(mod_1, mod_2, mod_3, mod_4, mod_5, mod_6, mod_7,
                    mod_8, mod_9, mod_10, mod_11, mod_12, mod_13, mod_14,
                    mod_15, mod_16,
                    rank = TRUE, 
                    verbose = TRUE, metrics = "all");

## Warning: When comparing models, please note that probably not all models were fit
##   from same data.

Seleccion

## # Comparison of Model Performance Indices
## 
## Name   | Model |    R2 | R2 (adj.) |  RMSE | Sigma | AIC weights | BIC weights | Performance-Score
## --------------------------------------------------------------------------------------------------
## mod_8  |    lm | 0.995 |     0.995 | 0.068 | 0.070 |       0.915 |       0.984 |            99.98%
## mod_6  |    lm | 0.978 |     0.975 | 0.068 | 0.073 |       0.052 |       0.007 |            66.62%
## mod_4  |    lm | 0.976 |     0.973 | 0.072 | 0.077 |       0.011 |       0.001 |            65.29%
## mod_13 |    lm | 0.977 |     0.977 | 0.078 | 0.080 |       0.006 |       0.003 |            64.88%
## mod_5  |    lm | 0.973 |     0.971 | 0.075 | 0.079 |       0.008 |       0.002 |            64.87%
## mod_15 |    lm | 0.978 |     0.976 | 0.077 | 0.081 |       0.003 |    7.00e-04 |            64.74%
## mod_16 |    lm | 0.978 |     0.976 | 0.077 | 0.081 |       0.003 |    7.00e-04 |            64.74%
## mod_9  |    lm | 0.969 |     0.968 | 0.081 | 0.083 |       0.002 |       0.001 |            64.03%
## mod_12 |    lm | 0.968 |     0.967 | 0.092 | 0.095 |    3.23e-05 |    1.70e-05 |            62.77%
## mod_14 |    lm | 0.968 |     0.967 | 0.092 | 0.095 |    3.23e-05 |    1.70e-05 |            62.77%
## mod_7  |    lm | 0.954 |     0.952 | 0.098 | 0.102 |    4.55e-06 |    2.39e-06 |            61.40%
## mod_3  |    lm | 0.948 |     0.944 | 0.104 | 0.110 |    2.45e-07 |    6.29e-08 |            60.30%
## mod_10 |    lm | 0.954 |     0.952 | 0.111 | 0.115 |    9.65e-08 |    5.07e-08 |            60.07%
## mod_1  |    lm | 0.935 |     0.933 | 0.116 | 0.120 |    2.30e-08 |    1.21e-08 |            58.56%
## mod_11 |    lm | 0.421 |     0.401 | 0.394 | 0.407 |    8.93e-25 |    4.69e-25 |             3.29%
## mod_2  |    lm | 0.358 |     0.336 | 0.367 | 0.379 |    8.14e-24 |    4.27e-24 |             2.77%

library(DT)
datatable(Seleccion, extensions = "Buttons", 
          options = list(dom = "Bfrtip",
          buttons = c("copy", "csv", "excel", "pdf")))

Visualizando todos los modelos

library(see)

## Warning: package 'see' was built under R version 4.1.3

plot(compare_performance(mod_1, mod_2, mod_3, mod_4, mod_5, mod_6, mod_7, 
                    mod_8, mod_9, mod_10, mod_11, mod_12, mod_13, mod_14,
                    mod_15, mod_16, rank = TRUE))

## Warning: When comparing models, please note that probably not all models were fit
##   from same data.

IMPORTANTE

Verificando el cumplimiento o no de los supuestos

############ DE AQUI EN ADELANTE ES EL ANALISIS DEL MODELO QUE DESEE 
# revisar los supuestos del modelo
library(performance) 
library(car)
modelo<-mod_3                            # nombre del modelo (objeto) a evaluar
summary(modelo)

## 
## Call:
## lm(formula = V ~ D + H, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.181411 -0.075021 -0.008143  0.062306  0.240259 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.642025   0.244607  -6.713 2.75e-07 ***
## D            0.052488   0.002946  17.816  < 2e-16 ***
## H            0.031517   0.012091   2.607   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared:  0.948,  Adjusted R-squared:  0.9442 
## F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

confint(modelo)

##                    2.5 %      97.5 %
## (Intercept) -2.143079490 -1.14097039
## D            0.046453365  0.05852304
## H            0.006749267  0.05628557

check_model(modelo)

outlierTest(modelo, cutoff=Inf, n.max=5)  # prueba de bonferroni para detectar outliers

##     rstudent unadjusted p-value Bonferroni p
## 31  2.765603           0.010122      0.31377
## 18 -1.859901           0.073827           NA
## 2   1.651668           0.110190           NA
## 3   1.567740           0.128590           NA
## 1   1.532069           0.137140           NA

check_heteroskedasticity (modelo)         # los errores deben mostrar homogeneidad

## OK: Error variance appears to be homoscedastic (p = 0.206).

check_autocorrelation (modelo)            # residualaes no correlacionados

## Warning: Autocorrelated residuals detected (p = 0.022).

check_collinearity (modelo)               # variables (x) independienets no correlacioandas

## # Check for Multicollinearity
## 
## Low Correlation
## 
##  Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
##     D 1.37 [1.09, 2.47]         1.17      0.73     [0.41, 0.91]
##     H 1.37 [1.09, 2.47]         1.17      0.73     [0.41, 0.91]

check_normality (modelo)                  # normalidad de los residuales (errores)

## OK: residuals appear as normally distributed (p = 0.639).

Visualizando los efectos de las variables predictivas

library("effects")
plot(predictorEffects(modelo))

Visualizando los estimados por el modelo

library(sjPlot)

## Warning: package 'sjPlot' was built under R version 4.1.3

plot_model(modelo, type = "pred", show.data = TRUE, xlab="hola")

## $D

## 
## $H

Detectando observaciones influyentes

library(car)
Anova<-Anova(modelo); Anova    # el anova del modelo

## Anova Table (Type II tests)
## 
## Response: V
##           Sum Sq Df  F value  Pr(>F)    
## D         3.8352  1 317.4129 < 2e-16 ***
## H         0.0821  1   6.7943 0.01449 *  
## Residuals 0.3383 28                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(modelo)

## 
## Call:
## lm(formula = V ~ D + H, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.181411 -0.075021 -0.008143  0.062306  0.240259 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.642025   0.244607  -6.713 2.75e-07 ***
## D            0.052488   0.002946  17.816  < 2e-16 ***
## H            0.031517   0.012091   2.607   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared:  0.948,  Adjusted R-squared:  0.9442 
## F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

# Análisis de observaciones influyentes 
library(car)
summary(influence.measures(modelo))

## Potentially influential observations of
##   lm(formula = V ~ D + H, data = Datos) :
## 
##    dfb.1_ dfb.D dfb.H dffit   cov.r cook.d hat  
## 31 -0.74   0.97  0.34  1.50_*  0.68  0.61   0.23

influencePlot(modelo)

##      StudRes       Hat     CookD
## 18 -1.859901 0.1434615 0.1775359
## 20 -1.105744 0.2112366 0.1082855
## 31  2.765603 0.2270585 0.6052326

Una forma muy elegante de ver el modelo

library(splines)
library(visreg)

## Warning: package 'visreg' was built under R version 4.1.3

summary(modelo)

## 
## Call:
## lm(formula = V ~ D + H, data = Datos)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.181411 -0.075021 -0.008143  0.062306  0.240259 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.642025   0.244607  -6.713 2.75e-07 ***
## D            0.052488   0.002946  17.816  < 2e-16 ***
## H            0.031517   0.012091   2.607   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared:  0.948,  Adjusted R-squared:  0.9442 
## F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

# forma elegante de ver el modelo
visreg2d(modelo, x="D", y="H", plot.type="image")    # que variables?

# visreg2d(modelo, x="D", y="H", plot.type="persp")    
# visreg2d(modelo, "D", "H", plot.type="rgl")

Otra forma elegante de ver el modelo

library("plot3D")
colnames(Datos)

##  [1] "Girth"  "Height" "Volume" "D"      "H"      "V"      "DH"     "D2"    
##  [9] "H2"     "D2H"    "D2H2"   "lnDH"   "lnD2H"  "lnD2H2" "lnD"    "lnH"   
## [17] "lnV"

# x, y and z coordinates
x <- D     # Cual es la variable x de su modelo?
y <- H     # Cual es la variable y de su modelo?
z <- V     # Cual es la variable que esta prediciendo?

scatter3D(x, y, z, clab = c("Ozone", "(ppb)"))

# Compute the linear regression 
fit <- lm(z ~ x + y); summary(fit)

## 
## Call:
## lm(formula = z ~ x + y)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.181411 -0.075021 -0.008143  0.062306  0.240259 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.642025   0.244607  -6.713 2.75e-07 ***
## x            0.052488   0.002946  17.816  < 2e-16 ***
## y            0.031517   0.012091   2.607   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared:  0.948,  Adjusted R-squared:  0.9442 
## F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

# predict values on regular xy grid
grid.lines = 100
x.pred <- seq(min(x), max(x), length.out = grid.lines)
y.pred <- seq(min(y), max(y), length.out = grid.lines)
xy <- expand.grid( x = x.pred, y = y.pred)
z.pred <- matrix(predict(fit, newdata = xy), 
                 nrow = grid.lines, ncol = grid.lines)

# fitted points for droplines to surface
fitpoints <- predict(fit)
# scatter plot with regression plane
scatter3D(x, y, z, pch = 18, cex = 2, 
          theta = 20, phi = 20, ticktype = "detailed",
          xlab = "Diametro (cm)", ylab = "Altura (m)", zlab = "Volumen (m3)",  
          surf = list(x = x.pred, y = y.pred, z = z.pred,  
                      facets = NA, fit = fitpoints), main = "")

library("plot3Drgl")
plotrgl()

Informe y tabla del modelo

library(report)

## Warning: package 'report' was built under R version 4.1.3

report(modelo)

## We fitted a linear model (estimated using OLS) to predict V with D (formula: V
## ~ D + H). The model explains a statistically significant and substantial
## proportion of variance (R2 = 0.95, F(2, 28) = 254.97, p < .001, adj. R2 =
## 0.94). The model's intercept, corresponding to D = 0, is at -1.64 (95% CI
## [-2.14, -1.14], t(28) = -6.71, p < .001). Within this model:
## 
##   - The effect of D is statistically significant and positive (beta = 0.05, 95%
## CI [0.05, 0.06], t(28) = 17.82, p < .001; Std. beta = 0.90, 95% CI [0.80,
## 1.00])
##   - The effect of H is statistically significant and positive (beta = 0.03, 95%
## CI [6.75e-03, 0.06], t(28) = 2.61, p = 0.014; Std. beta = 0.13, 95% CI [0.03,
## 0.23])
## 
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation. and We fitted a linear
## model (estimated using OLS) to predict V with H (formula: V ~ D + H). The model
## explains a statistically significant and substantial proportion of variance (R2
## = 0.95, F(2, 28) = 254.97, p < .001, adj. R2 = 0.94). The model's intercept,
## corresponding to H = 0, is at -1.64 (95% CI [-2.14, -1.14], t(28) = -6.71, p <
## .001). Within this model:
## 
##   - The effect of D is statistically significant and positive (beta = 0.05, 95%
## CI [0.05, 0.06], t(28) = 17.82, p < .001; Std. beta = 0.90, 95% CI [0.80,
## 1.00])
##   - The effect of H is statistically significant and positive (beta = 0.03, 95%
## CI [6.75e-03, 0.06], t(28) = 2.61, p = 0.014; Std. beta = 0.13, 95% CI [0.03,
## 0.23])
## 
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation.

tabla<-report_table(modelo)

library(DT)
# datatable(tabla)
datatable(tabla, extensions = "Buttons", 
          options = list(dom = "Bfrtip",
          buttons = c("copy", "csv", "excel", "pdf")))

Como quedaria el modelo?

library(equatiomatic)

## Warning: package 'equatiomatic' was built under R version 4.1.3

extract_eq(modelo)                            # El modelo final

\[ \operatorname{V} = \alpha + \beta_{1}(\operatorname{D}) + \beta_{2}(\operatorname{H}) + \epsilon \]

extract_eq(modelo, wrap = TRUE, use_coefs = TRUE) # El modelo final y coefficientes

\[ \begin{aligned} \operatorname{\widehat{V}} &= -1.64 + 0.05(\operatorname{D}) + 0.03(\operatorname{H}) \end{aligned} \]

Sigue aqui el curso completo

library(vembedr)
embed_url("https://www.youtube.com/watch?v=Ke3vfYpKmtU") # lecccion 1

embed_url("https://www.youtube.com/watch?v=Yk8QOjhUvBY") # lecccion 2

embed_url("https://www.youtube.com/watch?v=qFIXTihSZ2w") # lecccion 3

embed_url("https://www.youtube.com/watch?v=KBoTaKgGtg8") # lecccion 4

embed_url("https://www.youtube.com/watch?v=5cDtzvCDzy0") # lecccion 5

REGRESIÓN LINEAL MÚLTIPLE

Jorge Méndez González (UAAAN)

2022-09-11