Clase 27 - Correlación y regresión

logo

2. Regresión lineal simple

PAIS GASTO_US EXPVIDA_ANOS
Australia 4492.554 82.5
Austria 5100.015 81.3
Belgium 4778.452 81.1
Chile 1877.359 79.1
Czech Republic 2466.027 78.7
Denmark 5057.861 80.8

## 
## Call:
## lm(formula = EXPVIDA_ANOS ~ GASTO_US, data = datos)
## 
## Coefficients:
## (Intercept)     GASTO_US  
##   7.798e+01    6.637e-04

## 
## Call:
## lm(formula = EXPVIDA_ANOS ~ GASTO_US, data = datos)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.4864 -0.7812  0.0651  1.4460  2.9796 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.798e+01  8.012e-01  97.323  < 2e-16 ***
## GASTO_US    6.637e-04  1.870e-04   3.549  0.00122 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.097 on 32 degrees of freedom
## Multiple R-squared:  0.2824, Adjusted R-squared:   0.26 
## F-statistic:  12.6 on 1 and 32 DF,  p-value: 0.001219
##                   2.5 %       97.5 %
## (Intercept) 76.34447058 79.608509537
## GASTO_US     0.00028278  0.001044611

3. Regresión exponencial

\[ \begin{equation} y=a e^{b X_i} = 4 * e^{-0.01 X_i} \end{equation} \]

## 
## Formula: yprima ~ a * exp(-b * x)
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## a 3.8364052  0.0510798   75.11   <2e-16 ***
## b 0.0100053  0.0001877   53.30   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2525 on 498 degrees of freedom
## 
## Number of iterations to convergence: 3 
## Achieved convergence tolerance: 1.242e-07

\[ \begin{equation} y=a e^{b X_i} = 4 * e^{-0.01 X_i} \end{equation} \]

\[ \begin{equation} y'_i=a e^{b X_i} \approx 3.84 * e^{-0.01 X_i} \end{equation} \]

4. Regresión potencial

\[ \begin{equation} y_i=c x^a = 2 x^{1.5} \end{equation} \]

## 
## Formula: yprima ~ a * x^b
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## a  1.99819    0.96997    2.06   0.0399 *  
## b  1.48415    0.08133   18.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4666 on 498 degrees of freedom
## 
## Number of iterations to convergence: 9 
## Achieved convergence tolerance: 4.881e-08

\[ \begin{equation} y_i=c x^a = 2 x^{1.5} \end{equation} \]

\[ \begin{equation} y_i=c x^a \approx 1.99 x^{1.48} \end{equation} \]

5. Regresión logarítmica

\[ \begin{equation} y_i= c +b*log(x) = 10 +20*log(x) \end{equation} \]

## 
## Formula: yprima ~ a + b * log(x)
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## a  10.0901     1.4681   6.873 1.89e-11 ***
## b  19.7425     0.2763  71.445  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.024 on 498 degrees of freedom
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 3.948e-09

\[ \begin{equation} y_i= c +b*log(x) = 10 +20*log(x) \end{equation} \]

\[ \begin{equation} y'_i= c +b*log(x) = 10.09 +19.9*log(x) \end{equation} \]

6. Regresión polinomial

\[ \begin{equation} y_i= a_0 +a_1*x + a_2*x^2=3+4*x +5*x^2 \end{equation} \]

## 
## Formula: yprima ~ a + b * x + c * x^2
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## a -432.68666 3947.22654  -0.110    0.913    
## b   10.86011   36.38562   0.298    0.765    
## c    4.92818    0.07033  70.074   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29300 on 497 degrees of freedom
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 1.631e-06

\[ \begin{equation} y_i= a_0 +a_1*x + a_2*x^2=3+4*x +5*x^2 \end{equation} \]

\[ \begin{equation} y'_i= a_0 +a_1*x + a_2*x^2\approx-432.7+10.8*x +4.92*x^2\approx 5*x^2 \end{equation} \]

7. Regresión lineal múltiple

Nhospital RRHH BSERV Egresos DCO DE
H1 19779705 10982608 15196 91433 93952
H2 27881106 15394204 15960 114528 136033
H3 29969438 15065061 17785 139584 141724
H4 11358890 3408115 7782 32531 32629

## 
## Call:
## lm(formula = datos$RRHH ~ Egresos + DCO + DE, data = datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -8556957 -1018261  -174886   483225 16580483 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 348070.83  253002.28   1.376    0.171    
## Egresos        773.11      81.53   9.483   <2e-16 ***
## DCO             69.96      60.22   1.162    0.247    
## DE              23.05      61.23   0.377    0.707    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2738000 on 177 degrees of freedom
## Multiple R-squared:  0.9436, Adjusted R-squared:  0.9427 
## F-statistic: 987.5 on 3 and 177 DF,  p-value: < 2.2e-16
## Analysis of Variance Table
## 
## Model 1: datos$RRHH ~ Egresos + DCO + DE
## Model 2: datos$RRHH ~ Egresos
## Model 3: datos$RRHH ~ DCO
## Model 4: datos$RRHH ~ DE
##   Res.Df        RSS Df   Sum of Sq      F    Pr(>F)    
## 1    177 1.3273e+15                                    
## 2    179 1.8085e+15 -2 -4.8119e+14 32.084 1.289e-12 ***
## 3    179 2.0087e+15  0 -2.0021e+14                     
## 4    179 2.0419e+15  0 -3.3169e+13                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

8. Regresión multivariada

## 
## Call:
## lm(formula = cbind(RRHH, BSERV) ~ Egresos + DCO + DE, data = datos)
## 
## Coefficients:
##              RRHH        BSERV     
## (Intercept)   348070.83  -567950.81
## Egresos          773.11      459.51
## DCO               69.96       94.25
## DE                23.05      -10.87
## Response RRHH :
## 
## Call:
## lm(formula = RRHH ~ Egresos + DCO + DE, data = datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -8556957 -1018261  -174886   483225 16580483 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 348070.83  253002.28   1.376    0.171    
## Egresos        773.11      81.53   9.483   <2e-16 ***
## DCO             69.96      60.22   1.162    0.247    
## DE              23.05      61.23   0.377    0.707    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2738000 on 177 degrees of freedom
## Multiple R-squared:  0.9436, Adjusted R-squared:  0.9427 
## F-statistic: 987.5 on 3 and 177 DF,  p-value: < 2.2e-16
## 
## 
## Response BSERV :
## 
## Call:
## lm(formula = BSERV ~ Egresos + DCO + DE, data = datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -8400687 -1132474     2251   465098 21709324 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -567950.81  297613.19  -1.908    0.058 .  
## Egresos         459.51      95.90   4.792 3.49e-06 ***
## DCO              94.25      70.83   1.331    0.185    
## DE              -10.87      72.03  -0.151    0.880    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3221000 on 177 degrees of freedom
## Multiple R-squared:  0.8677, Adjusted R-squared:  0.8654 
## F-statistic: 386.9 on 3 and 177 DF,  p-value: < 2.2e-16

9. Regresión logística

## 
## Call:
## glm(formula = Survived ~ Age, data = datos)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.4811  -0.4158  -0.3662   0.5789   0.7252  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.483753   0.041788  11.576   <2e-16 ***
## Age         -0.002613   0.001264  -2.067   0.0391 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.2404297)
## 
##     Null deviance: 172.21  on 713  degrees of freedom
## Residual deviance: 171.19  on 712  degrees of freedom
##   (177 observations deleted due to missingness)
## AIC: 1012.6
## 
## Number of Fisher Scoring iterations: 2

##         1 
## 0.1967612
## 
## Call:
## glm(formula = Sobrevida ~ ., data = datos_proc)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7786  -0.2115  -0.1931   0.2471   0.8401  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.7804879  0.0394345  19.792   <2e-16 ***
## Edad        -0.0009206  0.0010730  -0.858    0.391    
## Femenino    -0.5469036  0.0323428 -16.910   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1717122)
## 
##     Null deviance: 172.21  on 713  degrees of freedom
## Residual deviance: 122.09  on 711  degrees of freedom
##   (177 observations deleted due to missingness)
## AIC: 773.22
## 
## Number of Fisher Scoring iterations: 2