DIFERENCIA ENTRE LA CORRELACIÓN Y REGRESIÓN…

#MÍNIMO DISTANCIA DE VALORES A LA RECTA: MINIMOS CUADRADOS. RECTA CON MÍNIMA DISNTANCIA TOTAL.

RESIUDO/ERROR :DISTANCIA ENTRE LO QUE YO ESPERO Y LO QUE TENGO.
SEGÚN
SE EXPLICA POR VAR.DEPENDIENTE~VAR.INDEPENDIENTE, NOMBRE DE LA BASE DE DATOS

pvalue: lA PROBABILIDAD DE QUE H1 …DADO QUE H0

¿POR QUÉ LO NECESITAMOS? PARA SABER SILA MUESTRA HABLA SIGNIFICATIVAMENTE DE LA POBLACIÓN.

POR CADA UNIDAD PORCENTUAL QUE AUMENTA X SE ESPERA QUE Y AUMENTE EN — 6.78 Y=58.32+6.78*x

r2: porcentaje de la varianza está siendo explicada por Y

load(url("https://www.dropbox.com/s/fyobx9uswy3qgp3/dataWorld_q.rda?dl=1"))
names(dataWorld_q)
##  [1] "country"    "quinq"      "tfr"        "yearSchF"   "contracep" 
##  [6] "age1mar"    "sanitat"    "water"      "birthSkill" "childMort" 
## [11] "deathRate"  "extPov"     "famWorkFem" "femWork"    "incomePp"  
## [16] "income10p"  "gini"       "lifExpFem"  "lifExpTot"  "maleWork"  
## [21] "materMort"  "vaccMeas"   "schGenEq"   "doctor"     "teenFert"
model1<- lm(lifExpFem~femWork,dataWorld_q)
model1
## 
## Call:
## lm(formula = lifExpFem ~ femWork, data = dataWorld_q)
## 
## Coefficients:
## (Intercept)      femWork  
##     78.5885      -0.1623

Y=78.5885 -0.1623*x

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6     v purrr   0.3.4
## v tibble  3.1.8     v dplyr   1.0.9
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
ggplot(dataWorld_q, aes(x=femWork, y=lifExpFem)) +
geom_point() + geom_smooth(method="lm", se = F)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 1642 rows containing non-finite values (stat_smooth).
## Warning: Removed 1642 rows containing missing values (geom_point).

No hay relación entre las variable independientes: supuesta de la relación multiple.

model2<- lm(lifExpFem~femWork+sanitat+income10p,dataWorld_q)
summary(model2)
## 
## Call:
## lm(formula = lifExpFem ~ femWork + sanitat + income10p, data = dataWorld_q)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7446  -2.2400   0.4262   2.8351  12.4650 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 53.986169   1.603971  33.658  < 2e-16 ***
## femWork      0.039648   0.015579   2.545   0.0113 *  
## sanitat      0.282769   0.008148  34.702  < 2e-16 ***
## income10p   -0.125383   0.030522  -4.108 4.76e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.567 on 440 degrees of freedom
##   (2272 observations deleted due to missingness)
## Multiple R-squared:  0.785,  Adjusted R-squared:  0.7835 
## F-statistic: 535.4 on 3 and 440 DF,  p-value: < 2.2e-16

Y= 53.98617+0.03965+0.28277-0.12538

EL P VALUE: < 2.2e-16 , ES MENOR A 0.05 POTR ENDE SE RECHAZA h0. Entonces el modelo es válido.

Multiple R-squared: 0.785 –> Este ,modelo explica en un 78.5 %

model3<- lm(lifExpFem~sanitat+famWorkFem+materMort,dataWorld_q)
summary(model3)
## 
## Call:
## lm(formula = lifExpFem ~ sanitat + famWorkFem + materMort, data = dataWorld_q)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.1452  -1.9861   0.3949   2.4655  11.7319 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 64.441940   1.277691  50.436   <2e-16 ***
## sanitat      0.155055   0.013230  11.720   <2e-16 ***
## famWorkFem  -0.003754   0.013634  -0.275    0.783    
## materMort   -0.016338   0.001360 -12.016   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.63 on 363 degrees of freedom
##   (2349 observations deleted due to missingness)
## Multiple R-squared:  0.8275, Adjusted R-squared:  0.8261 
## F-statistic: 580.5 on 3 and 363 DF,  p-value: < 2.2e-16

r2: 0.8275

model4<- lm(lifExpFem~sanitat+water+materMort,dataWorld_q)
summary(model4)
## 
## Call:
## lm(formula = lifExpFem ~ sanitat + water + materMort, data = dataWorld_q)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.6314  -2.2501   0.3332   2.6298  12.0004 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 62.372577   1.450420  43.003  < 2e-16 ***
## sanitat      0.119066   0.011640  10.229  < 2e-16 ***
## water        0.048571   0.018545   2.619  0.00907 ** 
## materMort   -0.016279   0.001093 -14.891  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.907 on 533 degrees of freedom
##   (2179 observations deleted due to missingness)
## Multiple R-squared:  0.8438, Adjusted R-squared:  0.8429 
## F-statistic: 959.6 on 3 and 533 DF,  p-value: < 2.2e-16

Multiple R-squared: 0.84 –> r2: 84.3%

model4<- lm(lifExpFem~sanitat+doctor+materMort,dataWorld_q)
summary(model4)
## 
## Call:
## lm(formula = lifExpFem ~ sanitat + doctor + materMort, data = dataWorld_q)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.9754  -2.2057   0.5363   2.5090  12.0869 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 65.6627469  0.8827673  74.383  < 2e-16 ***
## sanitat      0.1086868  0.0109531   9.923  < 2e-16 ***
## doctor       1.0421621  0.1665371   6.258 8.59e-10 ***
## materMort   -0.0168825  0.0009975 -16.924  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.721 on 485 degrees of freedom
##   (2227 observations deleted due to missingness)
## Multiple R-squared:  0.858,  Adjusted R-squared:  0.8572 
## F-statistic: 977.1 on 3 and 485 DF,  p-value: < 2.2e-16

r2: Multiple R-squared: 0.858