https://www.codecogs.com/latex/eqneditor.php?lang=es-es

Los datos x,y

calculos basicos para una regresion lineal simple. Supongamos que tenemos dos variables; x = diametro (cm) y= Altura (m)

x<-c(5, 7, 6, 35, 15, 20)
y<- c(9, 12, 10, 22, 20, 18)
plot(x,y)                              # Hagamos la figura
abline(lm(y ~ x), col = "red")         # Pongamos su linea de tendencia

Calculos previos

Escribamos algunos datos necesarios

n<-length(x)                           # Esto es n
p<-2                                   # Escriba el Numero de coeficientes que tiene el modelo

# crear dataframe de vectores (es para poder realizar los calculos y verlo en columna)
df1 <- data.frame(x, y); df1

##    x  y
## 1  5  9
## 2  7 12
## 3  6 10
## 4 35 22
## 5 15 20
## 6 20 18

Para hacer una regresion lineal simple paso a paso, es necesario hacer unos calculos previos, iniciemos obteniendo lo siguiente: \[ (x_{i}-\bar{x})\]

# Obtener x_xmed y agregarlos al dataframe
x_xmed<-x-mean(x)
df2<-data.frame(x, y, x_xmed); df2

##    x  y     x_xmed
## 1  5  9 -9.6666667
## 2  7 12 -7.6666667
## 3  6 10 -8.6666667
## 4 35 22 20.3333333
## 5 15 20  0.3333333
## 6 20 18  5.3333333

\[ (y_{i}-\bar{y})\]

# Obtener y_ymed y agregarlos al dataframe
y_ymed<-y-mean(y)
df3<-data.frame(x, y, x_xmed, y_ymed); df3

##    x  y     x_xmed    y_ymed
## 1  5  9 -9.6666667 -6.166667
## 2  7 12 -7.6666667 -3.166667
## 3  6 10 -8.6666667 -5.166667
## 4 35 22 20.3333333  6.833333
## 5 15 20  0.3333333  4.833333
## 6 20 18  5.3333333  2.833333

\[ (x_{i}-\bar{x}) * (y_{i}-\bar{y}) \]

# Obtener el producto de las distancias y agregarlos al dataframe
cov<-x_xmed*y_ymed
df4<-data.frame(x, y, x_xmed, y_ymed, cov); df4

##    x  y     x_xmed    y_ymed        cov
## 1  5  9 -9.6666667 -6.166667  59.611111
## 2  7 12 -7.6666667 -3.166667  24.277778
## 3  6 10 -8.6666667 -5.166667  44.777778
## 4 35 22 20.3333333  6.833333 138.944444
## 5 15 20  0.3333333  4.833333   1.611111
## 6 20 18  5.3333333  2.833333  15.111111

\[ (x_{i}-\bar{x})^2 \]

# Obtener x_xmed^2 y agregarlos al dataframe
x_xmed2<-(x_xmed^2)
df5<-data.frame(x, y, x_xmed, y_ymed, cov, x_xmed2); df5

##    x  y     x_xmed    y_ymed        cov     x_xmed2
## 1  5  9 -9.6666667 -6.166667  59.611111  93.4444444
## 2  7 12 -7.6666667 -3.166667  24.277778  58.7777778
## 3  6 10 -8.6666667 -5.166667  44.777778  75.1111111
## 4 35 22 20.3333333  6.833333 138.944444 413.4444444
## 5 15 20  0.3333333  4.833333   1.611111   0.1111111
## 6 20 18  5.3333333  2.833333  15.111111  28.4444444

\[ Cov_{xy}=\frac{\sum_{i}^{n}(x_{i}-\bar{x}) * (y_{i}-\bar{y})}{n-1} \]

#Calcular la Covarianza de la muestra
Covarianza<-sum(cov)/(n-1); Covarianza

## [1] 56.86667

\[ V(x)_=\frac{\sum_{i}^{n}(x_{i}-\bar{x}) ^2}{n-1} \]

#Calcular la Varianza de la muestra
Varianza_de_x<-sum(x_xmed2)/(n-1); Varianza_de_x

## [1] 133.8667

Los coeficientes de regresion

Ahora, con los datos anteriores, ya podemos calcular el valor de los coeficientes de regresion \[ \beta_{1}=\frac{\sum_{i}^{n}(x_{i}-\bar{x}) * (y_{i}-\bar{y})}{n-1}/\frac{\sum_{i}^{n}(x_{i}-\bar{x}) ^2}{n-1}=\frac{Cov_{xy}}{V({x})} \]

##Calcular b1 
b1<-Covarianza/Varianza_de_x; b1

## [1] 0.4248008

\[\beta _{0}=\bar{y}-\beta _{1}(\bar{x})\]

##Calcular b0
b0<-mean(y)-(b1*mean(x)); b0

## [1] 8.936255

Calculo de los estimados

\[ \hat{y}=\beta _{0}+\beta _{1}(x) \]

# Calculemos los ESTIMADOS y agregarlos al dataframe
Y_Est<-b0+(b1*x); 
df6<-data.frame(x, y, x_xmed, y_ymed, cov, x_xmed2, Y_Est); df6

##    x  y     x_xmed    y_ymed        cov     x_xmed2    Y_Est
## 1  5  9 -9.6666667 -6.166667  59.611111  93.4444444 11.06026
## 2  7 12 -7.6666667 -3.166667  24.277778  58.7777778 11.90986
## 3  6 10 -8.6666667 -5.166667  44.777778  75.1111111 11.48506
## 4 35 22 20.3333333  6.833333 138.944444 413.4444444 23.80428
## 5 15 20  0.3333333  4.833333   1.611111   0.1111111 15.30827
## 6 20 18  5.3333333  2.833333  15.111111  28.4444444 17.43227

\[ CM_{od}=(\hat{y}-\bar{y})^2 \]

# Calculemos los Cuadrados del modelo (CMod) y agregarlos al dataframe anterior
CMod<-(Y_Est-mean(y))^2
df7<-data.frame(df6, CMod); df7

##    x  y     x_xmed    y_ymed        cov     x_xmed2    Y_Est        CMod
## 1  5  9 -9.6666667 -6.166667  59.611111  93.4444444 11.06026 16.86258422
## 2  7 12 -7.6666667 -3.166667  24.277778  58.7777778 11.90986 10.60678603
## 3  6 10 -8.6666667 -5.166667  44.777778  75.1111111 11.48506 13.55422941
## 4 35 22 20.3333333  6.833333 138.944444 413.4444444 23.80428 74.60841365
## 5 15 20  0.3333333  4.833333   1.611111   0.1111111 15.30827  0.02005064
## 6 20 18  5.3333333  2.833333  15.111111  28.4444444 17.43227  5.13296262

\[ CE_{rr}=(y_{i}-\hat{y})^2 \]

# Calculemos los Cuadrados del Error (CErr) agregarlos al dataframe
CErr<-(y-Y_Est)^2
df8<-data.frame(df7, CErr); df8

##    x  y     x_xmed    y_ymed        cov     x_xmed2    Y_Est        CMod
## 1  5  9 -9.6666667 -6.166667  59.611111  93.4444444 11.06026 16.86258422
## 2  7 12 -7.6666667 -3.166667  24.277778  58.7777778 11.90986 10.60678603
## 3  6 10 -8.6666667 -5.166667  44.777778  75.1111111 11.48506 13.55422941
## 4 35 22 20.3333333  6.833333 138.944444 413.4444444 23.80428 74.60841365
## 5 15 20  0.3333333  4.833333   1.611111   0.1111111 15.30827  0.02005064
## 6 20 18  5.3333333  2.833333  15.111111  28.4444444 17.43227  5.13296262
##           CErr
## 1  4.244666999
## 2  0.008125119
## 3  2.205402494
## 4  3.255436670
## 5 22.012359179
## 6  0.322316312

\[ CT_{ot}=(y_{i}-\bar{y})^2 \]

# Calculemos los Cuadrados del Total (CTot) agregarlos al dataframe
CTot<-(y-mean(y))^2
df9<-data.frame(df8, CTot); df9

##    x  y     x_xmed    y_ymed        cov     x_xmed2    Y_Est        CMod
## 1  5  9 -9.6666667 -6.166667  59.611111  93.4444444 11.06026 16.86258422
## 2  7 12 -7.6666667 -3.166667  24.277778  58.7777778 11.90986 10.60678603
## 3  6 10 -8.6666667 -5.166667  44.777778  75.1111111 11.48506 13.55422941
## 4 35 22 20.3333333  6.833333 138.944444 413.4444444 23.80428 74.60841365
## 5 15 20  0.3333333  4.833333   1.611111   0.1111111 15.30827  0.02005064
## 6 20 18  5.3333333  2.833333  15.111111  28.4444444 17.43227  5.13296262
##           CErr      CTot
## 1  4.244666999 38.027778
## 2  0.008125119 10.027778
## 3  2.205402494 26.694444
## 4  3.255436670 46.694444
## 5 22.012359179 23.361111
## 6  0.322316312  8.027778

El Anova

Ahora calculemos los grados de libertad para cada fuente de variacion \[ GLM_{od}=1\] \[ GLE_{rr}=n-2 \] \[ GLT_{ot}=n-1\]

#Calculemos los grados de libertad
GLMod<-p-1;GLMod                      # Grados de libertad del modelo

## [1] 1

GLErr<-n-2;GLErr                      # Grados de libertad del error

## [1] 4

GLTot<-n-1;GLTot                      # Grados de libertad del total

## [1] 5

Ahora calculemos la suma de cuadrados para cada fuente de variacion, para esto solo hacemos las sumas

\[ SCM_{od}=\sum_{i}^{n} (\hat{y}-\bar{y})^2 \] \[ SCE_{rr}=\sum_{i}^{n} (y_{i} - \hat{y})^2 \] \[ SCT_{ot}=\sum_{i}^{n} (y_{i} - \bar{y})^2 \]

SCMod<-sum(CMod);SCMod                # Suma de cuadrados del Modelo

## [1] 120.785

SCErr<-sum(CErr);SCErr                # Suma de cuadrados del Error

## [1] 32.04831

SCTot<-sum(CTot);SCTot                # Suma de cuadrados del Total

## [1] 152.8333

Ahora calculemos los cuadrados medios para cada fuente de variacion \[ CMM_{od}=\frac{SCMod}{GLMod} \] \[ CME_{rr}=\frac{SCErr}{GLErr} \] \[ CMT_{ot}=\frac{SCTot}{GLTot} \]

\[ Fcal=\frac{CMMod}{CMTot} \]

CMMod<-SCMod/GLMod; CMMod             # Cuadrado Medio del Modelo

## [1] 120.785

CMEer<-SCErr/GLErr; CMEer             # Cuadrado Medio del Error

## [1] 8.012077

CMTot<-SCTot/GLTot; CMTot             # Cuadrado Medio del Total

## [1] 30.56667

FCAL<-CMMod/CMEer;FCAL

## [1] 15.07537

El ajuste del modelo

Ahora calculemos los estadisticos de ajuste \[ R^{2}=\frac{\sum_{i}^{n} (\hat{y}-\bar{y})^2}{\sum_{i}^{n} (y_{i} - \bar{y})^2}=\frac{SCMod}{SCTot} \]

\[ R=\sqrt{ \frac{\sum_{i}^{n} (\hat{y}-\bar{y})^2}{\sum_{i}^{n} (y_{i} - \bar{y})^2}}=\sqrt{\frac{SCMod}{SCTot}}=\sqrt{R^2} \] \[ R^2Aj=1-(\frac{\sum_{i}^{n} (y_{i} - \hat{y})^2}{n-2})/(\frac{\sum_{i}^{n} (y_{i} - \bar{y})^2}{n-1})=1-(\frac{CMMod}{CMTot}) \]

\[ S_{xy}=\sqrt{\frac{\sum_{i}^{n} (y_{i} - \hat{y})^2}{n-2}}=\sqrt{CMErr} \]

\[ CV=\frac{\sqrt{\frac{\sum_{i}^{n} (y_{i} - \hat{y})^2}{n-2}}}{\bar{y}}*100=(\frac{S_{xy}}{\bar{y}})*100 \]

R2<-SCMod/SCTot;R2                    # Coeficiente de determinacion R^2

## [1] 0.7903055

R<-sqrt(R2);R                         # Coeficiente de correlacion multiple

## [1] 0.8889913

R2AJ<-(1-(CMEer/CMTot));R2AJ          # R^2  ajustado

## [1] 0.7378819

Sxy<-sqrt(CMEer);Sxy                  # Error del modelo de regresion

## [1] 2.830561

CV<-((Sxy/mean(y))*100);CV            # Coeficiente de variacion

## [1] 18.66304

Finalmente calculemos T, para probar la hipotesis nula \[ Tcal=\frac{\beta _{1}}{\sqrt{\frac{CME}{\sum_{1}^{n}(x_{i}-\bar{x})^2}}} \] Estableciendo las siguientes hipotesis para b1

\[ Ho: \rho =0\\ Ha: \rho \neq 0 \]

#Ahora probemos la hipotesis de B1
#El estadistico de prueba tiene distribucion T-Student con n-2 grados de libertad.
#Ho: B1=0, Ha: B1 diferente de Cero

Tcal<-b1/(sqrt(CMEer/sum(x_xmed2))); Tcal

## [1] 3.882701

#Si Tcal es mayor que T de tablas, se rechaza Ho"

Regresion lineal paso a paso