Data

# Growth ~ tannin  
reg <- read.table("Crawley/regression.txt", header=TRUE)
names(reg)
[1] "growth" "tannin"
summary(reg)
     growth           tannin 
 Min.   : 2.000   Min.   :0  
 1st Qu.: 3.000   1st Qu.:2  
 Median : 7.000   Median :4  
 Mean   : 6.889   Mean   :4  
 3rd Qu.:10.000   3rd Qu.:6  
 Max.   :12.000   Max.   :8  

Plots

En la figura arriba vemos las diferencias ỹ - y que elevadas al cuadrado y sumadas hacen sumatoria de cuadrados total - SST

El modelo lineal

Comando lm()

rl <- lm(growth ~ tannin , data=reg)
coef(rl)
(Intercept)      tannin 
  11.755556   -1.216667 

SSE (Sumatoria de cuadrados del error [residual] )

En la figura de abajo vemos en color verde las diferencias tnre los valores observados y los valores ajustados ŷ - y (es diferente que ỹ - y).
La sumatoria de esas diferencias hacen la SSE

SSR = SST - SSE, (SSE en verde, SSR en negro). SSR representa la variación en Y que se debe a la varianza de X

Supuestos de la regresión

Linealidad [Residuals vrs. Fitted.valu] (diagnóstico gráfico])

rl$fitted.values
        1         2         3         4         5         6         7 
11.755556 10.538889  9.322222  8.105556  6.888889  5.672222  4.455556 
        8         9 
 3.238889  2.022222 
reg$growth
[1] 12 10  8 11  6  7  2  3  3
reg$growth - rl$fitted.values           # residuals calculados "a mano"
         1          2          3          4          5          6          7 
 0.2444444 -0.5388889 -1.3222222  2.8944444 -0.8888889  1.3277778 -2.4555556 
         8          9 
-0.2388889  0.9777778 
rl$residuals
         1          2          3          4          5          6          7 
 0.2444444 -0.5388889 -1.3222222  2.8944444 -0.8888889  1.3277778 -2.4555556 
         8          9 
-0.2388889  0.9777778 
resid <- rl$residuals                   # residuals directos desde el modelo lineal

plot(resid ~ rl$fitted.values, pch=20, cex=2)
abline(h=0, col="gray", lty="dashed", lwd=2)

# option directa en R
plot(rl,1, pch=20, cex=2)

Arriba; la figura etiqueta a los “top tres” valores más extremos, del set de datos.

Normalidad

shapiro.test(reg$growth)          # Normalidad de la variable respuesta

    Shapiro-Wilk normality test

data:  reg$growth
W = 0.9291, p-value = 0.4728
rl$fitted.values
        1         2         3         4         5         6         7 
11.755556 10.538889  9.322222  8.105556  6.888889  5.672222  4.455556 
        8         9 
 3.238889  2.022222 
reg$growth
[1] 12 10  8 11  6  7  2  3  3
reg$growth - rl$fitted.values           # residuals
         1          2          3          4          5          6          7 
 0.2444444 -0.5388889 -1.3222222  2.8944444 -0.8888889  1.3277778 -2.4555556 
         8          9 
-0.2388889  0.9777778 
rl$residuals
         1          2          3          4          5          6          7 
 0.2444444 -0.5388889 -1.3222222  2.8944444 -0.8888889  1.3277778 -2.4555556 
         8          9 
-0.2388889  0.9777778 
resid <- rl$residuals                   # residuals

hist(resid)

El supuesto más importante de la regresión lineal: Normalidad de los residuos

shapiro.test( rl$residuals)

    Shapiro-Wilk normality test

data:  rl$residuals
W = 0.98794, p-value = 0.9926

Homeneidad de varianza residual (diagnóstico gráfico)

plot(rl, 3)       # Buscamos siempre lo más horizontal

El análisis de regresión lineal

summary(rl)

Call:
lm(formula = growth ~ tannin, data = reg)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.4556 -0.8889 -0.2389  0.9778  2.8944 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  11.7556     1.0408  11.295 9.54e-06 ***
tannin       -1.2167     0.2186  -5.565 0.000846 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.693 on 7 degrees of freedom
Multiple R-squared:  0.8157,    Adjusted R-squared:  0.7893 
F-statistic: 30.97 on 1 and 7 DF,  p-value: 0.0008461

Aproximado como un “Analysis of variance” - aov()

SSR = SST - SSE
SST <- var(reg$growth) * 8; SST                 # SST 
[1] 108.8889
SSE <- sum( (reg$growth - 
               rl$fitted.values)^2 ) ; SSE      # SSE
[1] 20.07222
SSR <- SST - SSE ; SSR                          # SSR
[1] 88.81667
aov <- aov(growth ~ tannin , data=reg)
summary(aov)
            Df Sum Sq Mean Sq F value   Pr(>F)    
tannin       1  88.82   88.82   30.97 0.000846 ***
Residuals    7  20.07    2.87                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

El coeficiente de determinación ( R2) aproximado desde el aov()

SSR/SST       # R-squared
[1] 0.8156633
LS0tDQp0aXRsZTogIlJlZ3Jlc2lvbiINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCmBgYHtyIGVjaG89RkFMU0UsIGV2YWw9RkFMU0UsIG1lc3NhZ2U9RkFMU0V9DQpsaWJyYXJ5KGdvb2dsZXNoZWV0czQpOyBnczRfZGVhdXRoKCkNCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShnZ3B1YnIpDQpgYGAgIA0KDQojIyMjIERhdGEgIA0KYGBge3J9DQojIEdyb3d0aCB+IHRhbm5pbiAgDQpyZWcgPC0gcmVhZC50YWJsZSgiQ3Jhd2xleS9yZWdyZXNzaW9uLnR4dCIsIGhlYWRlcj1UUlVFKQ0KbmFtZXMocmVnKQ0Kc3VtbWFyeShyZWcpDQoNCmBgYCAgIA0KDQojIyMjIFBsb3RzDQoNCmBgYHtyIGVjaG89RkFMU0V9DQpwIDwtIGdncGxvdChhZXMoeD10YW5uaW4sIHk9Z3Jvd3RoLCBjb2w9KSwgZGF0YT0gcmVnKQ0KcCArIGdlb21fcG9pbnQoKSArDQogIGdlb21fdmxpbmUoeGludGVyY2VwdCA9IG1lYW4ocmVnJHRhbm5pbiksIGNvbD0yLCBsaW5ldHlwZT0gImRhc2hlZCIpICsNCiAgZ2VvbV9obGluZSh5aW50ZXJjZXB0ID0gbWVhbihyZWckZ3Jvd3RoKSwgY29sPTIsIGxpbmV0eXBlPSAiZGFzaGVkIikgKw0KICBnZW9tX3NlZ21lbnQoYWVzKHg9dGFubmluLCB5PSBtZWFuKGdyb3d0aCksIHhlbmQ9IHRhbm5pbiwgeWVuZD1ncm93dGgpKQ0KYGBgICANCg0KRW4gbGEgZmlndXJhIGFycmliYSB2ZW1vcyBsYXMgZGlmZXJlbmNpYXMgYOG7uSAtIHlgIHF1ZSBlbGV2YWRhcyBhbCBjdWFkcmFkbyB5IHN1bWFkYXMgaGFjZW4gKioqc3VtYXRvcmlhIGRlIGN1YWRyYWRvcyB0b3RhbCoqKiAtICoqKlNTVCoqKg0KDQojIyMgRWwgbW9kZWxvIGxpbmVhbA0KIyMjIyBDb21hbmRvIGBsbSgpYA0KYGBge3J9DQpybCA8LSBsbShncm93dGggfiB0YW5uaW4gLCBkYXRhPXJlZykNCmNvZWYocmwpDQpgYGAgDQpgYGB7ciBlY2hvPUZBTFNFLCBtZXNzYWdlPUZBTFNFLCByZXN1bHRzPUZBTFNFfQ0KcCArIGdlb21fcG9pbnQoKSArDQogIGdlb21fdmxpbmUoeGludGVyY2VwdCA9IG1lYW4ocmVnJHRhbm5pbiksIGNvbD0yLCBsaW5ldHlwZT0gImRhc2hlZCIpICsNCiAgZ2VvbV9obGluZSh5aW50ZXJjZXB0ID0gbWVhbihyZWckZ3Jvd3RoKSwgY29sPTIsIGxpbmV0eXBlPSAiZGFzaGVkIikgKw0KICBnZW9tX3NlZ21lbnQoYWVzKHg9dGFubmluLCB5PSBtZWFuKGdyb3d0aCksIHhlbmQ9IHRhbm5pbiwgeWVuZD1ncm93dGgpKSArDQogIGdlb21fc21vb3RoKG1ldGhvZD0gImxtIiwgc2U9RkFMU0UpDQpgYGAgIA0KDQoNCiMjIyMgU1NFIChTdW1hdG9yaWEgZGUgY3VhZHJhZG9zIGRlbCBlcnJvciBbcmVzaWR1YWxdICkgIA0KIyMjIyMgRW4gbGEgZmlndXJhIGRlIGFiYWpvIHZlbW9zIGVuIGNvbG9yIHZlcmRlIGxhcyBkaWZlcmVuY2lhcyB0bnJlIGxvcyB2YWxvcmVzIG9ic2VydmFkb3MgeSBsb3MgdmFsb3JlcyBhanVzdGFkb3MgKipgxbcgLSB5YCoqIChlcyBkaWZlcmVudGUgcXVlIGDhu7kgLSB5YCkuDQojIyMjIyBMYSBzdW1hdG9yaWEgZGUgZXNhcyBkaWZlcmVuY2lhcyBoYWNlbiBsYSBTU0UNCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIGVjaG89RkFMU0V9DQpwICsgZ2VvbV9wb2ludCgpICsNCiAgZ2VvbV92bGluZSh4aW50ZXJjZXB0ID0gbWVhbihyZWckdGFubmluKSwgY29sPTIsIGxpbmV0eXBlPSAiZGFzaGVkIikgKw0KICBnZW9tX2hsaW5lKHlpbnRlcmNlcHQgPSBtZWFuKHJlZyRncm93dGgpLCBjb2w9MiwgbGluZXR5cGU9ICJkYXNoZWQiKSArDQogIGdlb21fc2VnbWVudChhZXMoeD10YW5uaW4sIHk9IG1lYW4oZ3Jvd3RoKSwgeGVuZD0gdGFubmluLCB5ZW5kPWdyb3d0aCkpICsNCiAgZ2VvbV9zZWdtZW50KGFlcyh4PXRhbm5pbiwgeT0gcmwkZml0dGVkLnZhbHVlcywgeGVuZD0gdGFubmluLCB5ZW5kPWdyb3d0aCksIGNvbD0gImdyZWVuIikgKw0KICBnZW9tX3Ntb290aChtZXRob2Q9ICJsbSIsIHNlPUZBTFNFKQ0KYGBgICANCg0KU1NSID0gU1NUIC0gU1NFLCAgKFNTRSBlbiB2ZXJkZSwgU1NSIGVuIG5lZ3JvKS4gU1NSIHJlcHJlc2VudGEgbGEgdmFyaWFjacOzbiBlbiBgWWAgcXVlIHNlIGRlYmUgYSBsYSB2YXJpYW56YSBkZSBgWGAgIA0KDQpgYGB7ciBtZXNzYWdlPUZBTFNFLCByZXN1bHRzPUZBTFNFLCBlY2hvPUZBTFNFfQ0KcCArIGdlb21fcG9pbnQoKSArDQogIGdlb21fc21vb3RoKG1ldGhvZD0gImxtIiwgc2U9RkFMU0UpICsgDQogIGdlb21fc2VnbWVudChhZXMoeD10YW5uaW4sIHk9IHJsJGZpdHRlZC52YWx1ZXMsIHhlbmQ9IHRhbm5pbiwgeWVuZD1ncm93dGgpKQ0KYGBgICANCg0KIyMjIFN1cHVlc3RvcyBkZSBsYSByZWdyZXNpw7NuDQoNCi0gSW5kZXBlbmRlbmNpYSBkZSBsb3MgZGF0b3MgKCBlc3RvIGVzIHBhcmEgY3VhbHF1aWVyIHBydWViYSkNCi0gTGluZWFsaWRhZC4gTGEgcmVsYWNpw7NuIGVudHJlIGBZYCB5IGBYYCBzZSBhc3VtZSBsaW5lYWwuDQotIE5vcm1haWxpZGFkIGRlIHJlc2lkdW9zDQotIEhvbW9nZW5laWRhZCBkZSB2YXJpYW56YSByZXNpZHVhbA0KDQoNCiMjIyMgTGluZWFsaWRhZCBbUmVzaWR1YWxzIHZycy4gRml0dGVkLnZhbHVdIChkaWFnbsOzc3RpY28gZ3LDoWZpY29dKQ0KYGBge3J9DQpybCRmaXR0ZWQudmFsdWVzDQpyZWckZ3Jvd3RoDQpyZWckZ3Jvd3RoIC0gcmwkZml0dGVkLnZhbHVlcyAgICAgICAgICAgIyByZXNpZHVhbHMgY2FsY3VsYWRvcyAiYSBtYW5vIg0KDQpybCRyZXNpZHVhbHMNCnJlc2lkIDwtIHJsJHJlc2lkdWFscyAgICAgICAgICAgICAgICAgICAjIHJlc2lkdWFscyBkaXJlY3RvcyBkZXNkZSBlbCBtb2RlbG8gbGluZWFsDQoNCnBsb3QocmVzaWQgfiBybCRmaXR0ZWQudmFsdWVzLCBwY2g9MjAsIGNleD0yKQ0KYWJsaW5lKGg9MCwgY29sPSJncmF5IiwgbHR5PSJkYXNoZWQiLCBsd2Q9MikNCiMgb3B0aW9uIGRpcmVjdGEgZW4gUg0KcGxvdChybCwxLCBwY2g9MjAsIGNleD0yKQ0KYGBgICANCg0KQXJyaWJhOyBsYSBmaWd1cmEgZXRpcXVldGEgYSBsb3MgInRvcCB0cmVzIiB2YWxvcmVzIG3DoXMgZXh0cmVtb3MsIGRlbCBzZXQgZGUgZGF0b3MuDQoNCiMjIyMgTm9ybWFsaWRhZA0KYGBge3J9DQpzaGFwaXJvLnRlc3QocmVnJGdyb3d0aCkgICAgICAgICAgIyBOb3JtYWxpZGFkIGRlIGxhIHZhcmlhYmxlIHJlc3B1ZXN0YQ0KYGBgICANCg0KYGBge3J9DQpybCRmaXR0ZWQudmFsdWVzDQpyZWckZ3Jvd3RoDQpyZWckZ3Jvd3RoIC0gcmwkZml0dGVkLnZhbHVlcyAgICAgICAgICAgIyByZXNpZHVhbHMNCg0KcmwkcmVzaWR1YWxzDQpyZXNpZCA8LSBybCRyZXNpZHVhbHMgICAgICAgICAgICAgICAgICAgIyByZXNpZHVhbHMNCg0KaGlzdChyZXNpZCkNCmBgYCAgDQoNCkVsIHN1cHVlc3RvIG3DoXMgaW1wb3J0YW50ZSBkZSBsYSByZWdyZXNpw7NuIGxpbmVhbDogKioqTm9ybWFsaWRhZCBkZSBsb3MgcmVzaWR1b3MqKioNCg0KYGBge3J9DQpzaGFwaXJvLnRlc3QoIHJsJHJlc2lkdWFscykNCmBgYCAgDQoNCiMjIyMgSG9tZW5laWRhZCBkZSB2YXJpYW56YSByZXNpZHVhbCAoZGlhZ27Ds3N0aWNvIGdyw6FmaWNvKQ0KYGBge3J9DQpwbG90KHJsLCAzKSAgICAgICAjIEJ1c2NhbW9zIHNpZW1wcmUgbG8gbcOhcyBob3Jpem9udGFsDQpgYGAgIA0KDQojIyMgRWwgYW7DoWxpc2lzIGRlIHJlZ3Jlc2nDs24gbGluZWFsDQpgYGB7cn0NCnN1bW1hcnkocmwpDQpgYGAgIA0KDQoNCiMjIyBBcHJveGltYWRvIGNvbW8gdW4gIkFuYWx5c2lzIG9mIHZhcmlhbmNlIiAtIGBhb3YoKWANCiMjIyMjIFNTUiA9IFNTVCAtIFNTRQ0KYGBge3J9DQpTU1QgPC0gdmFyKHJlZyRncm93dGgpICogODsgU1NUICAgICAgICAgICAgICAgICAjIFNTVCANCg0KDQpTU0UgPC0gc3VtKCAocmVnJGdyb3d0aCAtIA0KICAgICAgICAgICAgICAgcmwkZml0dGVkLnZhbHVlcyleMiApIDsgU1NFICAgICAgIyBTU0UNCg0KU1NSIDwtIFNTVCAtIFNTRSA7IFNTUiAgICAgICAgICAgICAgICAgICAgICAgICAgIyBTU1INCg0KYGBgICANCg0KYGBge3J9DQphb3YgPC0gYW92KGdyb3d0aCB+IHRhbm5pbiAsIGRhdGE9cmVnKQ0Kc3VtbWFyeShhb3YpDQpgYGAgIA0KDQojIyMjIEVsIGNvZWZpY2llbnRlIGRlIGRldGVybWluYWNpw7NuICggUl4yXikgYXByb3hpbWFkbyBkZXNkZSBlbCBgYW92KClgDQpgYGB7cn0NClNTUi9TU1QgICAgICAgIyBSLXNxdWFyZWQNCmBgYCAgDQoNCg==