Primero se grafican los datos
data("Orange") #se utilizara la base de arboles de naranja
plot(circumference~age,data=Orange)
Posteriormente se utiliza la función lm para ajustar modelos lineales:
lm.fit=lm(circumference~age,data=Orange)
lm.fit
##
## Call:
## lm(formula = circumference ~ age, data = Orange)
##
## Coefficients:
## (Intercept) age
## 17.3997 0.1068
summary(lm.fit)
##
## Call:
## lm(formula = circumference ~ age, data = Orange)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.310 -14.946 -0.076 19.697 45.111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.399650 8.622660 2.018 0.0518 .
## age 0.106770 0.008277 12.900 1.93e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.74 on 33 degrees of freedom
## Multiple R-squared: 0.8345, Adjusted R-squared: 0.8295
## F-statistic: 166.4 on 1 and 33 DF, p-value: 1.931e-14
plot(circumference~age,data=Orange)
# La función abline permite agregar una línea a un gráfico, recibe como parámetros la pendiente y el intercepto. También puede recibir directamente un objeto lm
abline(17.399,0.106,col='orange')
Para predecir los 5 nuevos registros valor se utiliza la función predict.lm:
ag<-c(300,750,1250,1400,1650)
predict.lm(lm.fit,data.frame(age=ag))
## 1 2 3 4 5
## 49.43075 97.47739 150.86256 166.87811 193.57069
circ = c(49.43075, 97.47739, 150.86256, 166.87811, 193.57069)
nAge = c(Orange$age, ag)
nCircumf = c(Orange$circumference, circ)
plot(nCircumf~nAge)
lm.fit=lm(nCircumf~nAge)
lm.fit
##
## Call:
## lm(formula = nCircumf ~ nAge)
##
## Coefficients:
## (Intercept) nAge
## 17.3996 0.1068
summary(lm.fit)
##
## Call:
## lm(formula = nCircumf ~ nAge)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.31 -11.63 0.00 11.76 45.11
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.399650 7.604912 2.288 0.0278 *
## nAge 0.106770 0.007179 14.872 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.12 on 38 degrees of freedom
## Multiple R-squared: 0.8534, Adjusted R-squared: 0.8495
## F-statistic: 221.2 on 1 and 38 DF, p-value: < 2.2e-16
plot(nCircumf~nAge)
abline(17.399,0.106,col='orange')
No se observan cambios en los coeficientes del intercepto y tampoco en la pendiente.
Orange$circumference[15] = Orange$circumference[15] * 100
Orange$age[15] = Orange$age[15] * 100
plot(circumference~age,data=Orange)
lm.fit=lm(circumference~age,data=Orange)
lm.fit
##
## Call:
## lm(formula = circumference ~ age, data = Orange)
##
## Coefficients:
## (Intercept) age
## -120.3597 0.2556
summary(lm.fit)
##
## Call:
## lm(formula = circumference ~ age, data = Orange)
##
## Residuals:
## Min 1Q Median 3Q Max
## -144.08 -63.23 -11.31 56.62 123.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.204e+02 1.575e+01 -7.642 8.5e-09 ***
## age 2.556e-01 6.999e-03 36.527 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 77.31 on 33 degrees of freedom
## Multiple R-squared: 0.9759, Adjusted R-squared: 0.9751
## F-statistic: 1334 on 1 and 33 DF, p-value: < 2.2e-16
plot(circumference~age,data=Orange)
abline(-12.04,0.256,col='orange')
En este caso ambos coeficientes cambian siendo el intercepto en el cual se puede observar un mayor cambio con el intercepto anterior.