Fit Regression models
## Null model: no relationship between how black nose is and true age
lion.null <- lm(age.years ~ 1, # What does "~ 1 " tell R?
data = lions)
## ALt model: Age can be predicted by portion of nose that is black
lion.alt <- lm(age.years ~ portion.black,
data = lions)
Can you draw the hypotheses?
##
## Call:
## lm(formula = age.years ~ portion.black, data = lions)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5457 -1.1457 -0.3384 0.9245 4.3426
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9262 0.5591 1.657 0.108
## portion.black 10.5827 1.4884 7.110 6.59e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.66 on 30 degrees of freedom
## Multiple R-squared: 0.6276, Adjusted R-squared: 0.6152
## F-statistic: 50.55 on 1 and 30 DF, p-value: 6.59e-08
This gives a lot of somewhat cluttered output
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.926 0.559 1.657 0.108
## portion.black 10.583 1.488 7.110 0.000
## (Intercept) portion.black
## 0.9262292 10.5826652
## (Intercept) portion.black
## 0.9262292 10.5826652
Say we take a picture of a lion and determine that the portion of its nose that is blck is 0.65 (65%)
What is the predicted age of the lion
## (Intercept) portion.black
## 0.9262292 10.5826652
Points along the regression lines can be thought of as predictions for each value of x if we found a new lion with the value of x.
## (Intercept) portion.black
## 0.9262292 10.5826652
## (Intercept) portion.black
## 0.9262292 10.5826652
## (Intercept) portion.black
## 0.9262292 10.5826652
## (Intercept) portion.black
## 0.9262292 10.5826652
easy.predict <- function(model,datum){
x.var.name <- names(model$coefficients)[[2]]
new.df <- data.frame(datum)
names(new.df) <- x.var.name
predict(model, newdata = new.df)[[1]]
}
easy.predict(lion.alt, datum = 0.65)
## [1] 7.804962
Do we violate the assumptions of the regression model?
Shorthand versions (You MUST know these from memory) * 1)Random sampling * 2)Variability around regression line is approximately normal * 3)Variance in y does not change as x-changes
You get the residuals in R with the resid() command
#Get the residuals
model.residuals <- resid(lion.alt)
#Look at first 12
model.residuals[1:12]
## 1 2 3 4 5 6
## -2.04858891 -0.90780235 -0.19032239 -0.10197570 0.40385095 0.89802430
## 7 8 9 10 11 12
## 1.00385095 0.06889104 -0.96024222 -1.15441556 -1.14276226 -0.08449574
Note that they can be both positive AND negative. Why is that?
Frequently violated, especially with regression…
use the hist() command
hist(model.residuals)
“qq” = “quantile quantile”
plot(lion.alt, which = 2)
plot(lion.alt, which = 1)