Updates to this file will be announced via email and on teh course Facebook page https://www.facebook.com/groups/930301587096169/
Example analysis of the lion data using a two-sample t-test.
#The following sets up the data fzo the analysis
#Set working directory
setwd("C:/Users/lisanjie2/Desktop/TEACHING/1_STATS_CalU/1_STAT_CalU_2016_by_NLB/Lecture/Unit3_regression/last_week")
Load data
dat <- read.csv("lion_age_by_pop_and_sex.csv")
par(mfrow = c(1,1))
plot(age.years ~ portion.black, data = dat,
main = "Lion pigment data split into 3 groups",
ylim = c(0,19))
Make two groups, low pigmentation and high pigmentation, by converting the numeric pigmentation variable.
par(mfrow = c(1,1))
pigment.mean <- mean(dat$portion.black)
plot(age.years ~ portion.black, data = dat,
main = "Lion pigment data split into 2 groups",
ylim = c(0,19))
abline(v = pigment.mean, col = 2, lwd = 3)
text(x = 0.125, y = 17, "Low\npigment", cex =1.1)
text(x = 0.895, y = 17, "High\npigment", cex = 1.1)
pigment.mean <- mean(dat$portion.black)
dat$pigment.groups.2 <- "high.pigment"
dat$pigment.groups.2[which(dat$portion.black <= pigment.mean)] <- "low.pigment"
dat$pigment.groups.2 <- factor(dat$pigment.groups.2)
summary(dat)
## age.years portion.black population sex
## Min. : 1.056 Min. :0.04609 Ngorogoro:10 female:61
## 1st Qu.: 2.800 1st Qu.:0.21000 Serengeti:83 male :32
## Median : 5.378 Median :0.43000
## Mean : 5.589 Mean :0.46446
## 3rd Qu.: 7.800 3rd Qu.:0.72000
## Max. :15.178 Max. :1.00000
## pigment.groups.2
## high.pigment:43
## low.pigment :50
##
##
##
##
dat$pigment.groups.2 <- factor(dat$pigment.groups.2,
levels =c("low.pigment",
"high.pigment"))
table(dat$pigment.groups.2, dat$population)
##
## Ngorogoro Serengeti
## low.pigment 6 44
## high.pigment 4 39
table(dat$pigment.groups.2, dat$sex)
##
## female male
## low.pigment 25 25
## high.pigment 36 7
NOTE: Splitting up a continous variable like this into a categorical variable and doing an ANOVA is generally a BAD idea. Regression is the proper way to analyze the raw data. I am making these groups for illustration purposes only!
par(mfrow = c(1,1))
plot(age.years ~ pigment.groups.2,data = dat,
xlab = "Pigmentation group",
ylab = "Known Lion Age")
Figure 1a: Boxplots of lion ages in two pigmentation groups from the Serengeti and Ngorogoro Crater, Tanzania, east Africa.
library(sciplot)
ci.fun. <- function(x) c(mean(x)-2*se(x), mean(x)+3*se(x))
lineplot.CI(x.factor = pigment.groups.2,
response = age.years,
data = dat,
ci.fun = ci.fun.
)
Figure 2a: Mean age of lions in two pigmentation groups from the Serengeti and Ngorogoro crater, Tanzania, east Africa. Error bars are approximate 95% confidence intervals.
Test whether the two pigmentation groups have sig. different mean ages.
t.test(age.years ~ pigment.groups.2, data = dat)
##
## Welch Two Sample t-test
##
## data: age.years by pigment.groups.2
## t = -9.6798, df = 69.782, p-value = 1.567e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.808789 -3.823945
## sample estimates:
## mean in group low.pigment mean in group high.pigment
## 3.362444 8.178811
You can set up basically the same test/model using the function lm() (though note that Welch’s correction for unequal variances won’t be applied). Doing this is a useful way to look at the residuals of the t-test.
lm.t.test <- lm(age.years ~ pigment.groups.2, data = dat)
Get the residuals
resids <- resid(lm.t.test)
Plot diagnostic plots
par(mfrow = c(2,2))
hist(resids)
plot(lm.t.test, which = 2)
plot(lm.t.test, which = 1)
plot(lm.t.test, which = 5)
Check if log transformation improves thigns
lm.t.test.log <- lm(log(age.years) ~ pigment.groups.2, data = dat)
Get the residuals
resids.log <- resid(lm.t.test.log)
Plot diagnostic plots
par(mfrow = c(2,2))
hist(resids.log)
plot(lm.t.test.log, which = 2)
plot(lm.t.test.log, which = 1)
plot(lm.t.test.log, which = 5)