Example analysis of the lion data using a two-sample t-test

```
#The following sets up the data fro the analysis
#Set working directory
setwd("C:/Users/lisanjie2/Desktop/TEACHING/1_STATS_CalU/1_STAT_CalU_2016_by_NLB/Lecture/Unit3_regression/last_week")
```

Load data

`dat <- read.csv("lion_age_by_pop_and_sex.csv")`

```
par(mfrow = c(1,1))
plot(age.years ~ portion.black, data = dat,
main = "Lion pigment data split into 3 groups",
ylim = c(0,19))
```

Make two groups, low pigmentation and high pigmentation, by converting the numeric pigmentation variable.

```
par(mfrow = c(1,1))
pigment.mean <- mean(dat$portion.black)
plot(age.years ~ portion.black, data = dat,
main = "Lion pigment data split into 2 groups",
ylim = c(0,19))
abline(v = pigment.mean, col = 2, lwd = 3)
text(x = 0.125, y = 17, "Low\npigment", cex =1.1)
text(x = 0.895, y = 17, "High\npigment", cex = 1.1)
```

```
pigment.mean <- mean(dat$portion.black)
dat$pigment.groups.2 <- "high.pigment"
dat$pigment.groups.2[which(dat$portion.black <= pigment.mean)] <- "low.pigment"
dat$pigment.groups.2 <- factor(dat$pigment.groups.2)
summary(dat)
```

```
## age.years portion.black population sex
## Min. : 1.056 Min. :0.04609 Ngorogoro:10 female:61
## 1st Qu.: 2.800 1st Qu.:0.21000 Serengeti:83 male :32
## Median : 5.378 Median :0.43000
## Mean : 5.589 Mean :0.46446
## 3rd Qu.: 7.800 3rd Qu.:0.72000
## Max. :15.178 Max. :1.00000
## pigment.groups.2
## high.pigment:43
## low.pigment :50
##
##
##
##
```

```
dat$pigment.groups.2 <- factor(dat$pigment.groups.2,
levels =c("low.pigment",
"high.pigment"))
table(dat$pigment.groups.2, dat$population)
```

```
##
## Ngorogoro Serengeti
## low.pigment 6 44
## high.pigment 4 39
```

`table(dat$pigment.groups.2, dat$sex)`

```
##
## female male
## low.pigment 25 25
## high.pigment 36 7
```

**NOTE:** Splitting up a continous variable like this into a categorical variable and doing an ANOVA is **generally a BAD idea**. Regression is the proper way to analyze the raw data. I am making these groups for illustration purposes only!

```
par(mfrow = c(1,1))
plot(age.years ~ pigment.groups.2,data = dat,
xlab = "Pigmentation group",
ylab = "Known Lion Age")
```

**Figure 1a:** Boxplots of lion ages in two pigmentation groups from the Serengeti and Ngorogoro Crater, Tanzania, east Africa.

```
library(sciplot)
ci.fun. <- function(x) c(mean(x)-2*se(x), mean(x)+3*se(x))
lineplot.CI(x.factor = pigment.groups.2,
response = age.years,
data = dat,
ci.fun = ci.fun.
)
```

**Figure 2a:** Mean age of lions in two pigmentation groups from the Serengeti and Ngorogoro crater, Tanzania, east Africa. Error bars are approximate 95% confidence intervals.

Test whether the two pigmentation groups have sig. different mean ages.

`t.test(age.years ~ pigment.groups.2, data = dat)`

```
##
## Welch Two Sample t-test
##
## data: age.years by pigment.groups.2
## t = -9.6798, df = 69.782, p-value = 1.567e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.808789 -3.823945
## sample estimates:
## mean in group low.pigment mean in group high.pigment
## 3.362444 8.178811
```

You can set up basically the same test/model using the function lm() (though note that Welch’s correction for unequal variances won’t be applied). Doing this is a useful way to look at the residuals of the t-test.

`lm.t.test <- lm(age.years ~ pigment.groups.2, data = dat)`

Get the residuals

`resids <- resid(lm.t.test)`

Plot diagnostic plots

```
par(mfrow = c(2,2))
hist(resids)
plot(lm.t.test, which = 2)
plot(lm.t.test, which = 1)
plot(lm.t.test, which = 5)
```

- Normal QQ plot is a little off
- Residual’s vs. fitted indicates some difference in variance between the groups

Check if log transformation improves thigns

`lm.t.test.log <- lm(log(age.years) ~ pigment.groups.2, data = dat)`

Get the residuals

`resids.log <- resid(lm.t.test.log)`

Plot diagnostic plots

```
par(mfrow = c(2,2))
hist(resids.log)
plot(lm.t.test.log, which = 2)
plot(lm.t.test.log, which = 1)
plot(lm.t.test.log, which = 5)
```

- QQ is better
- Residvuals vs. fit is perhaps a little better.