Updates to this file will be announced via email and on teh course Facebook page https://www.facebook.com/groups/930301587096169/

1 Lion data set as a Paired t-test

To illustrate how to do a full analsis of data with a paired t-test, I take the lion data we’ve been using, make some changes, simulate some new observations, and turn it into a “repeated measures” study where the same lion has its picture taken twice. This data can then be analyzed wtih a paired t-test, where the pairs are observation 1 and observation 2 on each lion.

1.1 Preliminaries

#The following sets up the data for the analysis
#Set working directory

setwd("C:/Users/lisanjie2/Desktop/TEACHING/1_STATS_CalU/1_STAT_CalU_2016_by_NLB/Lecture/Unit3_regression/last_week")

Load original lion data

dat <- read.csv("lion_age_by_pop_and_sex.csv")

1.2 Plot Lion data

par(mfrow = c(1,1))
plot(age.years ~ portion.black, data = dat,
     main = "Lion pigment data split into 3 groups",
     ylim = c(0,19))


The original study took a single data point from each lion. I’m going turn this into repeated measures.


I am going to make up a hypothetical data set where I assume that lions the lions were all photgraphed and pigment quantified in one year and then 5 years later they were located and pigment quantified again. The hypothesis would be that pigment changes over time. The null hypothesis is that there is no significant change in pigment over time


2 Make hypothetical repeated measures data

Run this code to make the data.

NOTE: I ignore the fact that this data is bouned by 1 (100% pigmented). A logit transformation would ideally be applied

mod <- lm(portion.black ~ age.years, data = dat)
n <- dim(dat)[1]
slope.se <- 0.00438
slope.sd <- slope.se*sqrt(n)
slope <- coef(mod)[2]
slope.noise <- slope + rnorm(n,mean = 0, sd = slope.sd)

dat$portion.black.5yrs <- dat$portion.black + 5*slope.noise

dat$portion.black.5yrs  <- ifelse(dat$portion.black.5yrs > 1, 0.99999, dat$portion.black.5yrs )

plot(portion.black.5yrs ~ portion.black, data = dat, xlab = "original black pigment",
     ylab = "black pigment in 5 years")

3 Plot means of each time point

(NOTE: This is a general summary of the data , but plotting the data this way – esp. w/ confidence intervals! – can give a very innaccurate impression of what is going on b/c it ignores the paired nature of the data.) Below we’ll plot the DIFFERNECES between the time points, which what we are really interested in.

3.1 Calculate means, sd, and se

mean.portion.black <- mean(dat$portion.black)
mean.portion.black.5yrs <- mean(dat$portion.black.5yrs)


sd.portion.black <- mean(dat$portion.black)
sd.portion.black.5yrs <- mean(dat$portion.black.5yrs)

n.portion.black <- length(dat$portion.black)
n.portion.black.5yrs <- length(dat$portion.black.5yrs)

se.portion.black <- sd.portion.black/sqrt(n.portion.black.5yrs)
se.portion.black.5yrs <- sd.portion.black.5yrs/sqrt(n.portion.black.5yrs)



Bundle the means and SEs into “vectors”

my.means <- c(mean.portion.black,mean.portion.black.5yrs)
my.se <- c(se.portion.black,se.portion.black.5yrs)

3.2 Plot means w/ error bars

NOTE as noted above this plot is just for looking at the data overall but is potentially VERY misleading (esp the error bars). See section below for plotting the DIFFERENCES between.

plot.means(means = my.means,
           SEs = my.se,
           categories = c("Measurement year 1","Measurment year 5"),
           y.axis.label = "Portion black")

Figure 2a: Mean pigmentation of lions during first measuremnet period and 5 years later. Lions are from the Serengeti and Ngorogoro crater popualtions, Tanzania, east Africa. Error bars are approximate 95% confidence intervals that assume indepdence of obseravtions.. Note that the same lions occur in both samples.

3.3 Plot differences between time oints

3.3.1 Calculate differences

dat$pigment.diffs <- dat$portion.black.5yrs - dat$portion.black

3.3.2 Plot raw differences

boxplot(dat$pigment.diffs)

3.3.3 Summary stats of differences

The difference between paired measuremnets is the focus of this study and what we really need to look at.

mean.diff <- mean(dat$pigment.diffs)
sd.diff <- sd(dat$pigment.diffs)
n.diff <- length(dat$pigment.diffs)
se.diff <- sd.diff/sqrt(n.diff)

plot.means(means = mean.diff, SEs = se.diff,categories = "",y.axis.label = "5 year diff. in pigmentation")

abline(h = 0, col = 2)

3.4 Paired t-test

t.test(dat$portion.black.5yrs,dat$portion.black, paired = T)
## 
##  Paired t-test
## 
## data:  dat$portion.black.5yrs and dat$portion.black
## t = 16.172, df = 92, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2749990 0.3520023
## sample estimates:
## mean of the differences 
##               0.3135006

3.5 Write up results

“The portion of individual lion snounts that were pigmented black increased significantly over five years (mean difference = 0.095, t = 4.43, p < 0.00001, df = 92)”

3.6 Paired t-test diagnostics

Look at whether the differenes are normal-ish distributed

hist(dat$pigment.diffs)