2-sample T-test example

Whitlock & Shulter Analysis of Biological Data 2nd Ed

Chapter 12

Question 18


These data represent the distance between the eyes of male stalk-eyed flies that have been fed two different diets.

Outline

  • Enter the data
  • Re-create the summary table included in the book
  • t-test on data entered as seperate vectors
  • t-test on data entered as a data.frame
  • variations on t-ests on data entered as a dataframe



Functions used

  • summary()
  • mean()
  • var()
  • length()
  • rbind()
  • names()
  • round()
  • t.test()
  • data.frame()

The data, entered as vectors

Corn data

corn <- c(2.15,2.14,2.13,2.13,2.12,2.11,2.10,2.08
,2.08,2.08,2.04,2.05,2.03,2.02,2.01,2.00,1.99,1.96
,1.95,1.93,1.89)

Check the data I’ve entered against the informatio in the book

summary(corn)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.890   2.000   2.050   2.047   2.110   2.150
var(corn)
## [1] 0.005581429
length(corn)
## [1] 21

Cotton data

cotton <- c(2.12,2.07,2.01,1.93,1.77,1.68,1.64,1.61
,1.59,1.58,1.59,1.55,1.54,1.49,1.45,1.43,1.39,1.34
,1.33,1.29,1.26,1.24,1.11,1.05)

Check the data I’ve entered against the informatio in the book

summary(cotton)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.050   1.338   1.545   1.544   1.650   2.120
var(cotton)
## [1] 0.08126884
length(cotton)
## [1] 24

[Advanced] Re-create summary table

I’ll use the mean(), var(), and length() commands to extact the info necessary to make the summary table.

First, I’ll make two seperate vectors for each set of summary stats

corn.summary <- c(mean(corn),var(corn),length(corn))
cotton.summary <- c(mean(cotton),var(cotton),length(cotton))

Now I’ll combine them using the rbind() command for “row bind”

summaries <- rbind(corn.summary, cotton.summary)

Now I’ll add some labels using the names() command

colnames(summaries) <- c("Mean","Var", "N")
summaries
##                    Mean         Var  N
## corn.summary   2.047143 0.005581429 21
## cotton.summary 1.544167 0.081268841 24

THe number of digits is excessive so I’ll round

summaries <-  round(summaries,3)
summaries
##                 Mean   Var  N
## corn.summary   2.047 0.006 21
## cotton.summary 1.544 0.081 24

[Advanced] Re-create histograms

#Stack two plots on each other
par(mfrow = c(2,1))

#Adjust margisn
par(mar = c(2,4,2.5,1))
#Find min and max values for setting axes
x.min <- min(c(corn, cotton))
x.max <- max(c(corn, cotton))

#set number of bars
breaks.use <- seq(1.05,2.15,0.1)
hist(corn, xlim = c(x.min,x.max), 
     breaks = breaks.use,
     main = "",
     xlab = "")
par(mar = c(4,4,0.5,1))
hist(cotton, xlim = c(x.min,x.max), breaks = breaks.use, main = "",xlab = "Eyespan (mm)")

T-test on raw vectors

For simple situations, we can often analyze our data entered as simple vectors of data like these types directly into R . We can do a t-test on these data easily like this

t.test(corn, cotton)
## 
##  Welch Two Sample t-test
## 
## data:  corn and cotton
## t = 8.3231, df = 26.564, p-value = 7.067e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3788854 0.6270670
## sample estimates:
## mean of x mean of y 
##  2.047143  1.544167

R assumes that we want * a 2-sample test * a 2-tailed test * alpha set at 0.05 * Welch’s correct for un-equal variances

Only the first item is usually ever changed. We typically always want 2-tailed tests (1-tailed tests are rarely if ever justified), alpha of 0.05 is conventional for most areas of research (and you should report your exact p-value anyway), and there is no reason not to assume that your variances will be unequal.

T-test on data in a dataframe

Often data is entered into a spreadsheet first and then loaded into R as a .csv file. The code to analyze these data with a t-test is different from what we used above

First, I’ll make these data into a dataframe, then show two slight variations on the analysis. First, I’ll use the t.test comman wit the arguement “data = …”, and then I’ll use the t.test with the columns used in the analysis selected using dollarsigns ($). To be pedantic I’ll show two other minor variations on this last approach.

[Advanced] Make dataframe

Because the sample sizes are unequal, its most likely that the numeric data would occur in a single column, with another column indicating the diet.

n.corn <- length(corn)
n.cotton <- length(cotton)
diet.df <- data.frame(eye.span = c(corn,cotton),
                      diet =c(rep("corn", n.corn),
                              rep("cotton",n.cotton)))



THe spreadsheet would look like this

head(diet.df)
##   eye.span diet
## 1     2.15 corn
## 2     2.14 corn
## 3     2.13 corn
## 4     2.13 corn
## 5     2.12 corn
## 6     2.11 corn
tail(diet.df)
##    eye.span   diet
## 40     1.33 cotton
## 41     1.29 cotton
## 42     1.26 cotton
## 43     1.24 cotton
## 44     1.11 cotton
## 45     1.05 cotton



T-test on dataframe using “data =” arguement

t.test(eye.span ~ diet, data = diet.df)



T-test on dataframe using dollar signs to select columns

t.test(diet.df$eye.span ~ diet.df$diet)



T-test on dataframe using other ways of selecting columns

#Use colums called eye.span and diet
t.test(diet.df[ ,"eye.span"] ~ diet.df[ ,"diet"])
## 
##  Welch Two Sample t-test
## 
## data:  diet.df[, "eye.span"] by diet.df[, "diet"]
## t = 8.3231, df = 26.564, p-value = 7.067e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3788854 0.6270670
## sample estimates:
##   mean in group corn mean in group cotton 
##             2.047143             1.544167



#use first and 2nd columns
t.test(diet.df[ ,1] ~ diet.df[ ,2])



All methods compared

Note that all of these methods provide the same results. The first method only works if the data are directly entered in as vectors. The subsequent methods all assume the data occurs in a dataframe.



# data in vectors
t.test(corn, cotton)

#data in dataframe
t.test(eye.span ~ diet, data = diet.df)
t.test(diet.df$eye.span ~ diet.df$diet)
t.test(diet.df[ ,"eye.span"] ~ diet.df[ ,"diet"])
t.test(diet.df[ ,1] ~ diet.df[ ,2])