These data represent the distance between the eyes of male stalk-eyed flies that have been fed two different diets.
corn <- c(2.15,2.14,2.13,2.13,2.12,2.11,2.10,2.08
,2.08,2.08,2.04,2.05,2.03,2.02,2.01,2.00,1.99,1.96
,1.95,1.93,1.89)
Check the data I’ve entered against the informatio in the book
summary(corn)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.890 2.000 2.050 2.047 2.110 2.150
var(corn)
## [1] 0.005581429
length(corn)
## [1] 21
cotton <- c(2.12,2.07,2.01,1.93,1.77,1.68,1.64,1.61
,1.59,1.58,1.59,1.55,1.54,1.49,1.45,1.43,1.39,1.34
,1.33,1.29,1.26,1.24,1.11,1.05)
Check the data I’ve entered against the informatio in the book
summary(cotton)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.050 1.338 1.545 1.544 1.650 2.120
var(cotton)
## [1] 0.08126884
length(cotton)
## [1] 24
I’ll use the mean(), var(), and length() commands to extact the info necessary to make the summary table.
First, I’ll make two seperate vectors for each set of summary stats
corn.summary <- c(mean(corn),var(corn),length(corn))
cotton.summary <- c(mean(cotton),var(cotton),length(cotton))
Now I’ll combine them using the rbind() command for “row bind”
summaries <- rbind(corn.summary, cotton.summary)
Now I’ll add some labels using the names() command
colnames(summaries) <- c("Mean","Var", "N")
summaries
## Mean Var N
## corn.summary 2.047143 0.005581429 21
## cotton.summary 1.544167 0.081268841 24
THe number of digits is excessive so I’ll round
summaries <- round(summaries,3)
summaries
## Mean Var N
## corn.summary 2.047 0.006 21
## cotton.summary 1.544 0.081 24
#Stack two plots on each other
par(mfrow = c(2,1))
#Adjust margisn
par(mar = c(2,4,2.5,1))
#Find min and max values for setting axes
x.min <- min(c(corn, cotton))
x.max <- max(c(corn, cotton))
#set number of bars
breaks.use <- seq(1.05,2.15,0.1)
hist(corn, xlim = c(x.min,x.max),
breaks = breaks.use,
main = "",
xlab = "")
par(mar = c(4,4,0.5,1))
hist(cotton, xlim = c(x.min,x.max), breaks = breaks.use, main = "",xlab = "Eyespan (mm)")
For simple situations, we can often analyze our data entered as simple vectors of data like these types directly into R . We can do a t-test on these data easily like this
t.test(corn, cotton)
##
## Welch Two Sample t-test
##
## data: corn and cotton
## t = 8.3231, df = 26.564, p-value = 7.067e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.3788854 0.6270670
## sample estimates:
## mean of x mean of y
## 2.047143 1.544167
R assumes that we want * a 2-sample test * a 2-tailed test * alpha set at 0.05 * Welch’s correct for un-equal variances
Only the first item is usually ever changed. We typically always want 2-tailed tests (1-tailed tests are rarely if ever justified), alpha of 0.05 is conventional for most areas of research (and you should report your exact p-value anyway), and there is no reason not to assume that your variances will be unequal.
Often data is entered into a spreadsheet first and then loaded into R as a .csv file. The code to analyze these data with a t-test is different from what we used above
First, I’ll make these data into a dataframe, then show two slight variations on the analysis. First, I’ll use the t.test comman wit the arguement “data = …”, and then I’ll use the t.test with the columns used in the analysis selected using dollarsigns ($). To be pedantic I’ll show two other minor variations on this last approach.
Because the sample sizes are unequal, its most likely that the numeric data would occur in a single column, with another column indicating the diet.
n.corn <- length(corn)
n.cotton <- length(cotton)
diet.df <- data.frame(eye.span = c(corn,cotton),
diet =c(rep("corn", n.corn),
rep("cotton",n.cotton)))
THe spreadsheet would look like this
head(diet.df)
## eye.span diet
## 1 2.15 corn
## 2 2.14 corn
## 3 2.13 corn
## 4 2.13 corn
## 5 2.12 corn
## 6 2.11 corn
tail(diet.df)
## eye.span diet
## 40 1.33 cotton
## 41 1.29 cotton
## 42 1.26 cotton
## 43 1.24 cotton
## 44 1.11 cotton
## 45 1.05 cotton
t.test(eye.span ~ diet, data = diet.df)
t.test(diet.df$eye.span ~ diet.df$diet)
#Use colums called eye.span and diet
t.test(diet.df[ ,"eye.span"] ~ diet.df[ ,"diet"])
##
## Welch Two Sample t-test
##
## data: diet.df[, "eye.span"] by diet.df[, "diet"]
## t = 8.3231, df = 26.564, p-value = 7.067e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.3788854 0.6270670
## sample estimates:
## mean in group corn mean in group cotton
## 2.047143 1.544167
#use first and 2nd columns
t.test(diet.df[ ,1] ~ diet.df[ ,2])
Note that all of these methods provide the same results. The first method only works if the data are directly entered in as vectors. The subsequent methods all assume the data occurs in a dataframe.
# data in vectors
t.test(corn, cotton)
#data in dataframe
t.test(eye.span ~ diet, data = diet.df)
t.test(diet.df$eye.span ~ diet.df$diet)
t.test(diet.df[ ,"eye.span"] ~ diet.df[ ,"diet"])
t.test(diet.df[ ,1] ~ diet.df[ ,2])