# Lesson 2

The questions

This quiz covers the questions in the notes for week 2 of the R Intro Statistics Dot Com class. Please discuss on the online forum there if there are questions.

What gets plotted by:

plot(values ~ ind, data=stacked)


(stacked is stacked data so values is numeric, ind a factor)

Two versions of a test are given within a class. Suppose the students were selected at random to take the first or second test. The data is

test 1 scores  75 85 78  82  65  85
--------------------------------------
test 2 scores  90 95 87  92  94  95


If we view the test 1 scores as a random sample from a population, describe the population:

Let us enter the data on tests with:

t1 <- c(75, 85, 78, 82, 65, 85)
t2 <- c(90, 95, 87, 92, 94, 95)


Which command will do a two sample $$t$$ test with an assumption of equal variances:

The ToothGrowth data set has tooth measurements for various dosages of two supplements. This command compares the two supplements for the smallest dose:

t.test(len ~ supp, ToothGrowth, subset = dose == 0.5)

##
##  Welch Two Sample t-test
##
## data:  len by supp
## t = 3.17, df = 14.97, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719 8.781
## sample estimates:
## mean in group OJ mean in group VC
##            13.23             7.98


Repeat the above with the dose value 2.0. What is the p-value?

This shows the useful subset command to restrict the cases (rows) considered when using a model formula to specify a problem.

This command computes a subset of the morley data set for experiments 1 and 5.

d <- subset(morley, subset = Expt == 1 | Expt == 5)


A t.test is done with

t.test(Speed ~ Expt, data=d)


If instead of 1 and 5 you used 2 and 4, what would be the $$p$$-value?

In the above we use a logical operator to subset (it is in 1 *or* it is in 5). It can also be done using the %in% operator, as with Expt %in% c(1,5).

You were asked the following: For the \RCode{home} data set from \RCode{UsingR} make side-by-side boxplots of the two variables. Make them. Which of the following values is the best estimate for Q3 for the new variable

Does this graphic show similar distributions?

require("UsingR")
with(homedata, qqplot(y1970, y2000))


The twins data set from UsingR has IQ scores for identical twins raised under different circumstances. The assumption of independent samples should be clearly wrong for such data and the idea of pairing should hopefully be natural. You are asked to Perform a two-sided paired $$t$$-test for equivalence of means for the Foster and Biological data.

What is the $$p$$-value?

You have this question

For more than a century, the three species of large fish – gumpies, sticklebarbs, and spotheads – that are native to a certain river have been observed to co-exist in equal proportions of one-third each. But now a random sample of 300 large fish drawn from a standard fish-sampling location has turned up numbers and proportions suggesting that something has occurred to upset the natural ecology of the river. If the three fish species still inhabited the river in equal proportions, we would expect to find about 100 instances of each in a sample of size N=300; whereas what we actually observe are 89 gumpies, 120 sticklebarbs, and 91 spotheads.

Taken from http://faculty.vassar.edu/lowry/ch8pt1.html

If we set the data to be f:

f <- c(89, 120, 91)


WHich of these is the appropriate command to use to do the test?