Here, we use the AMCP package that accompanies the book (Version 0.0.4 or greater) to load the data into the working directory with the `data()`

function. Note that the data sets are organized using the chapter number and table number, each separated with an underscore (e.g., *chapter_1_table_1*; *chapter_11_table_20*, *chapter_13_table_5*, etc.). Similarly, for end-of-chapter exercises, the organization is such that “table” is replaced with “exercise” (e.g., *chapter_1_exercise_21*; *chapter_3_table_9*, *chapter_16_exercise_5*, etc.). Additionally, abbreviated names can be used where “C” replaces “chapter” and “T” or “E” replaces “table” or “exercise,” respectively. For example, *C1T1* can be used to reference *chapter_1_table_1* or C1E21 can be used to reference *chapter_1_exercise_21*, et cetera. There are a few special cases as well, such as tutorials, raw data (e.g., when summary information is provided in the book), and univariate format (long) vs. multivariate format (wide) is used.

For a list of data sets use `data()`

with `package="AMCP"`

specified, specifically as `data(package="AMCP")`

. All of the details needed for each data set can be found via `data(package="AMCP")`

Note also that the data sets are available as comma space delimited (CSV) files (as well as SPSS and SAS) on the DesigningExperiments.Com web site, which is our web site that accompanies the book.

To begin, we install the AMCP package (this only needs to happen once per system unless you update R or you want to install a newer version of AMCP (if applicable) into the R session using the `library()`

function. The `install.packages()`

function can be used as shown below. Note that you may need to select a mirror (select one near your location) to download the package. Note that number sign (also called a hash character or pound sign) is how to denote a comment within R code, as R does not reed what is to the right of a number sign.

```
# Only necessary once on current system for the current version of a package. Uncomment to install.
# install.packages("AMCP")
```

Next, we load the AMCP package into the current R session using the `library()`

function. The AMCP package contains all of the data from the book.

`library(AMCP)`

This code depends on version 0.0.4 of AMCP. Ensure that version (or newer) is loaded.

`if((getNamespaceVersion("AMCP") >= "0.0.4")==FALSE) print("Please install the most recent version of the R package 'AMCP'.")`

Next, we load the relevant data set of interest via the `data()`

function. For the Chapter 1, Table 1 data, we do the following.

`data(chapter_1_table_1)`

To return the data (i.e., see what it looks like) you can simply call upon it by typing (or copying code and then pasting it into the console) before hitting return. Notice here and elsewhere that R output is preceded by a two number signs (also called a hash character or pound sign). Note that hitting the up arrow a previously used command will be recalled and ready to be entered again.

`chapter_1_table_1`

```
## treat control week
## 1 28 32 1
## 2 31 25 1
## 3 25 15 1
## 4 23 25 1
## 5 28 16 1
## 6 26 30 2
## 7 36 24 2
## 8 23 13 2
## 9 23 25 2
## 10 24 16 2
```

For larger data sets, you may only want to see the first part (using `head()`

) or the last part (using `tail()`

).

`head(chapter_1_table_1)`

```
## treat control week
## 1 28 32 1
## 2 31 25 1
## 3 25 15 1
## 4 23 25 1
## 5 28 16 1
## 6 26 30 2
```

`tail(chapter_1_table_1)`

```
## treat control week
## 5 28 16 1
## 6 26 30 2
## 7 36 24 2
## 8 23 13 2
## 9 23 25 2
## 10 24 16 2
```

This data set is structured by *Condition* and by *Week* in a paired samples design. Often useful is a variable that is the difference between the two (paired) conditions. Although there are other ways in which this can be done within R, here we extract the *Treatment* and *Control* values from the data frame, take the difference, and then create a new variable called *Difference* that is augmented to the data frame. We use the `$`

operator, which extracts the variable on the right from the data frame on the left. For example, `Data$Score`

would extract the variable `Score`

from the data frame `Data`

. In addition to extracting variables, the `$`

operator can be used to assign a new variable to a data frame, as is done by placing `$`

after the data frame (i.e., the *chapter 1, table 1* object) but before the (new) variable that is being created (i.e., *Difference*) using the assignment operator (i.e., `<-`

) as shown below. The assignment operator assigns what is on the right (whether it is the result of a calculation as below or another object) to the object on the left.

```
chapter_1_table_1$Difference <- chapter_1_table_1$treat - chapter_1_table_1$control
chapter_1_table_1
```

```
## treat control week Difference
## 1 28 32 1 -4
## 2 31 25 1 6
## 3 25 15 1 10
## 4 23 25 1 -2
## 5 28 16 1 12
## 6 26 30 2 -4
## 7 36 24 2 12
## 8 23 13 2 10
## 9 23 25 2 -2
## 10 24 16 2 8
```

There are many ways to summarize data in R. Using only “base R” (i.e., the functions that come with R and not any add-on packages) we first extract the particular subsets of interest (e.g., week 1 for the treatment group) and then apply a particular function to the subsets of interest. Using subsets of the data is important here because of the way the data are structured (by *Condition* and by *Week*), and recalling that this is a paired samples design there is a linkage among the scores that cannot be ignored. First, for illustration, we extract the week 1 scores for the *Treatment* group. We do this with square brackets to subset the data frame into the identified rows and the identified column (variable) of interest. The specific set of rows (i.e., what is to the left of the comma below) is when *week* from the data set is equal to 1: `week==1`

. Note that we obtain `TRUE`

or `FALSE`

values for the *week* variable from the data `chapter_1_table_1`

and then R will subset the data to those in which the row values is `TRUE`

(thus ignoring the rows in which *week* does not equal 1). To the right of the comma but still within the square brackets is the variable(s) of interest, here only `treat`

, which we encase in double quotes. Thus, we have subsetted the data for *Week=1* for the *Treatment* group as follows:

`chapter_1_table_1[chapter_1_table_1$week==1, "treat"]`

`## [1] 28 31 25 23 28`

Now, taking the subsetting ideas above and applying a few functions, we can obtain some descriptive statistics for *Week=1* for the *Treatment* group (other descriptive statistics can be requested as well).

`sum(chapter_1_table_1[chapter_1_table_1$week==1, "treat"])`

`## [1] 135`

`mean(chapter_1_table_1[chapter_1_table_1$week==1, "treat"])`

`## [1] 27`

`sd(chapter_1_table_1[chapter_1_table_1$week==1, "treat"])`

`## [1] 3.082207`

`summary(chapter_1_table_1[chapter_1_table_1$week==1, "treat"])`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 23 25 28 27 28 31
```

We also obtain some descriptive statistics for *Week=1* for the *Control* group,

`sum(chapter_1_table_1[chapter_1_table_1$week==1, "control"])`

`## [1] 113`

`mean(chapter_1_table_1[chapter_1_table_1$week==1, "control"])`

`## [1] 22.6`

`sd(chapter_1_table_1[chapter_1_table_1$week==1, "control"])`

`## [1] 7.092249`

`summary(chapter_1_table_1[chapter_1_table_1$week==1, "control"])`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.0 16.0 25.0 22.6 25.0 32.0
```

We also obtain some descriptive statistics for *Week=2* for the *Treatment* group,

`sum(chapter_1_table_1[chapter_1_table_1$week==2, "treat"])`

`## [1] 132`

`mean(chapter_1_table_1[chapter_1_table_1$week==2, "treat"])`

`## [1] 26.4`

`sd(chapter_1_table_1[chapter_1_table_1$week==2, "treat"])`

`## [1] 5.504544`

`summary(chapter_1_table_1[chapter_1_table_1$week==2, "treat"])`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 23.0 23.0 24.0 26.4 26.0 36.0
```

We also obtain some descriptive statistics for *Week=2* for the *Control* group,

`sum(chapter_1_table_1[chapter_1_table_1$week==2, "control"])`

`## [1] 108`

`mean(chapter_1_table_1[chapter_1_table_1$week==2, "control"])`

`## [1] 21.6`

`sd(chapter_1_table_1[chapter_1_table_1$week==2, "control"])`

`## [1] 6.94982`

`summary(chapter_1_table_1[chapter_1_table_1$week==2, "control"])`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.0 16.0 24.0 21.6 25.0 30.0
```

We emphasize that the method above is only one way of obtaining descriptive statistics for group week by week.

Further, suppose one would like to obtain descriptive statistics for the difference between the treatment and control groups (i.e., the *Difference* variable). Again, these are values that we consider by week.

`sum(chapter_1_table_1[chapter_1_table_1$week==1, "Difference"])`

`## [1] 22`

`mean(chapter_1_table_1[chapter_1_table_1$week==1, "Difference"])`

`## [1] 4.4`

`sd(chapter_1_table_1[chapter_1_table_1$week==1, "Difference"])`

`## [1] 7.127412`

`summary(chapter_1_table_1[chapter_1_table_1$week==1, "Difference"])`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.0 -2.0 6.0 4.4 10.0 12.0
```

`sum(chapter_1_table_1[chapter_1_table_1$week==2, "Difference"])`

`## [1] 24`

`mean(chapter_1_table_1[chapter_1_table_1$week==2, "Difference"])`

`## [1] 4.8`

`sd(chapter_1_table_1[chapter_1_table_1$week==2, "Difference"])`

`## [1] 7.293833`

`summary(chapter_1_table_1[chapter_1_table_1$week==2, "Difference"])`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.0 -2.0 8.0 4.8 10.0 12.0
```

If we want to calculate, for example, the sum and mean, of the scores across week for each of the variables (*Treatment*, *Control*, & *Difference*), we can leave blank the value to the left of the comma within the square brackets. Leaving the rows (or columns) blank returns all rows (or columns). Keep in mind that the code below ignores *Week*.

`sum(chapter_1_table_1[ , "treat"])`

`## [1] 267`

`sum(chapter_1_table_1[ , "control"])`

`## [1] 221`

`sum(chapter_1_table_1[ , "Difference"])`

`## [1] 46`

`mean(chapter_1_table_1[ , "treat"])`

`## [1] 26.7`

`mean(chapter_1_table_1[ , "control"])`

`## [1] 22.1`

`mean(chapter_1_table_1[ , "Difference"])`

`## [1] 4.6`

Chapter 1 discusses counting rules. There are multiple ways of addressing such questions within R, from specialized packages, use of the factorial function (which is `factorial()`

) with definitional formulas, and use of the `choose()`

and `combn()`

functions, among others. Recall that combinations are *unordered* choices (whereas permutations are *ordered* choices). For combinations, R has a function that allows a user to easily count and enumerate all combinations. Recalling Chapter 1, the number of ways in which 2 objects can be chosen from a set of 8 [in which order does not matter, meaning that (A, B) is the same as (B, A)] can be addressed as follows

`choose(n=8, k=2)`

`## [1] 28`

If one wants to know the number of ways that “eight things can be taken four at a time,” the `choose()`

function would be used as

`choose(n=8, k=4)`

`## [1] 70`

To actually enumerate (list) the various ways in which one can choose 2 from 8, the `combn()`

function can be used, which returns a matrix of the various ways in which the objects can be selected (i.e., it is an enumerated listing of the varoius ways). Note that the number of rows in the returned matrix will be `m`

with the number of columns being the number of possible ways (which is equivalent in number to the above use of the `choose()`

function).

`combn(x=8, m=2)`

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 1 1 1 1 1 1 1 2 2 2 2 2 2
## [2,] 2 3 4 5 6 7 8 3 4 5 6 7 8
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## [1,] 3 3 3 3 3 4 4 4 4 5 5
## [2,] 4 5 6 7 8 5 6 7 8 6 7
## [,25] [,26] [,27] [,28]
## [1,] 5 6 6 7
## [2,] 8 7 8 8
```

Furthermore, the factorial function can be used to address questions such as “how many ways can one choose 2 cups out of eight, where order does not matter.”

`factorial(8)/(factorial(2)*factorial(6))`

`## [1] 28`

Further, note that one can compute the number of permutations of some number things taken some amout at a time, where order does matter (i.e., allowing the same cups but with different orders). For example, in the book we discuss the number of ways of choosing 4 cups from 8 cups in a particular order. This can be computed as follows (do note, however, that these are not the distinct sets of cups).

`factorial(8)/(factorial(4))`

`## [1] 1680`

Notice that, directly above, there is only one term in the denominator [i.e., `factorial(8-4)`

]. Thus, each set of four particular cups would appear 4 × 3 × 2 × 1, or 24, times in a listing of the 1680 orderings. Thus, one can divide the number of ways in which order matters, here 1680, by `(8-4)!=4*3*2*1`

to find the number of distinct orders (or, simply find the permutation of 8 things taken 4 at a time). This will yield a smaller answer because it will consider cups selected as number 1, 2, 3, and 4 the same as cups selected as number 4, 3, 2, and 1.

`factorial(8)/(factorial(4)*factorial(8-4))`

`## [1] 70`

Unlike the `choose()`

function for finding the number of combinations, R does not have a base function for dealing with permutations directly. However, see the `permutations`

function in the gtools package).

In the context of the paired differences as discussed, for the 10 pairs in which we are interested in the number of possible combinations of the signed differences that could have occurred, we compute this in R using the “hat” symbol that denotes an exponent, with 2 being the number (for positive or negative sign) for which the exponent 10 (because we have 10 pairs) is used

`2^10`

`## [1] 1024`

Of course, considering only the first week with the 5 pairs, we have:

`2^5`

`## [1] 32`

Given the above background and returning to the data from Table 1.1, we walk, step-by-step, through a procedure that can be used for a permutation test as discussed. First, it is useful to know about the `expand.grid()`

function. This function provides *all combinations* of the supplied vectors (or factors). For us, consider that the value of -4 observed for pair 1 could have been 4 if the (paired) scores had been sampled in opposite fashion and in reality there was no difference between the treatment and control. Similarly, the observed value of 6 could have been -6, et cetera. By supplying vectors of the positive and negative values of those observed in separate vectors, `expand.grid`

will supply all combinations of the scores.

```
Possible.Assignments <- expand.grid(c(4, -4), c(6, -6), c(10, -10), c(2, -2), c(12, -12))
Possible.Assignments
```

```
## Var1 Var2 Var3 Var4 Var5
## 1 4 6 10 2 12
## 2 -4 6 10 2 12
## 3 4 -6 10 2 12
## 4 -4 -6 10 2 12
## 5 4 6 -10 2 12
## 6 -4 6 -10 2 12
## 7 4 -6 -10 2 12
## 8 -4 -6 -10 2 12
## 9 4 6 10 -2 12
## 10 -4 6 10 -2 12
## 11 4 -6 10 -2 12
## 12 -4 -6 10 -2 12
## 13 4 6 -10 -2 12
## 14 -4 6 -10 -2 12
## 15 4 -6 -10 -2 12
## 16 -4 -6 -10 -2 12
## 17 4 6 10 2 -12
## 18 -4 6 10 2 -12
## 19 4 -6 10 2 -12
## 20 -4 -6 10 2 -12
## 21 4 6 -10 2 -12
## 22 -4 6 -10 2 -12
## 23 4 -6 -10 2 -12
## 24 -4 -6 -10 2 -12
## 25 4 6 10 -2 -12
## 26 -4 6 10 -2 -12
## 27 4 -6 10 -2 -12
## 28 -4 -6 10 -2 -12
## 29 4 6 -10 -2 -12
## 30 -4 6 -10 -2 -12
## 31 4 -6 -10 -2 -12
## 32 -4 -6 -10 -2 -12
```

Now, we want the sum of the various combinations of differences based on all possible assignments. We use the `apply()`

function, which takes a function (in this case `sum()`

) and applies the function row-by-row (when `MARGIN=1`

) to the specified object (in this case `Possible.Assignments`

). Then, we use the `sort()`

function (to match the book we specify `decreasing=TRUE`

). Finally, we identify the combinations that lead to a sum that is greater than or equal to 22, and then find the proportion of such cases.

```
Sums <- apply(Possible.Assignments, 1, sum)
Sums
```

```
## [1] 34 26 22 14 14 6 2 -6 30 22 18 10 10 2 -2 -10 10
## [18] 2 -2 -10 -10 -18 -22 -30 6 -2 -6 -14 -14 -22 -26 -34
```

`sort(Sums, decreasing=TRUE)`

```
## [1] 34 30 26 22 22 18 14 14 10 10 10 6 6 2 2 2 -2
## [18] -2 -2 -6 -6 -10 -10 -10 -14 -14 -18 -22 -22 -26 -30 -34
```

```
# See which are greater than or equal to 22 (the original)
Sums >= 22
```

```
## [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
```

```
# The mean of TRUE/FALSE data is the proportion of TRUE.
# Note that TRUE is treated numerically as 1 and FALSE is treated numerically as 0).
mean(Sums >= 22)
```

`## [1] 0.15625`

Or, more succintly

```
# Observed sum of the differences.
Value <- sum(chapter_1_table_1[chapter_1_table_1$week==1, "Difference"])
# Select the observed and the opposite sign differences.
Diffs <- cbind(chapter_1_table_1[chapter_1_table_1$week==1, "Difference"],
-chapter_1_table_1[chapter_1_table_1$week==1, "Difference"])
# All combinations of differences.
Possible.Assignments <- expand.grid(lapply(split(Diffs, row(Diffs)), matrix, ncol=2))
# First, find the sum of the combinations. Then, find the proportion greater than the
# observed value.
mean(apply(Possible.Assignments, 1, sum) >= Value)
```

`## [1] 0.15625`

Had we wanted to perform a *t*-test, there are two ways in which we can procede First, we already have the differences. A single sample *t*-test on the difference scores is equivalant to a paired samples *t*-test. The `t.test()`

function in R performs, based on the specifications, a single sample *t*-test, a paired-samples *t*-test, or an independent-samples *t*-test (with or without assuming homogeneity of variance). If only a single variable is specified, a single-sample *t*-test is performed (with the null value set to 0, which can be modified using `mu=X`

, where *X* is the null value of interest). For a single sample *t*-test we can thus use the differences for *week=1* and use the following for a two-sided *p*-value (we show a single-sided *p*-value momentarily):

`t.test(chapter_1_table_1[chapter_1_table_1$week==1, "Difference"])`

```
##
## One Sample t-test
##
## data: chapter_1_table_1[chapter_1_table_1$week == 1, "Difference"]
## t = 1.3804, df = 4, p-value = 0.2396
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -4.449851 13.249851
## sample estimates:
## mean of x
## 4.4
```

Alternatively, we could have specified the *Treatment* and *Control* scores and identified the data as a paired design:

```
t.test(chapter_1_table_1[chapter_1_table_1$week==1, "treat"],
chapter_1_table_1[chapter_1_table_1$week==1, "control"], paired=TRUE)
```

```
##
## Paired t-test
##
## data: chapter_1_table_1[chapter_1_table_1$week == 1, "treat"] and chapter_1_table_1[chapter_1_table_1$week == 1, "control"]
## t = 1.3804, df = 4, p-value = 0.2396
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.449851 13.249851
## sample estimates:
## mean of the differences
## 4.4
```

Doing the above across the two weeks (i.e., using the 10 differences), we could do the following:

`t.test(chapter_1_table_1[, "treat"], chapter_1_table_1[, "control"], paired=TRUE)`

```
##
## Paired t-test
##
## data: chapter_1_table_1[, "treat"] and chapter_1_table_1[, "control"]
## t = 2.1386, df = 9, p-value = 0.06116
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2658294 9.4658294
## sample estimates:
## mean of the differences
## 4.6
```

Or, using the `$`

operator to extract a variable (as we are not subsetting rows) could be done as follows:

`t.test(chapter_1_table_1$treat, chapter_1_table_1$control, paired=TRUE)`

```
##
## Paired t-test
##
## data: chapter_1_table_1$treat and chapter_1_table_1$control
## t = 2.1386, df = 9, p-value = 0.06116
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2658294 9.4658294
## sample estimates:
## mean of the differences
## 4.6
```

However, recalling from Chapter 1 that a single-sided (one-sided) test is performed, we need the *p*-value that corresponds to the alternative hypothesis of interest. The three options for the alternative hypothesis can be specified using the `alternative`

argument (with `two.sided`

(the default), `less`

, or `greater`

), depending on the specified alternative hypothesis of interest. Here we use a *greater than* alternative hypothesis (to complement the *less than or equal to* null hypothesis)

`t.test(chapter_1_table_1$treat, chapter_1_table_1$control, paired=TRUE, alternative="greater")`

```
##
## Paired t-test
##
## data: chapter_1_table_1$treat and chapter_1_table_1$control
## t = 2.1386, df = 9, p-value = 0.03058
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.6570313 Inf
## sample estimates:
## mean of the differences
## 4.6
```

Chapter 1 also discusses Fisher’s Exact test. The R function for implementing Fisher’s Exact test is `fisher.test()`

. The R help files demonstrates the use of the function for the tea problem as is discussed (this is a famous data set). We modify the code given in the helpfile for our purposes here. First, note that a matrix is created that summarizes the results. One dimension (rows) identifies the guesses, whereas the other dimension (columns) identifies the actual classification. Use of the ‘matrix()’ function can make inputting tabular data easier.

```
TeaTasting <- matrix(c(3, 1, 1, 3),
nrow = 2,
dimnames = list(Guess = c("Milk-first", "Tea-first"),
Actual = c("Milk-first", "Tea-first")))
# An alternative way of entering the data could be
rbind(MF=c(3, 1), TF=c(1, 3))
```

```
## [,1] [,2]
## MF 3 1
## TF 1 3
```

```
# Note that this is one interpretation of the type of test. Here the estimate is a
# conditional maximum likelihood estimate and differs from the sample odds ration
# (see the help file [?fisher.test] for how R addresses this 2-by-2 table, but
# do note that there are other approaches).
fisher.test(TeaTasting, alternative = "greater")
```

```
##
## Fisher's Exact Test for Count Data
##
## data: TeaTasting
## p-value = 0.2429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 0.3135693 Inf
## sample estimates:
## odds ratio
## 6.408309
```