Section 2.5: Two-Way Tables

Example: HELP

The HELP study was a clinical trial for adult inpatients recruited from a detoxification unit. Patients with no primary care physician were randomized to receive a multidisciplinary assessment and a brief motivational intervention or usual care, with the goal of linking them to primary medical care. The data set corresponding to this study is part of the mosaicData package.

First, save the data set as help.

help<-HELPrct

Let’s create a two-way table with substance as the row variable and sex as the column variable.

tally(~substance|sex, data=help)
##          sex
## substance female male
##   alcohol     36  141
##   cocaine     41  111
##   heroin      30   94

We can also display the entries as percents by adding the option format=“percent”

tally(~substance|sex, data=help, format="percent")
##          sex
## substance   female     male
##   alcohol 33.64486 40.75145
##   cocaine 38.31776 32.08092
##   heroin  28.03738 27.16763

Notice that the percentages in each column add to 100%.

Or we can display the proportions by adding the option format=“proportion”

tally(~substance|sex, data=help, format="proportion")
##          sex
## substance    female      male
##   alcohol 0.3364486 0.4075145
##   cocaine 0.3831776 0.3208092
##   heroin  0.2803738 0.2716763

Notice that the proportions in each column add to 1.

We can ask R Studio to give us the sum of the counts in each column by adding the option margins=T

tally(~substance|sex, data=help, margins=T)
##          sex
## substance female male
##   alcohol     36  141
##   cocaine     41  111
##   heroin      30   94
##   Total      107  346

If we want to display both the column sums and the row sums (i.e. the marginal distribution), then we have to modify the way we enter the variables iside the tally command (notice that | got replaced by &)

tally(~substance&sex, data=help, margins=T)
##          sex
## substance female male Total
##   alcohol     36  141   177
##   cocaine     41  111   152
##   heroin      30   94   124
##   Total      107  346   453

We can also do this when we display proportions (or percents), but in this case the proportions in all the cells (instead of each column) will add to 1 (or 100%). So the “proportion” option with this command will give the joint distribution of the two variables.

tally(~substance&sex, data=help, format="proportion",margins=T)
##          sex
## substance     female       male      Total
##   alcohol 0.07947020 0.31125828 0.39072848
##   cocaine 0.09050773 0.24503311 0.33554084
##   heroin  0.06622517 0.20750552 0.27373068
##   Total   0.23620309 0.76379691 1.00000000

If we only want to look at a certain subset of the data (say only the observations for which the variable sex has value female), then we can add the optional subset command inside tally:

tally(~substance, data=help, format="proportion",margins=T,subset=sex=="female")
## 
##   alcohol   cocaine    heroin     Total 
## 0.3364486 0.3831776 0.2803738 1.0000000

We can select a different criteria to define a the subset we want to look at (say when the value of the variable age is less than 35), and create the two-way table for only these observations.

tally(~substance&sex, data=help, margins=T, subset=age<35)
##          sex
## substance female male Total
##   alcohol     12   46    58
##   cocaine     22   63    85
##   heroin      16   58    74
##   Total       50  167   217

Let’s display the distribution of the two variables substance and sex on bargraphs with the two variables on separate panels:

bargraph(~substance|sex, data=help)

The following command will put the two variables on the same graph and group the bars by the categories in the variable sex.

bargraph(~substance, groups=sex, data=help)

For these bargarphs, it is very useful to provide a key to what each color means. We can accomplish this by adding the option auto.key=T:

bargraph(~substance, groups=sex, data=help, auto.key=T)

As before, this command also works with subsets, so we can look at just the observations for which the value of age is less than 35:

bargraph(~substance, groups=sex, data=help, auto.key=T, subset=age<35)

Example: Binge Drinking and Gender

Load the following data from the textbook:

binge<-read.file("/home/emesekennedy/Data/Ch2/bingegender.txt")
## Reading data with read.table()

Notice that this data set only has the summaries of the counts of the two categorical variables Frequent and Gender, but it is not quite in the right format to be a two-way table.

We can use the function dcast from the package reshape2 to format the data into a two-way table:

require(reshape2)
## Loading required package: reshape2
dcast(binge, Frequent~Gender)
## Using Count as value column: use value.var to override.
##   Frequent  Men Women
## 1       No 5550  8232
## 2      Yes 1630  1684

The “d” in dcast stands for data, which means that we can use the output to create a new data set/

binge2<-dcast(binge, Frequent~Gender)
## Using Count as value column: use value.var to override.

With a little more work, we can find the total number of observations:

sum(~Men, data=binge2)
## [1] 7180
sum(~Women, data=binge2)
## [1] 9916
7180+9916
## [1] 17096

Now, if we want to look at the proportions instead of the counts, we can divide each entry in the two-way table by the total number of observations. However, we cannot do this easily on the data set binge2, so we will use the function acast to create the two-way table. The “a” in acast stands for array which means that the command outputs an array that we can do arithmetic on.

acast(binge, Frequent~Gender)
## Using Count as value column: use value.var to override.
##      Men Women
## No  5550  8232
## Yes 1630  1684
acast(binge, Frequent~Gender)/17096
## Using Count as value column: use value.var to override.
##            Men      Women
## No  0.32463734 0.48151614
## Yes 0.09534394 0.09850257

Let’s create a bargraph to graphically represent the data.

barchart(Count~Frequent, groups=Gender, data=binge, auto.key=T)

More R Commands

We have worked with stemplots before to look at the distribution of a quantitative variable. If we have two quantitative variable that we would like to compare, then we can create a back-to-back stemplot. We’ll do this on the data sets vitdboys and vitdgirls, each of which has one variable called VitaminD. First, let’s load each set and create a stemplot of VitaminD for each.

vitB<-read.file("/home/emesekennedy/Data/Ch1/vitdboys.txt")
## Reading data with read.table()
stem(vitB$VitaminD)
## 
##   The decimal point is 1 digit(s) to the right of the |
## 
##   0 | 8
##   1 | 28
##   2 | 134447788899
##   3 | 11237
vitG<-read.file("/home/emesekennedy/Data/Ch1/vitdgirls.txt")
## Reading data with read.table()
stem(vitG$VitaminD)
## 
##   The decimal point is 1 digit(s) to the right of the |
## 
##   1 | 6
##   2 | 3568
##   3 | 3455678
##   4 | 0122338
##   5 | 1

A back-to-back stemplot allows us to look at the two stemplots at the same time. We can create a back-to-back stemplot using the stem.leaf.backback command from the package aplpack.

require(aplpack)
## Loading required package: aplpack
## Loading required package: tcltk
## Warning in fun(libname, pkgname): couldn't connect to display ":0"
stem.leaf.backback(vitB$VitaminD,vitG$VitaminD,style="bare", depths=F)
## ___________________________
##   1 | 2: represents 12, leaf unit: 1 
## vitB$VitaminD
##                 vitG$VitaminD
## ___________________________
##            | 0 |           
##           8| 0 |           
##           2| 1 |           
##           8| 1 |6          
##       44431| 2 |3          
##     9988877| 2 |568        
##        3211| 3 |34         
##           7| 3 |55678      
##            | 4 |012233     
##            | 4 |8          
##            | 5 |1          
##            | 5 |           
##            | 6 |           
## ___________________________
## n:       20     20     
## ___________________________

If we don’t include the option depths=F, then we can get a little more information about the distribution of the two variables:

stem.leaf.backback(vitB$VitaminD,vitG$VitaminD,style="bare")
## _________________________________
##   1 | 2: represents 12, leaf unit: 1 
##     vitB$VitaminD     vitG$VitaminD
## _________________________________
##               | 0 |              
##    1         8| 0 |              
##    2         2| 1 |              
##    3         8| 1 |6         1   
##    8     44431| 2 |3         2   
##   (7)  9988877| 2 |568       5   
##    5      3211| 3 |34        7   
##    1         7| 3 |55678    (5)  
##               | 4 |012233    8   
##               | 4 |8         2   
##               | 5 |1         1   
##               | 5 |              
##               | 6 |              
## _________________________________
## n:          20     20        
## _________________________________