Sameer Mathur
We are using here the Arthritis dataset from the package vcd.
The data represent a double-blind clinical trial of new treatments for rheumatoid arthritis.
library(vcd)
head(Arthritis) # top few observations from the data
ID Treatment Sex Age Improved
1 57 Treated Male 27 Some
2 46 Treated Male 29 None
3 77 Treated Male 30 None
4 17 Treated Male 32 Marked
5 36 Treated Male 46 Marked
6 23 Treated Male 58 Marked
str(Arthritis) # structure of the data frame
'data.frame': 84 obs. of 5 variables:
$ ID : int 57 46 77 17 36 23 75 39 33 55 ...
$ Treatment: Factor w/ 2 levels "Placebo","Treated": 2 2 2 2 2 2 2 2 2 2 ...
$ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
$ Age : int 27 29 30 32 46 58 59 59 63 63 ...
$ Improved : Ord.factor w/ 3 levels "None"<"Some"<..: 2 1 1 3 3 3 1 3 1 1 ...
Treatment(Placebo, Treated), Sex(Male, Female), and Improved (None, Some, Marked) are all categorical factors.
Simple frequrency counts using the table() function.
mytable <- with(Arthritis, table(Improved))
mytable # frequencies
Improved
None Some Marked
42 14 28
Frequencies into proportions and percentages with prop.table() function.
prop.table(mytable) # proportions
Improved
None Some Marked
0.5000000 0.1666667 0.3333333
prop.table(mytable)*100 # percentages
Improved
None Some Marked
50.00000 16.66667 33.33333
mytable <- xtabs(~ Treatment+Improved, data=Arthritis)
mytable # frequencies
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
Generate marginal frequencies and proportions using the margin.table() and prop.table() functions.
margin.table(mytable,1) #row sums
Treatment
Placebo Treated
43 41
prop.table(mytable, 1) # row proportions
Improved
Treatment None Some Marked
Placebo 0.6744186 0.1627907 0.1627907
Treated 0.3170732 0.1707317 0.5121951
The index (1) refers to the first variable in the table() statement.
Generate marginal frequencies and proportions using the margin.table() and prop.table() functions.
margin.table(mytable, 2) # column sums
Improved
None Some Marked
42 14 28
prop.table(mytable, 2) # column proportions
Improved
Treatment None Some Marked
Placebo 0.6904762 0.5000000 0.2500000
Treated 0.3095238 0.5000000 0.7500000
The index (2) refers to the second variable in the table() statement.
We can obtained the cell proportions using the prop.table() statement.
prop.table(mytable) # cell proportions
Improved
Treatment None Some Marked
Placebo 0.34523810 0.08333333 0.08333333
Treated 0.15476190 0.08333333 0.25000000
Row and column sums using the funtion addmargins().
addmargins(mytable) # add row and column sums to table
Improved
Treatment None Some Marked Sum
Placebo 29 7 7 43
Treated 13 7 21 41
Sum 42 14 28 84
Sum margins for all the variable.
addmargins(prop.table(mytable))
Improved
Treatment None Some Marked Sum
Placebo 0.34523810 0.08333333 0.08333333 0.51190476
Treated 0.15476190 0.08333333 0.25000000 0.48809524
Sum 0.50000000 0.16666667 0.33333333 1.00000000
Column sum.
addmargins(prop.table(mytable, 1), 2) # column sum
Improved
Treatment None Some Marked Sum
Placebo 0.6744186 0.1627907 0.1627907 1.0000000
Treated 0.3170732 0.1707317 0.5121951 1.0000000
Row sum.
addmargins(prop.table(mytable, 2), 1) # row sum
Improved
Treatment None Some Marked
Placebo 0.6904762 0.5000000 0.2500000
Treated 0.3095238 0.5000000 0.7500000
Sum 1.0000000 1.0000000 1.0000000
Creating two way tables using CrossTable() function in the gmodels() package.
library(gmodels)
CrossTable(Arthritis$Treatment, Arthritis$Improved)