library(vcd)
library(vcdExtra)
library(dplyr)The data set DanishWelfare in vcd gives a 4-way, 3 × 4 × 3 × 5 table as a data frame in frequency form, containing the variable Freq and four factors, Alcohol, Income, Status, and Urban. The variable Alcohol can be considered as the response variable, and the others as possible predictors.
Read the data and find the total number of cases represented in this table.
data("DanishWelfare")
sum(DanishWelfare$Freq)## [1] 5144
There are a total of 5144 cases represented in this table.
See the structure of the data and change the variables Alcohol and Income into ordered variables.
str(DanishWelfare)## 'data.frame': 180 obs. of 5 variables:
## $ Freq : num 1 4 1 8 6 14 8 41 100 175 ...
## $ Alcohol: Factor w/ 3 levels "<1","1-2",">2": 1 1 1 1 1 1 1 1 1 1 ...
## $ Income : Factor w/ 4 levels "0-50","50-100",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Status : Factor w/ 3 levels "Widow","Married",..: 1 1 1 1 1 2 2 2 2 2 ...
## $ Urban : Factor w/ 5 levels "Copenhagen","SubCopenhagen",..: 1 2 3 4 5 1 2 3 4 5 ...
DanishWelfare$Alcohol <- as.ordered(DanishWelfare$Alcohol)
DanishWelfare$Income <- as.ordered(DanishWelfare$Income)
str(DanishWelfare)## 'data.frame': 180 obs. of 5 variables:
## $ Freq : num 1 4 1 8 6 14 8 41 100 175 ...
## $ Alcohol: Ord.factor w/ 3 levels "<1"<"1-2"<">2": 1 1 1 1 1 1 1 1 1 1 ...
## $ Income : Ord.factor w/ 4 levels "0-50"<"50-100"<..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Status : Factor w/ 3 levels "Widow","Married",..: 1 1 1 1 1 2 2 2 2 2 ...
## $ Urban : Factor w/ 5 levels "Copenhagen","SubCopenhagen",..: 1 2 3 4 5 1 2 3 4 5 ...
Convert this data frame to table form, DanishWelfare.tab, a 4-way array containing the frequencies with appropriate variable names and level names (hint: review xtabs()).
DanishWelfare.tab <- ftable(xtabs(Freq ~Alcohol+Income+Status+Urban, data = DanishWelfare))
DanishWelfare.tab## Urban Copenhagen SubCopenhagen LargeCity City Country
## Alcohol Income Status
## <1 0-50 Widow 1 4 1 8 6
## Married 14 8 41 100 175
## Unmarried 6 1 2 6 9
## 50-100 Widow 8 2 7 14 5
## Married 42 51 62 234 255
## Unmarried 7 5 9 20 27
## 100-150 Widow 2 3 1 5 2
## Married 21 30 23 87 77
## Unmarried 3 2 1 12 4
## >150 Widow 42 29 17 95 46
## Married 24 30 50 167 232
## Unmarried 33 24 15 64 68
## 1-2 0-50 Widow 3 0 1 4 2
## Married 15 7 15 25 48
## Unmarried 2 3 9 9 7
## 50-100 Widow 1 1 3 8 4
## Married 39 59 68 172 143
## Unmarried 12 3 11 20 23
## 100-150 Widow 5 4 1 9 4
## Married 32 68 43 128 86
## Unmarried 6 10 5 21 15
## >150 Widow 26 34 14 48 24
## Married 43 76 70 198 136
## Unmarried 36 23 48 89 64
## >2 0-50 Widow 2 0 2 1 0
## Married 1 2 2 7 7
## Unmarried 3 0 1 5 1
## 50-100 Widow 3 0 2 1 3
## Married 14 21 14 38 35
## Unmarried 2 0 3 12 13
## 100-150 Widow 2 1 1 1 0
## Married 20 31 10 36 21
## Unmarried 0 2 3 9 7
## >150 Widow 21 13 5 20 8
## Married 23 47 21 53 36
## Unmarried 38 20 13 39 26
The variable Urban has 5 categories. Find the total frequencies in each of these.
DanishWelfare %>% group_by(Urban) %>% summarize(Freq = sum(Freq))The data set UKSoccer in vcd gives the distributions of number of goals scored by the 20 teams in the 1995/96 season of the Premier League of the UK Football Association.
data("UKSoccer")
ftable(UKSoccer)## Away 0 1 2 3 4
## Home
## 0 27 29 10 8 2
## 1 59 53 14 12 4
## 2 28 32 14 12 4
## 3 19 14 7 4 1
## 4 7 8 10 2 0
Verify that the total number of games represented in this table is 380.
sum(UKSoccer)## [1] 380
Express each of the marginal totals as proportions.
addmargins(UKSoccer)## Away
## Home 0 1 2 3 4 Sum
## 0 27 29 10 8 2 76
## 1 59 53 14 12 4 142
## 2 28 32 14 12 4 90
## 3 19 14 7 4 1 45
## 4 7 8 10 2 0 27
## Sum 140 136 55 38 11 380
prop.table(margin.table(UKSoccer,1))## Home
## 0 1 2 3 4
## 0.20000000 0.37368421 0.23684211 0.11842105 0.07105263
prop.table(margin.table(UKSoccer,2))## Away
## 0 1 2 3 4
## 0.36842105 0.35789474 0.14473684 0.10000000 0.02894737
The one-way frequency table Saxony in vcd records the frequencies of families with 0, 1, 2, . . . 12 male children, among 6115 families with 12 children.
data("Saxony")
Saxony## nMales
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 3 24 104 286 670 1033 1343 1112 829 478 181 45 7
Another data set, Geissler, in the vcdExtra package, gives the complete tabulation of all combinations of boys and girls in families with a given total number of children (size). The task here is to create an equivalent table, Saxony12 from the Geissler data.
data("Geissler")
str(Geissler)## 'data.frame': 90 obs. of 4 variables:
## $ boys : int 0 0 0 0 0 0 0 0 0 0 ...
## $ girls: num 1 2 3 4 5 6 7 8 9 10 ...
## $ size : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Freq : int 108719 42860 17395 7004 2839 1096 436 161 66 30 ...
Use subset() to create a data frame, sax12 containing the Geissler observations in families with size==12.
sax12 <- subset(Geissler, size==12)
sax12Select the columns for boys and Freq.
select(sax12, boys, Freq)Use xtabs() with a formula, Freq ~ boys, to create the one-way table.
xtabs(Freq ~ boys,data = sax12)## boys
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 3 24 104 286 670 1033 1343 1112 829 478 181 45 7