library(vcd)
library(vcdExtra)
library(dplyr)

1 Exercise 2.4

The data set DanishWelfare in vcd gives a 4-way, 3 × 4 × 3 × 5 table as a data frame in frequency form, containing the variable Freq and four factors, Alcohol, Income, Status, and Urban. The variable Alcohol can be considered as the response variable, and the others as possible predictors.

1.1 Part A.

Read the data and find the total number of cases represented in this table.

data("DanishWelfare")
sum(DanishWelfare$Freq)
## [1] 5144

There are a total of 5144 cases represented in this table.

1.2 Part B.

See the structure of the data and change the variables Alcohol and Income into ordered variables.

str(DanishWelfare)
## 'data.frame':    180 obs. of  5 variables:
##  $ Freq   : num  1 4 1 8 6 14 8 41 100 175 ...
##  $ Alcohol: Factor w/ 3 levels "<1","1-2",">2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Income : Factor w/ 4 levels "0-50","50-100",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Status : Factor w/ 3 levels "Widow","Married",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ Urban  : Factor w/ 5 levels "Copenhagen","SubCopenhagen",..: 1 2 3 4 5 1 2 3 4 5 ...
DanishWelfare$Alcohol <- as.ordered(DanishWelfare$Alcohol)
DanishWelfare$Income <- as.ordered(DanishWelfare$Income)
str(DanishWelfare)
## 'data.frame':    180 obs. of  5 variables:
##  $ Freq   : num  1 4 1 8 6 14 8 41 100 175 ...
##  $ Alcohol: Ord.factor w/ 3 levels "<1"<"1-2"<">2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Income : Ord.factor w/ 4 levels "0-50"<"50-100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Status : Factor w/ 3 levels "Widow","Married",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ Urban  : Factor w/ 5 levels "Copenhagen","SubCopenhagen",..: 1 2 3 4 5 1 2 3 4 5 ...

1.3 Part C.

Convert this data frame to table form, DanishWelfare.tab, a 4-way array containing the frequencies with appropriate variable names and level names (hint: review xtabs()).

DanishWelfare.tab <- ftable(xtabs(Freq ~Alcohol+Income+Status+Urban, data = DanishWelfare))
DanishWelfare.tab
##                           Urban Copenhagen SubCopenhagen LargeCity City Country
## Alcohol Income  Status                                                         
## <1      0-50    Widow                    1             4         1    8       6
##                 Married                 14             8        41  100     175
##                 Unmarried                6             1         2    6       9
##         50-100  Widow                    8             2         7   14       5
##                 Married                 42            51        62  234     255
##                 Unmarried                7             5         9   20      27
##         100-150 Widow                    2             3         1    5       2
##                 Married                 21            30        23   87      77
##                 Unmarried                3             2         1   12       4
##         >150    Widow                   42            29        17   95      46
##                 Married                 24            30        50  167     232
##                 Unmarried               33            24        15   64      68
## 1-2     0-50    Widow                    3             0         1    4       2
##                 Married                 15             7        15   25      48
##                 Unmarried                2             3         9    9       7
##         50-100  Widow                    1             1         3    8       4
##                 Married                 39            59        68  172     143
##                 Unmarried               12             3        11   20      23
##         100-150 Widow                    5             4         1    9       4
##                 Married                 32            68        43  128      86
##                 Unmarried                6            10         5   21      15
##         >150    Widow                   26            34        14   48      24
##                 Married                 43            76        70  198     136
##                 Unmarried               36            23        48   89      64
## >2      0-50    Widow                    2             0         2    1       0
##                 Married                  1             2         2    7       7
##                 Unmarried                3             0         1    5       1
##         50-100  Widow                    3             0         2    1       3
##                 Married                 14            21        14   38      35
##                 Unmarried                2             0         3   12      13
##         100-150 Widow                    2             1         1    1       0
##                 Married                 20            31        10   36      21
##                 Unmarried                0             2         3    9       7
##         >150    Widow                   21            13         5   20       8
##                 Married                 23            47        21   53      36
##                 Unmarried               38            20        13   39      26

1.4 Part D.

The variable Urban has 5 categories. Find the total frequencies in each of these.

DanishWelfare %>% group_by(Urban) %>% summarize(Freq = sum(Freq))

2 Exercise 2.5

The data set UKSoccer in vcd gives the distributions of number of goals scored by the 20 teams in the 1995/96 season of the Premier League of the UK Football Association.

data("UKSoccer")
ftable(UKSoccer)
##      Away  0  1  2  3  4
## Home                    
## 0         27 29 10  8  2
## 1         59 53 14 12  4
## 2         28 32 14 12  4
## 3         19 14  7  4  1
## 4          7  8 10  2  0

2.1 Part A.

Verify that the total number of games represented in this table is 380.

sum(UKSoccer)
## [1] 380

2.2 Part B.

Express each of the marginal totals as proportions.

addmargins(UKSoccer)
##      Away
## Home    0   1   2   3   4 Sum
##   0    27  29  10   8   2  76
##   1    59  53  14  12   4 142
##   2    28  32  14  12   4  90
##   3    19  14   7   4   1  45
##   4     7   8  10   2   0  27
##   Sum 140 136  55  38  11 380
prop.table(margin.table(UKSoccer,1))
## Home
##          0          1          2          3          4 
## 0.20000000 0.37368421 0.23684211 0.11842105 0.07105263
prop.table(margin.table(UKSoccer,2))
## Away
##          0          1          2          3          4 
## 0.36842105 0.35789474 0.14473684 0.10000000 0.02894737

3 Exercise 2.6

The one-way frequency table Saxony in vcd records the frequencies of families with 0, 1, 2, . . . 12 male children, among 6115 families with 12 children.

data("Saxony")
Saxony
## nMales
##    0    1    2    3    4    5    6    7    8    9   10   11   12 
##    3   24  104  286  670 1033 1343 1112  829  478  181   45    7

Another data set, Geissler, in the vcdExtra package, gives the complete tabulation of all combinations of boys and girls in families with a given total number of children (size). The task here is to create an equivalent table, Saxony12 from the Geissler data.

data("Geissler")
str(Geissler)
## 'data.frame':    90 obs. of  4 variables:
##  $ boys : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ girls: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ size : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Freq : int  108719 42860 17395 7004 2839 1096 436 161 66 30 ...

3.1 Part A.

Use subset() to create a data frame, sax12 containing the Geissler observations in families with size==12.

sax12 <- subset(Geissler, size==12)
sax12

3.2 Part B.

Select the columns for boys and Freq.

select(sax12, boys, Freq)

3.3 Part C.

Use xtabs() with a formula, Freq ~ boys, to create the one-way table.

xtabs(Freq ~ boys,data = sax12)
## boys
##    0    1    2    3    4    5    6    7    8    9   10   11   12 
##    3   24  104  286  670 1033 1343 1112  829  478  181   45    7