The data frame Geissler in the vcdExtra package contains the complete data from Geissler’s (1889) tabulation of family sex composition in Saxony. Subset the data that gives the number of boys in families of size 11.

  1. Read these data into R.
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.5.3
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.5.3
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.5.3
data("Saxony", package = "vcd")
data("Geissler", package = "vcdExtra")
Saxony
## nMales
##    0    1    2    3    4    5    6    7    8    9   10   11   12 
##    3   24  104  286  670 1033 1343 1112  829  478  181   45    7
str(Geissler)
## 'data.frame':    90 obs. of  4 variables:
##  $ boys : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ girls: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ size : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Freq : int  108719 42860 17395 7004 2839 1096 436 161 66 30 ...
  1. Use goodfit () to fit the binomial model and plot the results. Is there an indication that the binomial does not fit these data?
good.fit <- goodfit(Saxony, type = "binomial")
## Warning in goodfit(Saxony, type = "binomial"): size was not given, taken as
## maximum count
summary(good.fit)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                      X^2 df     P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16

There is a warning that the size is not given. Hence we will need to take the subset of Geissle

sax11 <- subset(Geissler, size==11)
sax11_boys <- subset(sax11, select=c("boys","Freq"))
sax11.boys.tab <- xtabs(Freq ~ boys, data=sax11)
good.fit <- goodfit(sax11.boys.tab, par=list(size=11), type="binomial")
good.fit
## 
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML' 
## 
##  count observed      fitted pearson residual
##      0        8    3.561571        2.3518439
##      1       72   41.947913        4.6400157
##      2      275  224.572436        3.3650364
##      3      837  721.362873        4.3054683
##      4     1540 1544.755904       -0.1210049
##      5     2161 2315.602347       -3.2128030
##      6     2310 2479.362698       -3.4013219
##      7     1801 1896.217320       -2.1866129
##      8     1077 1015.159294        1.9409187
##      9      492  362.317259        6.8129887
##     10       93   77.588097        1.7496804
##     11       24    7.552287        5.9850291
summary(good.fit)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                       X^2 df     P(> X^2)
## Likelihood Ratio 148.0892 10 9.212554e-27

Noticed that the binomial distribution does not fit perfectly. It is noticable on the ends of 0-4 and 9-11, the values are under fitted; while the values are overfitted in the middle range 4-8.

  1. Diagnose the form of the distribution
plot(sax11.boys.tab)

Ord_plot(sax11.boys.tab, legend = TRUE)

When ‘ord_plot’ function is used, it suggests that the distribution is close to being a binomial distribution. The probability is 0.412.

  1. Try fitting the negative binomial distribution, and use distplot () to diagnose whether the negative binomial is a reasonable fit.
distplot(sax11.boys.tab, type="binomial", size=11)

The binomial distribution is negative, which does not fit at all.