The data frame Geissler in the vcdExtra package contains the complete data from Geissler’s (1889) tabulation of family sex composition in Saxony. Subset the data that gives the number of boys in families of size 11.
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.5.3
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.5.3
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.5.3
data("Saxony", package = "vcd")
data("Geissler", package = "vcdExtra")
Saxony
## nMales
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 3 24 104 286 670 1033 1343 1112 829 478 181 45 7
str(Geissler)
## 'data.frame': 90 obs. of 4 variables:
## $ boys : int 0 0 0 0 0 0 0 0 0 0 ...
## $ girls: num 1 2 3 4 5 6 7 8 9 10 ...
## $ size : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Freq : int 108719 42860 17395 7004 2839 1096 436 161 66 30 ...
good.fit <- goodfit(Saxony, type = "binomial")
## Warning in goodfit(Saxony, type = "binomial"): size was not given, taken as
## maximum count
summary(good.fit)
##
## Goodness-of-fit test for binomial distribution
##
## X^2 df P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16
There is a warning that the size is not given. Hence we will need to take the subset of Geissle
sax11 <- subset(Geissler, size==11)
sax11_boys <- subset(sax11, select=c("boys","Freq"))
sax11.boys.tab <- xtabs(Freq ~ boys, data=sax11)
good.fit <- goodfit(sax11.boys.tab, par=list(size=11), type="binomial")
good.fit
##
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML'
##
## count observed fitted pearson residual
## 0 8 3.561571 2.3518439
## 1 72 41.947913 4.6400157
## 2 275 224.572436 3.3650364
## 3 837 721.362873 4.3054683
## 4 1540 1544.755904 -0.1210049
## 5 2161 2315.602347 -3.2128030
## 6 2310 2479.362698 -3.4013219
## 7 1801 1896.217320 -2.1866129
## 8 1077 1015.159294 1.9409187
## 9 492 362.317259 6.8129887
## 10 93 77.588097 1.7496804
## 11 24 7.552287 5.9850291
summary(good.fit)
##
## Goodness-of-fit test for binomial distribution
##
## X^2 df P(> X^2)
## Likelihood Ratio 148.0892 10 9.212554e-27
Noticed that the binomial distribution does not fit perfectly. It is noticable on the ends of 0-4 and 9-11, the values are under fitted; while the values are overfitted in the middle range 4-8.
plot(sax11.boys.tab)
Ord_plot(sax11.boys.tab, legend = TRUE)
When ‘ord_plot’ function is used, it suggests that the distribution is close to being a binomial distribution. The probability is 0.412.
distplot(sax11.boys.tab, type="binomial", size=11)
The binomial distribution is negative, which does not fit at all.