Exercise 3.1 The Arbuthnot data in HistData also contains the variable Ratio, giving the ratio of male to female births. (a) Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births?

library(HistData)
library(ggplot2)
data('Arbuthnot')
str(Arbuthnot)
## 'data.frame':    82 obs. of  7 variables:
##  $ Year     : int  1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 ...
##  $ Males    : int  5218 4858 4422 4994 5158 5035 5106 4917 4703 5359 ...
##  $ Females  : int  4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 ...
##  $ Plague   : int  0 1317 274 8 0 1 0 10400 3082 363 ...
##  $ Mortality: int  8771 10554 8562 9535 8393 10400 10651 23359 11763 13624 ...
##  $ Ratio    : num  1.11 1.09 1.08 1.09 1.07 ...
##  $ Total    : num  9.9 9.31 8.52 9.58 10 ...
with(Arbuthnot, plot(Year,Ratio, type='h', ylab="Sex Ratio: Male vs. Female"))
abline(lm(Arbuthnot$Ratio ~ Arbuthnot$Year), lwd=5,col="blue")

with(Arbuthnot, plot(Year,Males, type='h', ylab="Males Over Year"))
abline(lm(Arbuthnot$Males ~ Arbuthnot$Year), lwd=5,col="red")

with(Arbuthnot, plot(Year,Females, type='h', ylab="Females Over Year"))
abline(lm(Arbuthnot$Females ~ Arbuthnot$Year), lwd=5,col="red")

Based on plots above, the ratio feature stands out. I prefer to display the Male over Female ratio to display the tendency of more male births over years.

  1. Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see?
qplot(Year, Total, data = Arbuthnot, geom = c("point", "smooth"), ylab="Total", xlab= "Year")
## `geom_smooth()` using method = 'loess'

Based on plots above, the total number of christenings tend to fluctuate before 1680, and the total number tend to smooth after 1680.

Exercise 3.3 Use the data set WomenQueue to: (c) Make a reasonable plot showing departure from the binomial distribution.

data("WomenQueue", package="vcd")
library(vcd)
## Loading required package: grid
wm.fit <- goodfit(WomenQueue, type="binomial")
## Warning in goodfit(WomenQueue, type = "binomial"): size was not given,
## taken as maximum count
plot(wm.fit,type="standing", xlab="Number of Women")

  1. Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).
wm_fit <- goodfit(WomenQueue, type="binomial", par = list(size = 10))
summary(wm_fit)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                       X^2 df  P(> X^2)
## Likelihood Ratio 8.650999  8 0.3725869

the p value is 0.37, which is higher than 0.05 and not significant

Exercise 3.4 Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, 𝑖𝑛(𝑛 = 12, 𝑝 = 1) , specifying equal probability for boys and girls. [Hint: 2 you need to specify both size and prob values for goodfit ().]

  1. Carry out the GOF test for this fixed binomial distribution. What is the ratio of χ2/df? What do you conclude?
Sax.fit = goodfit(Saxony, type = "binomial", par = list(size = 12))
unlist(Sax.fit$par)
##      prob      size 
##  0.519215 12.000000
Sax.fit
## 
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML' 
## 
##  count observed       fitted pearson residual
##      0        3    0.9328394        2.1402809
##      1       24   12.0888374        3.4257991
##      2      104   71.8031709        3.7996298
##      3      286  258.4751335        1.7120476
##      4      670  628.0550119        1.6737139
##      5     1033 1085.2107008       -1.5849023
##      6     1343 1367.2793552       -0.6566116
##      7     1112 1265.6303069       -4.3184059
##      8      829  854.2466464       -0.8637977
##      9      478  410.0125627        3.3576088
##     10      181  132.8357027        4.1789562
##     11       45   26.0824586        3.7041659
##     12        7    2.3472734        3.0368664
  1. Test the additional lack of fit for the model 𝐵𝑖𝑛(𝑛 = 12, 𝑝 = 1) compared to the 2 model 𝐵𝑖𝑛(𝑛 = 12, 𝑝 = 𝑝̂) where 𝑝̂ is estimated from the data.
sax.fit2 = goodfit(Saxony, type = "binomial", par = list(prob = 0.5, size = 12))
sax.fit2
## 
## Observed and fitted values for binomial distribution
## with fixed parameters 
## 
##  count observed     fitted pearson residual
##      0        3    1.49292        1.2334401
##      1       24   17.91504        1.4376359
##      2      104   98.53271        0.5507842
##      3      286  328.44238       -2.3419098
##      4      670  738.99536       -2.5380434
##      5     1033 1182.39258       -4.3445838
##      6     1343 1379.45801       -0.9816094
##      7     1112 1182.39258       -2.0471328
##      8      829  738.99536        3.3108845
##      9      478  328.44238        8.2523747
##     10      181   98.53271        8.3079041
##     11       45   17.91504        6.3991064
##     12        7    1.49292        4.5071617
sax.fit3 = goodfit(Saxony, type = "binomial", par = list(prob = 1, size = 12))
sax.fit3
## 
## Observed and fitted values for binomial distribution
## with fixed parameters 
## 
##  count observed fitted pearson residual
##      0        3      0              Inf
##      1       24      0              Inf
##      2      104      0              Inf
##      3      286      0              Inf
##      4      670      0              Inf
##      5     1033      0              Inf
##      6     1343      0              Inf
##      7     1112      0              Inf
##      8      829      0              Inf
##      9      478      0              Inf
##     10      181      0              Inf
##     11       45      0              Inf
##     12        7   6115        -78.10895
  1. Use the plot.goodfit () method to visualize these two models.
plot(goodfit(Saxony, type="binomial", par=list(prob = 0.5, size = 12)))

plot(goodfit(Saxony, type="binomial", par=list(prob = 1, size = 12)))