Exercise 3.1 The Arbuthnot data in HistData also contains the variable Ratio, giving the ratio of male to female births. (a) Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births? From plot below, we can see that there is a big difference between male and female birth ratios.

library(HistData)
min_ratio <- min(Arbuthnot$Ratio)-0.02
max_ratio <- max(Arbuthnot$Ratio)+0.02
data("Arbuthnot")
with(Arbuthnot, plot(Year,Ratio, type='h', ylim=c(min_ratio,max_ratio), ylab="Sex Ratio: Male vs. Female"))
abline(h=1, col="yellow", lwd=5, lty=3)
abline(lm(Arbuthnot$Ratio ~ Arbuthnot$Year), lwd=5,col="red")

(b) Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see? From plot below, we can see that between 1640 and 1660, only a little chirstenings and then they grew greatly till 1680.

min_total <- min(Arbuthnot$Total)-0.05
max_total <- max(Arbuthnot$Total)+0.05
with(Arbuthnot, plot(Year,Total,type='h',ylim=c(min_total,max_total),ylab="Total Number of Christenings (in 000s")) 
abline(h=5.5, col="blue", lwd=5, lty=3)
abline(h=9, col="blue", lwd=5, lty=3)
abline(h=11.5, col="blue", lwd=5, lty=3)
abline(lm(Arbuthnot$Total ~ Arbuthnot$Year), lwd=5,col="yellow")

Exercise 3.3 Use the data set WomenQueue to: (c) Make a reasonable plot showing departure from the binomial distribution.

library(vcd)
## Warning: package 'vcd' was built under R version 3.4.3
## Loading required package: grid
library(grid)
goodFit= goodfit(WomenQueue, type = "binomial", par = list(size = 10))
plot(goodFit,type = "hanging", shade = TRUE)

(d) Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).

goodFitP= goodfit(WomenQueue, type = "binomial", par = list(prob = .5,size = 10))
plot(goodFitP,type = "hanging", shade = TRUE)
## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length

## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length

plot(goodFitP)

distplot(WomenQueue, type = c("binomial"), conf_level = 0.95)
## Warning in distplot(WomenQueue, type = c("binomial"), conf_level = 0.95):
## size was not given, taken as maximum count

Exercise 3.4 Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, 𝑖𝑛(𝑛 = 12, 𝑝 = 1) , specifying equal probability for boys and girls. [Hint: 2 y need to specify both size and prob values for goodfit ().] (a) Carry out the GOF test for this fixed binomial distribution. What is the ratio of χ2/df? What do you conclude?

gfS1=goodfit(Saxony, type = "binomial", par = list(size = 12))
gfS1
## 
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML' 
## 
##  count observed       fitted pearson residual
##      0        3    0.9328394        2.1402809
##      1       24   12.0888374        3.4257991
##      2      104   71.8031709        3.7996298
##      3      286  258.4751335        1.7120476
##      4      670  628.0550119        1.6737139
##      5     1033 1085.2107008       -1.5849023
##      6     1343 1367.2793552       -0.6566116
##      7     1112 1265.6303069       -4.3184059
##      8      829  854.2466464       -0.8637977
##      9      478  410.0125627        3.3576088
##     10      181  132.8357027        4.1789562
##     11       45   26.0824586        3.7041659
##     12        7    2.3472734        3.0368664
  1. Test the additional lack of fit for the model 𝐵𝑖𝑛(𝑛 = 12, 𝑝 = 1) compared to the 2 model 𝐵𝑖𝑛(𝑛 = 12, 𝑝 = 𝑝̂) where 𝑝̂ is estimated from the data.
gfS=goodfit(Saxony, type = "binomial", par = list(prob = 0.5, size = 12))
gfS
## 
## Observed and fitted values for binomial distribution
## with fixed parameters 
## 
##  count observed     fitted pearson residual
##      0        3    1.49292        1.2334401
##      1       24   17.91504        1.4376359
##      2      104   98.53271        0.5507842
##      3      286  328.44238       -2.3419098
##      4      670  738.99536       -2.5380434
##      5     1033 1182.39258       -4.3445838
##      6     1343 1379.45801       -0.9816094
##      7     1112 1182.39258       -2.0471328
##      8      829  738.99536        3.3108845
##      9      478  328.44238        8.2523747
##     10      181   98.53271        8.3079041
##     11       45   17.91504        6.3991064
##     12        7    1.49292        4.5071617
  1. Use the plot.goodfit () method to visualize these two models.
plot(gfS1,main = "Families in Saxony" ,xlab= "Number of Males")

plot(gfS)