library(vcd)
## Warning: package 'vcd' was built under R version 3.3.3
## Loading required package: grid
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.3.3
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.3.3
library (HistData)
## Warning: package 'HistData' was built under R version 3.3.3
library(MASS)
## Warning: package 'MASS' was built under R version 3.3.3
library(lattice)
## 
## Attaching package: 'lattice'
## The following object is masked from 'package:gnm':
## 
##     barley

Exercise 3.1

str(Arbuthnot)
## 'data.frame':    82 obs. of  7 variables:
##  $ Year     : int  1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 ...
##  $ Males    : int  5218 4858 4422 4994 5158 5035 5106 4917 4703 5359 ...
##  $ Females  : int  4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 ...
##  $ Plague   : int  0 1317 274 8 0 1 0 10400 3082 363 ...
##  $ Mortality: int  8771 10554 8562 9535 8393 10400 10651 23359 11763 13624 ...
##  $ Ratio    : num  1.11 1.09 1.08 1.09 1.07 ...
##  $ Total    : num  9.9 9.31 8.52 9.58 10 ...

(a) Make a plot of Ratio over Year

data("Arbuthnot", package="HistData")

with(Arbuthnot,   Ratio = Males/Females,
plot(x=Year, y=Ratio, type = "b", ylim=c(0.95, 1.20), ylab="Ratio"), 
abline(h=1.0, col = "red", lwd=2),
abline(h=mean(Ratio),col="green"),
text(x=1620, y=1.0, expression(H[0]: "Ratio=1.0")),
lines(loess.smooth(Year, Ratio),col="green", lwd=2) )

Which features stand out? In the plot above, we can see that each ratio is greater than one; hence, in each ratio male is greater than female, and male/female > 1.

The plot above clearly shows that each data point represents the ratio, Male/Female, and each data point has a y-value that is greater than one. This display shows the tendency for mor male births.

(b) Plot the total number of christenings, Males + Females or Total (in thousands) over time. What unusual features do you see?

with(Arbuthnot,
     plot(x=Year, y=Total, ylim=c(0.0, 25), ylab="Total Number of Christenings(000s)"), abline(h=mean(Total),col="green"))

Based on the plot above, during the period from 1640 to 1660 there was a drop in the number of christenings. There seems to have been a gradual, but steady, increase of christenings from 1660 to circa 1710. Moreover, there was an outlier around 1710.

Also, it seems the number of christenings, over the years, are found in the 5 to 15 number band.

str(WomenQueue)
##  table [1:11(1d)] 1 3 4 23 25 19 18 5 1 1 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ nWomen: chr [1:11] "0" "1" "2" "3" ...

Exercise 3.3

(c) Use the dataset, WomenQueue, to make a reasonable plot showing departure from the binomial distribution.

plot(WomenQueue)

(d) Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).

Let us calculate the binomial probabilities below to give us some insight:

k<-0:10
bprob<-dbinom(k, 10, 1/2)
b<-barplot(bprob, names.arg = k, xlab="number of Women in queues", 
           ylab="Frequency")
lines(x=b, y=bprob, col="blue")

We will keep in mind that when p is small (below 0.5), the binomial distribution will be nearly symmetrical. Otherwise, it will skew to the left or the right. Since p=0.5 we get a nearly symmetrical shape (above).

However, we do notice that the number of women in queues of length 10 is basically zero. Queues of lenght zero and of length 10 are on the fringes of the distribution and will not be considered. I am assuming this is a departure from a binomial distribution.

Exercise 3.4

Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, n(n=12, p=1/2), specifying equal probability for boys and girls. [Hint: you need to specify both size and probalility values for goodfit().]

The follwing sex ratio data should follow a binomial distribution:

str(Saxony)
##  table [1:13(1d)] 3 24 104 286 670 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ nMales: chr [1:13] "0" "1" "2" "3" ...
plot(Saxony)

It seems families with either no males or 12 males indicated a departure from the binomial distribution.

(a) Carry out the GOF test for this fixed binomial distribution. What is the ratio of (Chi-Square)/(Degrees of Freedom)? What do you conclude?

k<-0:12
SX<-dbinom(k, 12, 1/2)
bp<-barplot(Saxony, names.arg = k, xlab = "Number of Males", 
            ylab = "Number of Families", 
            main = "Number of Males in Saxony families of size 12 (Model 1)")

Let’s carry out the GOF test for the binomial distribution, n(n = 12, p = p-cap). Then find the ratio, X^2/df:

Sax.fit <-goodfit(Saxony, type="binomial")
## Warning in goodfit(Saxony, type = "binomial"): size was not given, taken as
## maximum count
unlist(Sax.fit$par)    # estimated parameters
##      prob      size 
##  0.519215 12.000000
names(Sax.fit)
## [1] "observed" "count"    "fitted"   "type"     "method"   "df"      
## [7] "par"
Sax.fit
## 
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML' 
## 
##  count observed       fitted pearson residual
##      0        3    0.9328394        2.1402809
##      1       24   12.0888374        3.4257991
##      2      104   71.8031709        3.7996298
##      3      286  258.4751335        1.7120476
##      4      670  628.0550119        1.6737139
##      5     1033 1085.2107008       -1.5849023
##      6     1343 1367.2793552       -0.6566116
##      7     1112 1265.6303069       -4.3184059
##      8      829  854.2466464       -0.8637977
##      9      478  410.0125627        3.3576088
##     10      181  132.8357027        4.1789562
##     11       45   26.0824586        3.7041659
##     12        7    2.3472734        3.0368664

Note that the observed and the fitted values are close.

Let’s find the summary and the calculation of the likelihood ratio Chi-square GOF test when the ML estimation method is used:

summary(Sax.fit)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                      X^2 df     P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16

The ratio of (Chi-square)/(degrees of Freedom) is the following:

97.0065/11
## [1] 8.818773

Note: When calculating the likelihood ratio Chi-squared GOF test (when the ML estimation method is used), note the following:

(1) A very good fit occurs when the p-value for the test Chi-square statistic is so high, as to suggest that something unreasonable under random sampling might have occured.(Footnote, M. Friendly), and

(2) A handy rule of thumb is to think of the ratio of X^2/df, because, under the null hypothesis of acceptable fit, epsilon(X^2/df) = 1, so ratios exceeding 2.5 are troubling. In our case, the ratio, 97.0065/11 = 8.81877, so the lack of fit is substantial.

(b) Test the lack of fit for the model,Bin(n=12, p =1/2) (Model 1)

Now, compare Model 2 with the additional lack of fit for the model, Bin(n=12, p=1/2): (Model 1).

Please note that I was not able to perform this part. Sorry. However, I do believe the ratio will be about the same: 8.8.

From 3.4 (a) above, where we used “SX <- dbinom(k,12, 1/2)”, we saw the graph of Model 1 to be very close to being symmetrical. Also, we saw that p-cap = 0.51922. The estimated p-cap value is very close to p = 1/2.

(c) Use the plot(goodfit()) method to visualize the two models.

plot(Sax.fit, main ="Binomial--Model 2 with goodfit Method", shade = TRUE, type ="standing")

Note that the graph of Model 2 is very close to the graph of Model 1. They are both symmetrical.