545 HW2

Exercise 3.1

See the Arbuthnot data in HistData library. The Arbuthnot data set refers to Dr. John Arbuthnot, an 18th century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the christening records for children born in London for every year from 1629 to 1710. It also contains the variable Ratio, giving the ratio of male to female births.

Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births? Red line represents a ratio of 1 where the number of males christening equals that of female. Since all of the dots are above the line, there are more newborn boys than girls.

library("HistData", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resources/library")
plot(Arbuthnot$Ratio ~ Arbuthnot$Year, ylim=c(0.95,1.15))
abline(h=1, col="red")

Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see?

Between the 1640s and 1660s, the number of christenings dramatically decreased.

plot(Arbuthnot$Males + Arbuthnot$Females ~ Arbuthnot$Year)

Exercise 3.3

Use the data set WomenQueue to:

Make a reasonable plot showing departure from the binomial distribution.

library(vcd)

## Loading required package: grid

data(WomenQueue, package ="vcd")
WQ.fit <- goodfit(WomenQueue, type="binomial")

## Warning in goodfit(WomenQueue, type = "binomial"): size was not given,
## taken as maximum count

plot(WQ.fit, type="deviation", xlab="Counts")

Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).

Perhaps there are less females than males and therefore the chance of a queue with 10 women is lower.

Exercise 3.4

Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, 𝑛(𝑛 = 12, 𝑝 = 1) , specifying equal probability for boys and girls. [Hint: you need to specify both size and prob values for goodfit ().]

Carry out the GOF test for this fixed binomial distribution. What is the ratio of χ2/df? What do you conclude?

The model is a pretty good fit. There’s only slight deviations.

gf1=goodfit(Saxony, type = "binomial", par = list(size = 12))
gf1

## 
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML' 
## 
##  count observed       fitted pearson residual
##      0        3    0.9328394        2.1402809
##      1       24   12.0888374        3.4257991
##      2      104   71.8031709        3.7996298
##      3      286  258.4751335        1.7120476
##      4      670  628.0550119        1.6737139
##      5     1033 1085.2107008       -1.5849023
##      6     1343 1367.2793552       -0.6566116
##      7     1112 1265.6303069       -4.3184059
##      8      829  854.2466464       -0.8637977
##      9      478  410.0125627        3.3576088
##     10      181  132.8357027        4.1789562
##     11       45   26.0824586        3.7041659
##     12        7    2.3472734        3.0368664

Test the additional lack of fit for the model Bin(n=12, p=1/2) compared to the model Bin(n=12, p=p hat) where p hat is estimated from the data

gf2=goodfit(Saxony, type = "binomial", par = list(prob = 0.5, size = 12))
gf2

## 
## Observed and fitted values for binomial distribution
## with fixed parameters 
## 
##  count observed     fitted pearson residual
##      0        3    1.49292        1.2334401
##      1       24   17.91504        1.4376359
##      2      104   98.53271        0.5507842
##      3      286  328.44238       -2.3419098
##      4      670  738.99536       -2.5380434
##      5     1033 1182.39258       -4.3445838
##      6     1343 1379.45801       -0.9816094
##      7     1112 1182.39258       -2.0471328
##      8      829  738.99536        3.3108845
##      9      478  328.44238        8.2523747
##     10      181   98.53271        8.3079041
##     11       45   17.91504        6.3991064
##     12        7    1.49292        4.5071617

Use the plot(goodfit ()) method to visualize these two models.

plot(gf1,xlab= "Number of Males")

plot(gf2,xlab= "Number of Males")