Exercise 3.1 See the Arbuthnot data in HistData library. The Arbuthnot data set refers to Dr. John Arbuthnot, an 18th century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the christening records for children born in London for every year from 1629 to 1710. It also contains the variable Ratio, giving the ratio of male to female births.

library("HistData")
## Warning: package 'HistData' was built under R version 3.4.4
library("plotly")
## Warning: package 'plotly' was built under R version 3.4.4
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.4.4
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
Arbuthnot
##    Year Males Females Plague Mortality    Ratio  Total
## 1  1629  5218    4683      0      8771 1.114243  9.901
## 2  1630  4858    4457   1317     10554 1.089971  9.315
## 3  1631  4422    4102    274      8562 1.078011  8.524
## 4  1632  4994    4590      8      9535 1.088017  9.584
## 5  1633  5158    4839      0      8393 1.065923  9.997
## 6  1634  5035    4820      1     10400 1.044606  9.855
## 7  1635  5106    4928      0     10651 1.036120 10.034
## 8  1636  4917    4605  10400     23359 1.067752  9.522
## 9  1637  4703    4457   3082     11763 1.055194  9.160
## 10 1638  5359    4952    363     13624 1.082189 10.311
## 11 1639  5366    4784    314      9862 1.121656 10.150
## 12 1640  5518    5332   1450     12771 1.034884 10.850
## 13 1641  5470    5200   1375     13142 1.051923 10.670
## 14 1642  5460    4910   1274     13273 1.112016 10.370
## 15 1643  4793    4617    996     13212 1.038120  9.410
## 16 1644  4107    3997   1492     10933 1.027521  8.104
## 17 1645  4047    3919   1871     11479 1.032661  7.966
## 18 1646  3768    3395   2365     12780 1.109867  7.163
## 19 1647  3796    3536   3597     14059 1.073529  7.332
## 20 1648  3363    3181    611      9894 1.057215  6.544
## 21 1649  3079    2746     67     10566 1.121267  5.825
## 22 1650  2890    2722     15      8754 1.061719  5.612
## 23 1651  3231    2840     23     10827 1.137676  6.071
## 24 1652  3220    2908     16     12569 1.107290  6.128
## 25 1653  3196    2959      6     10087 1.080095  6.155
## 26 1654  3441    3179     16     13247 1.082416  6.620
## 27 1655  3655    3349      9     11357 1.091371  7.004
## 28 1656  3668    3382      6     13921 1.084565  7.050
## 29 1657  3396    3289      4     12434 1.032533  6.685
## 30 1658  3157    3013     14     14993 1.047793  6.170
## 31 1659  3209    2781     36     14756 1.153901  5.990
## 32 1660  3724    3247     13     12681 1.146905  6.971
## 33 1661  4748    4107     20     16665 1.156075  8.855
## 34 1662  5216    4803     12     13664 1.085988 10.019
## 35 1663  5411    4881      9     12741 1.108584 10.292
## 36 1664  6041    5681      5     15453 1.063369 11.722
## 37 1665  5114    4858  68596     97306 1.052697  9.972
## 38 1666  4678    4319   1998     12738 1.083121  8.997
## 39 1667  5616    5322     35     15842 1.055242 10.938
## 40 1668  6073    5560     14     17278 1.092266 11.633
## 41 1669  6506    5829      3     19432 1.116143 12.335
## 42 1670  6278    5719      0     20198 1.097744 11.997
## 43 1671  6449    6061      5     15729 1.064016 12.510
## 44 1672  6443    6120      5     18230 1.052778 12.563
## 45 1673  6073    5822      5     17504 1.043112 11.895
## 46 1674  6113    5738      3     21201 1.065354 11.851
## 47 1675  6058    5717      1     17244 1.059647 11.775
## 48 1676  6552    5847      2     18732 1.120575 12.399
## 49 1677  6423    6203      2     19076 1.035467 12.626
## 50 1678  6568    6033      5     20678 1.088679 12.601
## 51 1679  6247    6041      2     21730 1.034100 12.288
## 52 1680  6548    6299      0     21053 1.039530 12.847
## 53 1681  6822    6533      0     23951 1.044237 13.355
## 54 1682  6909    6744      0     20691 1.024466 13.653
## 55 1683  7577    7158      0     20587 1.058536 14.735
## 56 1684  7575    7127      0     23202 1.062860 14.702
## 57 1685  7484    7246      0     23222 1.032846 14.730
## 58 1686  7575    7119      0     22609 1.064054 14.694
## 59 1687  7737    7214      0     21460 1.072498 14.951
## 60 1688  7487    7101      0     22921 1.054359 14.588
## 61 1689  7604    7167      0     23502 1.060974 14.771
## 62 1690  7909    7302      0     21461 1.083128 15.211
## 63 1691  7662    7392      0     22691 1.036526 15.054
## 64 1692  7602    7316      0     20874 1.039092 14.918
## 65 1693  7676    7483      0     20959 1.025792 15.159
## 66 1694  6985    6647      0     24109 1.050850 13.632
## 67 1695  7263    6713      0     19047 1.081931 13.976
## 68 1696  7632    7229      0     18638 1.055748 14.861
## 69 1697  8062    7767      0     20292 1.037981 15.829
## 70 1698  8426    7626      0     20183 1.104904 16.052
## 71 1699  7911    7452      0     20795 1.061594 15.363
## 72 1700  7578    7061      0     19443 1.073219 14.639
## 73 1701  8102    7514      0     20471 1.078254 15.616
## 74 1702  8031    7656      0     19481 1.048981 15.687
## 75 1703  7765    7683      0     20720 1.010673 15.448
## 76 1704  6113    5738      0     22684 1.065354 11.851
## 77 1705  8366    7779      0     22097 1.075460 16.145
## 78 1706  7952    7417      0     19847 1.072132 15.369
## 79 1707  8379    7687      0     21600 1.090022 16.066
## 80 1708  8239    7623      0     21291 1.080808 15.862
## 81 1709  7840    7380      0     21800 1.062331 15.220
## 82 1710  7640    7288      0     24620 1.048299 14.928
summary(Arbuthnot)
##       Year          Males         Females         Plague        
##  Min.   :1629   Min.   :2890   Min.   :2722   Min.   :    0.00  
##  1st Qu.:1649   1st Qu.:4759   1st Qu.:4457   1st Qu.:    0.00  
##  Median :1670   Median :6073   Median :5718   Median :    3.00  
##  Mean   :1670   Mean   :5907   Mean   :5535   Mean   : 1240.70  
##  3rd Qu.:1690   3rd Qu.:7576   3rd Qu.:7150   3rd Qu.:   22.25  
##  Max.   :1710   Max.   :8426   Max.   :7779   Max.   :68596.00  
##    Mortality         Ratio           Total       
##  Min.   : 8393   Min.   :1.011   Min.   : 5.612  
##  1st Qu.:12739   1st Qu.:1.048   1st Qu.: 9.199  
##  Median :17867   Median :1.065   Median :11.813  
##  Mean   :17816   Mean   :1.071   Mean   :11.442  
##  3rd Qu.:21030   3rd Qu.:1.088   3rd Qu.:14.723  
##  Max.   :97306   Max.   :1.156   Max.   :16.145
  1. Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births?
maxRatio<-max(Arbuthnot$Ratio)
minRatio<-min(Arbuthnot$Ratio)
plot_ly(Arbuthnot, x= ~Year, y= ~Ratio, name="Male vs Female Birth Ratio", type='scatter', mode='lines',line = list(color = 'black', width = 1) ) %>%
  add_trace(y = ~maxRatio, name = 'Maximum', mode = 'lines', line = list(color = 'red', width = 1)) %>%
  add_trace(y = ~minRatio, name="Minimum", mode ='lines', line = list(color= 'green', width=1))   %>%
  add_trace(y=1, name="Reference Line", mode='lines', line = list(color='blue', width=1))
## Warning: package 'bindrcpp' was built under R version 3.4.4

The above time series plot took the male-female ratios’ during the period 1629 to 1710. With reference blue line at 1, the ratio is higher than 1 and fluctuated during the entire sample period.The maximum ratio registed is 1.15 and minimum value is 1.01.This type of plot helps to see the change in the ratio over the time series.

  1. Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see?
maxT <- max(Arbuthnot$Total) 
minT <- min(Arbuthnot$Total)
plot_ly(Arbuthnot, x = ~Year, y = ~Total, name = 'Total Number of Christenings(Unit:1,000)', type = 'scatter', mode = 'line', line = list(color = 'black', width = 1)) %>%
  add_trace(y = ~maxT, name = 'Max Christening', line = list(color = 'red', width = 1)) %>%
  add_trace(y = ~minT, name = 'Min Christening', line = list(color = 'blue', width = 1))
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...

The above graph showed how the #christenings varied during the time period. From the time series plot, there is sharp decline from 1640 to 1650. Similarly, the volume of christening hightened from 1650 to 1697. The christening number fell down to 11851 in 1704. The reasons for such fluctuations could be internal and external political/social conditions during the period accordingly.

Exercise 3.3 Use the data set WomenQueue to:

  1. Make a reasonable plot showing departure from the binomial distribution.
library(vcd)
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
data(WomenQueue)
str(WomenQueue)
##  table [1:11(1d)] 1 3 4 23 25 19 18 5 1 1 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ nWomen: chr [1:11] "0" "1" "2" "3" ...
summary(WomenQueue)
## Number of cases in table: 100 
## Number of factors: 1
fit = goodfit(WomenQueue, type = "binomial", par = list(size = 10))
plot(fit, type = "standing", shade = TRUE)

d. Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).

fit2 = goodfit(WomenQueue, type = "binomial", par = list(prob = .5,size = 10))
plot(fit2, type = "deviation", shade = TRUE)
## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length

## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length

The deviation from this binomial expansion could be due to small sample size.

Exercise 3.4 Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, ????(????=12,????=12) , specifying equal probability for boys and girls. [Hint: you need to specify both size and prob values for goodfit ().]

  1. Carry out the GOF test for this fixed binomial distribution. What is the ratio of ??2/df? What do you conclude?
data("Saxony")
Saxony
## nMales
##    0    1    2    3    4    5    6    7    8    9   10   11   12 
##    3   24  104  286  670 1033 1343 1112  829  478  181   45    7
SaxonyFit = goodfit(Saxony, type = "binomial", par = list(size = 12))
summary(SaxonyFit)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                      X^2 df     P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16

The X^2 is very significant, therefore a less sigificant fit due to low p-value

  1. Test the additional lack of fit for the model Bin(n=12,p=1/2) compared to the model Bin(n=12,p ^) where p hat is estimated from the data
SaxonyFit2 = goodfit(Saxony, type = "binomial", par = list(prob=0.5,size = 12))
summary(SaxonyFit2)
## Warning in summary.goodfit(SaxonyFit2): Chi-squared approximation may be
## incorrect
## 
##   Goodness-of-fit test for binomial distribution
## 
##                       X^2 df     P(> X^2)
## Pearson          249.1954 12 2.013281e-46
## Likelihood Ratio 205.4060 12 2.493625e-37
  1. Use the plot(goodfit ()) method to visualize these two models.
plot(SaxonyFit)

plot(SaxonyFit2)