Exercise 3.1 See the Arbuthnot data in HistData library. The Arbuthnot data set refers to Dr. John Arbuthnot, an 18th century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the christening records for children born in London for every year from 1629 to 1710. It also contains the variable Ratio, giving the ratio of male to female births.
library("HistData")
## Warning: package 'HistData' was built under R version 3.4.4
library("plotly")
## Warning: package 'plotly' was built under R version 3.4.4
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.4.4
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Arbuthnot
## Year Males Females Plague Mortality Ratio Total
## 1 1629 5218 4683 0 8771 1.114243 9.901
## 2 1630 4858 4457 1317 10554 1.089971 9.315
## 3 1631 4422 4102 274 8562 1.078011 8.524
## 4 1632 4994 4590 8 9535 1.088017 9.584
## 5 1633 5158 4839 0 8393 1.065923 9.997
## 6 1634 5035 4820 1 10400 1.044606 9.855
## 7 1635 5106 4928 0 10651 1.036120 10.034
## 8 1636 4917 4605 10400 23359 1.067752 9.522
## 9 1637 4703 4457 3082 11763 1.055194 9.160
## 10 1638 5359 4952 363 13624 1.082189 10.311
## 11 1639 5366 4784 314 9862 1.121656 10.150
## 12 1640 5518 5332 1450 12771 1.034884 10.850
## 13 1641 5470 5200 1375 13142 1.051923 10.670
## 14 1642 5460 4910 1274 13273 1.112016 10.370
## 15 1643 4793 4617 996 13212 1.038120 9.410
## 16 1644 4107 3997 1492 10933 1.027521 8.104
## 17 1645 4047 3919 1871 11479 1.032661 7.966
## 18 1646 3768 3395 2365 12780 1.109867 7.163
## 19 1647 3796 3536 3597 14059 1.073529 7.332
## 20 1648 3363 3181 611 9894 1.057215 6.544
## 21 1649 3079 2746 67 10566 1.121267 5.825
## 22 1650 2890 2722 15 8754 1.061719 5.612
## 23 1651 3231 2840 23 10827 1.137676 6.071
## 24 1652 3220 2908 16 12569 1.107290 6.128
## 25 1653 3196 2959 6 10087 1.080095 6.155
## 26 1654 3441 3179 16 13247 1.082416 6.620
## 27 1655 3655 3349 9 11357 1.091371 7.004
## 28 1656 3668 3382 6 13921 1.084565 7.050
## 29 1657 3396 3289 4 12434 1.032533 6.685
## 30 1658 3157 3013 14 14993 1.047793 6.170
## 31 1659 3209 2781 36 14756 1.153901 5.990
## 32 1660 3724 3247 13 12681 1.146905 6.971
## 33 1661 4748 4107 20 16665 1.156075 8.855
## 34 1662 5216 4803 12 13664 1.085988 10.019
## 35 1663 5411 4881 9 12741 1.108584 10.292
## 36 1664 6041 5681 5 15453 1.063369 11.722
## 37 1665 5114 4858 68596 97306 1.052697 9.972
## 38 1666 4678 4319 1998 12738 1.083121 8.997
## 39 1667 5616 5322 35 15842 1.055242 10.938
## 40 1668 6073 5560 14 17278 1.092266 11.633
## 41 1669 6506 5829 3 19432 1.116143 12.335
## 42 1670 6278 5719 0 20198 1.097744 11.997
## 43 1671 6449 6061 5 15729 1.064016 12.510
## 44 1672 6443 6120 5 18230 1.052778 12.563
## 45 1673 6073 5822 5 17504 1.043112 11.895
## 46 1674 6113 5738 3 21201 1.065354 11.851
## 47 1675 6058 5717 1 17244 1.059647 11.775
## 48 1676 6552 5847 2 18732 1.120575 12.399
## 49 1677 6423 6203 2 19076 1.035467 12.626
## 50 1678 6568 6033 5 20678 1.088679 12.601
## 51 1679 6247 6041 2 21730 1.034100 12.288
## 52 1680 6548 6299 0 21053 1.039530 12.847
## 53 1681 6822 6533 0 23951 1.044237 13.355
## 54 1682 6909 6744 0 20691 1.024466 13.653
## 55 1683 7577 7158 0 20587 1.058536 14.735
## 56 1684 7575 7127 0 23202 1.062860 14.702
## 57 1685 7484 7246 0 23222 1.032846 14.730
## 58 1686 7575 7119 0 22609 1.064054 14.694
## 59 1687 7737 7214 0 21460 1.072498 14.951
## 60 1688 7487 7101 0 22921 1.054359 14.588
## 61 1689 7604 7167 0 23502 1.060974 14.771
## 62 1690 7909 7302 0 21461 1.083128 15.211
## 63 1691 7662 7392 0 22691 1.036526 15.054
## 64 1692 7602 7316 0 20874 1.039092 14.918
## 65 1693 7676 7483 0 20959 1.025792 15.159
## 66 1694 6985 6647 0 24109 1.050850 13.632
## 67 1695 7263 6713 0 19047 1.081931 13.976
## 68 1696 7632 7229 0 18638 1.055748 14.861
## 69 1697 8062 7767 0 20292 1.037981 15.829
## 70 1698 8426 7626 0 20183 1.104904 16.052
## 71 1699 7911 7452 0 20795 1.061594 15.363
## 72 1700 7578 7061 0 19443 1.073219 14.639
## 73 1701 8102 7514 0 20471 1.078254 15.616
## 74 1702 8031 7656 0 19481 1.048981 15.687
## 75 1703 7765 7683 0 20720 1.010673 15.448
## 76 1704 6113 5738 0 22684 1.065354 11.851
## 77 1705 8366 7779 0 22097 1.075460 16.145
## 78 1706 7952 7417 0 19847 1.072132 15.369
## 79 1707 8379 7687 0 21600 1.090022 16.066
## 80 1708 8239 7623 0 21291 1.080808 15.862
## 81 1709 7840 7380 0 21800 1.062331 15.220
## 82 1710 7640 7288 0 24620 1.048299 14.928
summary(Arbuthnot)
## Year Males Females Plague
## Min. :1629 Min. :2890 Min. :2722 Min. : 0.00
## 1st Qu.:1649 1st Qu.:4759 1st Qu.:4457 1st Qu.: 0.00
## Median :1670 Median :6073 Median :5718 Median : 3.00
## Mean :1670 Mean :5907 Mean :5535 Mean : 1240.70
## 3rd Qu.:1690 3rd Qu.:7576 3rd Qu.:7150 3rd Qu.: 22.25
## Max. :1710 Max. :8426 Max. :7779 Max. :68596.00
## Mortality Ratio Total
## Min. : 8393 Min. :1.011 Min. : 5.612
## 1st Qu.:12739 1st Qu.:1.048 1st Qu.: 9.199
## Median :17867 Median :1.065 Median :11.813
## Mean :17816 Mean :1.071 Mean :11.442
## 3rd Qu.:21030 3rd Qu.:1.088 3rd Qu.:14.723
## Max. :97306 Max. :1.156 Max. :16.145
maxRatio<-max(Arbuthnot$Ratio)
minRatio<-min(Arbuthnot$Ratio)
plot_ly(Arbuthnot, x= ~Year, y= ~Ratio, name="Male vs Female Birth Ratio", type='scatter', mode='lines',line = list(color = 'black', width = 1) ) %>%
add_trace(y = ~maxRatio, name = 'Maximum', mode = 'lines', line = list(color = 'red', width = 1)) %>%
add_trace(y = ~minRatio, name="Minimum", mode ='lines', line = list(color= 'green', width=1)) %>%
add_trace(y=1, name="Reference Line", mode='lines', line = list(color='blue', width=1))
## Warning: package 'bindrcpp' was built under R version 3.4.4
The above time series plot took the male-female ratios’ during the period 1629 to 1710. With reference blue line at 1, the ratio is higher than 1 and fluctuated during the entire sample period.The maximum ratio registed is 1.15 and minimum value is 1.01.This type of plot helps to see the change in the ratio over the time series.
maxT <- max(Arbuthnot$Total)
minT <- min(Arbuthnot$Total)
plot_ly(Arbuthnot, x = ~Year, y = ~Total, name = 'Total Number of Christenings(Unit:1,000)', type = 'scatter', mode = 'line', line = list(color = 'black', width = 1)) %>%
add_trace(y = ~maxT, name = 'Max Christening', line = list(color = 'red', width = 1)) %>%
add_trace(y = ~minT, name = 'Min Christening', line = list(color = 'blue', width = 1))
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
The above graph showed how the #christenings varied during the time period. From the time series plot, there is sharp decline from 1640 to 1650. Similarly, the volume of christening hightened from 1650 to 1697. The christening number fell down to 11851 in 1704. The reasons for such fluctuations could be internal and external political/social conditions during the period accordingly.
Exercise 3.3 Use the data set WomenQueue to:
library(vcd)
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
data(WomenQueue)
str(WomenQueue)
## table [1:11(1d)] 1 3 4 23 25 19 18 5 1 1 ...
## - attr(*, "dimnames")=List of 1
## ..$ nWomen: chr [1:11] "0" "1" "2" "3" ...
summary(WomenQueue)
## Number of cases in table: 100
## Number of factors: 1
fit = goodfit(WomenQueue, type = "binomial", par = list(size = 10))
plot(fit, type = "standing", shade = TRUE)
d. Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).
fit2 = goodfit(WomenQueue, type = "binomial", par = list(prob = .5,size = 10))
plot(fit2, type = "deviation", shade = TRUE)
## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length
## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length
The deviation from this binomial expansion could be due to small sample size.
Exercise 3.4 Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, ????(????=12,????=12) , specifying equal probability for boys and girls. [Hint: you need to specify both size and prob values for goodfit ().]
data("Saxony")
Saxony
## nMales
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 3 24 104 286 670 1033 1343 1112 829 478 181 45 7
SaxonyFit = goodfit(Saxony, type = "binomial", par = list(size = 12))
summary(SaxonyFit)
##
## Goodness-of-fit test for binomial distribution
##
## X^2 df P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16
The X^2 is very significant, therefore a less sigificant fit due to low p-value
SaxonyFit2 = goodfit(Saxony, type = "binomial", par = list(prob=0.5,size = 12))
summary(SaxonyFit2)
## Warning in summary.goodfit(SaxonyFit2): Chi-squared approximation may be
## incorrect
##
## Goodness-of-fit test for binomial distribution
##
## X^2 df P(> X^2)
## Pearson 249.1954 12 2.013281e-46
## Likelihood Ratio 205.4060 12 2.493625e-37
plot(SaxonyFit)
plot(SaxonyFit2)