Exercise 3.1 The Arbuthnot data in HistData also contains the variable Ratio, giving the ratio of male to female births. (a) Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births? From plot below, we can see that there is a big difference between male and female birth ratios.
library(HistData)
min_ratio <- min(Arbuthnot$Ratio)-0.02
max_ratio <- max(Arbuthnot$Ratio)+0.02
data("Arbuthnot")
with(Arbuthnot, plot(Year,Ratio, type='h', ylim=c(min_ratio,max_ratio), ylab="Sex Ratio: Male vs. Female"))
abline(h=1, col="yellow", lwd=5, lty=3)
abline(lm(Arbuthnot$Ratio ~ Arbuthnot$Year), lwd=5,col="red")
(b) Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see? From plot below, we can see that between 1640 and 1660, only a little chirstenings and then they grew greatly till 1680.
min_total <- min(Arbuthnot$Total)-0.05
max_total <- max(Arbuthnot$Total)+0.05
with(Arbuthnot, plot(Year,Total,type='h',ylim=c(min_total,max_total),ylab="Total Number of Christenings (in 000s"))
abline(h=5.5, col="blue", lwd=5, lty=3)
abline(h=9, col="blue", lwd=5, lty=3)
abline(h=11.5, col="blue", lwd=5, lty=3)
abline(lm(Arbuthnot$Total ~ Arbuthnot$Year), lwd=5,col="yellow")
Exercise 3.3 Use the data set WomenQueue to: (c) Make a reasonable plot showing departure from the binomial distribution.
library(vcd)
## Warning: package 'vcd' was built under R version 3.4.3
## Loading required package: grid
library(grid)
goodFit= goodfit(WomenQueue, type = "binomial", par = list(size = 10))
plot(goodFit,type = "hanging", shade = TRUE)
(d) Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).
goodFitP= goodfit(WomenQueue, type = "binomial", par = list(prob = .5,size = 10))
plot(goodFitP,type = "hanging", shade = TRUE)
## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length
## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length
plot(goodFitP)
distplot(WomenQueue, type = c("binomial"), conf_level = 0.95)
## Warning in distplot(WomenQueue, type = c("binomial"), conf_level = 0.95):
## size was not given, taken as maximum count
Exercise 3.4 Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, 𝑖𝑛(𝑛 = 12, 𝑝 = 1) , specifying equal probability for boys and girls. [Hint: 2 y need to specify both size and prob values for goodfit ().] (a) Carry out the GOF test for this fixed binomial distribution. What is the ratio of χ2/df? What do you conclude?
gfS1=goodfit(Saxony, type = "binomial", par = list(size = 12))
gfS1
##
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML'
##
## count observed fitted pearson residual
## 0 3 0.9328394 2.1402809
## 1 24 12.0888374 3.4257991
## 2 104 71.8031709 3.7996298
## 3 286 258.4751335 1.7120476
## 4 670 628.0550119 1.6737139
## 5 1033 1085.2107008 -1.5849023
## 6 1343 1367.2793552 -0.6566116
## 7 1112 1265.6303069 -4.3184059
## 8 829 854.2466464 -0.8637977
## 9 478 410.0125627 3.3576088
## 10 181 132.8357027 4.1789562
## 11 45 26.0824586 3.7041659
## 12 7 2.3472734 3.0368664
gfS=goodfit(Saxony, type = "binomial", par = list(prob = 0.5, size = 12))
gfS
##
## Observed and fitted values for binomial distribution
## with fixed parameters
##
## count observed fitted pearson residual
## 0 3 1.49292 1.2334401
## 1 24 17.91504 1.4376359
## 2 104 98.53271 0.5507842
## 3 286 328.44238 -2.3419098
## 4 670 738.99536 -2.5380434
## 5 1033 1182.39258 -4.3445838
## 6 1343 1379.45801 -0.9816094
## 7 1112 1182.39258 -2.0471328
## 8 829 738.99536 3.3108845
## 9 478 328.44238 8.2523747
## 10 181 98.53271 8.3079041
## 11 45 17.91504 6.3991064
## 12 7 1.49292 4.5071617
plot(gfS1,main = "Families in Saxony" ,xlab= "Number of Males")
plot(gfS)