Exercise 3.1 The Arbuthnot data in HistData also contains the variable Ratio, giving the ratio of male to female births.

  1. Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births?
library(HistData)
## Warning: package 'HistData' was built under R version 3.4.4
library(ggplot2)
maxRatio<-max(Arbuthnot$Ratio)
maxRatio
## [1] 1.156075
minRatio <- min(Arbuthnot$Ratio)
minRatio
## [1] 1.010673
ggplot(Arbuthnot, aes(x = Arbuthnot$Year, y=Arbuthnot$Ratio))+ 
  geom_line() +
  geom_point(aes(color=Ratio)) +
  labs(title='Male to Female Birth Ratio Over Year',x="Year", y="Male to Female Birth Ratio") +
  theme_classic()+
  ylim(1, maxRatio)+
  geom_hline(yintercept = maxRatio,color="blue", linetype="dashed")+
  geom_hline(yintercept = minRatio,color="red", linetype="dashed")+
  geom_smooth(method = "lm")

  1. Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see?
ggplot(Arbuthnot, aes(x = Arbuthnot$Year, y=Arbuthnot$Total))+ 
  geom_line() +
  geom_point(aes(color=Total)) +
  labs(title='Total Christenings Over Year',x="Year", y="Total Christenings") +
  theme_classic()+
  geom_smooth(method = "lm")

Exercise 3.3 Use the data set WomenQueue to:

  1. Make a reasonable plot showing departure from the binomial distribution
library(vcd)
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
data('WomenQueue', package = 'vcd')
barplot(WomenQueue, main="Women Queue Depature Distribution",xlab="Counts",ylab="Frequency")

WQ.fit <- goodfit(WomenQueue, type="binomial")
## Warning in goodfit(WomenQueue, type = "binomial"): size was not given,
## taken as maximum count
unlist(WQ.fit$par)
##   prob   size 
##  0.435 10.000
plot(WQ.fit, xlab="Counts",main = 'Rootogram')

  1. Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).
WQ.fit1 <- goodfit(WomenQueue, type = "binomial", par = list(prob = .5,size = 10))
plot(WQ.fit1,xlab="Counts",main = 'Rootogram - Bin(n=10, p=1/2)')

distplot(WomenQueue, type = "binomial", conf_level = 0.95)
## Warning in distplot(WomenQueue, type = "binomial", conf_level = 0.95): size
## was not given, taken as maximum count

Exercise 3.4 Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, ????????(???? = 12, ???? =1/2) , specifying equal probability for boys and girls. [Hint: you need to specify both size and prob values for goodfit ().]

  1. Carry out the GOF test for this fixed binomial distribution. What is the ratio of ??2/df? What do you conclude?
data('Saxony', package = 'vcd')
Sa.fit <- goodfit(Saxony, type = "binomial", par = list(prob = .5,size = 12))
summary(Sa.fit)
## Warning in summary.goodfit(Sa.fit): Chi-squared approximation may be
## incorrect
## 
##   Goodness-of-fit test for binomial distribution
## 
##                       X^2 df     P(> X^2)
## Pearson          249.1954 12 2.013281e-46
## Likelihood Ratio 205.4060 12 2.493625e-37
  1. Test the additional lack of fit for the model ????????????(???? = 12, ???? =1/2) compared to the model ????????????(???? = 12, ???? = ????^) where ????^is estimated from the data.
Sa.fit1 <- goodfit(Saxony, type = "binomial", par = list(size = 12))
summary(Sa.fit1)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                      X^2 df     P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16
  1. Use the plot.goodfit () method to visualize these two models.
plot(Sa.fit, xlab="Number of Males", main = 'Visualization - Probability of 0.5')

plot(Sa.fit1, xlab="Number of Males", main = 'Visualization - Probability of p')