Discrete Data Analysis with R

The Arbuthnot data in HistData also contains the variable Ratio, giving the ratio of male to female births.

install.packages("HistData", repos="http://R-Forge.R-project.org")

## Installing package into 'C:/Users/PinakKumar/Documents/R/win-library/3.4'
## (as 'lib' is unspecified)

## package 'HistData' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\PinakKumar\AppData\Local\Temp\Rtmpg1xp7M\downloaded_packages

library(HistData)
min_ratio <- min(Arbuthnot$Ratio)-0.02
max_ratio <- max(Arbuthnot$Ratio)+0.02

Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births?

data("Arbuthnot")
with(Arbuthnot, plot(Year,Ratio, type='h', ylim=c(min_ratio,max_ratio), ylab="Sex Ratio: Male vs. Female"))
abline(h=1, col="yellow", lwd=5, lty=3)
abline(lm(Arbuthnot$Ratio ~ Arbuthnot$Year), lwd=5,col="red")

Here, the yellow dotted line represent a 1:1 birth ratio relationship. However, the male birth ratios were not even close to the red dotted line as time passes. The advantage of using type h is the audience could visually feel the big difference between men’s birth ratio and women’s birth ratio.

Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see?

min_total <- min(Arbuthnot$Total)-0.05
max_total <- max(Arbuthnot$Total)+0.05
with(Arbuthnot, plot(Year,Total,type='h',ylim=c(min_total,max_total),ylab="Total Number of Christenings (in 000s")) 
abline(h=5.5, col="blue", lwd=5, lty=3)
abline(h=9, col="blue", lwd=5, lty=3)
abline(h=11.5, col="blue", lwd=5, lty=3)
abline(lm(Arbuthnot$Total ~ Arbuthnot$Year), lwd=5,col="yellow")

From the above plot, we can notice below three unusual conclusions: 1) During 1643 and 1661, there’s a big gap of christenings compared to rest of years 2) In 1666, the total number of christening dropped compared to previous 5 years 3) In 1704, there’s a significant drop of christenings.

Use the data set WomenQueue to:

library("vcd"); library("grid")

## Warning: package 'vcd' was built under R version 3.4.2

## Loading required package: grid

data(WomenQueue)

Make a reasonable plot showing departure from the binomial distribution.

goodFit= goodfit(WomenQueue, type = "binomial", par = list(size = 10))
goodFitP= goodfit(WomenQueue, type = "binomial", par = list(prob = .5,size = 10))

plot(goodFit,type = "hanging", shade = TRUE)

Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin (n=10, p = 1/2)

plot(goodFitP,type = "hanging", shade = TRUE)

## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length

## Warning in max.c * pmax(pmin(interpolate(abs(res)), 1), 0): longer object
## length is not a multiple of shorter object length

plot(goodFitP)

distplot(WomenQueue, type = c("binomial"), conf_level = 0.95)

## Warning in distplot(WomenQueue, type = c("binomial"), conf_level = 0.95):
## size was not given, taken as maximum count

Work on the distribution of male children in families in Saxony by fitting a binomial distribution, Bin(n=12,prob.=1/2). Specifying equal probabililty or boys and girls [Hint: you need to specify both size and prob values for goodfit ().]

Carry out the GOF test for this fixed binomial distribution. What is the ratio of ??2/df? What do you conclude?
Test the additional lack of fit for the model bin(n=12,p=1/2) compared to the model bin(n=12,p hat) where p hat is estimated from the data.
Use the plot.gootfit () method to visualize these two models.

gfS1=goodfit(Saxony, type = "binomial", par = list(size = 12))
gfS=goodfit(Saxony, type = "binomial", par = list(prob = 0.5, size = 12))

plot(gfS1,main = "Families in Saxony" ,xlab= "Number of Males")

plot(gfS)

Discrete Data Analysis with R

AG

25 November 2017