0.1 ANLY 545- HW2

0.2 - analysis on Arbuthonot

0.2.0.1 1. The Arbuthnot data in HistData also contains the variable Ratio, giving the ratio of male to female births.

data(Arbuthnot)
summary(Arbuthnot)
##       Year          Males         Females         Plague        
##  Min.   :1629   Min.   :2890   Min.   :2722   Min.   :    0.00  
##  1st Qu.:1649   1st Qu.:4759   1st Qu.:4457   1st Qu.:    0.00  
##  Median :1670   Median :6073   Median :5718   Median :    3.00  
##  Mean   :1670   Mean   :5907   Mean   :5535   Mean   : 1240.70  
##  3rd Qu.:1690   3rd Qu.:7576   3rd Qu.:7150   3rd Qu.:   22.25  
##  Max.   :1710   Max.   :8426   Max.   :7779   Max.   :68596.00  
##    Mortality         Ratio           Total       
##  Min.   : 8393   Min.   :1.011   Min.   : 5.612  
##  1st Qu.:12739   1st Qu.:1.048   1st Qu.: 9.199  
##  Median :17867   Median :1.065   Median :11.813  
##  Mean   :17816   Mean   :1.071   Mean   :11.442  
##  3rd Qu.:21030   3rd Qu.:1.088   3rd Qu.:14.723  
##  Max.   :97306   Max.   :1.156   Max.   :16.145

0.2.0.2 (a) Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births?

ggplot(data=Arbuthnot, aes(x=Year, y=Ratio))+ geom_line(color="blue")+geom_point(color="purple") +labs(title="Birth Ratio vs Year")

p1<- ggplot(data=Arbuthnot, aes(x=Year)) +geom_line(aes(y=Females), color="coral") +geom_line(aes(y=Males),color="cyan")+ labs(title="Females birth vs Male birth") +ylab("birth number") +xlab("birth year")

p1

As we can see from 1629 to 1710, the birth ratios of (male/female) are all greater than 1, so there are more male populations in general. around 1660, there’s a peak of baby boys, the ratio is approximately 1.156. Line plot is easy to see the tendency for more baby boys. blue line means the number of baby boys and coral line means the number of baby girls in a year. We can see the blue line is almost always higher than the coral line.

0.2.0.3 (b) Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see?

p1<- ggplot(data=Arbuthnot, aes(x=Year, y=Total))+ geom_line(color="pink")+geom_point(color="dark green") +labs(title="Total number of births (in 000s) vs Year")
p1

From the line plot we observed the total population is growing gradually over the years. However, when the baby boy ratio is the highest, the total population is the lowest, which is kind of bizzare.

0.3 - Analysis on WomenQueue:

data(WomenQueue)
dimnames(WomenQueue)
## $nWomen
##  [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

0.3.0.1 (c) Make a reasonable plot showing departure from the binomial distribution.

barplot(WomenQueue, main="Women Queue Distribution Graph", xlab="Number of Women in Line", ylab= "Frequency" , col="lightgreen")

women.fit <-goodfit(WomenQueue, type="binomial")
unlist(women.fit$par)
##   prob   size 
##  0.435 10.000
plot(women.fit,type="hanging", shade=TRUE)

distplot(WomenQueue, type="binomial" ,xlab="Number of Women in Line")

As we can see, the graph shows a bell-shaped, therefore we can say it is a binomial distribution.

0.3.0.2 (d) Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).

women.fit10 <-goodfit(WomenQueue, type="binomial", par=list(size=10,prob=1/2))
unlist(women.fit$par)
##   prob   size 
##  0.435 10.000
plot(women.fit10,type="hanging", shade=TRUE)

distplot(WomenQueue, type="binomial" , size=10,xlab="Number of Women in Line")

Ord_plot(WomenQueue, main="Women Queue distribution")

From the previous graph we can see the Women Queue is a bell shape and supported by the Ord_plot gave us the type=“Binomial”

0.4 - Saxony

0.4.0.1 Work on the distribution of male children in families in “Saxony” by fitting a binomial distribution, specifying equal probability for boys and girls.

0.4.0.2 (a) Carry out the GOF test for this fixed binomial distribution. What is the ratio? What do you conclude?

data(Saxony)
sax.goodfit <- goodfit(Saxony,type="binomial", par=list(size=12, prob=0.5))
unlist(sax.goodfit$par)
## prob size 
##  0.5 12.0
summary(sax.goodfit)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                       X^2 df     P(> X^2)
## Pearson          249.1954 12 2.013281e-46
## Likelihood Ratio 205.4060 12 2.493625e-37

the ratio of Chi-squre is approximately 205. we have 9 degree of freedom, and p < 0.05,therefore we can conclude it is statistically signifcant, therefore we reject the null-hypotheses, there is a large deviation of the perfect binomial form. So we conclude it is a good fit!

  1. Test the additional lack of fit for the model B(n=12, p=0.5) compare to the model B(n=12, p=p) p is estimated from the data.
sax.goodfit.true <- goodfit(Saxony,type="binomial", par=list(size=12))
unlist(sax.goodfit.true$par)
##      prob      size 
##  0.519215 12.000000
unlist(sax.goodfit$par)
## prob size 
##  0.5 12.0
summary(sax.goodfit.true)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                      X^2 df     P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16

the estimated probability is 0.52

0.4.0.3 (c) Use the plot.goodfit () method to visualize these two models.

plot(goodfit(Saxony, type="binomial", par=list(size=12,prob=0.52)))

plot(goodfit(Saxony, type="binomial", par=list(size=12,prob=0.5)))