data(Arbuthnot)
summary(Arbuthnot)## Year Males Females Plague
## Min. :1629 Min. :2890 Min. :2722 Min. : 0.00
## 1st Qu.:1649 1st Qu.:4759 1st Qu.:4457 1st Qu.: 0.00
## Median :1670 Median :6073 Median :5718 Median : 3.00
## Mean :1670 Mean :5907 Mean :5535 Mean : 1240.70
## 3rd Qu.:1690 3rd Qu.:7576 3rd Qu.:7150 3rd Qu.: 22.25
## Max. :1710 Max. :8426 Max. :7779 Max. :68596.00
## Mortality Ratio Total
## Min. : 8393 Min. :1.011 Min. : 5.612
## 1st Qu.:12739 1st Qu.:1.048 1st Qu.: 9.199
## Median :17867 Median :1.065 Median :11.813
## Mean :17816 Mean :1.071 Mean :11.442
## 3rd Qu.:21030 3rd Qu.:1.088 3rd Qu.:14.723
## Max. :97306 Max. :1.156 Max. :16.145
ggplot(data=Arbuthnot, aes(x=Year, y=Ratio))+ geom_line(color="blue")+geom_point(color="purple") +labs(title="Birth Ratio vs Year")p1<- ggplot(data=Arbuthnot, aes(x=Year)) +geom_line(aes(y=Females), color="coral") +geom_line(aes(y=Males),color="cyan")+ labs(title="Females birth vs Male birth") +ylab("birth number") +xlab("birth year")
p1 As we can see from 1629 to 1710, the birth ratios of (male/female) are all greater than 1, so there are more male populations in general. around 1660, there’s a peak of baby boys, the ratio is approximately 1.156. Line plot is easy to see the tendency for more baby boys. blue line means the number of baby boys and coral line means the number of baby girls in a year. We can see the blue line is almost always higher than the coral line.
p1<- ggplot(data=Arbuthnot, aes(x=Year, y=Total))+ geom_line(color="pink")+geom_point(color="dark green") +labs(title="Total number of births (in 000s) vs Year")
p1 From the line plot we observed the total population is growing gradually over the years. However, when the baby boy ratio is the highest, the total population is the lowest, which is kind of bizzare.
data(WomenQueue)
dimnames(WomenQueue)## $nWomen
## [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
barplot(WomenQueue, main="Women Queue Distribution Graph", xlab="Number of Women in Line", ylab= "Frequency" , col="lightgreen")women.fit <-goodfit(WomenQueue, type="binomial")
unlist(women.fit$par)## prob size
## 0.435 10.000
plot(women.fit,type="hanging", shade=TRUE)distplot(WomenQueue, type="binomial" ,xlab="Number of Women in Line")As we can see, the graph shows a bell-shaped, therefore we can say it is a binomial distribution.
women.fit10 <-goodfit(WomenQueue, type="binomial", par=list(size=10,prob=1/2))
unlist(women.fit$par)## prob size
## 0.435 10.000
plot(women.fit10,type="hanging", shade=TRUE)distplot(WomenQueue, type="binomial" , size=10,xlab="Number of Women in Line")Ord_plot(WomenQueue, main="Women Queue distribution")From the previous graph we can see the Women Queue is a bell shape and supported by the Ord_plot gave us the type=“Binomial”
data(Saxony)
sax.goodfit <- goodfit(Saxony,type="binomial", par=list(size=12, prob=0.5))
unlist(sax.goodfit$par)## prob size
## 0.5 12.0
summary(sax.goodfit)##
## Goodness-of-fit test for binomial distribution
##
## X^2 df P(> X^2)
## Pearson 249.1954 12 2.013281e-46
## Likelihood Ratio 205.4060 12 2.493625e-37
the ratio of Chi-squre is approximately 205. we have 9 degree of freedom, and p < 0.05,therefore we can conclude it is statistically signifcant, therefore we reject the null-hypotheses, there is a large deviation of the perfect binomial form. So we conclude it is a good fit!
sax.goodfit.true <- goodfit(Saxony,type="binomial", par=list(size=12))
unlist(sax.goodfit.true$par)## prob size
## 0.519215 12.000000
unlist(sax.goodfit$par)## prob size
## 0.5 12.0
summary(sax.goodfit.true)##
## Goodness-of-fit test for binomial distribution
##
## X^2 df P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16
the estimated probability is 0.52
plot(goodfit(Saxony, type="binomial", par=list(size=12,prob=0.52)))plot(goodfit(Saxony, type="binomial", par=list(size=12,prob=0.5)))