library(HistData)
library(ggplot2)
data('Arbuthnot', package = 'HistData')
maxRatio <- max(Arbuthnot$Ratio)
maxRatio
## [1] 1.156075
minRatio <- min(Arbuthnot$Ratio)
minRatio
## [1] 1.010673
ggplot(Arbuthnot, aes(x = Arbuthnot$Year, y=Arbuthnot$Ratio))+
geom_line() +
geom_point(aes(color=Ratio)) +
labs(title='Male to Female Birth Ratio Over Year',x="Year", y="Male to Female Birth Ratio") +
theme_classic()+
ylim(1, maxRatio)+
geom_hline(yintercept = maxRatio,color="blue", linetype="dashed")+
geom_hline(yintercept = minRatio,color="red", linetype="dashed")+
geom_smooth(method = "lm")
First, the ratio has a min greater than 1.0, meaning that the birth of male was alwasys higher than that of female during the period of our data. Second, the trend line shows a downward slope, meaning that the birth ratio of male to female has been declining over time in our data.
I used line plot with point to show the tendency since the trend changes over time and the ratio has alwasys been above 1.0.
ggplot(Arbuthnot, aes(x = Arbuthnot$Year, y=Arbuthnot$Total))+
geom_line() +
geom_point(aes(color=Total)) +
labs(title='Total Christenings Over Year',x="Year", y="Total Christenings") +
theme_classic()+
geom_smooth(method = "lm")
From the upward-sloping trend line we can see that the total christenings have been increasing over time, but there was a period of outliers. The number started dropping rapidly in 1640 and didn’t start to go back until 1660.
library(vcd)
## Warning: package 'vcd' was built under R version 3.4.3
## Loading required package: grid
data('WomenQueue', package = 'vcd')
barplot(WomenQueue, main="Women Queue Depature Distribution",xlab="Counts",ylab="Frequency")
WQ.fit <- goodfit(WomenQueue, type="binomial")
## Warning in goodfit(WomenQueue, type = "binomial"): size was not given,
## taken as maximum count
unlist(WQ.fit$par)
## prob size
## 0.435 10.000
plot(WQ.fit, xlab="Counts",main = 'Rootogram')
From the rootgram, we can see that there are a lot of deviations from the actual value.
WQ.fit1 <- goodfit(WomenQueue, type = "binomial", par = list(prob = .5,size = 10))
plot(WQ.fit1,xlab="Counts",main = 'Rootogram - Bin(n=10, p=1/2)')
distplot(WomenQueue, type = "binomial", conf_level = 0.95)
## Warning in distplot(WomenQueue, type = "binomial", conf_level = 0.95): size
## was not given, taken as maximum count
Again, here from the rootgram we can see that the fitted values deviated from the actual values, which means it departs from the binomial distribution. Also, from the Distplot we can see the intercept and slope is suggesting it departs from a binomial distribution.
data('Saxony', package = 'vcd')
Sa.fit <- goodfit(Saxony, type = "binomial", par = list(prob = .5,size = 12))
summary(Sa.fit)
## Warning in summary.goodfit(Sa.fit): Chi-squared approximation may be
## incorrect
##
## Goodness-of-fit test for binomial distribution
##
## X^2 df P(> X^2)
## Pearson 249.1954 12 2.013281e-46
## Likelihood Ratio 205.4060 12 2.493625e-37
Ratio of χ2/df is 205/12=17. Because of the ratio and the p-value (less than the critical value, thus reject null hypothesis of being a binomial distribution), we believe it is not a binomial distribution.
Sa.fit1 <- goodfit(Saxony, type = "binomial", par = list(size = 12))
summary(Sa.fit1)
##
## Goodness-of-fit test for binomial distribution
##
## X^2 df P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16
Ratio of χ2/df is 97/11=8.8 and p-value is still less than 0.05, thus we reject the null hypothesis of being a binomial distribution. The chi-square ratio is smaller than question (a).
plot(Sa.fit, xlab="Number of Males", main = 'Visualization - Probability of 0.5')
plot(Sa.fit1, xlab="Number of Males", main = 'Visualization - Probability of p')
Both fit grams show there are deviations from actual values.