Excercise 3.1

The dataset “Arbuthot”" from the package “HistData” contains christening records for children born in London for every year from 1629 to 1710.

A) Make a plot of ratio over years. What stands out?

data("Arbuthnot",package="HistData")
plot(Arbuthnot$Year,Arbuthnot$Ratio,main = "Male to Female christenings Ratio per year",xlab = "Years",ylab = "Ratio",col="purple",pch=1, lty="dashed", lwd=2)
abline(lm(Arbuthnot$Ratio~Arbuthnot$Year),col="green",lwd=2)

As years passed, the difference of male versus female christenings decreased gradually.

B) Plot the total number of christenings in 1,000s over time. What features do you see?

plot(Arbuthnot$Year,Arbuthnot$Total*1000,main="Total Christenings per year",xlab="Year",ylab="Christenings",col="purple", lty="dashed", lwd=2)
abline(lm(Arbuthnot$Total*1000~Arbuthnot$Year),col="green",lwd=2)

Christenings increased dramatically throughout the years

Extra: Plot of christenings per Gender per year

plot(Arbuthnot$Year,Arbuthnot$Females,main="Total christenings per Gender over time",xlab="Year",ylab="Christenings",col="Purple",lwd=2)
par(new=T)
plot(Arbuthnot$Males,col="green",lwd=2,axes = FALSE)

_Both genders’ christenings experienced a very similar growth over time.


Excercise 3.3

c) Make a reasonable plot showing departure from binomial distribution.

data("WomenQueue",package="vcd")
barplot(WomenQueue,main="Distribution of Women Queue Departure",ylab="Frequency",xlab="Count",col="lightpink")

The marjority of women in queue is distributed between 3 to 6 counts.

D) Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).

library("vcd")
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
WomenQueue_Fit= goodfit(WomenQueue, type = "binomial", par = list(prob = .5,size = 10))
plot(WomenQueue_Fit,xlab="Counts",ylab="Frequency",main="GOF - Women Queue")

Fitted valeus differ from actual values thus giving a possible indication of departure from binomial distribution


Excercise 3.4

__A) Test Godness of fit for fixed binomial distribution (n=12,p=.5)

data("Saxony",package="vcd")
GOF=goodfit(Saxony,type="binomial",par=list(size=12))
GOF
## 
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML' 
## 
##  count observed       fitted pearson residual
##      0        3    0.9328394        2.1402809
##      1       24   12.0888374        3.4257991
##      2      104   71.8031709        3.7996298
##      3      286  258.4751335        1.7120476
##      4      670  628.0550119        1.6737139
##      5     1033 1085.2107008       -1.5849023
##      6     1343 1367.2793552       -0.6566116
##      7     1112 1265.6303069       -4.3184059
##      8      829  854.2466464       -0.8637977
##      9      478  410.0125627        3.3576088
##     10      181  132.8357027        4.1789562
##     11       45   26.0824586        3.7041659
##     12        7    2.3472734        3.0368664

B) Test the additional lack of fit for the model Bin (n=12,p=.5) compared to Bin (n=12,p=probability estimated from data).

GOF2=goodfit(Saxony,type="binomial",par=list(prob=.5,size=12))
GOF2
## 
## Observed and fitted values for binomial distribution
## with fixed parameters 
## 
##  count observed     fitted pearson residual
##      0        3    1.49292        1.2334401
##      1       24   17.91504        1.4376359
##      2      104   98.53271        0.5507842
##      3      286  328.44238       -2.3419098
##      4      670  738.99536       -2.5380434
##      5     1033 1182.39258       -4.3445838
##      6     1343 1379.45801       -0.9816094
##      7     1112 1182.39258       -2.0471328
##      8      829  738.99536        3.3108845
##      9      478  328.44238        8.2523747
##     10      181   98.53271        8.3079041
##     11       45   17.91504        6.3991064
##     12        7    1.49292        4.5071617

C) Sse plot(goodfit()) to visualize both GOF models:

plot(GOF,xlab="Ocurrences",ylab="Frequency",main="Families in Saxony: GOF (size=12)")

plot(GOF2,xlab="Ocurrences",ylab="Frequency",main="Families in Saxony: GOF (size=12,p=.5)")

GOF with P=.5 shows a slightly higher frequency for the first half of ocurrences and a lower frequency for the second half.