ANLY 545- Homework #2

Source: “Discrete Data Analysis with R” By: Michael Friendly

Exercise 3.1(a) Make a plot of Ratio over Year. What features stand out? Which plot do you prefer to display the tendency for more male births?

library(HistData)
data(Arbuthnot) 
View(Arbuthnot)
library(VGAMdata)
library(GGally)
## Loading required package: ggplot2
data(Arbuthnot, package = "VGAMdata")
## Warning in data(Arbuthnot, package = "VGAMdata"): data set 'Arbuthnot' not
## found
names(Arbuthnot) <- gsub("*Rate", "", names(Arbuthnot))
names(Arbuthnot)[2:3] <-  c("Males", "Females")
GGally::ggpairs(Arbuthnot[, c(2,3, 6,1)],
                title = "Birth Ratio",
                diag = list(continuous = 'density'), axisLabels='none')
## Warning in check_and_set_ggpairs_defaults("diag", diag, continuous =
## "densityDiag", : Changing diag$continuous from 'density' to 'densityDiag'

OR

library(ggplot2)
ggplot(Arbuthnot, aes(Year,Ratio) )+
  geom_point() +
  geom_smooth(span = 0.3)
## `geom_smooth()` using method = 'loess'

OR

library(GGally)
ggpairs(Arbuthnot, 
        columns = c("Ratio", "Year"), 
        upper = list(continuous = wrap("cor", 
                                       size = 5)), 
        lower = list(continuous = "smooth"))

Or

library(PerformanceAnalytics)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
my_data <- Arbuthnot[, c(1,6)]
chart.Correlation(my_data, histogram=TRUE, pch=19)

Ratio stands out. It seems something was going on because it had droped between 1660 to 1680 and then started to raise again.

I prefer the cor plot to see the comparison the birth of males and females and the ratio between them over years. In this plot by looking at numbers and density plots and simple dot plots can get an assumption of how the rate of birth(males and females) is changed over the time.

Exercise 3.1(b) Plot the total number of christenings, Males + Females or Total (in 000s) over time. What unusual features do you see?

Arbuthnot$Total=Arbuthnot$Males+Arbuthnot$Females

plot(Arbuthnot$Year,Arbuthnot$Total,type="h", col="blue", xlab="Years",ylab="Frequency", ylim=c(2700,16200))
a=(lm(formula=Total~Arbuthnot$Year,data =Arbuthnot ))
abline(a)
abline(h=6000, col="red", lwd=5, lty=3)
abline(h=10000, col="red", lwd=5, lty=3)
abline(h=14000, col="red", lwd=5, lty=3)

There is decreasing of toatal birth in some years over the time and rate of mortality is different from total birth.

Exercise 3.3(c) Make a reasonable plot showing departure from the binomial distribution.

library(vcd)
## Loading required package: grid
## 
## Attaching package: 'vcd'
## The following object is masked from 'package:PerformanceAnalytics':
## 
##     Kappa
data(WomenQueue)
View(WomenQueue)

women.fit <- goodfit(WomenQueue, type="binomial")
## Warning in goodfit(WomenQueue, type = "binomial"): size was not given,
## taken as maximum count
women.fit
## 
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML' 
## 
##  count observed     fitted pearson residual
##      0        1  0.3315007        1.1610708
##      1        3  2.5522622        0.2802600
##      2        4  8.8425721       -1.6284964
##      3       23 18.1546613        1.1371822
##      4       25 24.4605946        0.1090641
##      5       19 22.5989918       -0.7570705
##      6       18 14.4993531        0.9193354
##      7        5  6.3789822       -0.5459878
##      8        1  1.8417194       -0.6202341
##      9        1  0.3151024        1.2201121
##     10        0  0.0242601       -0.1557565
plot(women.fit,type="standing", xlab="Number of women", shade = TRUE)

Exercise 3.3(d) Suggest some reasons why the number of women in queues of length 10 might depart from a binomial distribution, Bin(n = 10, p = 1/2).

par(mfrow=c(2,2), mar=c(2,2,2,2), oma=c(0,0,0,0))
plot(WomenQueue, type = "binomial", par = list(size = 10))
## Warning in plot.window(...): "par" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): plot type 'binomial' will be truncated
## to first character
## Warning in plot.xy(xy, type, ...): "par" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "par" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "par" is not a
## graphical parameter
## Warning in box(...): "par" is not a graphical parameter
## Warning in title(...): "par" is not a graphical parameter
## Warning in axis(...): "par" is not a graphical parameter
plot(WomenQueue, type = "binomial", par = list(prob = 0.5, size = 10))
## Warning in plot.window(...): "par" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): plot type 'binomial' will be truncated
## to first character
## Warning in plot.xy(xy, type, ...): "par" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "par" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "par" is not a
## graphical parameter
## Warning in box(...): "par" is not a graphical parameter
## Warning in title(...): "par" is not a graphical parameter
## Warning in axis(...): "par" is not a graphical parameter
par(mfrow=c(2,2), mar=c(2,2,2,2), oma=c(0,0,0,0))

plot(goodfit(WomenQueue, type="binomial", par = list(size = 10)))

plot(goodfit(WomenQueue, type = "binomial", par = list(prob = 0.5, size = 10)))

The sample size is relatively small which might not be able to fully represent the real distribution.

Exercise 3.4(a) Carry out the GOF test for this fixed binomial distribution. What is the ratio of x2/df? What do you conclude?

data("Saxony", package="vcd")
View(Saxony)
fit = goodfit(Saxony, type = "binomial", par = list(prob = 0.5, size = 12))
fit
## 
## Observed and fitted values for binomial distribution
## with fixed parameters 
## 
##  count observed     fitted pearson residual
##      0        3    1.49292        1.2334401
##      1       24   17.91504        1.4376359
##      2      104   98.53271        0.5507842
##      3      286  328.44238       -2.3419098
##      4      670  738.99536       -2.5380434
##      5     1033 1182.39258       -4.3445838
##      6     1343 1379.45801       -0.9816094
##      7     1112 1182.39258       -2.0471328
##      8      829  738.99536        3.3108845
##      9      478  328.44238        8.2523747
##     10      181   98.53271        8.3079041
##     11       45   17.91504        6.3991064
##     12        7    1.49292        4.5071617
summary(fit)
## Warning in summary.goodfit(fit): Chi-squared approximation may be incorrect
## 
##   Goodness-of-fit test for binomial distribution
## 
##                       X^2 df     P(> X^2)
## Pearson          249.1954 12 2.013281e-46
## Likelihood Ratio 205.4060 12 2.493625e-37
Ratio = 205.4060 / 12
Ratio
## [1] 17.11717

Because pvalue is less than chi-square so I fail to reject null hypothesi so I conclude my variables are independent.

Exercise 3.4(b) Test the additional lack of fit for the model Bin(n=12,p=1/2) compared to the model Bin(n=12, p=p^) where p^ is estimated from the data.

fit = goodfit(Saxony, type = "binomial", par = list(prob = 0.5, size = 12))
fit
## 
## Observed and fitted values for binomial distribution
## with fixed parameters 
## 
##  count observed     fitted pearson residual
##      0        3    1.49292        1.2334401
##      1       24   17.91504        1.4376359
##      2      104   98.53271        0.5507842
##      3      286  328.44238       -2.3419098
##      4      670  738.99536       -2.5380434
##      5     1033 1182.39258       -4.3445838
##      6     1343 1379.45801       -0.9816094
##      7     1112 1182.39258       -2.0471328
##      8      829  738.99536        3.3108845
##      9      478  328.44238        8.2523747
##     10      181   98.53271        8.3079041
##     11       45   17.91504        6.3991064
##     12        7    1.49292        4.5071617
unlist(fit$par)
## prob size 
##  0.5 12.0
summary(fit)
## Warning in summary.goodfit(fit): Chi-squared approximation may be incorrect
## 
##   Goodness-of-fit test for binomial distribution
## 
##                       X^2 df     P(> X^2)
## Pearson          249.1954 12 2.013281e-46
## Likelihood Ratio 205.4060 12 2.493625e-37
fit1=goodfit(Saxony, type = "binomial", par = list(size = 12))
fit1
## 
## Observed and fitted values for binomial distribution
## with parameters estimated by `ML' 
## 
##  count observed       fitted pearson residual
##      0        3    0.9328394        2.1402809
##      1       24   12.0888374        3.4257991
##      2      104   71.8031709        3.7996298
##      3      286  258.4751335        1.7120476
##      4      670  628.0550119        1.6737139
##      5     1033 1085.2107008       -1.5849023
##      6     1343 1367.2793552       -0.6566116
##      7     1112 1265.6303069       -4.3184059
##      8      829  854.2466464       -0.8637977
##      9      478  410.0125627        3.3576088
##     10      181  132.8357027        4.1789562
##     11       45   26.0824586        3.7041659
##     12        7    2.3472734        3.0368664
unlist(fit$par)
## prob size 
##  0.5 12.0
summary(fit1)
## 
##   Goodness-of-fit test for binomial distribution
## 
##                      X^2 df     P(> X^2)
## Likelihood Ratio 97.0065 11 6.978187e-16

Exercise 3.4(c) Use the plot(goodfit ()) method to visualize these two models.

par(mfrow=c(2,2), mar=c(2,2,2,2), oma=c(0,0,0,0))
plot( goodfit(Saxony, type = "binomial", par = list(prob = 0.5, size = 12)),xlab="Number of Males", main = 'Probability = 0.5', ylab = 'SquareRoot of Frequency')

plot(goodfit(Saxony, type = "binomial", par = list(size = 12)),main='Probability =0.519215 (estimated from data)', xlab ='Number of Males', ylab ='SquareRoot of Frequency')