set.seed(234)
prod <- expand.grid(sex = c("M", "F"), prod = c("A","B","C"))
#prod <- expand.grid(sex = c("M", "F"), prod = LETTERS[1:3]) same as above line of code
poin<-rpois(n=6, lambda=4)
poin
## [1] 5 5 1 5 1 5
prod$price <- poin
prod
## sex prod price
## 1 M A 5
## 2 F A 5
## 3 M B 1
## 4 F B 5
## 5 M C 1
## 6 F C 5
library(vcd)
## Loading required package: grid
library(grid)
library(gnm)
library(vcdExtra)
WomenQueue
## nWomen
## 0 1 2 3 4 5 6 7 8 9 10
## 1 3 4 23 25 19 18 5 1 1 0
#lecture#3
Ord_plot(WomenQueue, main = "Queue", gp=gpar(cex=1), pch=16)
Using Ord_plot command from R, we found that the best distribution for our data is Binomal distribution with no. of frequencies=10 and estimated probability =0.53.
data("Hospital", package = "vcd")
str(Hospital)
## table [1:3, 1:3] 43 6 9 16 11 18 3 10 16
## - attr(*, "dimnames")=List of 2
## ..$ Visit frequency: chr [1:3] "Regular" "Less than monthly" "Never"
## ..$ Length of stay : chr [1:3] "2-9" "10-19" "20+"
prop.table (Hospital,1)
## Length of stay
## Visit frequency 2-9 10-19 20+
## Regular 0.6935484 0.2580645 0.0483871
## Less than monthly 0.2222222 0.4074074 0.3703704
## Never 0.2093023 0.4186047 0.3720930
Out results indicate that the frequency of visits is reduced over the length of stay: For people staying 2-9 years, people will get more frequent visitors (69.35%) For people staying 10-19 years, the frequency of regular visits will be very low (25.8%), they may however get visitors in less than a month approx 40% of the time For people staying 20+ years, the frequency of regular visits is reduced greatly (only 4.8%).
chisq.test(Hospital)
##
## Pearson's Chi-squared test
##
## data: Hospital
## X-squared = 35.171, df = 4, p-value = 4.284e-07
Our results indicate that since p <0.05 there is a an association between the frequency of visits and the length of stay.
chisq.test(Hospital, simulate=TRUE)
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: Hospital
## X-squared = 35.171, df = NA, p-value = 0.0004998
Our results indicate that since p value is still below 0.05, there is no change in the results.
library(MASS)
H_lm <- loglm(~1+2, data=Hospital, fitted=TRUE)
H_lm
## Call:
## loglm(formula = ~1 + 2, data = Hospital, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 38.35297 4 9.475535e-08
## Pearson 35.17109 4 4.284198e-07
Since p-value for both of the chi square statistics in our model are <0.05 this indicates an association between both the variables.
assocstats(Hospital)
## X^2 df P(> X^2)
## Likelihood Ratio 38.353 4 9.4755e-08
## Pearson 35.171 4 4.2842e-07
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.459
## Cramer's V : 0.365
Cramer’s V is positively related with Chi-square statistics. However the value is only 0.3 which mean there is a weak association. If the Cramer’s V was close to 1 it would indicate a strong association between the variables.
CMHtest(Hospital)
## Cochran-Mantel-Haenszel Statistics for Visit frequency by Length of stay
##
## AltHypothesis Chisq Df Prob
## cor Nonzero correlation 29.138 1 6.7393e-08
## rmeans Row mean scores differ 34.391 2 3.4044e-08
## cmeans Col mean scores differ 29.607 2 3.7233e-07
## general General association 34.905 4 4.8596e-07
Our results indicate that the p-value is still less than 0.05 and both row and column factors are ordered factors so the association between the two variables i.e. frequency of visits and length of stay still holds true.
library(vcd)
mosaic(Hospital, shade=TRUE, labeling=labeling_residuals())
From Mosaic Plot, we find association between the visit and length of stay, especially about the corner effect.
If one stays for a very short time, he can expect to get more frequent visits which is also confirmed by the negative association with short stay and never visited. Similarly, if one stays very long (i.e. 20+), there is a higher higher likelihood of no visitors which is also confirmed by the negative association with long stay and regular visitors.
library(ca)
hosp <- ca(Hospital)
str(hosp)
## List of 16
## $ sv : num [1:2] 0.51615 0.00633
## $ nd : logi NA
## $ rownames : chr [1:3] "Regular" "Less than monthly" "Never"
## $ rowmass : num [1:3] 0.47 0.205 0.326
## $ rowdist : num [1:3] 0.548 0.473 0.494
## $ rowinertia: num [1:3] 0.1412 0.0457 0.0795
## $ rowcoord : num [1:3, 1:2] -1.0624 0.916 0.9568 0.0154 -1.7464 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:3] "Regular" "Less than monthly" "Never"
## .. ..$ : chr [1:2] "Dim1" "Dim2"
## $ rowsup : logi(0)
## $ colnames : chr [1:3] "2-9" "10-19" "20+"
## $ colmass : num [1:3] 0.439 0.341 0.22
## $ coldist : num [1:3] 0.544 0.229 0.734
## $ colinertia: num [1:3] 0.1303 0.0179 0.1183
## $ colcoord : num [1:3, 1:2] -1.055 0.443 1.422 -0.404 1.318 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:3] "2-9" "10-19" "20+"
## .. ..$ : chr [1:2] "Dim1" "Dim2"
## $ colsup : logi(0)
## $ N : num [1:3, 1:3] 43 6 9 16 11 18 3 10 16
## $ call : language ca.matrix(obj = obj)
## - attr(*, "class")= chr "ca"
plot(hosp)
Our correspondence analysis confirms that the patients with shorter stay are more likely to get regular visitors as compared to patients with longer stay of more than 20+ years
Titanic
## , , Age = Child, Survived = No
##
## Sex
## Class Male Female
## 1st 0 0
## 2nd 0 0
## 3rd 35 17
## Crew 0 0
##
## , , Age = Adult, Survived = No
##
## Sex
## Class Male Female
## 1st 118 4
## 2nd 154 13
## 3rd 387 89
## Crew 670 3
##
## , , Age = Child, Survived = Yes
##
## Sex
## Class Male Female
## 1st 5 1
## 2nd 11 13
## 3rd 13 14
## Crew 0 0
##
## , , Age = Adult, Survived = Yes
##
## Sex
## Class Male Female
## 1st 57 140
## 2nd 14 80
## 3rd 75 76
## Crew 192 20
Titanic <- Titanic+0.5
Titanic
## , , Age = Child, Survived = No
##
## Sex
## Class Male Female
## 1st 0.5 0.5
## 2nd 0.5 0.5
## 3rd 35.5 17.5
## Crew 0.5 0.5
##
## , , Age = Adult, Survived = No
##
## Sex
## Class Male Female
## 1st 118.5 4.5
## 2nd 154.5 13.5
## 3rd 387.5 89.5
## Crew 670.5 3.5
##
## , , Age = Child, Survived = Yes
##
## Sex
## Class Male Female
## 1st 5.5 1.5
## 2nd 11.5 13.5
## 3rd 13.5 14.5
## Crew 0.5 0.5
##
## , , Age = Adult, Survived = Yes
##
## Sex
## Class Male Female
## 1st 57.5 140.5
## 2nd 14.5 80.5
## 3rd 75.5 76.5
## Crew 192.5 20.5
#Finding Fit
titanic.loglm1 <- loglm(Freq~ (Age * Sex * Class) + Survived, data= Titanic)
summary(titanic.loglm1)
## Formula:
## Freq ~ (Age * Sex * Class) + Survived
## attr(,"variables")
## list(Freq, Age, Sex, Class, Survived)
## attr(,"factors")
## Age Sex Class Survived Age:Sex Age:Class Sex:Class Age:Sex:Class
## Freq 0 0 0 0 0 0 0 0
## Age 1 0 0 0 1 1 0 1
## Sex 0 1 0 0 1 0 1 1
## Class 0 0 1 0 0 1 1 1
## Survived 0 0 0 1 0 0 0 0
## attr(,"term.labels")
## [1] "Age" "Sex" "Class" "Survived"
## [5] "Age:Sex" "Age:Class" "Sex:Class" "Age:Sex:Class"
## attr(,"order")
## [1] 1 1 1 1 2 2 2 3
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 659.3162 15 0
## Pearson 643.1600 15 0
#Obtaining a Mosaic Plot and interpreting the pattern of residuals
mosaic(titanic.loglm1, main ="model [AGC][S]",labeling = labeling_residuals())
Our mosiac plot shows that overall 1st and 2nd class adult females are most likely to survive, however except for class 1st, males are less likely to survive. Female child in class 2nd is more likely to survive than male child in the same class.
#Finding Fit
(titanic.loglm2 <- loglm(~ (Class * Age * Sex) + Survived*(Class + Age + Sex), data=Titanic))
## Call:
## loglm(formula = ~(Class * Age * Sex) + Survived * (Class + Age +
## Sex), data = Titanic)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 105.5531 10 0
## Pearson 101.4396 10 0
#Obtaining a Mosaic Plot and interpreting the pattern of residuals
mosaic(titanic.loglm2, main ="model[AGC][AS][GS][CS]", labeling = labeling_residuals())
our mosaic plot shows that for class 2nd male child is more likely to survive while in the model above this same was true for female child. Overall, Class 3rd male adult has a higher survival rate and female child has a higher non survival rate. However in this analysis, overall the number of associations have significantly reduced compared the model above without interactions of survival with class, Age and Sex.
#Finding Fit
(titanic.loglm3 <- loglm(~ (Class * Age * Sex) + Survived*(Class + Age * Sex), data=Titanic))
## Call:
## loglm(formula = ~(Class * Age * Sex) + Survived * (Class + Age *
## Sex), data = Titanic)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 85.51508 9 1.287859e-14
## Pearson 78.83273 9 2.755574e-13
summary(titanic.loglm3)
## Formula:
## ~(Class * Age * Sex) + Survived * (Class + Age * Sex)
## attr(,"variables")
## list(Class, Age, Sex, Survived)
## attr(,"factors")
## Class Age Sex Survived Class:Age Class:Sex Age:Sex Class:Survived
## Class 1 0 0 0 1 1 0 1
## Age 0 1 0 0 1 0 1 0
## Sex 0 0 1 0 0 1 1 0
## Survived 0 0 0 1 0 0 0 1
## Age:Survived Sex:Survived Class:Age:Sex Age:Sex:Survived
## Class 0 0 1 0
## Age 1 0 1 1
## Sex 0 1 1 1
## Survived 1 1 0 1
## attr(,"term.labels")
## [1] "Class" "Age" "Sex"
## [4] "Survived" "Class:Age" "Class:Sex"
## [7] "Age:Sex" "Class:Survived" "Age:Survived"
## [10] "Sex:Survived" "Class:Age:Sex" "Age:Sex:Survived"
## attr(,"order")
## [1] 1 1 1 1 2 2 2 2 2 2 3 3
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 0
## attr(,".Environment")
## <environment: R_GlobalEnv>
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 85.51508 9 1.287859e-14
## Pearson 78.83273 9 2.755574e-13
#Obtaining a Mosaic Plot and interpreting the pattern of residuals
mosaic(titanic.loglm3, main="Model [AGC][AS][GS][CS][AGS]")
# compare the models using Anova
anova(titanic.loglm1,titanic.loglm2, titanic.loglm3, test= "Chisq")
## LR tests for hierarchical log-linear models
##
## Model 1:
## Freq ~ (Age * Sex * Class) + Survived
## Model 2:
## ~(Class * Age * Sex) + Survived * (Class + Age + Sex)
## Model 3:
## ~(Class * Age * Sex) + Survived * (Class + Age * Sex)
##
## Deviance df Delta(Dev) Delta(df) P(> Delta(Dev)
## Model 1 659.31623 15
## Model 2 105.55308 10 553.76314 5 0e+00
## Model 3 85.51508 9 20.03800 1 1e-05
## Saturated 0.00000 0 85.51508 9 0e+00
After we allowed more interactions between the variables, mosaic plot showed that remaining association reduced a lot.This plot indicates that overall male in 3rd class has higher survival rate overall compared to females. Our model comparison using ANOVA shows that the association between variables reduced as we increased the no. of interactions between different variables.