Excercises: 5.1 & 5.9 “Discrete Data Analysis” by: Michael Friendly

Excercise 5.1:

The data set criminal in the package logmult gives the 4 × 5 table below of the number of men aged 15-19 charged with a criminal case for whom charges were dropped in Denmark from 1955-1958.

data("criminal",package = "logmult")
str(criminal)
##  table [1:4, 1:5] 141 144 196 212 285 292 380 424 320 342 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ Year: chr [1:4] "1955" "1956" "1957" "1958"
##   ..$ Age : chr [1:5] "15" "16" "17" "18" ...

The data seems to follow a normal distribution and is skewed to the right.

criminal2=as.data.frame(criminal)
plot(density(criminal2$Freq))

A) Use loglm() to test whether there is an association between Year and Age.

Is there evidence that dropping of charges in relation to age changed over the years recorded here?

library(MASS)
loglm(Freq~Age+Year, data = criminal) #fitts log linear models
## Call:
## loglm(formula = Freq ~ Age + Year, data = criminal)
## 
## Statistics:
##                       X^2 df     P(> X^2)
## Likelihood Ratio 38.24466 12 0.0001400372
## Pearson          38.41033 12 0.0001315495

There is minimal evidence that dropping of charges and age have a significant relationship. As a result, there is small likelyhood that charges changed over time on relation to age.

B)Use mosaic() with the option shade=TRUE to display the pattern of signs and magnitudes of the residuals. Compare this with the result of mosaic() using “Friendly shading,” from the option gp=shading_Friendly. Describe verbally what you see in each regarding the pattern of association in this table.

library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
mosaic(criminal,shade = TRUE, labeling = labeling_residuals, supress = 0)

The first mosaic shows the highest residuals are in 16 and 19 years old for 1955 and 1958. let’s look at the second mosaic to show all residuals.

mosaic(criminal, gp = shading_Friendly,labeling = labeling_residuals,suppress=0)

Residuals are low during the middle years and ages. 0 residuals during 1956 for age 17.


Excercise 5.9:

Bertin (1983, pp. 30-31) used a 4-way table of frequencies of traffic accident victims in France in 1958 to illustrate his scheme for classifying data sets by numerous variables, each of which could have various types and could be assigned to various visual attributes. His data are contained in Accident in vcdExtra, a frequency data frame representing his 5 × 2 × 4 × 2 table of the variables age, result (died or injured), mode of transportation, and gender.

data("Accident",package = "vcdExtra")
str(Accident)  
## 'data.frame':    80 obs. of  5 variables:
##  $ age   : Ord.factor w/ 5 levels "0-9"<"10-19"<..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ result: Factor w/ 2 levels "Died","Injured": 1 1 1 1 1 1 1 1 2 2 ...
##  $ mode  : Factor w/ 4 levels "4-Wheeled","Bicycle",..: 4 4 2 2 3 3 1 1 4 4 ...
##  $ gender: Factor w/ 2 levels "Female","Male": 2 1 2 1 2 1 2 1 2 1 ...
##  $ Freq  : int  704 378 396 56 742 78 513 253 5206 5449 ...

The data is highly skewed to the right woth the majority of the observations between 0 to 5,000.

plot(density(Accident$Freq))

A) Use loglm() to fit the model of mutual independence, Freq~age+mode+gender+result for this data set.

loglm(Freq~ age+result+mode+gender,data = Accident)
## Call:
## loglm(formula = Freq ~ age + result + mode + gender, data = Accident)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 60320.05 70        0
## Pearson          76865.31 70        0

The model indicates there is no significant relationship when cobining all explanatory variables Age/Result/mode/gender and the frequencies produced.

B) Use mosaic() to produce an interpretable mosaic plot of the associations among all variables under the model of mutual independence. Try different orders of the variables in the mosaic. (Hint: the abbreviate component of the labeling_args argument to mosaic() will be useful to avoid some overlap of the category labels.)

mosaic(Freq ~ age + mode + gender + result, data = Accident, shade = TRUE, labeling_args = list(clip = c(result = TRUE)))

mosaic(Freq ~ mode + gender + result + age, data = Accident, shade = TRUE, labeling_args = list(clip = c(result = TRUE)))

mosaic(Freq ~ gender + result + age + mode, data= Accident, shade = TRUE, labeling_args = list(clip = c(result = TRUE)))

All three combinations show that most residuals are concerntrated within injured, monotercycle, and ages between 20 to 50. The second combination highlights the highest residuals in a much clearer way by grouping together nicely.

c) Treat result (“Died” vs. “Injured”) as the response variable, and fit the model Freq ~ agemodegender + result that asserts independence of result.

Model_Result = loglm(Freq~age * mode * gender + result,data = Accident )
Model_Result
## Call:
## loglm(formula = Freq ~ age * mode * gender + result, data = Accident)
## 
## Statistics:
##                      X^2 df P(> X^2)
## Likelihood Ratio 2217.72 39        0
## Pearson          2347.60 39        0

The model does not seem to fit the data very well, pearson x^2 resulsts show no likelihood.

D) Construct a mosaic display for the residual associations in this model. Which combinations of the predictor factors are more likely to result in death?

mosaic(Model_Result, shade = T, labeling = labeling_residuals, labeling_args = list(clip = c(result = TRUE)))

Males between 20 to 49 years old that ride motorcycles seem to be the strongest predictors of death.