There are 76 different datasets total in both packages. VCD contains 33 and VCDExtra contains 43.
#Prep
require(vcdExtra)
## Loading required package: vcdExtra
## Loading required package: vcd
## Loading required package: grid
## Loading required package: gnm
require(vcd)
ds <- datasets(package = c("vcd", "vcdExtra"))
str(ds, vec.len = 2)
## 'data.frame': 76 obs. of 5 variables:
## $ Package: chr "vcd" "vcd" ...
## $ Item : chr "Arthritis" "Baseball" ...
## $ class : chr "data.frame" "data.frame" ...
## $ dim : chr "84x5" "322x25" ...
## $ Title : chr "Arthritis Treatment Data" "Baseball Data" ...
ds <- data.frame(ds)
##Part A.
#Total packages = 76
length(ds$Title)
## [1] 76
#VCD packages = 33
#ds[which(ds$Package == "vcd"),]
#VCDExtra Packages = 43
#ds[which(ds$Package == "vcdExtra"),]
The table is shown below.
table(ds$Package,ds$class)
##
## array data.frame matrix table
## vcd 1 17 0 15
## vcdExtra 3 24 1 15
From the two datasets I looked at, the Aborition dataset seemed to a bit more interesting. It seemed the largest group supporting abortion were females with a lower socio econimic status.
?BrokenMarriage
example("BrokenMarriage")
##
## BrknMr> data("BrokenMarriage")
##
## BrknMr> structable(~ ., data = BrokenMarriage)
## rank I II III IV V
## gender broken
## male yes 14 39 42 79 66
## no 102 151 292 293 261
## female yes 12 23 37 102 58
## no 25 79 151 557 321
?Abortion
example("Abortion")
##
## Abortn> data(Abortion)
##
## Abortn> # example goes here
## Abortn> ftable(Abortion)
## Support_Abortion Yes No
## Sex Status
## Female Lo 171 79
## Hi 138 112
## Male Lo 152 148
## Hi 167 133
##
## Abortn> mosaic(Abortion, shade=TRUE)
##
## Abortn> # stratified by Sex
## Abortn> fourfold(aperm(Abortion, 3:1))
##
## Abortn> # stratified by Status
## Abortn> fourfold(aperm(Abortion, c(3,1,2)))
Abortion Data
Response Variable: Support_Abortion Explanatory Variables: Sex, Status
Unordered - Sex, Support_Abortion Ordered - Status (though there are only two categories in this data so it may be unncessary)
Some questions I have regarding the Abortion data set include which group of individuals support abortion? How does wealth tie into support? Age would be an interesting variable that would help make other conclusions.
str(Abortion)
## 'table' num [1:2, 1:2, 1:2] 171 152 138 167 79 148 112 133
## - attr(*, "dimnames")=List of 3
## ..$ Sex : chr [1:2] "Female" "Male"
## ..$ Status : chr [1:2] "Lo" "Hi"
## ..$ Support_Abortion: chr [1:2] "Yes" "No"
Caesarian Births
Response Variable: Infection Explanatory Variables: Risk, Antibiotics, Planned
Unordered - Risk, Antibiotics, Planned, Infection Ordered - None
Some questions I have regarding this dataset are how many planned c-sections have infections. Additionally, information regarding risk could help predict infection or if an emergency c-section was needed.
str(Caesar)
## 'table' num [1:3, 1:2, 1:2, 1:2] 0 1 17 0 1 1 11 17 30 4 ...
## - attr(*, "dimnames")=List of 4
## ..$ Infection : chr [1:3] "Type 1" "Type 2" "None"
## ..$ Risk : chr [1:2] "Yes" "No"
## ..$ Antibiotics: chr [1:2] "Yes" "No"
## ..$ Planned : chr [1:2] "Yes" "No"
DaytonSurvey
Response Variable: Cigarette, Alcohol, Marijuana, Frequency Explanatory Variables: Sex, Race
Unordered - Cigarette, Alcohol, Marijuana, Sex, Race Ordered - Frequency
Some questions I have for the dataset would be the validity of the answers. I wonder if the students feared retaliation. Some other questions I would ask would be the impact of race and sex on substance use. Additionally, the frequency of how often they have used these substances.
str(DaytonSurvey)
## 'data.frame': 32 obs. of 6 variables:
## $ cigarette: Factor w/ 2 levels "Yes","No": 1 2 1 2 1 2 1 2 1 2 ...
## $ alcohol : Factor w/ 2 levels "Yes","No": 1 1 2 2 1 1 2 2 1 1 ...
## $ marijuana: Factor w/ 2 levels "Yes","No": 1 1 1 1 2 2 2 2 1 1 ...
## $ sex : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 2 2 ...
## $ race : Factor w/ 2 levels "white","other": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq : num 405 13 1 1 268 218 17 117 453 28 ...
Minnesota High School Graduates
Response Variable: Status Explanatory Variables: Rank, Occupation, Sex
Unordered - Sex, Status Ordered - Occupation, Rank
I would be intersted to see if class rank predicts status well. I don’t really understand what the occupation variable is (what does high vs low mean?), but that would be an interesting analysis as well.
str(Hoyt)
## 'table' num [1:4, 1:3, 1:7, 1:2] 87 3 17 105 216 4 14 118 256 2 ...
## - attr(*, "dimnames")=List of 4
## ..$ Status : chr [1:4] "College" "School" "Job" "Other"
## ..$ Rank : chr [1:3] "Low" "Middle" "High"
## ..$ Occupation: chr [1:7] "1" "2" "3" "4" ...
## ..$ Sex : chr [1:2] "Male" "Female"
head(UCBAdmissions)
## [1] 512 313 89 19 353 207
There are 4526 cases contained in this table.
sum(UCBAdmissions)
## [1] 4526
Number of applicants by department:
Department A - 933 Department B - 585 Department C - 918 Department D - 792 Department E - 584 Department F - 714
margin.table(UCBAdmissions,margin=3)
## Dept
## A B C D E F
## 933 585 918 792 584 714
The admission rate by department are as follows:
Department A - 64% Department B - 63% Department C - 35% Department D - 34% Department E - 25% Department F - 6%
#Overall Admission Rates by Department
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept"))
## Dept A B C D E F
## Admit
## Admitted 0.13278833 0.08174989 0.07114450 0.05943438 0.03247901 0.01016350
## Rejected 0.07335395 0.04750331 0.13168361 0.11555457 0.09655325 0.14759169
#Department A
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,1])
## [1] 0.6441586 0.3558414
#Department B
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,2])
## [1] 0.6324786 0.3675214
#Department C
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,3])
## [1] 0.3507625 0.6492375
#Department D
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,4])
## [1] 0.3396465 0.6603535
#Department E
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,5])
## [1] 0.2517123 0.7482877
#Department F
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,6])
## [1] 0.06442577 0.93557423
The proportions of male and females admitted by department are shown in the table below.
table_prep<- aperm(UCBAdmissions,c(3,2,1))
table_admit <- table_prep[,,"Admitted"]
table_rej <- table_prep[,,"Rejected"]
table_prop <- table_admit/(table_admit+table_rej)
table_prop
## Gender
## Dept Male Female
## A 0.62060606 0.82407407
## B 0.63035714 0.68000000
## C 0.36923077 0.34064081
## D 0.33093525 0.34933333
## E 0.27748691 0.23918575
## F 0.05898123 0.07038123
?UKSoccer
data("UKSoccer",package="vcd")
ftable(UKSoccer)
## Away 0 1 2 3 4
## Home
## 0 27 29 10 8 2
## 1 59 53 14 12 4
## 2 28 32 14 12 4
## 3 19 14 7 4 1
## 4 7 8 10 2 0
The total number of games is 380.
sum(UKSoccer)
## [1] 380
The marginal totals of the goals scored are shown in the below table.
addmargins(UKSoccer)
## Away
## Home 0 1 2 3 4 Sum
## 0 27 29 10 8 2 76
## 1 59 53 14 12 4 142
## 2 28 32 14 12 4 90
## 3 19 14 7 4 1 45
## 4 7 8 10 2 0 27
## Sum 140 136 55 38 11 380
Below, all of the marginal totals are listed as proportions.
prop.table(addmargins(UKSoccer))
## Away
## Home 0 1 2 3 4
## 0 0.0177631579 0.0190789474 0.0065789474 0.0052631579 0.0013157895
## 1 0.0388157895 0.0348684211 0.0092105263 0.0078947368 0.0026315789
## 2 0.0184210526 0.0210526316 0.0092105263 0.0078947368 0.0026315789
## 3 0.0125000000 0.0092105263 0.0046052632 0.0026315789 0.0006578947
## 4 0.0046052632 0.0052631579 0.0065789474 0.0013157895 0.0000000000
## Sum 0.0921052632 0.0894736842 0.0361842105 0.0250000000 0.0072368421
## Away
## Home Sum
## 0 0.0500000000
## 1 0.0934210526
## 2 0.0592105263
## 3 0.0296052632
## 4 0.0177631579
## Sum 0.2500000000
From the mosaic plot, it looks like there are more games that are won that are home games than away games with the exception of Away Game 0.
All of the sum totals for home games are higher than away games except for game 0 as well.
plot(prop.table(UKSoccer))
mosaic(UKSoccer, gp = shading_max, main = "UK Soccer Scores")