STAT ID HW 2

Excercise 2.1

Part A.

There are 76 different datasets total in both packages. VCD contains 33 and VCDExtra contains 43.

#Prep
require(vcdExtra)

## Loading required package: vcdExtra

## Loading required package: vcd

## Loading required package: grid

## Loading required package: gnm

require(vcd)
ds <- datasets(package = c("vcd", "vcdExtra"))
str(ds, vec.len = 2)

## 'data.frame':    76 obs. of  5 variables:
##  $ Package: chr  "vcd" "vcd" ...
##  $ Item   : chr  "Arthritis" "Baseball" ...
##  $ class  : chr  "data.frame" "data.frame" ...
##  $ dim    : chr  "84x5" "322x25" ...
##  $ Title  : chr  "Arthritis Treatment Data" "Baseball Data" ...

ds <- data.frame(ds)

##Part A.

#Total packages = 76
length(ds$Title)

## [1] 76

#VCD packages = 33
#ds[which(ds$Package == "vcd"),]

#VCDExtra Packages = 43
#ds[which(ds$Package == "vcdExtra"),]

Part B.

The table is shown below.

table(ds$Package,ds$class)

##           
##            array data.frame matrix table
##   vcd          1         17      0    15
##   vcdExtra     3         24      1    15

Part C.

From the two datasets I looked at, the Aborition dataset seemed to a bit more interesting. It seemed the largest group supporting abortion were females with a lower socio econimic status.

?BrokenMarriage
example("BrokenMarriage")

## 
## BrknMr> data("BrokenMarriage")
## 
## BrknMr> structable(~ ., data = BrokenMarriage)
##               rank   I  II III  IV   V
## gender broken                         
## male   yes          14  39  42  79  66
##        no          102 151 292 293 261
## female yes          12  23  37 102  58
##        no           25  79 151 557 321

?Abortion
example("Abortion")

## 
## Abortn> data(Abortion)
## 
## Abortn> # example goes here
## Abortn> ftable(Abortion)
##               Support_Abortion Yes  No
## Sex    Status                         
## Female Lo                      171  79
##        Hi                      138 112
## Male   Lo                      152 148
##        Hi                      167 133
## 
## Abortn> mosaic(Abortion, shade=TRUE)

## 
## Abortn> # stratified by Sex
## Abortn> fourfold(aperm(Abortion, 3:1))

## 
## Abortn> # stratified by Status
## Abortn> fourfold(aperm(Abortion, c(3,1,2)))

Excercise 2.2

Part A.

Abortion Data

Response Variable: Support_Abortion Explanatory Variables: Sex, Status

Unordered - Sex, Support_Abortion Ordered - Status (though there are only two categories in this data so it may be unncessary)

Some questions I have regarding the Abortion data set include which group of individuals support abortion? How does wealth tie into support? Age would be an interesting variable that would help make other conclusions.

str(Abortion)

##  'table' num [1:2, 1:2, 1:2] 171 152 138 167 79 148 112 133
##  - attr(*, "dimnames")=List of 3
##   ..$ Sex             : chr [1:2] "Female" "Male"
##   ..$ Status          : chr [1:2] "Lo" "Hi"
##   ..$ Support_Abortion: chr [1:2] "Yes" "No"

Part B.

Caesarian Births

Response Variable: Infection Explanatory Variables: Risk, Antibiotics, Planned

Unordered - Risk, Antibiotics, Planned, Infection Ordered - None

Some questions I have regarding this dataset are how many planned c-sections have infections. Additionally, information regarding risk could help predict infection or if an emergency c-section was needed.

str(Caesar)

##  'table' num [1:3, 1:2, 1:2, 1:2] 0 1 17 0 1 1 11 17 30 4 ...
##  - attr(*, "dimnames")=List of 4
##   ..$ Infection  : chr [1:3] "Type 1" "Type 2" "None"
##   ..$ Risk       : chr [1:2] "Yes" "No"
##   ..$ Antibiotics: chr [1:2] "Yes" "No"
##   ..$ Planned    : chr [1:2] "Yes" "No"

Part C.

DaytonSurvey

Response Variable: Cigarette, Alcohol, Marijuana, Frequency Explanatory Variables: Sex, Race

Unordered - Cigarette, Alcohol, Marijuana, Sex, Race Ordered - Frequency

Some questions I have for the dataset would be the validity of the answers. I wonder if the students feared retaliation. Some other questions I would ask would be the impact of race and sex on substance use. Additionally, the frequency of how often they have used these substances.

str(DaytonSurvey)

## 'data.frame':    32 obs. of  6 variables:
##  $ cigarette: Factor w/ 2 levels "Yes","No": 1 2 1 2 1 2 1 2 1 2 ...
##  $ alcohol  : Factor w/ 2 levels "Yes","No": 1 1 2 2 1 1 2 2 1 1 ...
##  $ marijuana: Factor w/ 2 levels "Yes","No": 1 1 1 1 2 2 2 2 1 1 ...
##  $ sex      : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 2 2 ...
##  $ race     : Factor w/ 2 levels "white","other": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Freq     : num  405 13 1 1 268 218 17 117 453 28 ...

Part D.

Minnesota High School Graduates

Response Variable: Status Explanatory Variables: Rank, Occupation, Sex

Unordered - Sex, Status Ordered - Occupation, Rank

I would be intersted to see if class rank predicts status well. I don’t really understand what the occupation variable is (what does high vs low mean?), but that would be an interesting analysis as well.

str(Hoyt)

##  'table' num [1:4, 1:3, 1:7, 1:2] 87 3 17 105 216 4 14 118 256 2 ...
##  - attr(*, "dimnames")=List of 4
##   ..$ Status    : chr [1:4] "College" "School" "Job" "Other"
##   ..$ Rank      : chr [1:3] "Low" "Middle" "High"
##   ..$ Occupation: chr [1:7] "1" "2" "3" "4" ...
##   ..$ Sex       : chr [1:2] "Male" "Female"

Excercise 2.3

head(UCBAdmissions)

## [1] 512 313  89  19 353 207

Part A.

There are 4526 cases contained in this table.

sum(UCBAdmissions)

## [1] 4526

Part B.

Number of applicants by department:

Department A - 933 Department B - 585 Department C - 918 Department D - 792 Department E - 584 Department F - 714

margin.table(UCBAdmissions,margin=3)

## Dept
##   A   B   C   D   E   F 
## 933 585 918 792 584 714

Part C.

The admission rate by department are as follows:

Department A - 64% Department B - 63% Department C - 35% Department D - 34% Department E - 25% Department F - 6%

#Overall Admission Rates by Department
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept"))

##          Dept          A          B          C          D          E          F
## Admit                                                                          
## Admitted      0.13278833 0.08174989 0.07114450 0.05943438 0.03247901 0.01016350
## Rejected      0.07335395 0.04750331 0.13168361 0.11555457 0.09655325 0.14759169

#Department A
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,1])

## [1] 0.6441586 0.3558414

#Department B
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,2])

## [1] 0.6324786 0.3675214

#Department C
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,3])

## [1] 0.3507625 0.6492375

#Department D
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,4])

## [1] 0.3396465 0.6603535

#Department E
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,5])

## [1] 0.2517123 0.7482877

#Department F
prop.table(ftable(UCBAdmissions,row.vars = "Admit",col.vars = "Dept")[,6])

## [1] 0.06442577 0.93557423

Part D.

The proportions of male and females admitted by department are shown in the table below.

table_prep<- aperm(UCBAdmissions,c(3,2,1))
table_admit <- table_prep[,,"Admitted"]
table_rej <- table_prep[,,"Rejected"]
table_prop <- table_admit/(table_admit+table_rej)
table_prop

##     Gender
## Dept       Male     Female
##    A 0.62060606 0.82407407
##    B 0.63035714 0.68000000
##    C 0.36923077 0.34064081
##    D 0.33093525 0.34933333
##    E 0.27748691 0.23918575
##    F 0.05898123 0.07038123

Excercise 2.5

?UKSoccer
data("UKSoccer",package="vcd")
ftable(UKSoccer)

##      Away  0  1  2  3  4
## Home                    
## 0         27 29 10  8  2
## 1         59 53 14 12  4
## 2         28 32 14 12  4
## 3         19 14  7  4  1
## 4          7  8 10  2  0

Part A.

The total number of games is 380.

sum(UKSoccer)

## [1] 380

Part B.

The marginal totals of the goals scored are shown in the below table.

addmargins(UKSoccer)

##      Away
## Home    0   1   2   3   4 Sum
##   0    27  29  10   8   2  76
##   1    59  53  14  12   4 142
##   2    28  32  14  12   4  90
##   3    19  14   7   4   1  45
##   4     7   8  10   2   0  27
##   Sum 140 136  55  38  11 380

Part C.

Below, all of the marginal totals are listed as proportions.

prop.table(addmargins(UKSoccer))

##      Away
## Home             0            1            2            3            4
##   0   0.0177631579 0.0190789474 0.0065789474 0.0052631579 0.0013157895
##   1   0.0388157895 0.0348684211 0.0092105263 0.0078947368 0.0026315789
##   2   0.0184210526 0.0210526316 0.0092105263 0.0078947368 0.0026315789
##   3   0.0125000000 0.0092105263 0.0046052632 0.0026315789 0.0006578947
##   4   0.0046052632 0.0052631579 0.0065789474 0.0013157895 0.0000000000
##   Sum 0.0921052632 0.0894736842 0.0361842105 0.0250000000 0.0072368421
##      Away
## Home           Sum
##   0   0.0500000000
##   1   0.0934210526
##   2   0.0592105263
##   3   0.0296052632
##   4   0.0177631579
##   Sum 0.2500000000

Part D.

From the mosaic plot, it looks like there are more games that are won that are home games than away games with the exception of Away Game 0.

All of the sum totals for home games are higher than away games except for game 0 as well.

plot(prop.table(UKSoccer))

mosaic(UKSoccer, gp = shading_max, main = "UK Soccer Scores")

STAT ID HW 2

Kajal Chokshi

5/27/2019

Excercise 2.1

Part A.

Part B.

Part C.

Excercise 2.2

Part A.

Part B.

Part C.

Part D.

Excercise 2.3

Part A.

Part B.

Part C.

Part D.

Excercise 2.5

Part A.

Part B.

Part C.

Part D.