For this assignment, you will be working through questions 2.1 - 2.6 & #2.8 from Chapter 2 on p.p. 60-63. You will use the template below to answer the questions. If the problem requires that you develop R code, place the R code in the code chunks below. For responses requiring text-based ansers, type your answer below each code chunk.

This assignment is worth 100 points. There are 25 problems and each problem is worth 4 points each.

Submission Instructions: save the .HTML file as ‘R_Cat_Lab_yourlastname.HTML’ and upload the HTML file to the assignment entitled ‘R Categorical Lab’ on Moodle on or before Monday May 20, 2019 by 11:55p.m. EST. No late assignments are accepted.

2.1 p.p. 60-61

run the code chunk below.

library(vcdExtra)

## Warning: package 'vcdExtra' was built under R version 3.5.3

## Loading required package: vcd

## Warning: package 'vcd' was built under R version 3.5.3

## Loading required package: grid

## Loading required package: gnm

## Warning: package 'gnm' was built under R version 3.5.3

ds <- datasets(package = c("vcd", "vcdExtra"))
str(ds, vec.len=2)

## 'data.frame':    76 obs. of  5 variables:
##  $ Package: chr  "vcd" "vcd" ...
##  $ Item   : chr  "Arthritis" "Baseball" ...
##  $ class  : chr  "data.frame" "data.frame" ...
##  $ dim    : chr  "84x5" "322x25" ...
##  $ Title  : chr  "Arthritis Treatment Data" "Baseball Data" ...

How many data sets are there altogether? How many are there in each package?

ds=datasets(package=c("vcd", "vcdExtra"))
nrow(ds)

## [1] 76

table(ds$Package)

## 
##      vcd vcdExtra 
##       33       43

Make a tabular display of the frequencies by Package and class.

table(ds$Package, ds$class)

##           
##            array data.frame matrix table
##   vcd          1         17      0    15
##   vcdExtra     3         24      1    15

Choose one or two data sets from this list, and examine their help files (e.g., help(Arthritis) or ?Arthritis). You can use, e.g., example(Arthritis) to run the R code for a given example.

?Arthritis

## starting httpd help server ... done

?BrokenMarriage
#e1(Arthritis)
#e2(BrokenMarriage)

2.2 p. 61

Place your written responses below each code chunk.

Abortion opinion data: Abortion

library(vcdExtra)
library(vcdExtra)
data(Abortion, package="vcdExtra")
str(Abortion)

##  'table' num [1:2, 1:2, 1:2] 171 152 138 167 79 148 112 133
##  - attr(*, "dimnames")=List of 3
##   ..$ Sex             : chr [1:2] "Female" "Male"
##   ..$ Status          : chr [1:2] "Lo" "Hi"
##   ..$ Support_Abortion: chr [1:2] "Yes" "No"

Caesarian Births: Caesar

library(vcdExtra)
data(Caesar, package="vcdExtra")
str(Caesar)

##  'table' num [1:3, 1:2, 1:2, 1:2] 0 1 17 0 1 1 11 17 30 4 ...
##  - attr(*, "dimnames")=List of 4
##   ..$ Infection  : chr [1:3] "Type 1" "Type 2" "None"
##   ..$ Risk       : chr [1:2] "Yes" "No"
##   ..$ Antibiotics: chr [1:2] "Yes" "No"
##   ..$ Planned    : chr [1:2] "Yes" "No"

Dayton Survey: DaytonSurvey

library(vcdExtra)
data(DaytonSurvey, package="vcdExtra")
str(DaytonSurvey)

## 'data.frame':    32 obs. of  6 variables:
##  $ cigarette: Factor w/ 2 levels "Yes","No": 1 2 1 2 1 2 1 2 1 2 ...
##  $ alcohol  : Factor w/ 2 levels "Yes","No": 1 1 2 2 1 1 2 2 1 1 ...
##  $ marijuana: Factor w/ 2 levels "Yes","No": 1 1 1 1 2 2 2 2 1 1 ...
##  $ sex      : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 2 2 ...
##  $ race     : Factor w/ 2 levels "white","other": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Freq     : num  405 13 1 1 268 218 17 117 453 28 ...

Minnesota High School Graduates: Hoyt

library(vcdExtra)
data(Hoyt, package="vcdExtra")
str(Hoyt)

##  'table' num [1:4, 1:3, 1:7, 1:2] 87 3 17 105 216 4 14 118 256 2 ...
##  - attr(*, "dimnames")=List of 4
##   ..$ Status    : chr [1:4] "College" "School" "Job" "Other"
##   ..$ Rank      : chr [1:3] "Low" "Middle" "High"
##   ..$ Occupation: chr [1:7] "1" "2" "3" "4" ...
##   ..$ Sex       : chr [1:2] "Male" "Female"

2.3 p. 61

Find the total number of cases contained in this table.

a=UCBAdmissions

summary(a)

## Number of cases in table: 4526 
## Number of factors: 3 
## Test for independence of all factors:
##  Chisq = 2000.3, df = 16, p-value = 0

For each department, find the total number of applicants.

colSums(a, dims=2)

##   A   B   C   D   E   F 
## 933 585 918 792 584 714

For each department, find the overall proportion of applicants who were admitted.

c=prop.table(margin.table(a,c(1,3)),2)
c

##           Dept
## Admit               A          B          C          D          E
##   Admitted 0.64415863 0.63247863 0.35076253 0.33964646 0.25171233
##   Rejected 0.35584137 0.36752137 0.64923747 0.66035354 0.74828767
##           Dept
## Admit               F
##   Admitted 0.06442577
##   Rejected 0.93557423

Construct a tabular display of department (rows) and gender (columns), showing the proportion of applicants in each cell who were admitted relative to the total applicants in that cell.

d=ftable(prop.table(a,c(2,3)),row.vars="Dept",col.vars=c("Gender","Admit"))

2.4 p. 60

Find the total number of cases represented in this table.

data("DanishWelfare", package="vcd")
sum(DanishWelfare$Freq)

## [1] 5144

In this form, the variables Alcohol and Income should arguably be considered ordered factors. Change them to make them ordered.

levels(DanishWelfare$Alcohol)

## [1] "<1"  "1-2" ">2"

DanishWelfare$Alcohol <- as.ordered(DanishWelfare$Alcohol)
DanishWelfare$Income <- as.ordered(DanishWelfare$Income)
str(DanishWelfare)

## 'data.frame':    180 obs. of  5 variables:
##  $ Freq   : num  1 4 1 8 6 14 8 41 100 175 ...
##  $ Alcohol: Ord.factor w/ 3 levels "<1"<"1-2"<">2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Income : Ord.factor w/ 4 levels "0-50"<"50-100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Status : Factor w/ 3 levels "Widow","Married",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ Urban  : Factor w/ 5 levels "Copenhagen","SubCopenhagen",..: 1 2 3 4 5 1 2 3 4 5 ...

Convert this data frame to table form, DanishWelfare.tab, a 4-way array containing the frequencies with appropriate variable names and level names.

DanishWelfare.tab <-xtabs(Freq ~ ., data = DanishWelfare)
str(DanishWelfare.tab)

##  'xtabs' num [1:3, 1:4, 1:3, 1:5] 1 3 2 8 1 3 2 5 2 42 ...
##  - attr(*, "dimnames")=List of 4
##   ..$ Alcohol: chr [1:3] "<1" "1-2" ">2"
##   ..$ Income : chr [1:4] "0-50" "50-100" "100-150" ">150"
##   ..$ Status : chr [1:3] "Widow" "Married" "Unmarried"
##   ..$ Urban  : chr [1:5] "Copenhagen" "SubCopenhagen" "LargeCity" "City" ...
##  - attr(*, "call")= language xtabs(formula = Freq ~ ., data = DanishWelfare)

d.The variable Urban has 5 categories. Find the total frequencies in each of these. How would you collapse the table to have only two categories, City, Non-city?

Place your text response below the code chunk.

margin.table(DanishWelfare.tab, 4)

## Urban
##    Copenhagen SubCopenhagen     LargeCity          City       Country 
##           552           614           594          1765          1619

DW=vcdExtra::collapse.table(DanishWelfare.tab, Urban=c("City","NonCity","City","City","NonCity"))
head(ftable(DW))

##                                                          
##                                  "Urban" "City" "NonCity"
##  "Alcohol" "Income"  "Status"                            
##  "<1"      "0-50"    "Widow"                 10        10
##                      "Married"              155       183
##                      "Unmarried"             14        10
##            "50-100"  "Widow"                 29         7
##                      "Married"              338       306
##                      "Unmarried"             36        32

2.5 p. 62

Verify that the total number of games represented in this table is 380.

a=UKSoccer

ftable(a)

##      Away  0  1  2  3  4
## Home                    
## 0         27 29 10  8  2
## 1         59 53 14 12  4
## 2         28 32 14 12  4
## 3         19 14  7  4  1
## 4          7  8 10  2  0

Find the marginal total of the number of goals scored by each of the home and away teams.

b=addmargins(a)
b

##      Away
## Home    0   1   2   3   4 Sum
##   0    27  29  10   8   2  76
##   1    59  53  14  12   4 142
##   2    28  32  14  12   4  90
##   3    19  14   7   4   1  45
##   4     7   8  10   2   0  27
##   Sum 140 136  55  38  11 380

Express each of the marginal totals as proportions.

prop.table(b)

##      Away
## Home             0            1            2            3            4
##   0   0.0177631579 0.0190789474 0.0065789474 0.0052631579 0.0013157895
##   1   0.0388157895 0.0348684211 0.0092105263 0.0078947368 0.0026315789
##   2   0.0184210526 0.0210526316 0.0092105263 0.0078947368 0.0026315789
##   3   0.0125000000 0.0092105263 0.0046052632 0.0026315789 0.0006578947
##   4   0.0046052632 0.0052631579 0.0065789474 0.0013157895 0.0000000000
##   Sum 0.0921052632 0.0894736842 0.0361842105 0.0250000000 0.0072368421
##      Away
## Home           Sum
##   0   0.0500000000
##   1   0.0934210526
##   2   0.0592105263
##   3   0.0296052632
##   4   0.0177631579
##   Sum 0.2500000000

##      Away
## Home    0   1   2   3   4 Sum
##   0    27  29  10   8   2  76
##   1    59  53  14  12   4 142
##   2    28  32  14  12   4  90
##   3    19  14   7   4   1  45
##   4     7   8  10   2   0  27
##   Sum 140 136  55  38  11 380

Comment on the distribution of the numbers of home-team and away-team goals. Is there any evidence that home teams score more goals on average?

Place your text response below the code chunk.

#Home teams won more games compared to away team in this dataset.  There's evidence that home teams scored more goals, but more evidence and testing are needed.

2.6 p. 62

Use subset () to create a data frame, sax12 containing the Geissler observations in families with size==12.

data("Saxony", package="vcd")
data("Geissler", package="vcdExtra")
sax12 <- subset(Geissler, size==12)
sax12

##    boys girls size Freq
## 12    0    12   12    3
## 24    1    11   12   24
## 35    2    10   12  104
## 45    3     9   12  286
## 54    4     8   12  670
## 62    5     7   12 1033
## 69    6     6   12 1343
## 75    7     5   12 1112
## 80    8     4   12  829
## 84    9     3   12  478
## 87   10     2   12  181
## 89   11     1   12   45
## 90   12     0   12    7

Select the columns for boys and Freq.

sax12 <- subset(sax12, select=c("boys","Freq"))

Use xtabs () with a formula, Freq ~ boys, to create the one-way table.

Saxony12<-xtabs(Freq~boys, data=sax12)
Saxony12

## boys
##    0    1    2    3    4    5    6    7    8    9   10   11   12 
##    3   24  104  286  670 1033 1343 1112  829  478  181   45    7

Do the same steps again to create a one-way table, Saxony11, containing similar frequencies for families of size==11.

 sax11 <- subset(Geissler, size==11, select = c("boys","Freq"))
Saxony11 <- xtabs(Freq~boys, data=sax11)
Saxony11

## boys
##    0    1    2    3    4    5    6    7    8    9   10   11 
##    8   72  275  837 1540 2161 2310 1801 1077  492   93   24

2.8 p. 63

From this, use xtabs () to create two 4 x 4 frequency tables, one for each gender.

data("VisualAcuity", package = "vcd")
str(VisualAcuity)

## 'data.frame':    32 obs. of  4 variables:
##  $ Freq  : num  1520 234 117 36 266 ...
##  $ right : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
##  $ left  : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
##  $ gender: Factor w/ 2 levels "male","female": 2 2 2 2 2 2 2 2 2 2 ...

data("VisualAcuity", package="vcd")
va.tabm <- xtabs(Freq ~ right+left, data = VisualAcuity, subset=gender=="male")
va.tabm

##      left
## right   1   2   3   4
##     1 821 112  85  35
##     2 116 494 145  27
##     3  72 151 583  87
##     4  43  34 106 331

From this, use xtabs() to create two 4 4 frequency tables, one for each gender.

data("VisualAcuity", package="vcd")
va.tabm <- xtabs(Freq ~ right+left, data = VisualAcuity, subset=gender=="male")
va.tabm

##      left
## right   1   2   3   4
##     1 821 112  85  35
##     2 116 494 145  27
##     3  72 151 583  87
##     4  43  34 106 331

va.tab <- xtabs(Freq ~ ., data = VisualAcuity)
va.tabm <- va.tab[,,"male"]
va.tabf <- va.tab[,,"female"]

Bonus: +4 points b. Use structable () to create a nicely organized tabular display.

structable(right ~ left + gender, data = va.tab)

##             right    1    2    3    4
## left gender                          
## 1    male          821  116   72   43
##      female       1520  234  117   36
## 2    male          112  494  151   34
##      female        266 1512  362   82
## 3    male           85  145  583  106
##      female        124  432 1772  179
## 4    male           35   27   87  331
##      female         66   78  205  492

#delete this and place your code here

R Categorical Lab

Steven Infanti

2019-09-03