Homework #1 is worth 100 points and each question is worth 6.5 points each.

Submission Instructions: save the .HTML file as ‘HW1_yourlastname.HTML’ and upload the HTML file to the assignment entitled ‘Homework #1’ on Moodle on or before Tuesday July 23, 2019 by 11:55p.m. EST. No late assignments are accepted.

  1. 2.1 p.p. 60-61

Run the code chunk below.

library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.5.3
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.5.3
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.5.3
ds <- datasets(package = c("vcd", "vcdExtra"))
str(ds, vec.len=2)
## 'data.frame':    76 obs. of  5 variables:
##  $ Package: chr  "vcd" "vcd" ...
##  $ Item   : chr  "Arthritis" "Baseball" ...
##  $ class  : chr  "data.frame" "data.frame" ...
##  $ dim    : chr  "84x5" "322x25" ...
##  $ Title  : chr  "Arthritis Treatment Data" "Baseball Data" ...
  1. How many data sets are there altogether? How many are there in each package?

There are 76 data sets altogether. There are 33 data sets in package vcd and 43 data sets in package.

nrow(ds)
## [1] 76
ds_1 <- datasets(package = "vcd")
nrow(ds_1)
## [1] 33
ds_2 <- datasets(package = "vcdExtra")
nrow(ds_2)
## [1] 43
  1. Make a tabular display of the frequencies by Package and class.
table(ds$Package, ds$class)
##           
##            array data.frame matrix table
##   vcd          1         17      0    15
##   vcdExtra     3         24      1    15
  1. Choose one or two data sets from this list, and examine their help files (e.g., help(Arthritis) or ?Arthritis). You can use, e.g., example(Arthritis) to run the R code for a given example.
help(Arthritis)
## starting httpd help server ... done
example(Arthritis)
## 
## Arthrt> data("Arthritis")
## 
## Arthrt> art <- xtabs(~ Treatment + Improved, data = Arthritis, subset = Sex == "Female")
## 
## Arthrt> art
##          Improved
## Treatment None Some Marked
##   Placebo   19    7      6
##   Treated    6    5     16
## 
## Arthrt> mosaic(art, gp = shading_Friendly)

## 
## Arthrt> mosaic(art, gp = shading_max)

help(Baseball)
example(Baseball)
## 
## Basbll> data("Baseball")
  1. p. 61 #2.3
  1. Find the total number of cases contained in this table. There are total 4526 cases contained in this table
sum(UCBAdmissions)
## [1] 4526
  1. For each department, find the total number of applicants.
margin.table(UCBAdmissions,3)
## Dept
##   A   B   C   D   E   F 
## 933 585 918 792 584 714
  1. For each department, find the overall proportion of applicants who were admitted.
ucb.df <-as.data.frame(UCBAdmissions)
UCB_cont <- xtabs(Freq~Dept + Admit, data = ucb.df)
prop.table(UCB_cont)
##     Admit
## Dept   Admitted   Rejected
##    A 0.13278833 0.07335395
##    B 0.08174989 0.04750331
##    C 0.07114450 0.13168361
##    D 0.05943438 0.11555457
##    E 0.03247901 0.09655325
##    F 0.01016350 0.14759169
  1. Construct a tabular display of department (rows) and gender (columns), showing the proportion of applicants in each cell who were admitted relative to the total applicants in that cell.
#total number of student
sum(UCBAdmissions)
## [1] 4526
#flat table display
flat_ucb <- ftable(Gender ~ Admit + Dept, data = UCBAdmissions)
flat_ucb
##               Gender Male Female
## Admit    Dept                   
## Admitted A            512     89
##          B            353     17
##          C            120    202
##          D            138    131
##          E             53     94
##          F             22     24
## Rejected A            313     19
##          B            207      8
##          C            205    391
##          D            279    244
##          E            138    299
##          F            351    317
#proportion of applicants
prop_ucb <- prop.table(flat_ucb)
prop_ucb
##               Gender        Male      Female
## Admit    Dept                               
## Admitted A           0.113124171 0.019664163
##          B           0.077993814 0.003756076
##          C           0.026513478 0.044631021
##          D           0.030490499 0.028943880
##          E           0.011710119 0.020768891
##          F           0.004860804 0.005302696
## Rejected A           0.069155988 0.004197967
##          B           0.045735749 0.001767565
##          C           0.045293858 0.086389748
##          D           0.061643836 0.053910738
##          E           0.030490499 0.066062749
##          F           0.077551922 0.070039770
  1. p. 61 #2.4 a, c, e
  1. Find the total number of cases represented in this table. There are total 5144 cases represented in this table
sum(DanishWelfare$Freq)
## [1] 5144
  1. Convert this data frame to table form, DanishWelfare.tab, a 4-way array containing the frequencies with appropriate variable names and level names.
DanishWelfare.tab <- xtabs(Freq ~., data = DanishWelfare)
str(DanishWelfare.tab)
##  'xtabs' num [1:3, 1:4, 1:3, 1:5] 1 3 2 8 1 3 2 5 2 42 ...
##  - attr(*, "dimnames")=List of 4
##   ..$ Alcohol: chr [1:3] "<1" "1-2" ">2"
##   ..$ Income : chr [1:4] "0-50" "50-100" "100-150" ">150"
##   ..$ Status : chr [1:3] "Widow" "Married" "Unmarried"
##   ..$ Urban  : chr [1:5] "Copenhagen" "SubCopenhagen" "LargeCity" "City" ...
##  - attr(*, "call")= language xtabs(formula = Freq ~ ., data = DanishWelfare)
  1. Use structable () or ftable () to produce a pleasing flattened display of the frequencies in the 4-way table. Choose the variables used as row and column variables to make it easier to compare levels of Alcohol across the other factors.
ftable(xtabs(Freq ~., data = DanishWelfare))
##                           Urban Copenhagen SubCopenhagen LargeCity City Country
## Alcohol Income  Status                                                         
## <1      0-50    Widow                    1             4         1    8       6
##                 Married                 14             8        41  100     175
##                 Unmarried                6             1         2    6       9
##         50-100  Widow                    8             2         7   14       5
##                 Married                 42            51        62  234     255
##                 Unmarried                7             5         9   20      27
##         100-150 Widow                    2             3         1    5       2
##                 Married                 21            30        23   87      77
##                 Unmarried                3             2         1   12       4
##         >150    Widow                   42            29        17   95      46
##                 Married                 24            30        50  167     232
##                 Unmarried               33            24        15   64      68
## 1-2     0-50    Widow                    3             0         1    4       2
##                 Married                 15             7        15   25      48
##                 Unmarried                2             3         9    9       7
##         50-100  Widow                    1             1         3    8       4
##                 Married                 39            59        68  172     143
##                 Unmarried               12             3        11   20      23
##         100-150 Widow                    5             4         1    9       4
##                 Married                 32            68        43  128      86
##                 Unmarried                6            10         5   21      15
##         >150    Widow                   26            34        14   48      24
##                 Married                 43            76        70  198     136
##                 Unmarried               36            23        48   89      64
## >2      0-50    Widow                    2             0         2    1       0
##                 Married                  1             2         2    7       7
##                 Unmarried                3             0         1    5       1
##         50-100  Widow                    3             0         2    1       3
##                 Married                 14            21        14   38      35
##                 Unmarried                2             0         3   12      13
##         100-150 Widow                    2             1         1    1       0
##                 Married                 20            31        10   36      21
##                 Unmarried                0             2         3    9       7
##         >150    Widow                   21            13         5   20       8
##                 Married                 23            47        21   53      36
##                 Unmarried               38            20        13   39      26
  1. p. 62 #2.5 a, b, c
#code from text
data("UKSoccer", package = "vcd") 
ftable(UKSoccer)
##      Away  0  1  2  3  4
## Home                    
## 0         27 29 10  8  2
## 1         59 53 14 12  4
## 2         28 32 14 12  4
## 3         19 14  7  4  1
## 4          7  8 10  2  0
  1. Verify that the total number of games represented in this table is 380.
sum(UKSoccer)
## [1] 380
# Or use
margin.table(UKSoccer)
## [1] 380
  1. Find the marginal total of the number of goals scored by each of the home and away teams.
prop.table(UKSoccer,1)
##     Away
## Home          0          1          2          3          4
##    0 0.35526316 0.38157895 0.13157895 0.10526316 0.02631579
##    1 0.41549296 0.37323944 0.09859155 0.08450704 0.02816901
##    2 0.31111111 0.35555556 0.15555556 0.13333333 0.04444444
##    3 0.42222222 0.31111111 0.15555556 0.08888889 0.02222222
##    4 0.25925926 0.29629630 0.37037037 0.07407407 0.00000000
prop.table(UKSoccer,2)
##     Away
## Home          0          1          2          3          4
##    0 0.19285714 0.21323529 0.18181818 0.21052632 0.18181818
##    1 0.42142857 0.38970588 0.25454545 0.31578947 0.36363636
##    2 0.20000000 0.23529412 0.25454545 0.31578947 0.36363636
##    3 0.13571429 0.10294118 0.12727273 0.10526316 0.09090909
##    4 0.05000000 0.05882353 0.18181818 0.05263158 0.00000000
  1. Express each of the marginal totals as proportions.
prop.table(margin.table(UKSoccer,1))
## Home
##          0          1          2          3          4 
## 0.20000000 0.37368421 0.23684211 0.11842105 0.07105263
prop.table(margin.table(UKSoccer,2))
## Away
##          0          1          2          3          4 
## 0.36842105 0.35789474 0.14473684 0.10000000 0.02894737
  1. Run the code below and notice there is a data frame entitled SpaceShuttle. Using the R help, read about the details of this data frame. That is, familiarize yourself with the context and understand the meaning of the different rows.
library(vcd)
library(vcdExtra)

ds <- datasets(package = c("vcd", "vcdExtra"))
str(ds)
## 'data.frame':    76 obs. of  5 variables:
##  $ Package: chr  "vcd" "vcd" "vcd" "vcd" ...
##  $ Item   : chr  "Arthritis" "Baseball" "BrokenMarriage" "Bundesliga" ...
##  $ class  : chr  "data.frame" "data.frame" "data.frame" "data.frame" ...
##  $ dim    : chr  "84x5" "322x25" "20x4" "14018x7" ...
##  $ Title  : chr  "Arthritis Treatment Data" "Baseball Data" "Broken Marriage Data" "Ergebnisse der Fussball-Bundesliga" ...
View(ds)

SpaceShuttle
##    FlightNumber Temperature Pressure Fail nFailures Damage
## 1             1          66       50   no         0      0
## 2             2          70       50  yes         1      4
## 3             3          69       50   no         0      0
## 4             4          80       50 <NA>        NA     NA
## 5             5          68       50   no         0      0
## 6             6          67       50   no         0      0
## 7             7          72       50   no         0      0
## 8             8          73       50   no         0      0
## 9             9          70      100   no         0      0
## 10          41B          57      100  yes         1      4
## 11          41C          63      200  yes         1      2
## 12          41D          70      200  yes         1      4
## 13          41G          78      200   no         0      0
## 14          51A          67      200   no         0      0
## 15          51C          53      200  yes         2     11
## 16          51D          67      200   no         0      0
## 17          51B          75      200   no         0      0
## 18          51G          70      200   no         0      0
## 19          51F          81      200   no         0      0
## 20          51I          76      200   no         0      0
## 21          51J          79      200   no         0      0
## 22          61A          75      200  yes         2      4
## 23          61C          58      200  yes         1      4
## 24          61I          76      200   no         0      4
  1. Using the structable() function, create a “flat” table that has the Damage Index on the columns and whether the O-ring failed and how many failures on the rows.
structable(Damage ~ Fail + nFailures, data = SpaceShuttle)
##                Damage  0  2  4 11
## Fail nFailures                   
## no   0                15  0  1  0
##      1                 0  0  0  0
##      2                 0  0  0  0
## yes  0                 0  0  0  0
##      1                 0  1  4  0
##      2                 0  0  1  1
  1. Construct the same formatted table that you did in part a, but now use the xtabs() and ftable() functions.
ftable(Damage ~ Fail + nFailures, data = SpaceShuttle)
##                Damage  0  2  4 11
## Fail nFailures                   
## no   0                15  0  1  0
##      1                 0  0  0  0
##      2                 0  0  0  0
## yes  0                 0  0  0  0
##      1                 0  1  4  0
##      2                 0  0  1  1