Homework #1 is worth 100 points and each question is worth 6.5 points each.

Submission Instructions: save the .HTML file as ‘Familiar_ Categorical_Data_Assignmentyourlastname.HTML’ and upload the HTML file to the assignment entitled ‘Getting Familiar with Categorical Data in R’ on Canvas on or before Wednesday November 13, 2019 by 11:59p.m. EST. No late assignments are accepted.

  1. #2.1 p.p. 60-61

Run the code chunk below.

library(vcd)
## Warning: package 'vcd' was built under R version 4.1.2
## Loading required package: grid
library(grid)

library(gnm)
## Warning: package 'gnm' was built under R version 4.1.2
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 4.1.2
ds <- datasets(package = c("vcd", "vcdExtra"))
str(ds, vec.len=2)
## 'data.frame':    76 obs. of  5 variables:
##  $ Package: chr  "vcd" "vcd" ...
##  $ Item   : chr  "Arthritis" "Baseball" ...
##  $ class  : chr  "data.frame" "data.frame" ...
##  $ dim    : chr  "84x5" "322x25" ...
##  $ Title  : chr  "Arthritis Treatment Data" "Baseball Data" ...
View(ds)

View(UCBAdmissions)
str(UCBAdmissions)
##  'table' num [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ Admit : chr [1:2] "Admitted" "Rejected"
##   ..$ Gender: chr [1:2] "Male" "Female"
##   ..$ Dept  : chr [1:6] "A" "B" "C" "D" ...
  1. How many data sets are there altogether? How many are there in each package?
nrow(ds)
## [1] 76
ds_1=datasets(package = "vcd")
nrow(ds_1)
## [1] 33
ds_2=datasets(package = "vcdExtra")
nrow(ds_2)
## [1] 43
  1. Make a tabular display of the frequencies by Package and class.
table(ds$Package,ds$class)
##           
##            array data.frame table
##   vcd          1         17    15
##   vcdExtra     4         24    15
  1. Choose one or two data sets from this list, and examine their help files (e.g., help(Arthritis) or ?Arthritis). You can use, e.g., example(Arthritis) to run the R code for a given example.
help(Baseball)
example(Baseball)
## 
## Basbll> data("Baseball")
help(Butterfly)
example(Butterfly)
## 
## Bttrfl> data("Butterfly")
## 
## Bttrfl> Ord_plot(Butterfly)

  1. p. 61 #2.3
  1. Find the total number of cases contained in this table.
sum(UCBAdmissions)
## [1] 4526
  1. For each department, find the total number of applicants.
margin.table(UCBAdmissions,3)
## Dept
##   A   B   C   D   E   F 
## 933 585 918 792 584 714
  1. For each department, find the overall proportion of applicants who were admitted.
UCBA=as.data.frame(UCBAdmissions)
Overall=xtabs(Freq~Dept+Admit,data = UCBA)
prop.table(Overall,1)
##     Admit
## Dept   Admitted   Rejected
##    A 0.64415863 0.35584137
##    B 0.63247863 0.36752137
##    C 0.35076253 0.64923747
##    D 0.33964646 0.66035354
##    E 0.25171233 0.74828767
##    F 0.06442577 0.93557423
  1. Construct a tabular display of department (rows) and gender (columns), showing the proportion of applicants in each cell who were admitted relative to the total applicants in that cell.
UCBA2=xtabs(Freq~Dept+Gender+Admit,data=UCBA)
prop.table(UCBA2,1)
## , , Admit = Admitted
## 
##     Gender
## Dept       Male     Female
##    A 0.54876742 0.09539121
##    B 0.60341880 0.02905983
##    C 0.13071895 0.22004357
##    D 0.17424242 0.16540404
##    E 0.09075342 0.16095890
##    F 0.03081232 0.03361345
## 
## , , Admit = Rejected
## 
##     Gender
## Dept       Male     Female
##    A 0.33547696 0.02036442
##    B 0.35384615 0.01367521
##    C 0.22331155 0.42592593
##    D 0.35227273 0.30808081
##    E 0.23630137 0.51198630
##    F 0.49159664 0.44397759
  1. p. 61 #2.4 a, c, e
  1. Find the total number of cases represented in this table.
sum(DanishWelfare$Freq)
## [1] 5144
  1. Convert this data frame to table form, DanishWelfare.tab, a 4-way array containing the frequencies with appropriate variable names and level names.
DanishWelfare.tab=structable(Freq~Urban+Status+Income+Alcohol,data = DanishWelfare)
DanishWelfare.tab
##                       Status  Widow         Married         Unmarried        
##                       Alcohol    <1 1-2  >2      <1 1-2  >2        <1 1-2  >2
## Urban         Income                                                         
## Copenhagen    0-50                1   3   2      14  15   1         6   2   3
##               50-100              8   1   3      42  39  14         7  12   2
##               100-150             2   5   2      21  32  20         3   6   0
##               >150               42  26  21      24  43  23        33  36  38
## SubCopenhagen 0-50                4   0   0       8   7   2         1   3   0
##               50-100              2   1   0      51  59  21         5   3   0
##               100-150             3   4   1      30  68  31         2  10   2
##               >150               29  34  13      30  76  47        24  23  20
## LargeCity     0-50                1   1   2      41  15   2         2   9   1
##               50-100              7   3   2      62  68  14         9  11   3
##               100-150             1   1   1      23  43  10         1   5   3
##               >150               17  14   5      50  70  21        15  48  13
## City          0-50                8   4   1     100  25   7         6   9   5
##               50-100             14   8   1     234 172  38        20  20  12
##               100-150             5   9   1      87 128  36        12  21   9
##               >150               95  48  20     167 198  53        64  89  39
## Country       0-50                6   2   0     175  48   7         9   7   1
##               50-100              5   4   3     255 143  35        27  23  13
##               100-150             2   4   0      77  86  21         4  15   7
##               >150               46  24   8     232 136  36        68  64  26
  1. Use structable () or ftable () to produce a pleasing flattened display of the frequencies in the 4-way table. Choose the variables used as row and column variables to make it easier to compare levels of Alcohol across the other factors.
ftable(xtabs(Freq~.,data = DanishWelfare))
##                           Urban Copenhagen SubCopenhagen LargeCity City Country
## Alcohol Income  Status                                                         
## <1      0-50    Widow                    1             4         1    8       6
##                 Married                 14             8        41  100     175
##                 Unmarried                6             1         2    6       9
##         50-100  Widow                    8             2         7   14       5
##                 Married                 42            51        62  234     255
##                 Unmarried                7             5         9   20      27
##         100-150 Widow                    2             3         1    5       2
##                 Married                 21            30        23   87      77
##                 Unmarried                3             2         1   12       4
##         >150    Widow                   42            29        17   95      46
##                 Married                 24            30        50  167     232
##                 Unmarried               33            24        15   64      68
## 1-2     0-50    Widow                    3             0         1    4       2
##                 Married                 15             7        15   25      48
##                 Unmarried                2             3         9    9       7
##         50-100  Widow                    1             1         3    8       4
##                 Married                 39            59        68  172     143
##                 Unmarried               12             3        11   20      23
##         100-150 Widow                    5             4         1    9       4
##                 Married                 32            68        43  128      86
##                 Unmarried                6            10         5   21      15
##         >150    Widow                   26            34        14   48      24
##                 Married                 43            76        70  198     136
##                 Unmarried               36            23        48   89      64
## >2      0-50    Widow                    2             0         2    1       0
##                 Married                  1             2         2    7       7
##                 Unmarried                3             0         1    5       1
##         50-100  Widow                    3             0         2    1       3
##                 Married                 14            21        14   38      35
##                 Unmarried                2             0         3   12      13
##         100-150 Widow                    2             1         1    1       0
##                 Married                 20            31        10   36      21
##                 Unmarried                0             2         3    9       7
##         >150    Widow                   21            13         5   20       8
##                 Married                 23            47        21   53      36
##                 Unmarried               38            20        13   39      26
  1. p. 62 #2.5 a, b, c
#code from text
data("UKSoccer", package = "vcd") 
ftable(UKSoccer)
##      Away  0  1  2  3  4
## Home                    
## 0         27 29 10  8  2
## 1         59 53 14 12  4
## 2         28 32 14 12  4
## 3         19 14  7  4  1
## 4          7  8 10  2  0
    1. Verify that the total number of games represented in this table is 380.
sum(UKSoccer)
## [1] 380
  1. Find the marginal total of the number of goals scored by each of the home and away teams.
margin.table(UKSoccer,1)
## Home
##   0   1   2   3   4 
##  76 142  90  45  27
margin.table(UKSoccer,2)
## Away
##   0   1   2   3   4 
## 140 136  55  38  11
  1. Express each of the marginal totals as proportions.
prop.table(margin.table(UKSoccer,1))
## Home
##          0          1          2          3          4 
## 0.20000000 0.37368421 0.23684211 0.11842105 0.07105263
prop.table(margin.table(UKSoccer,2))
## Away
##          0          1          2          3          4 
## 0.36842105 0.35789474 0.14473684 0.10000000 0.02894737
  1. Run the code below and notice there is a data frame entitled SpaceShuttle. Using the R help, read about the details of this data frame. That is, familiarize yourself with the context and understand the meaning of the different rows.
library(vcd)
library(vcdExtra)

ds <- datasets(package = c("vcd", "vcdExtra"))
str(ds)
## 'data.frame':    76 obs. of  5 variables:
##  $ Package: chr  "vcd" "vcd" "vcd" "vcd" ...
##  $ Item   : chr  "Arthritis" "Baseball" "BrokenMarriage" "Bundesliga" ...
##  $ class  : chr  "data.frame" "data.frame" "data.frame" "data.frame" ...
##  $ dim    : chr  "84x5" "322x25" "20x4" "14018x7" ...
##  $ Title  : chr  "Arthritis Treatment Data" "Baseball Data" "Broken Marriage Data" "Ergebnisse der Fussball-Bundesliga" ...
View(ds)
head(ds)
##   Package           Item      class     dim
## 1     vcd      Arthritis data.frame    84x5
## 2     vcd       Baseball data.frame  322x25
## 3     vcd BrokenMarriage data.frame    20x4
## 4     vcd     Bundesliga data.frame 14018x7
## 5     vcd  Bundestag2005      table    16x5
## 6     vcd      Butterfly      table      24
##                                     Title
## 1                Arthritis Treatment Data
## 2                           Baseball Data
## 3                    Broken Marriage Data
## 4      Ergebnisse der Fussball-Bundesliga
## 5 Votes in German Bundestag Election 2005
## 6             Butterfly Species in Malaya
  1. Using the structable() function, create a “flat” table that has the Damage Index on the columns and whether the O-ring failed and how many failures on the rows.
DamageTable=structable(Damage~Fail+nFailures,data=SpaceShuttle)
DamageTable
##                Damage  0  2  4 11
## Fail nFailures                   
## no   0                15  0  1  0
##      1                 0  0  0  0
##      2                 0  0  0  0
## yes  0                 0  0  0  0
##      1                 0  1  4  0
##      2                 0  0  1  1
  1. Construct the same formatted table that you did in part a, but now use the xtabs() and ftable() functions.
ftable(Damage~Fail+nFailures,data = SpaceShuttle)
##                Damage  0  2  4 11
## Fail nFailures                   
## no   0                15  0  1  0
##      1                 0  0  0  0
##      2                 0  0  0  0
## yes  0                 0  0  0  0
##      1                 0  1  4  0
##      2                 0  0  1  1