ANLY 545 HW1

2.4

Read the data and find the total number of cases represented in this table.

data("DanishWelfare")
sum(DanishWelfare$Freq)

## [1] 5144

See the structure of the data and change the variables Alcohol and Income into ordered variables.

str(DanishWelfare)

## 'data.frame':    180 obs. of  5 variables:
##  $ Freq   : num  1 4 1 8 6 14 8 41 100 175 ...
##  $ Alcohol: Factor w/ 3 levels "<1","1-2",">2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Income : Factor w/ 4 levels "0-50","50-100",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Status : Factor w/ 3 levels "Widow","Married",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ Urban  : Factor w/ 5 levels "Copenhagen","SubCopenhagen",..: 1 2 3 4 5 1 2 3 4 5 ...

levels(DanishWelfare$Alcohol)

## [1] "<1"  "1-2" ">2"

levels(DanishWelfare$Income)

## [1] "0-50"    "50-100"  "100-150" ">150"

DanishWelfare$Alcohol = as.ordered(DanishWelfare$Alcohol)
DanishWelfare$Income = as.ordered(DanishWelfare$Income)

Convert this data frame to table form, DanishWelfare.tab, a 4-way array containing the frequencies with appropriate variable names and level names (hint: review xtabs()).

DanishWelfare.tab = ftable(xtabs(Freq ~ Alcohol+ Urban+ Status+ Income , data = DanishWelfare))

str(DanishWelfare.tab)

##  'ftable' num [1:45, 1:4] 1 14 6 4 8 1 1 41 2 8 ...
##  - attr(*, "row.vars")=List of 3
##   ..$ Alcohol: chr [1:3] "<1" "1-2" ">2"
##   ..$ Urban  : chr [1:5] "Copenhagen" "SubCopenhagen" "LargeCity" "City" ...
##   ..$ Status : chr [1:3] "Widow" "Married" "Unmarried"
##  - attr(*, "col.vars")=List of 1
##   ..$ Income: chr [1:4] "0-50" "50-100" "100-150" ">150"

head(DanishWelfare.tab)

##                                                                          
##                                        "Income" "0-50" "50-100" "100-150"
##  "Alcohol" "Urban"         "Status"                                      
##  "<1"      "Copenhagen"    "Widow"                   1        8         2
##                            "Married"                14       42        21
##                            "Unmarried"               6        7         3
##            "SubCopenhagen" "Widow"                   4        2         3
##                            "Married"                 8       51        30
##                            "Unmarried"               1        5         2
##        
##  ">150"
##        
##      42
##      24
##      33
##      29
##      30
##      24

The variable Urban has 5 categories. Find the total frequencies in each of these.

aggregate(Freq~Urban,data=DanishWelfare,sum)

##           Urban Freq
## 1    Copenhagen  552
## 2 SubCopenhagen  614
## 3     LargeCity  594
## 4          City 1765
## 5       Country 1619

2.5

data("UKSoccer", package = "vcd")

ftable(UKSoccer)

##      Away  0  1  2  3  4
## Home                    
## 0         27 29 10  8  2
## 1         59 53 14 12  4
## 2         28 32 14 12  4
## 3         19 14  7  4  1
## 4          7  8 10  2  0

Verify that the total number of games represented in this table is 380.

summary(UKSoccer)

## Number of cases in table: 380 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 18.699, df = 16, p-value = 0.2846
##  Chi-squared approximation may be incorrect

Express each of the marginal totals as proportions.

prop_table = addmargins(prop.table(UKSoccer))

# Home
prop_table[1:5, 'Sum']

##          0          1          2          3          4 
## 0.20000000 0.37368421 0.23684211 0.11842105 0.07105263

# Away
prop_table['Sum', 1:5]

##          0          1          2          3          4 
## 0.36842105 0.35789474 0.14473684 0.10000000 0.02894737

2.6

data("Saxony", package = "vcd")
Saxony

## nMales
##    0    1    2    3    4    5    6    7    8    9   10   11   12 
##    3   24  104  286  670 1033 1343 1112  829  478  181   45    7

str(Saxony)

##  'table' num [1:13(1d)] 3 24 104 286 670 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ nMales: chr [1:13] "0" "1" "2" "3" ...

data("Geissler", package = "vcdExtra")
str(Geissler)

## 'data.frame':    90 obs. of  4 variables:
##  $ boys : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ girls: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ size : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Freq : int  108719 42860 17395 7004 2839 1096 436 161 66 30 ...

Use subset() to create a data frame, sax12 containing the Geissler observations in families with size==12.

sax12 = subset(Geissler, size == 12)

sax12

##    boys girls size Freq
## 12    0    12   12    3
## 24    1    11   12   24
## 35    2    10   12  104
## 45    3     9   12  286
## 54    4     8   12  670
## 62    5     7   12 1033
## 69    6     6   12 1343
## 75    7     5   12 1112
## 80    8     4   12  829
## 84    9     3   12  478
## 87   10     2   12  181
## 89   11     1   12   45
## 90   12     0   12    7

Select the columns for boys and Freq.

subset(sax12, select = c(boys,Freq))

##    boys Freq
## 12    0    3
## 24    1   24
## 35    2  104
## 45    3  286
## 54    4  670
## 62    5 1033
## 69    6 1343
## 75    7 1112
## 80    8  829
## 84    9  478
## 87   10  181
## 89   11   45
## 90   12    7

Use xtabs() with a formula, Freq ~ boys, to create the one-way table.

xtabs(Freq ~ boys,data = sax12)

## boys
##    0    1    2    3    4    5    6    7    8    9   10   11   12 
##    3   24  104  286  670 1033 1343 1112  829  478  181   45    7

ANLY 545 HW1

Raavi Anvesh

9/16/2018

2.4

2.5

2.6