WFED 540 Contingency Table

The following report is analysis of data from the NLS database
To begin, I downloaded the appropriate variables from the NLS dataset and loaded the required packages. I also edited variable names to allow for an easier understanding of variables.

##    PUBID_1997        SEX             RACE          VOCATION      
##  Min.   :   1   Min.   :1.000   Min.   :1.000   Min.   :-8.0000  
##  1st Qu.:2249   1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 0.0000  
##  Median :4502   Median :1.000   Median :4.000   Median : 0.0000  
##  Mean   :4504   Mean   :1.488   Mean   :2.788   Mean   : 0.1171  
##  3rd Qu.:6758   3rd Qu.:2.000   3rd Qu.:4.000   3rd Qu.: 1.0000  
##  Max.   :9022   Max.   :2.000   Max.   :4.000   Max.   : 1.0000  
##                                                 NA's   :2752

## Loading required package: ggvis
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Loading required package: magrittr

## Source: local data frame [8,984 x 4]
## 
##    PUBID_1997 SEX RACE VOCATION
## 1           1   2    4        0
## 2           2   1    2        1
## 3           3   2    2       NA
## 4           4   2    2        1
## 5           5   1    2        0
## 6           6   2    2        0
## 7           7   1    2       NA
## 8           8   2    4        0
## 9           9   1    4        0
## 10         10   1    4        0
## ..        ... ...  ...      ...

##    PUBID_1997        SEX             RACE          VOCATION      
##  Min.   :   1   Min.   :1.000   Min.   :1.000   Min.   :-8.0000  
##  1st Qu.:2249   1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 0.0000  
##  Median :4502   Median :1.000   Median :4.000   Median : 0.0000  
##  Mean   :4504   Mean   :1.488   Mean   :2.788   Mean   : 0.1171  
##  3rd Qu.:6758   3rd Qu.:2.000   3rd Qu.:4.000   3rd Qu.: 1.0000  
##  Max.   :9022   Max.   :2.000   Max.   :4.000   Max.   : 1.0000  
##                                                 NA's   :2752

The variables chosen for a chi-square test were Sex, Race, and Vocational concentration.
Summary of the data show variables that need to be recoded to NA and then eliminate the NA from the dataset.

contingency$VOCATION<-ifelse(contingency$VOCATION <0, NA, contingency$VOCATION)

contingency<- na.omit(contingency)
summary(contingency)

##    PUBID_1997        SEX             RACE          VOCATION     
##  Min.   :   1   Min.   :1.000   Min.   :1.000   Min.   :0.0000  
##  1st Qu.:2328   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:0.0000  
##  Median :4554   Median :1.000   Median :4.000   Median :0.0000  
##  Mean   :4481   Mean   :1.499   Mean   :2.857   Mean   :0.3216  
##  3rd Qu.:6577   3rd Qu.:2.000   3rd Qu.:4.000   3rd Qu.:1.0000  
##  Max.   :9021   Max.   :2.000   Max.   :4.000   Max.   :1.0000

Hypothesis (null) There is no relationship in sex, gender, and vocational concentration

Hypothesis (alternative) There is a relationship in sex, gender, and vocational concentration

To test the null hypothesis, I created a contingency table and ran a chi-square test.

SEX_RACE_VOCATION<- xtabs(~contingency$SEX + contingency$RACE + contingency$VOCATION)
ftable(SEX_RACE_VOCATION)

##                                  contingency$VOCATION    0    1
## contingency$SEX contingency$RACE                               
## 1               1                                      485  220
##                 2                                      412  193
##                 3                                       13   10
##                 4                                      996  704
## 2               1                                      543  253
##                 2                                      458  124
##                 3                                       23    2
##                 4                                     1180  442

summary(SEX_RACE_VOCATION)

## Call: xtabs(formula = ~contingency$SEX + contingency$RACE + contingency$VOCATION)
## Number of cases in table: 6058 
## Number of factors: 3 
## Test for independence of all factors:
##  Chisq = 133.05, df = 10, p-value = 1.113e-23

The chi-square test rejects the null hypothesis at the .05 level
From this test statistic, there is a relationship between sex, race, and vocational concentrator.

WFED 540 Contingency Table

Meg Handley

November 5