Chi Squared Test with R

To calculate the chi-squared test for independence:

row1 = c(data from row 1 separated by commas)

row2 = c(data from row 2 separated by commas)

keep going until you have all of your rows typed in.

data.table = rbind(row1, row2, …) – makes the data into a table. You can

call it what ever you want. It does not have to be data.table.

data.table – use if you want to look at the table

chisq.test(data.table) – calculates the chi-squared test for independence

chisq.test(data.table)$expected – let’s you see the expected values

Example:

Breast Feed and Autism Data

Breast Feeding Data
Autism None

than 2

months

2 to 6

months

More

than 6

months

Row Total
Yes 241 198 164 215 818
No 20 25 27 44 116

Column

Total

261 223 191 259 934

Test if autism is independent of breast feeding timelines.

Process for typing data in:

row1 = c(241, 198, 164, 215)
row2 = c(20, 25, 27, 44)
data.table = rbind(row1, row2)
data.table
##      [,1] [,2] [,3] [,4]
## row1  241  198  164  215
## row2   20   25   27   44

Process for conducting analysis:

chisq.test(data.table)
## 
##  Pearson's Chi-squared test
## 
## data:  data.table
## X-squared = 11.217, df = 3, p-value = 0.01061

To find the expected values:

chisq.test(data.table)$expected
##           [,1]      [,2]      [,3]      [,4]
## row1 228.58458 195.30407 167.27837 226.83298
## row2  32.41542  27.69593  23.72163  32.16702

To calculate the chi-squared test for goodness of fit:

Type in the observed frequencies. Call it something like observed.

observed<- c(type in data with commas in between)

Type in the probabilities that you are comparing to the observed frequencies. Call it something like null.probs.

null.probs <- c(type in probabilities with commas in between)

chisq.test(observed, p=null.probs) – the command for the hypothesis test

Example

Suppose you have a die that you are curious if it is fair or not. If it is fair then the proportion for each value should be the same. You need to find the observed frequencies and to accomplish this you roll the die 500 times and count how often each side comes up. The data is in table

Observed frequencies on die
Die side 1 2 3 4 5 6 Total
Observed 78 87 87 76 85 87 500

Do the data show that the die is fair? Test at the 5% level.

observed<-c(78, 87, 87, 76, 85, 87)
null.probs<-c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)
chisq.test(observed, p=null.probs)
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 1.504, df = 5, p-value = 0.9126

To create an ANOVA on R:

The data frame must have a factor variable and another variable that contains a quantitative variable.

gf_boxplot(variable ~ factor, data = data_frame) – creates box plot of each factor

results=aov(variable ~ factor, data = data_frame) – runs the ANOVA analysis and saves it in results, though you can call it whatever you wish.

summary(results) – displays results of the ANOVA analysis

Example:

Cancer is a terrible disease. Surviving may depend on the type of cancer the person has. To see if the mean survival time for several types of cancer are different, data was collected on the survival time in days of patients with one of these cancer in advanced stage. The head of the data is

head(Cancer)
##   survival   organ
## 1      124 Stomach
## 2       42 Stomach
## 3       25 Stomach
## 4       45 Stomach
## 5      412 Stomach
## 6       51 Stomach

(“Cancer survival story,” 2013). (Please realize that this data is from 1978. There have been many advances in cancer treatment, so do not use this data as an indication of survival rates from these cancers.)

Do the data indicate that at least two of the mean survival time for these types of cancer are not all equal? Test at the 1% level.

gf_boxplot(survival~organ, data=Cancer)

gf_density(~survival, data=Cancer, fill = ~organ, title="Survival time for different cancers",  xlab="Survival Time (days)" )

results=aov(survival~organ, data=Cancer)
summary(results)
##             Df   Sum Sq Mean Sq F value   Pr(>F)    
## organ        4 11535761 2883940   6.433 0.000229 ***
## Residuals   59 26448144  448274                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1