Chi Squared Test with R

To calculate the chi-squared test for independence:

row1 = c(data from row 1 separated by commas)

row2 = c(data from row 2 separated by commas)

keep going until you have all of your rows typed in.

data.table = rbind(row1, row2, …) – makes the data into a table. You can

call it what ever you want. It does not have to be data.table.

data.table – use if you want to look at the table

chisq.test(data.table) – calculates the chi-squared test for independence

chisq.test(data.table)$expected – let’s you see the expected values

Example:

Breast Feed and Autism Data

Breast Feeding Data
Autism	None	than 2 months	2 to 6 months	More than 6 months	Row Total
Yes	241	198	164	215	818
No	20	25	27	44	116
Column Total	261	223	191	259	934

Test if autism is independent of breast feeding timelines.

Process for typing data in:

row1 = c(241, 198, 164, 215)
row2 = c(20, 25, 27, 44)
data.table = rbind(row1, row2)
data.table

##      [,1] [,2] [,3] [,4]
## row1  241  198  164  215
## row2   20   25   27   44

Process for conducting analysis:

chisq.test(data.table)

## 
##  Pearson's Chi-squared test
## 
## data:  data.table
## X-squared = 11.217, df = 3, p-value = 0.01061

To find the expected values:

chisq.test(data.table)$expected

##           [,1]      [,2]      [,3]      [,4]
## row1 228.58458 195.30407 167.27837 226.83298
## row2  32.41542  27.69593  23.72163  32.16702

To calculate the chi-squared test for goodness of fit:

Type in the observed frequencies. Call it something like observed.

observed<- c(type in data with commas in between)

Type in the probabilities that you are comparing to the observed frequencies. Call it something like null.probs.

null.probs <- c(type in probabilities with commas in between)

chisq.test(observed, p=null.probs) – the command for the hypothesis test

Example

Suppose you have a die that you are curious if it is fair or not. If it is fair then the proportion for each value should be the same. You need to find the observed frequencies and to accomplish this you roll the die 500 times and count how often each side comes up. The data is in table

Observed frequencies on die
Die side	1	2	3	4	5	6	Total
Observed	78	87	87	76	85	87	500

Do the data show that the die is fair? Test at the 5% level.

observed<-c(78, 87, 87, 76, 85, 87)
null.probs<-c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)
chisq.test(observed, p=null.probs)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 1.504, df = 5, p-value = 0.9126

To create an ANOVA on R:

The data frame must have a factor variable and another variable that contains a quantitative variable.

gf_boxplot(variable ~ factor, data = data_frame) – creates box plot of each factor

results=aov(variable ~ factor, data = data_frame) – runs the ANOVA analysis and saves it in results, though you can call it whatever you wish.

summary(results) – displays results of the ANOVA analysis

Example:

Cancer is a terrible disease. Surviving may depend on the type of cancer the person has. To see if the mean survival time for several types of cancer are different, data was collected on the survival time in days of patients with one of these cancer in advanced stage. The head of the data is

head(Cancer)

##   survival   organ
## 1      124 Stomach
## 2       42 Stomach
## 3       25 Stomach
## 4       45 Stomach
## 5      412 Stomach
## 6       51 Stomach

(“Cancer survival story,” 2013). (Please realize that this data is from 1978. There have been many advances in cancer treatment, so do not use this data as an indication of survival rates from these cancers.)

Do the data indicate that at least two of the mean survival time for these types of cancer are not all equal? Test at the 1% level.

gf_boxplot(survival~organ, data=Cancer)

gf_density(~survival, data=Cancer, fill = ~organ, title="Survival time for different cancers",  xlab="Survival Time (days)" )

results=aov(survival~organ, data=Cancer)
summary(results)

##             Df   Sum Sq Mean Sq F value   Pr(>F)    
## organ        4 11535761 2883940   6.433 0.000229 ***
## Residuals   59 26448144  448274                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Chi-Squared Tests with R

Chi Squared Test with R

To calculate the chi-squared test for independence:

Example:

To calculate the chi-squared test for goodness of fit:

Example

To create an ANOVA on R:

Example: