row1 = c(data from row 1 separated by commas)
row2 = c(data from row 2 separated by commas)
keep going until you have all of your rows typed in.
data.table = rbind(row1, row2, …) – makes the data into a table. You can
call it what ever you want. It does not have to be data.table.
data.table – use if you want to look at the table
chisq.test(data.table) – calculates the chi-squared test for independence
chisq.test(data.table)$expected – let’s you see the expected values
Breast Feed and Autism Data
| Autism | None | than 2 months |
2 to 6 months |
More than 6 months |
Row Total |
|---|---|---|---|---|---|
| Yes | 241 | 198 | 164 | 215 | 818 |
| No | 20 | 25 | 27 | 44 | 116 |
Column Total |
261 | 223 | 191 | 259 | 934 |
Test if autism is independent of breast feeding timelines.
Process for typing data in:
row1 = c(241, 198, 164, 215)
row2 = c(20, 25, 27, 44)
data.table = rbind(row1, row2)
data.table
## [,1] [,2] [,3] [,4]
## row1 241 198 164 215
## row2 20 25 27 44
Process for conducting analysis:
chisq.test(data.table)
##
## Pearson's Chi-squared test
##
## data: data.table
## X-squared = 11.217, df = 3, p-value = 0.01061
To find the expected values:
chisq.test(data.table)$expected
## [,1] [,2] [,3] [,4]
## row1 228.58458 195.30407 167.27837 226.83298
## row2 32.41542 27.69593 23.72163 32.16702
Type in the observed frequencies. Call it something like observed.
observed<- c(type in data with commas in between)
Type in the probabilities that you are comparing to the observed frequencies. Call it something like null.probs.
null.probs <- c(type in probabilities with commas in between)
chisq.test(observed, p=null.probs) – the command for the hypothesis test
Suppose you have a die that you are curious if it is fair or not. If it is fair then the proportion for each value should be the same. You need to find the observed frequencies and to accomplish this you roll the die 500 times and count how often each side comes up. The data is in table
| Die side | 1 | 2 | 3 | 4 | 5 | 6 | Total |
|---|---|---|---|---|---|---|---|
| Observed | 78 | 87 | 87 | 76 | 85 | 87 | 500 |
Do the data show that the die is fair? Test at the 5% level.
observed<-c(78, 87, 87, 76, 85, 87)
null.probs<-c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)
chisq.test(observed, p=null.probs)
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 1.504, df = 5, p-value = 0.9126
The data frame must have a factor variable and another variable that contains a quantitative variable.
gf_boxplot(variable ~ factor, data = data_frame) – creates box plot of each factor
results=aov(variable ~ factor, data = data_frame) – runs the ANOVA analysis and saves it in results, though you can call it whatever you wish.
summary(results) – displays results of the ANOVA analysis
Cancer is a terrible disease. Surviving may depend on the type of cancer the person has. To see if the mean survival time for several types of cancer are different, data was collected on the survival time in days of patients with one of these cancer in advanced stage. The head of the data is
head(Cancer)
## survival organ
## 1 124 Stomach
## 2 42 Stomach
## 3 25 Stomach
## 4 45 Stomach
## 5 412 Stomach
## 6 51 Stomach
(“Cancer survival story,” 2013). (Please realize that this data is from 1978. There have been many advances in cancer treatment, so do not use this data as an indication of survival rates from these cancers.)
Do the data indicate that at least two of the mean survival time for these types of cancer are not all equal? Test at the 1% level.
gf_boxplot(survival~organ, data=Cancer)
gf_density(~survival, data=Cancer, fill = ~organ, title="Survival time for different cancers", xlab="Survival Time (days)" )
results=aov(survival~organ, data=Cancer)
summary(results)
## Df Sum Sq Mean Sq F value Pr(>F)
## organ 4 11535761 2883940 6.433 0.000229 ***
## Residuals 59 26448144 448274
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1