US is 40% are type A, 11% are type B, 4% are type AB, and 45% are type O
Null Hypothesis: The proportion of individuals in the US with the blood types A, B, AB or O are the same as that of the sampled area. Alternative Hypothesis: At least one of the proportions of individuals in the US with blood types A, B, AB or O is different to the sampled area. The Test statistic is 573.7
GoF <- chisq.test(BTable, p = c(.40,.04,.11,.45))
print(GoF)
##
## Chi-squared test for given probabilities
##
## data: BTable
## X-squared = 573.7, df = 3, p-value < 2.2e-16
The p-value is 2.2e-16. The probability of getting a difference between the sample blood proportions and the populations overall proportions is greater than or equal to 2.2e-14% given the null hypothesis is true.
If α= 0.01, then the p-value is less than alpha, therefore we reject the null hypothesis, meaning there is a difference between the sample proportions and population proportions. We are 99% confident that the proportions of people with the blood types in the sample area are different from those of the population.
##
## A AB B O
## 9.255448 2.327069 14.437825 -16.558170
Types A, AB and B have more people in the sample than expected compared to the average US population. Type O had less people in the sample than expected.
The type that contributed most to the test statistic is the discrepancies with type O, as this has the largest residual value.
Null Hypothesis: Blood type and being diseased are independent. Alternative Hypothesis: Blood type and disease are dependent. The Test statistic is 19.663.
##
## Pearson's Chi-squared test
##
## data: BT2
## X-squared = 19.663, df = 3, p-value = 0.0001993
The p-value is 0.0001993. The probability of getting our results given the null hypothesis is true is .001993%.
If α= 0.01, then the p-value is less than alpha therefore we reject the null. We are 99% confident that blood type and being diseased are dependent of one another.
print(Indep$residuals)
##
## No Yes
## A -0.6656645 1.3518765
## AB 0.2458748 -0.4993393
## B -0.3298545 0.6698909
## O 1.7957577 -3.6469465
Type A individuals were more likely to have the disease given the null hypothesis is true. Type O individuals were less likely to have the given disease given the null hypothesis was true.
Type O contributed the most to the test statistic as it has the highest residual value.
##Code APPENDIX
blod <- read.csv("C:/Users/mason/Downloads/blood.csv")
BTable <- table(blod$Type)
GoF <- chisq.test(BTable, p = c(.40,.04,.11,.45))
print(GoF)
print(GoF$residuals)
BT2 <- table(blod$Type, blod$Disease)
Indep <- chisq.test(BT2, correct=FALSE)
print(Indep)
print(Indep$residuals)