HIV = read.csv('https://wimr-genomics.vip.sydney.edu.au/AMED3002/data/HIV.csv')
head(HIV)
## state sex diag death status T.categ age
## 1 NSW M 10905 11081 D hs 35
## 2 NSW M 11029 11096 D hs 53
## 3 NSW M 9551 9983 D hs 42
## 4 NSW M 9577 9654 D haem 44
## 5 NSW M 10015 10290 D hs 39
## 6 NSW M 9971 10344 D hs 36
dim(HIV)
## [1] 2843 7
sum(is.na(HIV))
## [1] 0
tab <- table(HIV$sex, HIV$state)
tab
##
## NSW Other QLD VIC
## F 54 13 9 13
## M 1726 236 217 575
chisq.test(tab)
##
## Pearson's Chi-squared test
##
## data: tab
## X-squared = 5.8235, df = 3, p-value = 0.1205
test = chisq.test(tab)
test$expected >= 5
##
## NSW Other QLD VIC
## F TRUE TRUE TRUE TRUE
## M TRUE TRUE TRUE TRUE
OR <- (tab[1,1]*tab[2,2])/(tab[2,1]*tab[1,2])
OR
## [1] 0.5679651
HIV_died <- HIV[HIV$status == "D",]
#HIV_died
HIV_died$SurviveTime <- HIV_died$death - HIV_died$diag
head(HIV_died)
## state sex diag death status T.categ age SurviveTime
## 1 NSW M 10905 11081 D hs 35 176
## 2 NSW M 11029 11096 D hs 53 67
## 3 NSW M 9551 9983 D hs 42 432
## 4 NSW M 9577 9654 D haem 44 77
## 5 NSW M 10015 10290 D hs 39 275
## 6 NSW M 9971 10344 D hs 36 373
library(Cairo)
library(ggplot2)
library(tidyr)
plot <- ggplot(HIV_died, aes(x= state, y = SurviveTime, col = state)) + geom_boxplot()
print(plot)
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this 1 way avo :
fit = aov(HIV_died$SurviveTime ~ HIV_died$state + HIV_died$age + HIV_died$sex)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## HIV_died$state 3 1477348 492449 5.22 0.00137 **
## HIV_died$age 1 3279006 3279006 34.76 4.47e-09 ***
## HIV_died$sex 1 17921 17921 0.19 0.66301
## Residuals 1755 165573121 94344
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2 way avo One-way ANOVA: In one-way ANOVA, you analyze the effect of a single categorical factor on a continuous dependent variable. In other words, you assess if there are any statistically significant differences between the means of the groups represented by the factor levels. The example you provided in your question is a one-way ANOVA, as it is analyzing the effect of three independent factors (state, age, and sex) on the dependent variable (SurviveTime) without considering any interactions between these factors.
Two-way ANOVA: A two-way ANOVA extends the one-way ANOVA by analyzing the effect of two categorical factors on a continuous dependent variable simultaneously. It also takes into account the possible interaction between the two factors. This means that it assesses whether the effect of one factor is dependent on the levels of the other factor.
In summary, one-way ANOVA examines the effect of a single factor, while two-way ANOVA examines the effects of two factors and their potential interactions. The two-way ANOVA code provided above evaluates the effects of the state, age, and sex factors, as well as their interactions, on the SurviveTime variable.
fit <- aov(HIV_died$SurviveTime ~ HIV_died$state * HIV_died$age * HIV_died$sex)
summary(fit)
## Df Sum Sq Mean Sq F value
## HIV_died$state 3 1477348 492449 5.236
## HIV_died$age 1 3279006 3279006 34.866
## HIV_died$sex 1 17921 17921 0.191
## HIV_died$state:HIV_died$age 3 998398 332799 3.539
## HIV_died$state:HIV_died$sex 3 220808 73603 0.783
## HIV_died$age:HIV_died$sex 1 46132 46132 0.491
## HIV_died$state:HIV_died$age:HIV_died$sex 3 195668 65223 0.694
## Residuals 1745 164112116 94047
## Pr(>F)
## HIV_died$state 0.00134 **
## HIV_died$age 4.24e-09 ***
## HIV_died$sex 0.66251
## HIV_died$state:HIV_died$age 0.01418 *
## HIV_died$state:HIV_died$sex 0.50359
## HIV_died$age:HIV_died$sex 0.48379
## HIV_died$state:HIV_died$age:HIV_died$sex 0.55599
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.