Matthias has created a very good figure showing the various mutations, deletions or amplifications together with several clinical variables and HPV status of HNSCC.
It has been suggested by the reviewers though that there are no stats describing the relation between variables and Matthias, Chris, Stephan etc want to address this by adding p-values.
The main points re stats which Chris mentioned were:
‘we can see what could be tested by statistical analysis, e.g. p53 mutation and smoking status; FBX7 and HPV status; PIK3CA/PTEN combined and HPV status; ? p valuesCyclin D1, p16 (CDKN2A) and p15 (CDKN2B) and HPV status, p value
to reviewer’s comment:
If the paper is proposing that these mutations and pathways are significant, especially in a clinical context they need at least basic statistical analysis to demonstrate that the frequency of alterations are worth pursuing. No analysis seems to have been attempted with regard to the basic patient demographics relative to genetic alterations. They simply reported and not analyzed. With a small sample size the likelihood that it is significant is low, however, if the authors would like to draw any conclusions based upon their findings, some attempt must be made.)
‘We should add p values’
Matthias however suggests using the FDR as in a similar Fig in a recent Science paper, The Mutational Landscape of Head and Neck Squamous Cell Carcinoma.
OK The data looks like this. I have added WT for wild-type where there were many blank spaces. Otherwise the software would see it as NA (missing or not applicable) and would exclude it from any stats - which is incorrect.
load("~/Google Drive/Matthias/genm.RData")
head(genm)
## ID Gender Age Site Grade T N M Survival Death
## 1 P6_pos F 57 base_of_tongue poorly diff 2 2 0 1337 1
## 2 P8_pos M 63 tonsil poorly diff 3 2 0 2711 0
## 3 P13_pos F 56 tonsil mod diff 2 1 0 792 1
## 4 P19_pos F 63 tonsil poorly diff 3 3 0 2586 0
## 5 P26_pos M 53 tonsil poorly diff NA NA NA 1277 1
## 6 P28_pos M 57 tonsil poorly diff 4 3 0 734 1
## Smoking Alcohol TP53 CCND1 CCND3 CDKN2A CDKN2B EGFR MDM2
## 1 no_tobacco no_alcohol P WT WT WT WT WT WT
## 2 no_tobacco no_alcohol WT WT WT WT WT WT WT
## 3 no_tobacco no_alcohol WT WT WT WT WT WT WT
## 4 ex_smoker <NA> WT WT WT WT WT WT WT
## 5 <NA> <NA> WT WT WT WT WT WT WT
## 6 smoker occasional_alcohol WT WT WT WT WT WT WT
## CCND3.1 TET2 SUFU PKHD1 LRP1B KDM6A FGFR1 FGFR3 KDM6A.1 MYC MCL1 RICTOR
## 1 WT WT WT WT WT WT WT WT WT WT WT WT
## 2 WT WT WT WT WT WT WT WT WT WT WT WT
## 3 WT WT WT WT WT WT WT WT WT WT WT WT
## 4 WT WT WT WT WT WT WT WT WT WT WT WT
## 5 WT WT WT WT WT WT WT WT WT WT WT WT
## 6 WT WT WT WT WT WT WT WT WT WT WT WT
## STK11 RB1 NF1 KRAS BCL2L1 SOX2 PTEN FBXW7 PIK3CA HPVstatus
## 1 P WT WT WT WT G WT P PG HPVpos
## 2 WT WT WT WT WT WT WT P WT HPVpos
## 3 WT WT WT WT WT WT L p WT HPVpos
## 4 WT WT WT WT WT WT WT WT P HPVpos
## 5 WT WT WT WT WT WT WT WT WT HPVpos
## 6 WT WT WT WT WT WT WT WT WT HPVpos
# I need to reorder some variables so that they are correctly handled by R
# - by default factors are not ordered - and when ordered you have to make
# sure they are in the correct order not alphabetical. reorder Grade
gr = genm$Grade
gr2 = as.vector(gr)
gr2[gr == "poorly diff"] <- 3
gr2[gr == "mod diff"] <- 2
gr2[gr == "well diff"] <- 1
gr2 = as.numeric(gr2)
# genm$Grade= ordered(gr2, labels=c('well diff', 'mod diff', 'poorly
# diff'))
rm(gr, gr2)
# reorder Smoking
sm = genm$Smoking
sm2 = as.vector(sm)
sm2[sm == "no tobacco exposure"] <- 1
sm2[sm == "ex_smoker"] <- 2
sm2[sm == "active"] <- 3
sm2[sm == "smoker"] <- 3
sm2[is.na(sm)] <- NA
sm2 = as.numeric(sm2)
## Warning: NAs introduced by coercion
# genm$Smoking = ordered(sm2, labels=c('no_tobacco', 'ex_smoker',
# 'smoker'))
rm(sm, sm2)
# reorder Smoking
al = genm$Alcohol
al2 = as.vector(al)
al2[al == "no alcohol intake"] <- 1
al2[al == "occasional alcohol intake"] <- 2
al2[al == "Heavy drinker"] <- 3
al2[is.na(al)] <- NA
al2 = as.numeric(al2)
## Warning: NAs introduced by coercion
# genm$Alcohol = ordered(al2, labels=c('no_alcohol', 'occasional_alcohol',
# 'heavy_alcohol'))
rm(al, al2)
# The histological factors are in the right order but are numeric whereas
# they should be ordinal factors
genm$T = ordered(genm$T)
genm$N = ordered(genm$N)
genm$M = ordered(genm$M)
Essentially the analysis requested would simply be a list of Chi-square tests (HPVstatus vs ManyOtherFactors), plus a couple of other tests (Smoking vs TP53) and then the more complex (PIK3CA/PTEN vs HPVstatus). Then I would choose some adjustment (p or q-value) and he would reorder the fig to follow the pvals like in the Science Fig above.
However note that all the factors/genes tested in the Science paper are binomial MUTANT or WT. Here we have a lot of factors that are ordinal (e.g notobacco < exsmoker < smoker). We can use kendalls tau and test the significance of correlation on these - as using the ordinal information should be more sensitive (powerful).
This is long winded though and there is another way: Log-Linear Analysis (LLA). The problem is though that the long winded way (lots of chisq.test) is simple but LLA - whilst technically more correct for lots of comparisons and ostensibly quicker (a few lines of code) is far more complex to interpret. It would look for interactions between factors (of which there would be hundreds) (e.g. such as PIK3CA/PTEN vs HPVstatus) and the you would try and cut this down usign a step down procedure. Unwieldy and judging by initial tests unlikely to work.
… We'll have a look at the chisq.tests first:
colnames(genm)
## [1] "ID" "Gender" "Age" "Site" "Grade"
## [6] "T" "N" "M" "Survival" "Death"
## [11] "Smoking" "Alcohol" "TP53" "CCND1" "CCND3"
## [16] "CDKN2A" "CDKN2B" "EGFR" "MDM2" "CCND3.1"
## [21] "TET2" "SUFU" "PKHD1" "LRP1B" "KDM6A"
## [26] "FGFR1" "FGFR3" "KDM6A.1" "MYC" "MCL1"
## [31] "RICTOR" "STK11" "RB1" "NF1" "KRAS"
## [36] "BCL2L1" "SOX2" "PTEN" "FBXW7" "PIK3CA"
## [41] "HPVstatus"
# example of what the table command is doing
with(genm, table(HPVstatus, Gender))
## Gender
## HPVstatus F M
## HPVneg 4 12
## HPVpos 5 13
# the chisq.test takes this table format as input
geneP = list()
genePsim = list()
geneP$tp53 = chisq.test(with(genm, table(HPVstatus, TP53)))$p.value
geneP$ccnd1 = chisq.test(with(genm, table(HPVstatus, CCND1)))$p.value
geneP$ccnd3 = chisq.test(with(genm, table(HPVstatus, CCND3)))$p.value
geneP$cdkn2a = chisq.test(with(genm, table(HPVstatus, CDKN2A)))$p.value
geneP$cdkn2b = chisq.test(with(genm, table(HPVstatus, CDKN2B)))$p.value
geneP$egfr = chisq.test(with(genm, table(HPVstatus, EGFR)))$p.value
geneP$mdm2 = chisq.test(with(genm, table(HPVstatus, MDM2)))$p.value
geneP$ccnd3.1 = chisq.test(with(genm, table(HPVstatus, CCND3.1)))$p.value
geneP$tet2 = chisq.test(with(genm, table(HPVstatus, TET2)))$p.value
geneP$sufu = chisq.test(with(genm, table(HPVstatus, SUFU)))$p.value
geneP$pkhd1 = chisq.test(with(genm, table(HPVstatus, PKHD1)))$p.value
geneP$lrp1b = chisq.test(with(genm, table(HPVstatus, LRP1B)))$p.value
geneP$kdm6a = chisq.test(with(genm, table(HPVstatus, KDM6A)))$p.value
geneP$fgfr1 = chisq.test(with(genm, table(HPVstatus, FGFR1)))$p.value
geneP$fgfr3 = chisq.test(with(genm, table(HPVstatus, FGFR3)))$p.value
geneP$kdm6a.1 = chisq.test(with(genm, table(HPVstatus, KDM6A.1)))$p.value
geneP$myc = chisq.test(with(genm, table(HPVstatus, MYC)))$p.value
geneP$mcl1 = chisq.test(with(genm, table(HPVstatus, MCL1)))$p.value
geneP$rictor = chisq.test(with(genm, table(HPVstatus, RICTOR)))$p.value
geneP$stk11 = chisq.test(with(genm, table(HPVstatus, STK11)))$p.value
geneP$rb1 = chisq.test(with(genm, table(HPVstatus, RB1)))$p.value
geneP$nf1 = chisq.test(with(genm, table(HPVstatus, NF1)))$p.value
geneP$kras = chisq.test(with(genm, table(HPVstatus, KRAS)))$p.value
geneP$bcl2l1 = chisq.test(with(genm, table(HPVstatus, BCL2L1)))$p.value
geneP$sox2 = chisq.test(with(genm, table(HPVstatus, SOX2)))$p.value
geneP$pten = chisq.test(with(genm, table(HPVstatus, PTEN)))$p.value
geneP$fbxw7 = chisq.test(with(genm, table(HPVstatus, FBXW7)))$p.value
geneP$pik3ca = chisq.test(with(genm, table(HPVstatus, PIK3CA)))$p.value
genePsim$tp53 = chisq.test(with(genm, table(HPVstatus, TP53)), simulate.p.value = T)$p.value
genePsim$ccnd1 = chisq.test(with(genm, table(HPVstatus, CCND1)), simulate.p.value = T)$p.value
genePsim$ccnd3 = chisq.test(with(genm, table(HPVstatus, CCND3)), simulate.p.value = T)$p.value
genePsim$cdkn2a = chisq.test(with(genm, table(HPVstatus, CDKN2A)), simulate.p.value = T)$p.value
genePsim$cdkn2b = chisq.test(with(genm, table(HPVstatus, CDKN2B)), simulate.p.value = T)$p.value
genePsim$egfr = chisq.test(with(genm, table(HPVstatus, EGFR)), simulate.p.value = T)$p.value
genePsim$mdm2 = chisq.test(with(genm, table(HPVstatus, MDM2)), simulate.p.value = T)$p.value
genePsim$ccnd3.1 = chisq.test(with(genm, table(HPVstatus, CCND3.1)), simulate.p.value = T)$p.value
genePsim$tet2 = chisq.test(with(genm, table(HPVstatus, TET2)), simulate.p.value = T)$p.value
genePsim$sufu = chisq.test(with(genm, table(HPVstatus, SUFU)), simulate.p.value = T)$p.value
genePsim$pkhd1 = chisq.test(with(genm, table(HPVstatus, PKHD1)), simulate.p.value = T)$p.value
genePsim$lrp1b = chisq.test(with(genm, table(HPVstatus, LRP1B)), simulate.p.value = T)$p.value
genePsim$kdm6a = chisq.test(with(genm, table(HPVstatus, KDM6A)), simulate.p.value = T)$p.value
genePsim$fgfr1 = chisq.test(with(genm, table(HPVstatus, FGFR1)), simulate.p.value = T)$p.value
genePsim$fgfr3 = chisq.test(with(genm, table(HPVstatus, FGFR3)), simulate.p.value = T)$p.value
genePsim$kdm6a.1 = chisq.test(with(genm, table(HPVstatus, KDM6A.1)), simulate.p.value = T)$p.value
genePsim$myc = chisq.test(with(genm, table(HPVstatus, MYC)), simulate.p.value = T)$p.value
genePsim$mcl1 = chisq.test(with(genm, table(HPVstatus, MCL1)), simulate.p.value = T)$p.value
genePsim$rictor = chisq.test(with(genm, table(HPVstatus, RICTOR)), simulate.p.value = T)$p.value
genePsim$stk11 = chisq.test(with(genm, table(HPVstatus, STK11)), simulate.p.value = T)$p.value
genePsim$rb1 = chisq.test(with(genm, table(HPVstatus, RB1)), simulate.p.value = T)$p.value
genePsim$nf1 = chisq.test(with(genm, table(HPVstatus, NF1)), simulate.p.value = T)$p.value
genePsim$kras = chisq.test(with(genm, table(HPVstatus, KRAS)), simulate.p.value = T)$p.value
genePsim$bcl2l1 = chisq.test(with(genm, table(HPVstatus, BCL2L1)), simulate.p.value = T)$p.value
genePsim$sox2 = chisq.test(with(genm, table(HPVstatus, SOX2)), simulate.p.value = T)$p.value
genePsim$pten = chisq.test(with(genm, table(HPVstatus, PTEN)), simulate.p.value = T)$p.value
genePsim$fbxw7 = chisq.test(with(genm, table(HPVstatus, FBXW7)), simulate.p.value = T)$p.value
genePsim$pik3ca = chisq.test(with(genm, table(HPVstatus, PIK3CA)), simulate.p.value = T)$p.value
plot(unlist(geneP), unlist(genePsim))
From these two sets of chisq.test the simulated values are probably more correct. There is a rule of thumb that if there are n<5 in any cell of the contingency table then you should think about the Fisher exact test - or simulate the p-values. However looking at the plot you can see that it really isn't going to make any difference. The significant HPV-gene associations are very significant and there are no marginal calls that would be altered. So I use the simulated p.values and now adjust them.
# I reorder the list of p.values so that the most significant are at the
# top
genePsim = genePsim[order(unlist(genePsim))]
# this would be the most conservative p-value adjustment when
p.bon = p.adjust(unlist(genePsim), method = "bonferroni")
p.fdr = p.adjust(unlist(genePsim), method = "fdr")
genePsim.tab = data.frame(unlist(genePsim), bonferroni = p.bon, fdr = p.fdr)
rm(p.bon, p.fdr)
genePsim.tab
## unlist.genePsim. bonferroni fdr
## tp53 0.0004998 0.01399 0.006997
## ccnd1 0.0004998 0.01399 0.006997
## cdkn2b 0.0014993 0.04198 0.013993
## cdkn2a 0.0019990 0.05597 0.013993
## egfr 0.2238881 1.00000 0.605958
## fbxw7 0.2368816 1.00000 0.605958
## pten 0.3018491 1.00000 0.605958
## sox2 0.3463268 1.00000 0.605958
## lrp1b 0.4517741 1.00000 0.605958
## tet2 0.4557721 1.00000 0.605958
## rictor 0.4557721 1.00000 0.605958
## sufu 0.4622689 1.00000 0.605958
## mcl1 0.4622689 1.00000 0.605958
## kdm6a.1 0.4642679 1.00000 0.605958
## pkhd1 0.4672664 1.00000 0.605958
## fgfr1 0.4712644 1.00000 0.605958
## bcl2l1 0.4717641 1.00000 0.605958
## mdm2 0.4747626 1.00000 0.605958
## fgfr3 0.4747626 1.00000 0.605958
## ccnd3 0.4837581 1.00000 0.605958
## myc 0.4892554 1.00000 0.605958
## kras 0.4897551 1.00000 0.605958
## nf1 0.4977511 1.00000 0.605958
## pik3ca 0.7876062 1.00000 0.918874
## ccnd3.1 0.8535732 1.00000 0.919233
## kdm6a 0.8535732 1.00000 0.919233
## stk11 1.0000000 1.00000 1.000000
## rb1 1.0000000 1.00000 1.000000
If we wanted to follow the Science Fig exactly then we would use this Fig ordering and the false discovery rate (q-value, fdr). Personally I think that the bonferroni is best as it is the most conservative and there are not so many tests as in the Science paper. The need for a bar as the difference betwen significant and not is absolute so asterisks (*) to mark significant should be fine. We focus on 28 genes that have mutations, they seem to have sequenced and aligned the whole exome???
Now for the other clinical variable comparisons… firstly I should say that I think there is no point in comparison of Smoking to TP53. TP53 is highly associated with HPV- status which is associated with Non-smoking. They are confounded and there are too few samples to disentangle - So you should be consistent and just report the HPV to Smoking relation. I don't think there are enough samples for multivariate logistic regression (though that would be quicker to type) so I'm just going to do every variable singly as before
genm.var = genm[, c(2:7, 10:12, 41)]
head(genm.var)
## Gender Age Site Grade T N Death Smoking
## 1 F 57 base_of_tongue poorly diff 2 2 1 no_tobacco
## 2 M 63 tonsil poorly diff 3 2 0 no_tobacco
## 3 F 56 tonsil mod diff 2 1 1 no_tobacco
## 4 F 63 tonsil poorly diff 3 3 0 ex_smoker
## 5 M 53 tonsil poorly diff <NA> <NA> 1 <NA>
## 6 M 57 tonsil poorly diff 4 3 1 smoker
## Alcohol HPVstatus
## 1 no_alcohol HPVpos
## 2 no_alcohol HPVpos
## 3 no_alcohol HPVpos
## 4 <NA> HPVpos
## 5 <NA> HPVpos
## 6 occasional_alcohol HPVpos
# after all that messing about with the ordering of factors it turns out R
# (or at least the wilcoxon.test) doesn't handle the data how I expected
# so I'm changing them back to numeric order
genm.var$Site = as.numeric(genm$Site)
genm.var$Grade = as.numeric(genm$Grade)
genm.var$T = as.numeric(genm$T)
genm.var$N = as.numeric(genm$N)
genm.var$Smoking = as.numeric(genm$Smoking)
genm.var$Alcohol = as.numeric(genm$Alcohol)
head(genm.var)
## Gender Age Site Grade T N Death Smoking Alcohol HPVstatus
## 1 F 57 1 3 2 3 1 1 1 HPVpos
## 2 M 63 2 3 3 3 0 1 1 HPVpos
## 3 F 56 2 2 2 2 1 1 1 HPVpos
## 4 F 63 2 3 3 4 0 2 NA HPVpos
## 5 M 53 2 3 NA NA 1 NA NA HPVpos
## 6 M 57 2 3 4 4 1 3 2 HPVpos
clinP = list()
clinP$Site = with(genm.var, wilcox.test(Site ~ HPVstatus))
clinP$Grade = with(genm.var, wilcox.test(Grade ~ HPVstatus))
clinP$T = with(genm.var, wilcox.test(T ~ HPVstatus))
clinP$N = with(genm.var, wilcox.test(N ~ HPVstatus))
clinP$Smoking = with(genm.var, wilcox.test(Smoking ~ HPVstatus))
clinP$Alcohol = with(genm.var, wilcox.test(Alcohol ~ HPVstatus))
# NB I slip in a chisq test for Death and Gender plus a (single) logistic
# regression for Age and lastly a quick comparison of the survival curves
clinP$Death = chisq.test(with(genm.var, table(HPVstatus, Death)))
clinP$Gender = chisq.test(with(genm.var, table(HPVstatus, Gender)))
summary(glm(HPVstatus ~ Age, data = genm.var, family = "binomial", na.action = na.omit))
##
## Call:
## glm(formula = HPVstatus ~ Age, family = "binomial", data = genm.var,
## na.action = na.omit)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.380 -1.210 0.962 1.101 1.388
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.6469 2.2109 0.74 0.46
## Age -0.0263 0.0375 -0.70 0.48
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 47.016 on 33 degrees of freedom
## Residual deviance: 46.517 on 32 degrees of freedom
## AIC: 50.52
##
## Number of Fisher Scoring iterations: 4
##
clinP$Age$p.value = 0.484
require(survival)
## Loading required package: survival
## Loading required package: splines
with(genm, survdiff(Surv(Survival, Death) ~ HPVstatus))
## Call:
## survdiff(formula = Surv(Survival, Death) ~ HPVstatus)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## HPVstatus=HPVneg 16 9 7.06 0.534 0.892
## HPVstatus=HPVpos 18 9 10.94 0.345 0.892
##
## Chisq= 0.9 on 1 degrees of freedom, p= 0.345
clinP$Survival$p.value = 0.345
# I'm adjusting all tests together and main a tidy table to view this
p.all = c(unlist(genePsim), sapply(clinP, function(x) x$p.value))
p.all.bon = p.adjust(p.all, method = "bonferroni")
p.all.fdr = p.adjust(p.all, method = "fdr")
test = c(rep("chisq", 28), rep("wilcox", 6), rep("chisq", 2), "log-reg", "survival")
p.all.tab = data.frame(unadjustedP = p.all, bonferroni = p.all.bon, fdr = p.all.fdr,
test = test)
p.all.tab
## unadjustedP bonferroni fdr test
## tp53 0.0004998 0.01899 0.009495 chisq
## ccnd1 0.0004998 0.01899 0.009495 chisq
## cdkn2b 0.0014993 0.05697 0.018991 chisq
## cdkn2a 0.0019990 0.07596 0.018991 chisq
## egfr 0.2238881 1.00000 0.610147 chisq
## fbxw7 0.2368816 1.00000 0.610147 chisq
## pten 0.3018491 1.00000 0.610147 chisq
## sox2 0.3463268 1.00000 0.610147 chisq
## lrp1b 0.4517741 1.00000 0.610147 chisq
## tet2 0.4557721 1.00000 0.610147 chisq
## rictor 0.4557721 1.00000 0.610147 chisq
## sufu 0.4622689 1.00000 0.610147 chisq
## mcl1 0.4622689 1.00000 0.610147 chisq
## kdm6a.1 0.4642679 1.00000 0.610147 chisq
## pkhd1 0.4672664 1.00000 0.610147 chisq
## fgfr1 0.4712644 1.00000 0.610147 chisq
## bcl2l1 0.4717641 1.00000 0.610147 chisq
## mdm2 0.4747626 1.00000 0.610147 chisq
## fgfr3 0.4747626 1.00000 0.610147 chisq
## ccnd3 0.4837581 1.00000 0.610147 chisq
## myc 0.4892554 1.00000 0.610147 chisq
## kras 0.4897551 1.00000 0.610147 chisq
## nf1 0.4977511 1.00000 0.610147 chisq
## pik3ca 0.7876062 1.00000 0.935282 chisq
## ccnd3.1 0.8535732 1.00000 0.953994 chisq
## kdm6a 0.8535732 1.00000 0.953994 chisq
## stk11 1.0000000 1.00000 1.000000 chisq
## rb1 1.0000000 1.00000 1.000000 chisq
## Site 0.1830444 1.00000 0.610147 wilcox
## Grade 0.1368043 1.00000 0.577618 wilcox
## T 0.0320464 1.00000 0.152220 wilcox
## N 0.0176130 0.66929 0.095613 wilcox
## Smoking 0.0173409 0.65895 0.095613 wilcox
## Alcohol 0.0025951 0.09861 0.019722 wilcox
## Death 0.9838468 1.00000 1.000000 chisq
## Gender 1.0000000 1.00000 1.000000 chisq
## Age 0.4840000 1.00000 0.610147 log-reg
## Survival 0.3450000 1.00000 0.610147 survival
OK when I add in all the clinical variables each tested separately and singly using the appropriate test then do adjustment on all tests (genes and clinical) together, then using the FDR makes more sense. I presume that Matthias can take these values and alter the figure appropriately.
Lastly there was a question about a possible interaction between PIK3CA, PTEN and HPVStatus. this is a specific test selection (rather than dredging of all possible interactions) so no adjustment is required.
chisq.test(with(genm, table(PIK3CA, PTEN)), simulate.p.value = T)
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: with(genm, table(PIK3CA, PTEN))
## X-squared = 2.804, df = NA, p-value = 0.7031
##
chisq.test(with(genm, table(PIK3CA, HPVstatus)), simulate.p.value = T)
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: with(genm, table(PIK3CA, HPVstatus))
## X-squared = 2.076, df = NA, p-value = 0.7851
##
chisq.test(with(genm, table(HPVstatus, PTEN)), simulate.p.value = T)
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: with(genm, table(HPVstatus, PTEN))
## X-squared = 2.927, df = NA, p-value = 0.2989
##
with(genm, table(PIK3CA, PTEN, HPVstatus))
## , , HPVstatus = HPVneg
##
## PTEN
## PIK3CA L P WT
## G 0 0 1
## P 0 0 3
## PG 0 0 0
## WT 0 1 11
##
## , , HPVstatus = HPVpos
##
## PTEN
## PIK3CA L P WT
## G 0 0 1
## P 0 0 4
## PG 0 0 2
## WT 3 1 7
##
chisq.test(with(genm, table(PIK3CA, PTEN, HPVstatus)), simulate.p.value = T)
##
## Chi-squared test for given probabilities with simulated p-value
## (based on 2000 replicates)
##
## data: with(genm, table(PIK3CA, PTEN, HPVstatus))
## X-squared = 115.6, df = NA, p-value = 0.0004998
##
There is a simple logic to these results. You maybe suspicious that there is a relation between PIK3CA and PTEN based on prior evidence but it's not significant on its own. However whilst the relation of PIK3CA or PTEN separately to HPVstatus is not signifcant (NB if the numbers are slightly different to the table that is simulation error), their joint relationship is significant. So you can make a statement in the text that given prior knowledge they are on the same path there is a significant joint relation between PIK3CA, PTEN and HPVstatus.
As I said before I'm not doing TP53 to Smoking - it doesn't make sense to me when you have too few samples and a strongly confounded comparison to HPVstatus.
# this was the code to create the report in HTML Tumblr format# knit('MatthiasGenm.Rmd'); markdownToHTML('MatthiasGenm.md',# 'Test-Post.html', fragment.only = TRUE)