1 OVC tumor-only mode

1.1 Load data

script_dir = "~/data2/PureCN_manuscript"
results_dir = file.path(script_dir, "Results/purity_ploidy")
purecn_puri_ploi = readRDS(file.path(results_dir, "data/ABS_w_tumor_only.rds"))

source("~/Documents/github/PureCN_manuscript/Figures/Final_Figures/non_dup.R")
purecn_puri_ploi = purecn_puri_ploi[which(purecn_puri_ploi$SampleId %in% ovc_236),]

1.2 Ploidy outliers

Ploidy-concordant samples with the absolute ploidy difference less than 0.5, are marked in red and the others are colored in cyan. Pearson correlation on upper-left side is calculated only with the concordant samples.

2 LUAD tumor-only mode

2.1 Load data

script_dir = "~/data2/PureCN_manuscript"
results_dir = file.path(script_dir, "luad/Results/purity_ploidy")
purecn_puri_ploi = readRDS(file.path(results_dir, "data/luad_ABS_w_tumor_only.rds"))

# 442 obs. which have matched normal samples
# paired = readRDS(file.path(results_dir, "data/luad_ABS_w_matching_normal.rds"))
# write.csv(paired$fullname, "~/data2/PureCN_manuscript/luad/Results/duplication/luad_442.csv")
luad_442 = read.csv("~/data2/PureCN_manuscript/luad/Results/duplication/luad_442.csv")[,2]  

purecn_puri_ploi = purecn_puri_ploi[which(purecn_puri_ploi$fullname %in% luad_442),]

2.2 Ploidy outliers

Ploidy-concordant samples with the absolute ploidy difference less than 0.5, are marked in red and the others are colored in cyan. Pearson correlation on upper-left side is calculated only with the concordant samples.

3 Purity and Coverage

3.1 OVC

3.1.1 Concordant

selectedCol = c("Purity_tumor_only", "Ploidy_tumor_only", "mean.coverage.ontarget", "mean.duplication.ontarget")
summary(OV_all[which(OV_all$absdiff == "concordant"),][selectedCol])
##  Purity_tumor_only Ploidy_tumor_only mean.coverage.ontarget
##  Min.   :0.2600    Min.   :1.641     Min.   : 25.77        
##  1st Qu.:0.6400    1st Qu.:1.898     1st Qu.: 90.81        
##  Median :0.7300    Median :2.764     Median :110.13        
##  Mean   :0.7267    Mean   :2.696     Mean   :127.89        
##  3rd Qu.:0.8500    3rd Qu.:3.289     3rd Qu.:178.43        
##  Max.   :0.9500    Max.   :5.499     Max.   :345.37        
##  mean.duplication.ontarget
##  Min.   :0.06474          
##  1st Qu.:0.13050          
##  Median :0.17198          
##  Mean   :0.22055          
##  3rd Qu.:0.27727          
##  Max.   :0.74399

3.1.2 Discordant

selectedCol = c("Purity_tumor_only", "Ploidy_tumor_only", "mean.coverage.ontarget", "mean.duplication.ontarget")
summary(OV_all[which(OV_all$absdiff == "discordant"),][selectedCol])
##  Purity_tumor_only Ploidy_tumor_only mean.coverage.ontarget
##  Min.   :0.3700    Min.   :2.429     Min.   : 74.66        
##  1st Qu.:0.4875    1st Qu.:3.024     1st Qu.: 93.46        
##  Median :0.6100    Median :3.427     Median :119.89        
##  Mean   :0.5783    Mean   :3.432     Mean   :135.49        
##  3rd Qu.:0.6650    3rd Qu.:3.707     3rd Qu.:181.71        
##  Max.   :0.8400    Max.   :4.491     Max.   :219.48        
##  mean.duplication.ontarget
##  Min.   :0.1088           
##  1st Qu.:0.1398           
##  Median :0.2103           
##  Mean   :0.2389           
##  3rd Qu.:0.3087           
##  Max.   :0.5819

3.2 LUAD

3.2.1 Concordant

selectedCol = c("Purity_tumor_only", "Ploidy_tumor_only", "mean.coverage.ontarget", "mean.duplication.ontarget")
summary(LUAD_all[which(LUAD_all$absdiff == "concordant"),][selectedCol])
##  Purity_tumor_only Ploidy_tumor_only mean.coverage.ontarget
##  Min.   :0.1500    Min.   :1.557     Min.   : 54.12        
##  1st Qu.:0.3000    1st Qu.:1.997     1st Qu.: 75.66        
##  Median :0.4100    Median :2.470     Median : 90.10        
##  Mean   :0.4338    Mean   :2.628     Mean   : 95.00        
##  3rd Qu.:0.5600    3rd Qu.:3.167     3rd Qu.:108.96        
##  Max.   :0.8400    Max.   :4.973     Max.   :197.86        
##  mean.duplication.ontarget
##  Min.   :0.07165          
##  1st Qu.:0.10901          
##  Median :0.12781          
##  Mean   :0.14184          
##  3rd Qu.:0.17170          
##  Max.   :0.34886

3.2.2 Discordant

selectedCol = c("Purity_tumor_only", "Ploidy_tumor_only", "mean.coverage.ontarget", "mean.duplication.ontarget")
summary(LUAD_all[which(LUAD_all$absdiff == "discordant"),][selectedCol])
##  Purity_tumor_only Ploidy_tumor_only mean.coverage.ontarget
##  Min.   :0.1500    Min.   :1.640     Min.   : 52.20        
##  1st Qu.:0.2400    1st Qu.:1.989     1st Qu.: 74.22        
##  Median :0.3100    Median :2.320     Median : 86.41        
##  Mean   :0.3365    Mean   :2.539     Mean   : 89.83        
##  3rd Qu.:0.4225    3rd Qu.:2.760     3rd Qu.: 97.22        
##  Max.   :0.6900    Max.   :4.754     Max.   :251.29        
##  mean.duplication.ontarget
##  Min.   :0.05161          
##  1st Qu.:0.10421          
##  Median :0.11991          
##  Mean   :0.14390          
##  3rd Qu.:0.16475          
##  Max.   :0.44079

4 P-value

Significance of purity and coverage differences between concordant vs. discordant samples, where ‘discordant’ is defined as the ploidy difference >= 0.5.

4.1 Purity

4.1.1 OVC

con_ind = which(OV_all$absdiff == "concordant")
dis_ind = which(OV_all$absdiff == "discordant")

wilcox.test(OV_all$Purity_tumor_only[con_ind],
            OV_all$Purity_tumor_only[dis_ind])
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  OV_all$Purity_tumor_only[con_ind] and OV_all$Purity_tumor_only[dis_ind]
## W = 4868.5, p-value = 1.21e-07
## alternative hypothesis: true location shift is not equal to 0
t.test(OV_all$Purity_tumor_only[con_ind], OV_all$Purity_tumor_only[dis_ind])
## 
##  Welch Two Sample t-test
## 
## data:  OV_all$Purity_tumor_only[con_ind] and OV_all$Purity_tumor_only[dis_ind]
## t = 6.4014, df = 41.447, p-value = 1.11e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1015743 0.1951581
## sample estimates:
## mean of x mean of y 
## 0.7266995 0.5783333

4.1.2 LUAD

con_ind = which(LUAD_all$absdiff == "concordant")
dis_ind = which(LUAD_all$absdiff == "discordant")

wilcox.test(LUAD_all$Purity_tumor_only[con_ind],
            LUAD_all$Purity_tumor_only[dis_ind])
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  LUAD_all$Purity_tumor_only[con_ind] and LUAD_all$Purity_tumor_only[dis_ind]
## W = 22794, p-value = 4.016e-07
## alternative hypothesis: true location shift is not equal to 0
t.test(LUAD_all$Purity_tumor_only[con_ind], LUAD_all$Purity_tumor_only[dis_ind])
## 
##  Welch Two Sample t-test
## 
## data:  LUAD_all$Purity_tumor_only[con_ind] and LUAD_all$Purity_tumor_only[dis_ind]
## t = 6.3041, df = 221.37, p-value = 1.551e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.06690376 0.12775706
## sample estimates:
## mean of x mean of y 
## 0.4338304 0.3365000

4.1.3 Both

all = rbind(OV_all, LUAD_all)
con_ind = which(all$absdiff == "concordant")
dis_ind = which(all$absdiff == "discordant")

wilcox.test(all$Purity_tumor_only[con_ind],
            all$Purity_tumor_only[dis_ind])
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  all$Purity_tumor_only[con_ind] and all$Purity_tumor_only[dis_ind]
## W = 50030, p-value = 2.649e-13
## alternative hypothesis: true location shift is not equal to 0
t.test(all$Purity_tumor_only[con_ind], all$Purity_tumor_only[dis_ind])
## 
##  Welch Two Sample t-test
## 
## data:  all$Purity_tumor_only[con_ind] and all$Purity_tumor_only[dis_ind]
## t = 9.0539, df = 252.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1178489 0.1833706
## sample estimates:
## mean of x mean of y 
## 0.5429174 0.3923077

4.2 Coverage

4.2.1 OVC

con_ind = which(OV_all$absdiff == "concordant")
dis_ind = which(OV_all$absdiff == "discordant")

wilcox.test(OV_all$mean.coverage.ontarget[con_ind],
            OV_all$mean.coverage.ontarget[dis_ind])
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  OV_all$mean.coverage.ontarget[con_ind] and OV_all$mean.coverage.ontarget[dis_ind]
## W = 2759, p-value = 0.4074
## alternative hypothesis: true location shift is not equal to 0
t.test(OV_all$mean.coverage.ontarget[con_ind], OV_all$mean.coverage.ontarget[dis_ind])
## 
##  Welch Two Sample t-test
## 
## data:  OV_all$mean.coverage.ontarget[con_ind] and OV_all$mean.coverage.ontarget[dis_ind]
## t = -0.80112, df = 39.972, p-value = 0.4278
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -26.77187  11.57298
## sample estimates:
## mean of x mean of y 
##  127.8872  135.4866

4.2.2 LUAD

con_ind = which(LUAD_all$absdiff == "concordant")
dis_ind = which(LUAD_all$absdiff == "discordant")

wilcox.test(LUAD_all$mean.coverage.ontarget[con_ind],
            LUAD_all$mean.coverage.ontarget[dis_ind])
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  LUAD_all$mean.coverage.ontarget[con_ind] and LUAD_all$mean.coverage.ontarget[dis_ind]
## W = 19306, p-value = 0.04967
## alternative hypothesis: true location shift is not equal to 0
t.test(LUAD_all$mean.coverage.ontarget[con_ind], LUAD_all$mean.coverage.ontarget[dis_ind])
## 
##  Welch Two Sample t-test
## 
## data:  LUAD_all$mean.coverage.ontarget[con_ind] and LUAD_all$mean.coverage.ontarget[dis_ind]
## t = 1.7848, df = 161.68, p-value = 0.07617
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5495838 10.8760481
## sample estimates:
## mean of x mean of y 
##  94.99557  89.83234

4.2.3 Both

all = rbind(OV_all, LUAD_all)
con_ind = which(all$absdiff == "concordant")
dis_ind = which(all$absdiff == "discordant")

wilcox.test(all$mean.coverage.ontarget[con_ind],
            all$mean.coverage.ontarget[dis_ind])
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  all$mean.coverage.ontarget[con_ind] and all$mean.coverage.ontarget[dis_ind]
## W = 39700, p-value = 0.03239
## alternative hypothesis: true location shift is not equal to 0
t.test(all$mean.coverage.ontarget[con_ind], all$mean.coverage.ontarget[dis_ind])
## 
##  Welch Two Sample t-test
## 
## data:  all$mean.coverage.ontarget[con_ind] and all$mean.coverage.ontarget[dis_ind]
## t = 1.8561, df = 210.23, p-value = 0.06483
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4268748 14.1848662
## sample estimates:
## mean of x mean of y 
##  107.2469  100.3679