There is 3 patient that did not consent and removed.
Note: Of 2362 total survey respondents, 1632 had complete data on 12 key variables (Consent, received germline genetic testing, genetic diagnosis, age at diagnosis (germline mutation), genetic test result, sex at birth, age range, cancer diagnosis, family history of stomach cancer, any gastric lesion, race/ethnicity, anti-acid medication/proton pump inhibitors). We proceed our analysis with n=1632 patients.
Self-reported cancer diagnosis. Note patients may indicate more than
one cancer.
Self-reported race/ethnicity. Note patients may report more than one
race/ethnicity.
Self-reported genes. Note patients may report more than one gene.
Self-reported genes among Stomach cancer patients. Note patients may
report more than one gene.
Self-reported gastric lesions. Note patients may report more than one
lesion.
| Characteristic | N = 1,6321 |
|---|---|
| Age | |
| Less than 29 | 43 (2.6%) |
| 30-39 | 173 (11%) |
| 40-49 | 373 (23%) |
| 50-59 | 394 (24%) |
| 60-69 | 410 (25%) |
| 70 and over | 239 (15%) |
| Sex | |
| Female | 1,552 (95%) |
| Male | 80 (4.9%) |
| Race_Ethnicity | |
| American Indian, Alaska Native, or First Nations | 1 (<0.1%) |
| Asian or Asian American | 24 (1.5%) |
| Black or African-American | 10 (0.6%) |
| Hispanic White | 19 (1.2%) |
| Hispanic, Latino, or Latinx | 38 (2.3%) |
| Mixed race/ethnicity | 22 (1.3%) |
| Non-Hispanic White | 1,502 (92%) |
| Other | 16 (1.0%) |
| Any_Gastric_Lesion | 798 (49%) |
| H_pylori | 100 (6.1%) |
| Fam_Hist_Stomach_Cancer | |
| Don't know | 272 (17%) |
| No | 957 (59%) |
| Yes | 403 (25%) |
| Stomach_Cancer | 50 (3.1%) |
| 1 n (%) | |
Any grastric lesion includes any of the following:
Helicobacter Pylori (H. pylori)
Gastritis
Gastric Ulcer
Gastro-esophageal reflux (GERD)
Intestinal Metaplasia
Barrett’s Esophagus
Pernicious Anemia
HR genes: ATM, BARD1, BRCA1, BRCA2, BRIP1, CHEK2, PALB2, RAD51C, RAD51D, NBN
Lynch genes: MLH1, MSH2, MSH6, PMS2, EPCAM
Hereditary Diffuse Gastric Cancer (HDGC) genes: CDH1, CTNNA1
Juvenile Polyposis Syndrome genes: SMAD4, BMPR1A
Peutz-Jeghers Syndrome gene: STK11 (also known as LKB1)
Li-Fraumeni Syndrome gene: TP53
Recessive cancer predisposition syndrome genes: MUTYH, NTHL1
All other genes not included: CDKN2A, ENG, NF1, PTEN
Note: n=272 who responded “Don’t know” are not included in table 4.
| Characteristic | Overall N = 7821 |
No N = 7321 |
Yes N = 501 |
p-value |
|---|---|---|---|---|
| Fam_Hist_Stomach_Cancer | <0.0012 | |||
| Don't know | 117 (15%) | 111 (15%) | 6 (12%) | |
| No | 463 (59%) | 445 (61%) | 18 (36%) | |
| Yes | 202 (26%) | 176 (24%) | 26 (52%) | |
| Any_Gastric_Lesion | 0.0242 | |||
| No | 402 (51%) | 384 (52%) | 18 (36%) | |
| Yes | 380 (49%) | 348 (48%) | 32 (64%) | |
| H_pylori | 0.0093 | |||
| No | 743 (95%) | 700 (96%) | 43 (86%) | |
| Yes | 39 (5.0%) | 32 (4.4%) | 7 (14%) | |
| HR | <0.0012 | |||
| No | 178 (23%) | 148 (20%) | 30 (60%) | |
| Yes | 604 (77%) | 584 (80%) | 20 (40%) | |
| Lynch | >0.92 | |||
| No | 668 (85%) | 625 (85%) | 43 (86%) | |
| Yes | 114 (15%) | 107 (15%) | 7 (14%) | |
| HDGC | <0.0013 | |||
| No | 731 (93%) | 701 (96%) | 30 (60%) | |
| Yes | 51 (6.5%) | 31 (4.2%) | 20 (40%) | |
| Anti_acid_ever | >0.92 | |||
| No | 326 (42%) | 305 (42%) | 21 (42%) | |
| Yes | 456 (58%) | 427 (58%) | 29 (58%) | |
| Anti_acid_time | <0.0013 | |||
| No | 326 (42%) | 305 (42%) | 21 (43%) | |
| Daily | 166 (21%) | 146 (20%) | 20 (41%) | |
| 1-4 times weekly | 115 (15%) | 115 (16%) | 0 (0%) | |
| 1-4 times monthly | 44 (5.6%) | 42 (5.7%) | 2 (4.1%) | |
| 1-4 times in past 6 months | 102 (13%) | 99 (14%) | 3 (6.1%) | |
| Other/As needed | 28 (3.6%) | 25 (3.4%) | 3 (6.1%) | |
| Missing | 1 | 0 | 1 | |
| 1 n (%) | ||||
| 2 Pearson’s Chi-squared test | ||||
| 3 Fisher’s exact test | ||||
Let’s look at cell counts for (gastic cancer/no cancer), H pylori, and HR
## HR No Yes
## Stomach_ORNO_Cancer H_pylori
## No No 144 556
## Yes 4 28
## Yes No 27 16
## Yes 3 4
Cases: Gastric cancer, or any gastric lesion
Controls: (Non-gastric cancer and no family history of gastric cancer) or no cancer
Note we are assuming comparability of groups.
| Characteristic | Overall N = 1,2001 |
0 N = 3841 |
1 N = 8161 |
p-value |
|---|---|---|---|---|
| H_pylori | <0.0012 | |||
| No | 1,100 (92%) | 384 (100%) | 716 (88%) | |
| Yes | 100 (8.3%) | 0 (0%) | 100 (12%) | |
| Anti_acid_ever | <0.0012 | |||
| No | 417 (35%) | 226 (59%) | 191 (23%) | |
| Yes | 783 (65%) | 158 (41%) | 625 (77%) | |
| HR | <0.0012 | |||
| No | 298 (25%) | 65 (17%) | 233 (29%) | |
| Yes | 902 (75%) | 319 (83%) | 583 (71%) | |
| Lynch | 0.0022 | |||
| No | 976 (81%) | 332 (86%) | 644 (79%) | |
| Yes | 224 (19%) | 52 (14%) | 172 (21%) | |
| HDGC | 0.0112 | |||
| No | 1,137 (95%) | 373 (97%) | 764 (94%) | |
| Yes | 63 (5.3%) | 11 (2.9%) | 52 (6.4%) | |
| FAP | 0.72 | |||
| No | 1,178 (98%) | 376 (98%) | 802 (98%) | |
| Yes | 22 (1.8%) | 8 (2.1%) | 14 (1.7%) | |
| Juvenile_Polyposis_Syndrome | >0.93 | |||
| No | 1,199 (100%) | 384 (100%) | 815 (100%) | |
| Yes | 1 (<0.1%) | 0 (0%) | 1 (0.1%) | |
| Peutz_Jeghers_Syndrome | 0.63 | |||
| No | 1,197 (100%) | 384 (100%) | 813 (100%) | |
| Yes | 3 (0.3%) | 0 (0%) | 3 (0.4%) | |
| Li_Fraumeni_Syndrome | 0.23 | |||
| No | 1,195 (100%) | 384 (100%) | 811 (99%) | |
| Yes | 5 (0.4%) | 0 (0%) | 5 (0.6%) | |
| Recessive_Cancer_Predisposition | 0.23 | |||
| No | 1,186 (99%) | 382 (99%) | 804 (99%) | |
| Yes | 14 (1.2%) | 2 (0.5%) | 12 (1.5%) | |
| Stomach_Genes | 0.0182 | |||
| No | 36 (3.0%) | 5 (1.3%) | 31 (3.8%) | |
| Yes | 1,164 (97%) | 379 (99%) | 785 (96%) | |
| Other_genes | ||||
| No | 1,200 (100%) | 384 (100%) | 816 (100%) | |
| 1 n (%) | ||||
| 2 Pearson’s Chi-squared test | ||||
| 3 Fisher’s exact test | ||||
Models are adjusted for sex and current age category. 95% CI uses the profile likelihood method.
| Gene | No cancer/Non-gastric cancer & No family history n=384 | Gastric/Lesion n=816 | OR (95% CI) | P-Value |
|---|---|---|---|---|
| HRYes | 319 (83.07) | 583 (71.45) | 0.42 (0.3-0.58) | 0.0000002 |
| LynchYes | 52 (13.54) | 172 (21.08) | 1.83 (1.29-2.62) | 0.0008615 |
| HDGCYes | 11 (2.86) | 52 (6.37) | 2.79 (1.46-5.79) | 0.0031809 |
| FAPYes | 8 (2.08) | 14 (1.72) | 1.07 (0.44-2.76) | 0.8871420 |
| Recessive_Cancer_PredispositionYes | 2 (0.52) | 12 (1.47) | 2.69 (0.69-17.95) | 0.2124594 |
| Stomach_GenesYes | 379 (98.7) | 785 (96.2) | 0.25 (0.08-0.6) | 0.0047796 |
| BRCA1Yes | 81 (21.09) | 134 (16.42) | 0.69 (0.5-0.96) | 0.0247509 |
| BRCA2Yes | 138 (35.94) | 193 (23.65) | 0.54 (0.41-0.71) | 0.0000118 |
| CDH1Yes | 11 (2.86) | 50 (6.13) | 2.67 (1.4-5.57) | 0.0048704 |
| H_pyloriYes | 0 (0) | 100 (12.25) | 20163339.96 (83.72-2.26938371154776e+70) | 0.9645141 |
| Parameter | Odds Ratio | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| HR (Yes) | 0.38 | 0.06 | (0.28, 0.53) | -5.68 | < .001 |
| H pylori (Yes) | 9.66e+06 | 8.93e+09 | (0.00, Inf) | 0.02 | 0.986 |
| HR (Yes) × H pylori (Yes) | 2.65 | 2682.28 | (0.00, Inf) | 9.65e-04 | > .999 |
Models adjusted for sex and entry age category. 95% CI uses the profile likelihood method. Note that interaction analysis is likely underpowered due to low n=50 stomach cases.
## Stomach_ORNO_Cancer No Yes
## HR H_pylori
## No No 144 27
## Yes 4 3
## Yes No 556 16
## Yes 28 4
| Parameter | Odds Ratio | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| HR (Yes) | 0.14 | 0.05 | (0.07, 0.26) | -5.88 | < .001 |
| H pylori (Yes) | 3.63 | 2.98 | (0.65, 18.30) | 1.58 | 0.115 |
| HR (Yes) × H pylori (Yes) | 1.44 | 1.46 | (0.19, 10.95) | 0.36 | 0.720 |
## Stomach_ORNO_Cancer No Yes
## HR Any_Gastric_Lesion
## No No 65 13
## Yes 83 17
## Yes No 319 5
## Yes 265 15
| Parameter | Odds Ratio | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| HR (Yes) | 0.07 | 0.04 | (0.02, 0.20) | -4.73 | < .001 |
| Any Gastric Lesion (Yes) | 0.95 | 0.40 | (0.42, 2.18) | -0.13 | 0.900 |
| HR (Yes) × Any Gastric Lesion (Yes) | 3.35 | 2.26 | (0.92, 13.38) | 1.79 | 0.073 |
## Stomach_ORNO_Cancer No Yes
## HDGC H_pylori
## No No 669 24
## Yes 32 6
## Yes No 31 19
## Yes 0 1
| Parameter | Odds Ratio | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| HDGC (Yes) | 18.95 | 7.29 | (8.94, 40.71) | 7.65 | < .001 |
| H pylori (Yes) | 5.48 | 2.78 | (1.88, 14.10) | 3.36 | < .001 |
| HDGC (Yes) × H pylori (Yes) | 6.50e+05 | 5.74e+08 | (7.58e-73, ) | 0.02 | 0.988 |
## Stomach_ORNO_Cancer No Yes
## HDGC Any_Gastric_Lesion
## No No 373 6
## Yes 328 24
## Yes No 11 12
## Yes 20 8
| Parameter | Odds Ratio | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| HDGC (Yes) | 76.55 | 47.73 | (23.61, 279.09) | 6.96 | < .001 |
| Any Gastric Lesion (Yes) | 4.20 | 1.96 | (1.79, 11.50) | 3.08 | 0.002 |
| HDGC (Yes) × Any Gastric Lesion (Yes) | 0.08 | 0.06 | (0.02, 0.35) | -3.23 | 0.001 |
Models adjusted for sex and entry age category. 95% CI uses the profile likelihood method. Note that interaction analysis is likely underpowered due to low n=72 colorectal cases. There are cells with 0 value.
## , , = No
##
##
## No Yes
## No 308 14
## Yes 1161 77
##
## , , = Yes
##
##
## No Yes
## No 46 3
## Yes 17 6
| Parameter | Odds Ratio | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| HR (Yes) | 0.08 | 0.02 | (0.04, 0.14) | -8.41 | < .001 |
| H pylori (Yes) | 1.09 | 0.74 | (0.23, 3.72) | 0.13 | 0.900 |
| HR (Yes) × H pylori (Yes) | 4.82 | 4.08 | (0.96, 28.50) | 1.86 | 0.063 |
## , , = No
##
##
## No Yes
## No 1241 82
## Yes 228 9
##
## , , = Yes
##
##
## No Yes
## No 19 2
## Yes 44 7
| Parameter | Odds Ratio | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| Lynch (Yes) | 13.74 | 3.99 | (7.89, 24.81) | 9.01 | < .001 |
| H pylori (Yes) | 1.50 | 1.13 | (0.23, 5.35) | 0.54 | 0.592 |
| Lynch (Yes) × H pylori (Yes) | 2.52 | 2.38 | (0.45, 20.67) | 0.98 | 0.325 |
## H_pylori No Yes
## Race_Ethnicity
## American Indian, Alaska Native, or First Nations 1 0
## Asian or Asian American 19 5
## Black or African-American 10 0
## Hispanic White 19 0
## Hispanic, Latino, or Latinx 29 9
## Mixed race/ethnicity 18 4
## Non-Hispanic White 1421 81
## Other 15 1
Locations of the 41 NHWs that have H. pylori
##
## California Colorado Connecticut Florida
## 7 2 2 5
## Georgia Iowa Isla Wake Kansas
## 1 1 1 2
## Kentucky Maine Maryland Minnesota
## 1 2 1 2
## Missouri Montana New Hampshire New Jersey
## 1 1 1 3
## New York North Carolina North Dakota Ohio
## 4 11 1 3
## Oregon Prefer not to share South Carolina Texas
## 2 8 1 6
## Utah Virginia Washington West Virginia
## 1 1 6 1
## Wisconsin
## 3
## [1] 0.008249971
## 23 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 1.3298938
## Q4 -1.1754926
## Q5 -3.1477437
## Q6 -1.2546994
## Q7 .
## Q8 -3.3529903
## Q10 3.2172535
## Q12 .
## Q13 -2.1302120
## Q15 .
## Q16 .
## Q17 -0.3752814
## Q18 -1.7291057
## Q19 -2.1982152
## Q20 .
## Q21 -2.3113357
## Q22 -4.6991217
## Q23 .
## Q24 .
## Q25 3.2175091
## Q26 .
## Q27 .
## Q28 0.9061810
## 23 x 1 Matrix of class "dgeMatrix"
## s1
## (Intercept) 3.780641719
## Q4 0.308666905
## Q5 0.042948925
## Q6 0.285161566
## Q7 1.000000000
## Q8 0.034979597
## Q10 24.959474354
## Q12 1.000000000
## Q13 0.118812107
## Q15 1.000000000
## Q16 1.000000000
## Q17 0.687095927
## Q18 0.177443022
## Q19 0.111001101
## Q20 1.000000000
## Q21 0.099128758
## Q22 0.009103269
## Q23 1.000000000
## Q24 1.000000000
## Q25 24.965855270
## Q26 1.000000000
## Q27 1.000000000
## Q28 2.474853062
## 0.8677273
##
## Call:
## glm(formula = gc_outcome ~ ., family = "binomial", data = df2[,
## c("gc_outcome", c(selected_vars, "age", "Sex"))])
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -16.90047 5594.88585 -0.003 0.997590
## Q4No 1.76954 1.24400 1.422 0.154894
## Q5No 4.02067 0.93945 4.280 1.87e-05 ***
## Q6No 15.21325 3956.18051 0.004 0.996932
## Q8No 4.12965 0.81485 5.068 4.02e-07 ***
## Q10No -19.22340 3956.18051 -0.005 0.996123
## Q13No 3.05290 0.82703 3.691 0.000223 ***
## Q17No 1.73746 1.51635 1.146 0.251871
## Q18No 2.73895 0.94947 2.885 0.003918 **
## Q19No 3.56111 0.98727 3.607 0.000310 ***
## Q21PALB2 -4.12984 1.32272 -3.122 0.001795 **
## Q22PMS2 -19.40782 1368.06309 -0.014 0.988681
## Q25RAD51D 19.79616 3956.18051 0.005 0.996008
## Q28TP53 14.72877 3956.18048 0.004 0.997030
## age 0.04091 0.02353 1.739 0.082100 .
## SexMale 1.18301 0.92644 1.277 0.201619
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 180.71 on 137 degrees of freedom
## Residual deviance: 104.72 on 122 degrees of freedom
## AIC: 136.72
##
## Number of Fisher Scoring iterations: 16
## (Intercept) Q4No Q5No Q6No Q8No Q10No
## 4.573173e-08 5.868136e+00 5.573857e+01 4.046030e+06 6.215628e+01 4.481101e-09
## Q13No Q17No Q18No Q19No Q21PALB2 Q22PMS2
## 2.117661e+01 5.682874e+00 1.547077e+01 3.520226e+01 1.608542e-02 3.726410e-09
## Q25RAD51D Q28TP53 age SexMale
## 3.956958e+08 2.492430e+06 1.041760e+00 3.264197e+00
| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Q4 | |||
| APC | — | — | |
| No | 5.87 | 0.51, 78.8 | 0.2 |
| Q5 | |||
| ATM | — | — | |
| No | 55.7 | 10.7, 472 | <0.001 |
| Q6 | |||
| BARD1 | — | — | |
| No | 4,046,030 | 0.00, |
>0.9 |
| Q8 | |||
| BRCA1 | — | — | |
| No | 62.2 | 14.5, 374 | <0.001 |
| Q10 | |||
| BRIP1 | — | — | |
| No | 0.00 | >0.9 | |
| Q13 | |||
| CHEK2 | — | — | |
| No | 21.2 | 4.64, 124 | <0.001 |
| Q17 | |||
| MLH1 | — | — | |
| No | 5.68 | 0.19, 118 | 0.3 |
| Q18 | |||
| MSH2 | — | — | |
| No | 15.5 | 2.63, 117 | 0.004 |
| Q19 | |||
| MSH6 | — | — | |
| No | 35.2 | 5.99, 309 | <0.001 |
| Q21 | |||
| No | — | — | |
| PALB2 | 0.02 | 0.00, 0.16 | 0.002 |
| Q22 | |||
| No | — | — | |
| PMS2 | 0.00 | >0.9 | |
| Q25 | |||
| No | — | — | |
| RAD51D | 395,695,778 | 0.00, |
>0.9 |
| Q28 | |||
| No | — | — | |
| TP53 | 2,492,430 | 0.00, |
>0.9 |
| age | 1.04 | 1.00, 1.09 | 0.082 |
| Sex | |||
| Female | — | — | |
| Male | 3.26 | 0.52, 21.2 | 0.2 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
## y
## clusters 0 1
## 1 62 47
## 2 26 3
Cancer cases do not cleanly separate and are scattered in multiple regions. Some mild grouping but no distinct clusters. Mutation profiles do not strongly separate gastric cancer vs non-cancer patients in a linear low-dimensional structure (PCA). Clustering structure exists, but is weak and not cleanly distinct in PCA space
Principal component analysis of mutation profiles did not reveal clear separation between gastric cancer and non-cancer patients. Similarly, hierarchical clustering showed only weak group structure, suggesting that gastric cancer risk is not driven by global mutation patterns but rather by specific genetic alterations
PCA to detect global variance structure is weak/none. Clustering to see patient subgroups is weak. LASSO to detect sparse predictive features has some strong signals.
Our data suggests that there are no clear groups, but there are a few important genes. This pattern is common in biomedical data as diseases are often not cluster-driven, but are feature-driven for specific mutations.