There is 3 patient that did not consent and removed.

Note: Of 2362 total survey respondents, 1632 had complete data on 12 key variables (Consent, received germline genetic testing, genetic diagnosis, age at diagnosis (germline mutation), genetic test result, sex at birth, age range, cancer diagnosis, family history of stomach cancer, any gastric lesion, race/ethnicity, anti-acid medication/proton pump inhibitors). We proceed our analysis with n=1632 patients.

Figures

Self-reported cancer diagnosis. Note patients may indicate more than one cancer.

Self-reported race/ethnicity. Note patients may report more than one race/ethnicity.

Self-reported genes. Note patients may report more than one gene.

Self-reported genes among Stomach cancer patients. Note patients may report more than one gene.

Self-reported gastric lesions. Note patients may report more than one lesion.

Table 1. Overall demographics and health characteristics

Characteristic N = 1,6321
Age
    Less than 29 43 (2.6%)
    30-39 173 (11%)
    40-49 373 (23%)
    50-59 394 (24%)
    60-69 410 (25%)
    70 and over 239 (15%)
Sex
    Female 1,552 (95%)
    Male 80 (4.9%)
Race_Ethnicity
    American Indian, Alaska Native, or First Nations 1 (<0.1%)
    Asian or Asian American 24 (1.5%)
    Black or African-American 10 (0.6%)
    Hispanic White 19 (1.2%)
    Hispanic, Latino, or Latinx 38 (2.3%)
    Mixed race/ethnicity 22 (1.3%)
    Non-Hispanic White 1,502 (92%)
    Other 16 (1.0%)
Any_Gastric_Lesion 798 (49%)
H_pylori 100 (6.1%)
Fam_Hist_Stomach_Cancer
    Don't know 272 (17%)
    No 957 (59%)
    Yes 403 (25%)
Stomach_Cancer 50 (3.1%)
1 n (%)

Table 2. Risk factors by any gastric lesion

Any grastric lesion includes any of the following:

  1. Helicobacter Pylori (H. pylori)

  2. Gastritis

  3. Gastric Ulcer

  4. Gastro-esophageal reflux (GERD)

  5. Intestinal Metaplasia

  6. Barrett’s Esophagus

  7. Pernicious Anemia

HR genes: ATM, BARD1, BRCA1, BRCA2, BRIP1, CHEK2, PALB2, RAD51C, RAD51D, NBN

Lynch genes: MLH1, MSH2, MSH6, PMS2, EPCAM

Hereditary Diffuse Gastric Cancer (HDGC) genes: CDH1, CTNNA1

Juvenile Polyposis Syndrome genes: SMAD4, BMPR1A

Peutz-Jeghers Syndrome gene: STK11 (also known as LKB1)

Li-Fraumeni Syndrome gene: TP53

Recessive cancer predisposition syndrome genes: MUTYH, NTHL1

All other genes not included: CDKN2A, ENG, NF1, PTEN

Table 3. Risk factors by H. pylori status

Table 4. Cancers by family history of gastric cancer

Note: n=272 who responded “Don’t know” are not included in table 4.

Table 5. Risk factors by gastric cancer status

Characteristic Overall
N = 782
1
No
N = 732
1
Yes
N = 50
1
p-value
Fam_Hist_Stomach_Cancer


<0.0012
    Don't know 117 (15%) 111 (15%) 6 (12%)
    No 463 (59%) 445 (61%) 18 (36%)
    Yes 202 (26%) 176 (24%) 26 (52%)
Any_Gastric_Lesion


0.0242
    No 402 (51%) 384 (52%) 18 (36%)
    Yes 380 (49%) 348 (48%) 32 (64%)
H_pylori


0.0093
    No 743 (95%) 700 (96%) 43 (86%)
    Yes 39 (5.0%) 32 (4.4%) 7 (14%)
HR


<0.0012
    No 178 (23%) 148 (20%) 30 (60%)
    Yes 604 (77%) 584 (80%) 20 (40%)
Lynch


>0.92
    No 668 (85%) 625 (85%) 43 (86%)
    Yes 114 (15%) 107 (15%) 7 (14%)
HDGC


<0.0013
    No 731 (93%) 701 (96%) 30 (60%)
    Yes 51 (6.5%) 31 (4.2%) 20 (40%)
Anti_acid_ever


>0.92
    No 326 (42%) 305 (42%) 21 (42%)
    Yes 456 (58%) 427 (58%) 29 (58%)
Anti_acid_time


<0.0013
    No 326 (42%) 305 (42%) 21 (43%)
    Daily 166 (21%) 146 (20%) 20 (41%)
    1-4 times weekly 115 (15%) 115 (16%) 0 (0%)
    1-4 times monthly 44 (5.6%) 42 (5.7%) 2 (4.1%)
    1-4 times in past 6 months 102 (13%) 99 (14%) 3 (6.1%)
    Other/As needed 28 (3.6%) 25 (3.4%) 3 (6.1%)
    Missing 1 0 1
1 n (%)
2 Pearson’s Chi-squared test
3 Fisher’s exact test

Let’s look at cell counts for (gastic cancer/no cancer), H pylori, and HR

##                              HR  No Yes
## Stomach_ORNO_Cancer H_pylori           
## No                  No          144 556
##                     Yes           4  28
## Yes                 No           27  16
##                     Yes           3   4

Table 6. Risk factors by (gastric cancer or any gastric lesion)

Cases: Gastric cancer, or any gastric lesion

Controls: (Non-gastric cancer and no family history of gastric cancer) or no cancer

Note we are assuming comparability of groups.

Characteristic Overall
N = 1,200
1
0
N = 384
1
1
N = 816
1
p-value
H_pylori


<0.0012
    No 1,100 (92%) 384 (100%) 716 (88%)
    Yes 100 (8.3%) 0 (0%) 100 (12%)
Anti_acid_ever


<0.0012
    No 417 (35%) 226 (59%) 191 (23%)
    Yes 783 (65%) 158 (41%) 625 (77%)
HR


<0.0012
    No 298 (25%) 65 (17%) 233 (29%)
    Yes 902 (75%) 319 (83%) 583 (71%)
Lynch


0.0022
    No 976 (81%) 332 (86%) 644 (79%)
    Yes 224 (19%) 52 (14%) 172 (21%)
HDGC


0.0112
    No 1,137 (95%) 373 (97%) 764 (94%)
    Yes 63 (5.3%) 11 (2.9%) 52 (6.4%)
FAP


0.72
    No 1,178 (98%) 376 (98%) 802 (98%)
    Yes 22 (1.8%) 8 (2.1%) 14 (1.7%)
Juvenile_Polyposis_Syndrome


>0.93
    No 1,199 (100%) 384 (100%) 815 (100%)
    Yes 1 (<0.1%) 0 (0%) 1 (0.1%)
Peutz_Jeghers_Syndrome


0.63
    No 1,197 (100%) 384 (100%) 813 (100%)
    Yes 3 (0.3%) 0 (0%) 3 (0.4%)
Li_Fraumeni_Syndrome


0.23
    No 1,195 (100%) 384 (100%) 811 (99%)
    Yes 5 (0.4%) 0 (0%) 5 (0.6%)
Recessive_Cancer_Predisposition


0.23
    No 1,186 (99%) 382 (99%) 804 (99%)
    Yes 14 (1.2%) 2 (0.5%) 12 (1.5%)
Stomach_Genes


0.0182
    No 36 (3.0%) 5 (1.3%) 31 (3.8%)
    Yes 1,164 (97%) 379 (99%) 785 (96%)
Other_genes



    No 1,200 (100%) 384 (100%) 816 (100%)
1 n (%)
2 Pearson’s Chi-squared test
3 Fisher’s exact test

Table 7. Associations between cancer-predisposing gene groups and risk for (gastric cancer or any gastric lesion).

Models are adjusted for sex and current age category. 95% CI uses the profile likelihood method.

Gene No cancer/Non-gastric cancer & No family history n=384 Gastric/Lesion n=816 OR (95% CI) P-Value
HRYes 319 (83.07) 583 (71.45) 0.42 (0.3-0.58) 0.0000002
LynchYes 52 (13.54) 172 (21.08) 1.83 (1.29-2.62) 0.0008615
HDGCYes 11 (2.86) 52 (6.37) 2.79 (1.46-5.79) 0.0031809
FAPYes 8 (2.08) 14 (1.72) 1.07 (0.44-2.76) 0.8871420
Recessive_Cancer_PredispositionYes 2 (0.52) 12 (1.47) 2.69 (0.69-17.95) 0.2124594
Stomach_GenesYes 379 (98.7) 785 (96.2) 0.25 (0.08-0.6) 0.0047796
BRCA1Yes 81 (21.09) 134 (16.42) 0.69 (0.5-0.96) 0.0247509
BRCA2Yes 138 (35.94) 193 (23.65) 0.54 (0.41-0.71) 0.0000118
CDH1Yes 11 (2.86) 50 (6.13) 2.67 (1.4-5.57) 0.0048704
H_pyloriYes 0 (0) 100 (12.25) 20163339.96 (83.72-2.26938371154776e+70) 0.9645141

Interaction effects - (Unable to do, 0 cell)

Parameter Odds Ratio SE 95% CI z p
HR (Yes) 0.38 0.06 (0.28, 0.53) -5.68 < .001
H pylori (Yes) 9.66e+06 8.93e+09 (0.00, Inf) 0.02 0.986
HR (Yes) × H pylori (Yes) 2.65 2682.28 (0.00, Inf) 9.65e-04 > .999

Table 8. Stomach Cancer: Interaction effects of HR and HDGC genes with H. pylori any gastric lesion

Models adjusted for sex and entry age category. 95% CI uses the profile likelihood method. Note that interaction analysis is likely underpowered due to low n=50 stomach cases.

8a. HR x H. pylori

##              Stomach_ORNO_Cancer  No Yes
## HR  H_pylori                            
## No  No                           144  27
##     Yes                            4   3
## Yes No                           556  16
##     Yes                           28   4
Parameter Odds Ratio SE 95% CI z p
HR (Yes) 0.14 0.05 (0.07, 0.26) -5.88 < .001
H pylori (Yes) 3.63 2.98 (0.65, 18.30) 1.58 0.115
HR (Yes) × H pylori (Yes) 1.44 1.46 (0.19, 10.95) 0.36 0.720

8b. HR x any gastric lesion

##                        Stomach_ORNO_Cancer  No Yes
## HR  Any_Gastric_Lesion                            
## No  No                                      65  13
##     Yes                                     83  17
## Yes No                                     319   5
##     Yes                                    265  15
Parameter Odds Ratio SE 95% CI z p
HR (Yes) 0.07 0.04 (0.02, 0.20) -4.73 < .001
Any Gastric Lesion (Yes) 0.95 0.40 (0.42, 2.18) -0.13 0.900
HR (Yes) × Any Gastric Lesion (Yes) 3.35 2.26 (0.92, 13.38) 1.79 0.073

8c. HDGC x H. pylori

##               Stomach_ORNO_Cancer  No Yes
## HDGC H_pylori                            
## No   No                           669  24
##      Yes                           32   6
## Yes  No                            31  19
##      Yes                            0   1
Parameter Odds Ratio SE 95% CI z p
HDGC (Yes) 18.95 7.29 (8.94, 40.71) 7.65 < .001
H pylori (Yes) 5.48 2.78 (1.88, 14.10) 3.36 < .001
HDGC (Yes) × H pylori (Yes) 6.50e+05 5.74e+08 (7.58e-73, ) 0.02 0.988

8d. HDGC x any gastric lesion

##                         Stomach_ORNO_Cancer  No Yes
## HDGC Any_Gastric_Lesion                            
## No   No                                     373   6
##      Yes                                    328  24
## Yes  No                                      11  12
##      Yes                                     20   8
Parameter Odds Ratio SE 95% CI z p
HDGC (Yes) 76.55 47.73 (23.61, 279.09) 6.96 < .001
Any Gastric Lesion (Yes) 4.20 1.96 (1.79, 11.50) 3.08 0.002
HDGC (Yes) × Any Gastric Lesion (Yes) 0.08 0.06 (0.02, 0.35) -3.23 0.001

Table 9. Colorectal Cancer: Interaction effects of HR and Lynch genes with H. pylori

Models adjusted for sex and entry age category. 95% CI uses the profile likelihood method. Note that interaction analysis is likely underpowered due to low n=72 colorectal cases. There are cells with 0 value.

9a. HR x H. pylori

## , ,  = No
## 
##      
##         No  Yes
##   No   308   14
##   Yes 1161   77
## 
## , ,  = Yes
## 
##      
##         No  Yes
##   No    46    3
##   Yes   17    6
Parameter Odds Ratio SE 95% CI z p
HR (Yes) 0.08 0.02 (0.04, 0.14) -8.41 < .001
H pylori (Yes) 1.09 0.74 (0.23, 3.72) 0.13 0.900
HR (Yes) × H pylori (Yes) 4.82 4.08 (0.96, 28.50) 1.86 0.063

9b. Lynch x H. pylori

## , ,  = No
## 
##      
##         No  Yes
##   No  1241   82
##   Yes  228    9
## 
## , ,  = Yes
## 
##      
##         No  Yes
##   No    19    2
##   Yes   44    7
Parameter Odds Ratio SE 95% CI z p
Lynch (Yes) 13.74 3.99 (7.89, 24.81) 9.01 < .001
H pylori (Yes) 1.50 1.13 (0.23, 5.35) 0.54 0.592
Lynch (Yes) × H pylori (Yes) 2.52 2.38 (0.45, 20.67) 0.98 0.325
##                                                  H_pylori   No  Yes
## Race_Ethnicity                                                     
## American Indian, Alaska Native, or First Nations             1    0
## Asian or Asian American                                     19    5
## Black or African-American                                   10    0
## Hispanic White                                              19    0
## Hispanic, Latino, or Latinx                                 29    9
## Mixed race/ethnicity                                        18    4
## Non-Hispanic White                                        1421   81
## Other                                                       15    1

Locations of the 41 NHWs that have H. pylori

## 
##          California            Colorado         Connecticut             Florida 
##                   7                   2                   2                   5 
##             Georgia                Iowa           Isla Wake              Kansas 
##                   1                   1                   1                   2 
##            Kentucky               Maine            Maryland           Minnesota 
##                   1                   2                   1                   2 
##            Missouri             Montana       New Hampshire          New Jersey 
##                   1                   1                   1                   3 
##            New York      North Carolina        North Dakota                Ohio 
##                   4                  11                   1                   3 
##              Oregon Prefer not to share      South Carolina               Texas 
##                   2                   8                   1                   6 
##                Utah            Virginia          Washington       West Virginia 
##                   1                   1                   6                   1 
##           Wisconsin 
##                   3

Exploring the outcome and exposures

Considering Dr. Nuno’s supervised learning with penalized logistic regression (LASSO)

## [1] 0.008249971
## 23 x 1 sparse Matrix of class "dgCMatrix"
##                     s1
## (Intercept)  1.3298938
## Q4          -1.1754926
## Q5          -3.1477437
## Q6          -1.2546994
## Q7           .        
## Q8          -3.3529903
## Q10          3.2172535
## Q12          .        
## Q13         -2.1302120
## Q15          .        
## Q16          .        
## Q17         -0.3752814
## Q18         -1.7291057
## Q19         -2.1982152
## Q20          .        
## Q21         -2.3113357
## Q22         -4.6991217
## Q23          .        
## Q24          .        
## Q25          3.2175091
## Q26          .        
## Q27          .        
## Q28          0.9061810
## 23 x 1 Matrix of class "dgeMatrix"
##                       s1
## (Intercept)  3.780641719
## Q4           0.308666905
## Q5           0.042948925
## Q6           0.285161566
## Q7           1.000000000
## Q8           0.034979597
## Q10         24.959474354
## Q12          1.000000000
## Q13          0.118812107
## Q15          1.000000000
## Q16          1.000000000
## Q17          0.687095927
## Q18          0.177443022
## Q19          0.111001101
## Q20          1.000000000
## Q21          0.099128758
## Q22          0.009103269
## Q23          1.000000000
## Q24          1.000000000
## Q25         24.965855270
## Q26          1.000000000
## Q27          1.000000000
## Q28          2.474853062
## 0.8677273
## 
## Call:
## glm(formula = gc_outcome ~ ., family = "binomial", data = df2[, 
##     c("gc_outcome", c(selected_vars, "age", "Sex"))])
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -16.90047 5594.88585  -0.003 0.997590    
## Q4No           1.76954    1.24400   1.422 0.154894    
## Q5No           4.02067    0.93945   4.280 1.87e-05 ***
## Q6No          15.21325 3956.18051   0.004 0.996932    
## Q8No           4.12965    0.81485   5.068 4.02e-07 ***
## Q10No        -19.22340 3956.18051  -0.005 0.996123    
## Q13No          3.05290    0.82703   3.691 0.000223 ***
## Q17No          1.73746    1.51635   1.146 0.251871    
## Q18No          2.73895    0.94947   2.885 0.003918 ** 
## Q19No          3.56111    0.98727   3.607 0.000310 ***
## Q21PALB2      -4.12984    1.32272  -3.122 0.001795 ** 
## Q22PMS2      -19.40782 1368.06309  -0.014 0.988681    
## Q25RAD51D     19.79616 3956.18051   0.005 0.996008    
## Q28TP53       14.72877 3956.18048   0.004 0.997030    
## age            0.04091    0.02353   1.739 0.082100 .  
## SexMale        1.18301    0.92644   1.277 0.201619    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 180.71  on 137  degrees of freedom
## Residual deviance: 104.72  on 122  degrees of freedom
## AIC: 136.72
## 
## Number of Fisher Scoring iterations: 16
##  (Intercept)         Q4No         Q5No         Q6No         Q8No        Q10No 
## 4.573173e-08 5.868136e+00 5.573857e+01 4.046030e+06 6.215628e+01 4.481101e-09 
##        Q13No        Q17No        Q18No        Q19No     Q21PALB2      Q22PMS2 
## 2.117661e+01 5.682874e+00 1.547077e+01 3.520226e+01 1.608542e-02 3.726410e-09 
##    Q25RAD51D      Q28TP53          age      SexMale 
## 3.956958e+08 2.492430e+06 1.041760e+00 3.264197e+00
Characteristic OR 95% CI p-value
Q4


    APC
    No 5.87 0.51, 78.8 0.2
Q5


    ATM
    No 55.7 10.7, 472 <0.001
Q6


    BARD1
    No 4,046,030 0.00,
>0.9
Q8


    BRCA1
    No 62.2 14.5, 374 <0.001
Q10


    BRIP1
    No 0.00
>0.9
Q13


    CHEK2
    No 21.2 4.64, 124 <0.001
Q17


    MLH1
    No 5.68 0.19, 118 0.3
Q18


    MSH2
    No 15.5 2.63, 117 0.004
Q19


    MSH6
    No 35.2 5.99, 309 <0.001
Q21


    No
    PALB2 0.02 0.00, 0.16 0.002
Q22


    No
    PMS2 0.00
>0.9
Q25


    No
    RAD51D 395,695,778 0.00,
>0.9
Q28


    No
    TP53 2,492,430 0.00,
>0.9
age 1.04 1.00, 1.09 0.082
Sex


    Female
    Male 3.26 0.52, 21.2 0.2
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Unsupervised

##         y
## clusters  0  1
##        1 62 47
##        2 26  3

Cancer cases do not cleanly separate and are scattered in multiple regions. Some mild grouping but no distinct clusters. Mutation profiles do not strongly separate gastric cancer vs non-cancer patients in a linear low-dimensional structure (PCA). Clustering structure exists, but is weak and not cleanly distinct in PCA space

Principal component analysis of mutation profiles did not reveal clear separation between gastric cancer and non-cancer patients. Similarly, hierarchical clustering showed only weak group structure, suggesting that gastric cancer risk is not driven by global mutation patterns but rather by specific genetic alterations

PCA to detect global variance structure is weak/none. Clustering to see patient subgroups is weak. LASSO to detect sparse predictive features has some strong signals.

Our data suggests that there are no clear groups, but there are a few important genes. This pattern is common in biomedical data as diseases are often not cluster-driven, but are feature-driven for specific mutations.