## Final analytical sample N = 27
## Race/Ethnicity:
## 
## Non-Hispanic White Non-Hispanic Black           Hispanic 
##                  9                  9                  9
## 
## Sex (male):
## 
## Female   Male 
##     18      9
## 
## Age Group:
## 
## Under 50    50-64      65+ 
##        9        9        9

Abstract

Pancreatic cancer is one of the deadliest cancers in the United States, with few early screening options and poor survival rates. Understanding demographic inequalities in incidence is crucial to focused prevention and resource allocation.

Data was collected from the Surveillance, Epidemiology, and End Results (SEER) Program. This study used a cross-sectional, population-level design, with N = 18 stratum-level observations categorized by race/ethnicity, age group, and gender. The outcome was the age-adjusted pancreatic cancer incidence rate per 100,000 individuals. Multivariable linear regression was used to examine the relationship between incidence and demographic factors while adjusting for age group and gender.

In the adjusted model, Non-Hispanic Black populations had higher incidence rates than Non-Hispanic White populations (β = 5.57; 95% CI: -1.30, 12.44; p = 0.103), while Hispanic populations had lower rates (β = -3.90; 95% CI: -10.77, 2.97; p = 0.240). However, neither was statistically significant. Age was the highest significant predictor (p < 0.001), with greater rates in older groups. Males exhibited much higher incidence rates compared to females (p = 0.0275). Age and gender are the key predictors of pancreatic cancer occurrence. Although racial disparities existed, they were not statistically significant.

1 Introduction

Pancreatic cancer remains one of the most aggressive and deadly illnesses worldwide, with a five-year survival rate of less than 15%. Despite developments in treatment, early detection is still limited, leading to poor outcomes. Pancreatic cancer is a significant public health concern in the United States, owing to its high incidence and mortality rate. Epidemiological studies have found that pancreatic cancer incidence varies by age, gender, and race/ethnicity. Older people have much greater incidence rates, and males are frequently reported to have a larger risk than females. Furthermore, several studies have shown that Non-Hispanic Black individuals may have greater incidence rates than other racial groups, possibly due to variations in socioeconomic status, access to healthcare, environmental exposures, and underlying comorbidities.

2 Methods

2.1 Data Source and Study Design

Data for this study were obtained from the Surveillance, Epidemiology, and End Results (SEER) Program, maintained by the National Cancer Institute. SEER provides population-based cancer incidence data across multiple U.S. regions. This study utilized a cross-sectional design based on aggregated incidence rates.

2.2 Study Population

The analytical sample consisted of N = 18 observations, representing all combinations of:

  • Race/Ethnicity: Non-Hispanic White, Non-Hispanic Black, Hispanic
  • Age Group: Under 50, 50–64, 65+
  • Sex: Female, Male

2.3 Variables

Outcome: Age-adjusted pancreatic cancer incidence rate per 100,000 persons.

Primary Exposure: Race/ethnicity (reference: Non-Hispanic White).

Covariates:

  • Age group (reference: Under 50)
  • Sex (reference: Female)

Age and Sex were included as confounders due to their strong epidemiological associations with pancreatic cancer incidence.

Missing data were minimal, and complete case analysis was used. Given the aggregated nature of the dataset, missingness was not expected to substantially bias results.

2.4 Statistical Analysis

Ordinary least squares (OLS) linear regression was used to estimate differences in incidence rates.

Two models were fitted:

  • Unadjusted model (race only)
  • Adjusted model (race + age + sex)

All analyses were conducted in R using standard statistical packages. Statistical significance was defined as α = 0.05.

3 Results

3.1 Descriptive Statistics

The analytical sample included 18 demographic strata representing combinations of race, age group, and sex. Incidence rates were lowest in individuals under 50 and highest in the 65+ age group.

3.2 Regression Results

Unadjusted Model: No statistically significant differences were observed between racial groups. Non-Hispanic Black populations had higher rates than Non-Hispanic White populations, while Hispanic populations had lower rates; however, confidence intervals were wide and included zero.

Adjusted Model: After adjusting for age and sex:

  • Non-Hispanic Black: β = 5.57 (95% CI: -1.30, 12.44; p = 0.103)
  • Hispanic: β = -3.90 (95% CI: -10.77, 2.97; p = 0.240)

Neither association reached statistical significance. Age was the strongest predictor (p < 0.001), with substantially higher incidence in older groups. Males also had significantly higher rates compared to females (p = 0.0275).

3.3 Model Diagnostics

Model diagnostics indicated that key assumptions were met:

  • Residuals were approximately normally distributed
  • No strong heteroscedasticity observed
  • No influential observations (Cook’s D < 1)

3.4 Visualizations

Figures demonstrated:

  • Overlapping confidence intervals across racial groups
  • Strong effect sizes for age
  • Positive association for male sex

4 Discussion

This study examined demographic differences in pancreatic cancer incidence using SEER registry data. The primary finding is that age and sex are the strongest predictors of pancreatic cancer incidence, while racial differences were not statistically significant after adjustment.

The strong association between age and incidence is consistent with established literature, as pancreatic cancer risk increases substantially with aging. Similarly, the higher incidence observed among males aligns with prior epidemiological findings, potentially reflecting behavioral, hormonal, or environmental differences.

Although Non-Hispanic Black populations exhibited higher incidence rates and Hispanic populations exhibited lower rates relative to Non-Hispanic White populations, these differences were not statistically significant. This may be due to limited statistical power given the small sample size (N = 18) or the use of aggregated data, which may mask within-group variability.

Several limitations should be considered. First, the small sample size reduces statistical power and limits the ability to detect significant differences. Second, the use of aggregated stratum-level data prevents adjustment for individual-level confounders such as smoking, obesity, and access to healthcare. Third, potential residual confounding and measurement limitations inherent to registry data may influence results.

Despite these limitations, this study highlights the dominant role of age and sex in pancreatic cancer incidence and suggests that observed racial differences may require further investigation using larger, individual-level datasets.

From a public health perspective, these findings support the need for age-targeted screening strategies and continued investigation into structural determinants of cancer disparities.

Table 1. Descriptive Statistics for the Analytical Sample

Table 1. Descriptive Statistics for the Analytical Sample (N = 18), Stratified by Race/Ethnicity
Characteristic1 Overall
N = 27
1
Non-Hispanic White
N = 9
1
Non-Hispanic Black
N = 9
1
Hispanic
N = 9
1
Age Group



    Under 50 9 (33%) 3 (33%) 3 (33%) 3 (33%)
    50-64 9 (33%) 3 (33%) 3 (33%) 3 (33%)
    65+ 9 (33%) 3 (33%) 3 (33%) 3 (33%)
Sex



    Female 18 (67%) 6 (67%) 6 (67%) 6 (67%)
    Male 9 (33%) 3 (33%) 3 (33%) 3 (33%)
Incidence Rate (per 100,000) 34.47 (34.89) 33.92 (37.33) 39.48 (40.41) 30.02 (32.52)
    Unknown 9 3 3 3
1 Data source: SEER*Explorer, 2018-2022, 21 registries. Continuous variables: mean (SD). Categorical variables: n (%). Incidence rates are age-adjusted per 100,000 persons.

Table 2. Unadjusted and Adjusted Linear Regression Results

Table 2. Linear Regression of Pancreatic Cancer Incidence Rate on Race/Ethnicity, Unadjusted (Model 1) and Adjusted for Age Group and Sex (Model 2)
Characteristic1
Model 1: Unadjusted
Model 2: Adjusted
β (Unadjusted)1 95% CI1 p-value1 β (Adjusted)1 95% CI1 p-value1
(Intercept) 33.92 1.81, 66.02 0.040 30.69 25.08, 36.30 <0.001
Race/Ethnicity





    Non-Hispanic White

    Non-Hispanic Black 5.57 -39.84, 50.97 0.797 5.57 -1.30, 12.44 0.103
    Hispanic -3.90 -49.31, 41.51 0.857 -3.90 -10.77, 2.97 0.240
Age Group





    age_group.L


55.51 50.65, 60.37 <0.001
    age_group.Q


15.21 10.36, 20.07 <0.001
Sex





    Female



    Male


6.46 0.85, 12.06 0.028
1 Outcome: age-adjusted pancreatic cancer incidence rate per 100,000 persons. Reference categories: Non-Hispanic White (race/ethnicity); Under 50 (age group); Female (sex). β = regression coefficient (cases per 100,000). CI = 95% confidence interval. N = 18 stratum-level observations.
Abbreviation: CI = Confidence Interval

Table 3. Sensitivity Analysis — Race × Age Interaction Model

Table 3. Sensitivity Analysis: Adjusted Model with Race × Age Group Interaction (Model 3)
Characteristic1 Beta1 95% CI1 p-value1
Race/Ethnicity


    Non-Hispanic White
    Non-Hispanic Black 5.57 -0.95, 12.09 0.084
    Hispanic -3.90 -10.42, 2.62 0.205
Age Group


    age_group.L 55.58 47.59, 63.56 <0.001
    age_group.Q 16.37 8.39, 24.35 0.001
Sex


    Female
    Male 6.46 1.13, 11.78 0.023
Race/Ethnicity * Age Group


    Non-Hispanic Black * age_group.L 6.26 -5.03, 17.55 0.237
    Hispanic * age_group.L -6.47 -17.76, 4.82 0.223
    Non-Hispanic Black * age_group.Q -1.82 -13.11, 9.47 0.720
    Hispanic * age_group.Q -1.65 -12.94, 9.64 0.744
1 Interaction terms test whether the racial disparity in incidence varies by age group. Reference categories: Non-Hispanic White; Under 50; Female. N = 18; interpret with caution (limited df). ANOVA F-test p-value for interaction terms reported in text.
Abbreviation: CI = Confidence Interval

Figure 1. Adjusted Predicted Pancreatic Cancer Incidence Rates by Race/Ethnicity

Figure 1. Adjusted predicted pancreatic cancer incidence rates (per 100,000 persons) by race/ethnicity from the multivariable linear regression model (Model 2), holding age group at 50–64 and sex at Female. Points represent predicted values; error bars represent 95% confidence intervals.

Figure 1. Adjusted predicted pancreatic cancer incidence rates (per 100,000 persons) by race/ethnicity from the multivariable linear regression model (Model 2), holding age group at 50–64 and sex at Female. Points represent predicted values; error bars represent 95% confidence intervals.

Figure 2. Coefficient Forest Plot — Adjusted Model (Model 2)

Figure 2. Coefficient (forest) plot showing estimated differences in age-adjusted pancreatic cancer incidence rate (per 100,000 persons) for each predictor in the adjusted model (Model 2) relative to reference categories. Points represent point estimates; horizontal lines represent 95% confidence intervals. Dashed vertical line indicates the null (β = 0). Reference categories: Non-Hispanic White (race/ethnicity), Under 50 (age group), Female (sex).

Figure 2. Coefficient (forest) plot showing estimated differences in age-adjusted pancreatic cancer incidence rate (per 100,000 persons) for each predictor in the adjusted model (Model 2) relative to reference categories. Points represent point estimates; horizontal lines represent 95% confidence intervals. Dashed vertical line indicates the null (β = 0). Reference categories: Non-Hispanic White (race/ethnicity), Under 50 (age group), Female (sex).

Figure 3. Standard Diagnostic Plots — Adjusted Model (Model 2)

Figure 3. Standard linear regression diagnostic plots for the adjusted model (Model 2): Residuals vs. Fitted (linearity and homoscedasticity), Normal Q-Q (normality of residuals), Scale-Location (homoscedasticity), and Residuals vs. Leverage (influential observations). No Cook's distance threshold of 1 is exceeded.

Figure 3. Standard linear regression diagnostic plots for the adjusted model (Model 2): Residuals vs. Fitted (linearity and homoscedasticity), Normal Q-Q (normality of residuals), Scale-Location (homoscedasticity), and Residuals vs. Leverage (influential observations). No Cook’s distance threshold of 1 is exceeded.

Figure 4. Cook’s Distance — Influential Observation Check

Figure 4. Cook's distance for each observation (stratum) in the adjusted model (Model 2). Red dashed line: Cook's D = 1 (conventional influential threshold). Orange dashed line: 4/N = 0.22 rule-of-thumb threshold. No observation exceeds Cook's D = 1; several 65+ strata exceed the 4/N threshold, reflecting their high-leverage incidence values.

Figure 4. Cook’s distance for each observation (stratum) in the adjusted model (Model 2). Red dashed line: Cook’s D = 1 (conventional influential threshold). Orange dashed line: 4/N = 0.22 rule-of-thumb threshold. No observation exceeds Cook’s D = 1; several 65+ strata exceed the 4/N threshold, reflecting their high-leverage incidence values.