2025-03-30

Purpose

This presentation explores the relationship between esophageal cancer cases and factors, such as age, alcohol consumption, and tobacco consumption, using the esoph dataset from R. We will visualize the data and perform a statistical analysis.

This analysis will help in understanding trends in cancer cases across different demographic groups.

What is the Esoph dataset?

The esoph dataset contains data from a study of esophageal cancer cases in relation to tobacco and alcohol consumption. Variables include:

  • agegp: Age group
  • alcgp: Alcohol consumption group
  • tobgp: Tobacco consumption group
  • ncases: Number of cancer cases
  • ncontrols: Number of control (non-cancer) cases

This dataset is part of R’s built-in datasets and offers insights into how these factors contribute to cancer incidence.

Relationship between age group vs cancer cases.

Relationship between alcohol consumption vs cancer cases.

Tobacco Consumption vs Cancer Cases

Cancer Cases by Alcohol Consumption and Age Group

This shows the relationship between alcohol consumption, age group, and cancer cases, where we are able to visualize how the cancer cases vary across different levels of alcohol consumption and age group.

Descriptive Statistics

In this slide, we will calculate descriptive statistics such as the mean, median, standard deviation, and 5-number summary for the ncases (number of cancer cases) variable.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   2.273   4.000  17.000
## [1] 2.753169

The lower standard deviation could indicate factors such as age, alcohol consumption, or tobacco use are not leading to drastic differences in cancer cases.

Poisson Regression

We used a Poisson regression model to predict the number of cancer cases based on alcohol consumption and age group.

##                    Term    Estimate  StdError     zValue         Pr_z
## (Intercept) (Intercept)  0.07715412 0.1876141  0.4112384 6.808978e-01
## alcgp.L         alcgp.L  0.22439744 0.1649993  1.3599903 1.738330e-01
## alcgp.Q         alcgp.Q -0.55904293 0.1498362 -3.7310277 1.907002e-04
## alcgp.C         alcgp.C  0.38466834 0.1332445  2.8869362 3.890131e-03
## agegp.L         agegp.L  2.40937945 0.6338580  3.8011344 1.440351e-04
## agegp.Q         agegp.Q -2.64596597 0.5735464 -4.6133422 3.962452e-06
## agegp.C         agegp.C -0.06552884 0.4336948 -0.1510943 8.799013e-01
## agegp^4         agegp^4  0.03398998 0.2917143  0.1165180 9.072420e-01
## agegp^5         agegp^5 -0.08963839 0.1760185 -0.5092554 6.105732e-01
##              RateRatio
## (Intercept)  1.0802085
## alcgp.L      1.2515683
## alcgp.Q      0.5717560
## alcgp.C      1.4691270
## agegp.L     11.1270541
## agegp.Q      0.0709368
## agegp.C      0.9365720
## agegp^4      1.0345742
## agegp^5      0.9142617

Poisson Results

Age shows a significant relationship with cancer cases. The rate of cancer cases increases with age, but the effect decreases at higher age groups.

Alcohol consumption also has a significant impact, with the rate of cancer cases changing non-linearly with alcohol intake.

The cubic and higher-order terms for age (agegp.C, agegp^4, agegp^5) are not statistically significant.

The linear alcohol group (alcgp.L) is not significant at the 0.05 level.

Conclusion

The analysis shows:

  • Age: Cancer cases increase with age, though the effect decreases in older groups.

  • Alcohol: A significant non-linear relationship with cancer cases.

  • Poisson Regression: Both age and alcohol consumption significantly impact cancer cases, while tobacco consumption did not show significant effects after accounting for the other factors.

These findings highlight the importance of age and alcohol in predicting cancer risk.