The data I have used is from esoph data set within R. The data represents the number of cases espohegal cancer in individuals including detail around their smoking and drinking habits. From this data I was trying to understand the relationships between alcohol consumption, smoking and esophegal cancer.
In the first graph below we see the number of agregated cases of the cancer occurring in individuals based on the amount of tobacco smoked. From this we can see a larger number of individuals smoking a smaller quantity of tobacco have got esophegal cancer.
Objectively, this would lead to the conclusion that individuals smoking smaller amounts of tobbaco are more likely to get eosphegal cancer. However, a confounding variable exists in this data set being the age of the individuals.
Including the age category significantly changes the conclusions drawn from Figure 1. A much larger number of cases have been detected in individuals residing in the age brackets of above 45 years. This indicates, that the cancer takes decades to develop, with a small number of cases ocurring in the 25-34 age bracket.
Contrary to the original conclusion of “individuals smoking smaller amounts of tobbaco are more likely to get eosphegal cancer”, the amount is actually not significant. The better predictor of espohegal cancer is actually what age cateogry the individual smoking falls into.