The aim of this report was to investigate trends in lung cancer mortality in Australian males and females over time in Australia and determine the relationship between age and cancer mortality in 2017.
Historically, males have suffered from a significantly higher lung cancer mortality rate than females. Interestingly, from 1982 to 2017 the lung cancer-associated mortality rate was seen to decrease in males in a linear fashion. Conversely, over the same period, lung-cancer-associated mortality rate was seen to increase in females. It appears therefore, that the gap in mortality due to lung cancer in both sexes is narrowing, a trend attributed to historical smoking behaviours of two sexes. Furthermore, in 2017, males suffered from greater lung cancer-attributed mortality than females across all age groups. Importantly, in congruence with literature, older age has been shown to be positively correlated to cancer mortality in both sexes.
The dataset was sourced from the Australian Government, Australian Institute of Health and Welfare.
This dataset is valid because it has been obtained from a reliable source - Australian Government and has been rigorously collected through the standardised reporting process. The dataset represents the Australian population, as all states and territories have legislations mandating the notification of cancer diagnosis and incident rates (AIHW, 2019).
There are several issues that need to be taken into account when analysing this dataset. For example, missing data points for morbidity and mortality statistics, indicate that data has not been collected for certain cancers during certain years. Furthermore, the dataset includes projection data (2018-2021), which may not be representative of the population and is hence excluded. Lastly, the dataset may omit some cancer patients, such as those who have been treated overseas.
There are several Australian and global stakeholders that could benefit from analysis of this dataset. In particular, its availability to Australian public can increase cancer awareness and promote prevention strategies. Furthermore, it could be utilised by Australian healthcare organisations such as NSW Health, to guide cancer prevention initiatives and allocate resources to cancer treatment. Moreover, regulatory bodies that implement policies can access this data to have a better understanding to inform policy-making and perform future projections. The data would also be important for the media to advertise prevention measures, such as anti-smoking campaigns. On a state and federal government level, this data would guide fund allocation to healthcare in the budget. Lastly, the data could be used by international organisations such as WHO to monitor global trends in lung cancer.
There are 9 variables in the dataset: data type, cancer group, sex, year, age group, incidence counts, mortalty counts, incidence rate and mortality rate which correspond to dataset columns (table 1). Each row contains data entries corresponding to these variables.
| Variable | Type | Description |
|---|---|---|
| ï..Type | Character | Determines whether data is real or projected |
| Cancer.group.site | Character | Type of cancer (e.g. Lung cancer) |
| Year | Integer | Year data was recorded (1982-2017) |
| Sex | Character | Gender (Male, female or combined) |
| Age.group..years. | Character | Age bracket of patient (increments of 10) |
| Incidence.Count | Character | Number of individuals diagnosed with cancer |
| Mortality.Count | Character | Number of individuals who have died from cancer |
| Age.specific.incidence.rate..per.100.000. | Character | Number of new cases as a proportion of number of people at risk of cancer |
| Age.specific.mortality.rate..per.100.000. | Character (altered to numeric) | Number of deaths due as a proportion of total population |
Table 1. Variables in the Australian lung cancer incidence and mortality rates dataset
## read in data
data = read.csv("data/cancer_incidence_mortality.csv", header = TRUE)
## show classification of variables
names(data)
## [1] "ï..Type"
## [2] "Cancer.group.site"
## [3] "Year"
## [4] "Sex"
## [5] "Age.group..years."
## [6] "Incidence.Count"
## [7] "Age.specific.incidence.rate..per.100.000."
## [8] "Mortality.Count"
## [9] "Age.specific.mortality.rate..per.100.000."
str(data)
## 'data.frame': 91146 obs. of 9 variables:
## $ ï..Type : chr "Actual" "Actual" "Actual" "Actual" ...
## $ Cancer.group.site : chr "Acute lymphoblastic leukaemia" "Acute lymphoblastic leukaemia" "Acute lymphoblastic leukaemia" "Acute lymphoblastic leukaemia" ...
## $ Year : int 1982 1982 1982 1982 1982 1982 1982 1982 1982 1982 ...
## $ Sex : chr "Males" "Males" "Males" "Males" ...
## $ Age.group..years. : chr "00-09" "10-19" "20-29" "30-39" ...
## $ Incidence.Count : chr "67" "28" "7" "8" ...
## $ Age.specific.incidence.rate..per.100.000.: chr "5.5" "2.1" "0.5" "0.7" ...
## $ Mortality.Count : chr "19" "14" "16" "7" ...
## $ Age.specific.mortality.rate..per.100.000.: chr "1.6" "1" "1.2" "0.6" ...
How has lung cancer mortality in all males and all females changed from 1968-2017 in Australia?
## write code here
data_actual <- subset(data, data$ï..Type != "Projections")
#Excluding projected data from analysis
Year_Mortality <- subset(data_actual, select = c(2,3,4,5,9))
#Limiting dataset to columns for Cancer type, Year, Sex, Age group and Mortality
Lung_Mortality <- subset(Year_Mortality, Year_Mortality$Cancer.group.site == "Lung cancer")
#Limiting dataset to rows for “Lung cancer” only
Lung_Mortality$Age.specific.mortality.rate..per.100.000. <- as.numeric(Lung_Mortality$Age.specific.mortality.rate..per.100.000.)
#Changing mortality rate to be classified as a “numeric”
str(Lung_Mortality)
## 'data.frame': 1188 obs. of 5 variables:
## $ Cancer.group.site : chr "Lung cancer" "Lung cancer" "Lung cancer" "Lung cancer" ...
## $ Year : int 1982 1982 1982 1982 1982 1982 1982 1982 1982 1982 ...
## $ Sex : chr "Males" "Males" "Males" "Males" ...
## $ Age.group..years. : chr "00-09" "10-19" "20-29" "30-39" ...
## $ Age.specific.mortality.rate..per.100.000.: num 0 0 0.1 1.8 16.9 ...
Lung_mortality_total <- subset(Lung_Mortality, Lung_Mortality$Age.group..years. == "All ages combined")
#Limiting dataset to rows for Mortality combined across all age groups
Males_mortality <-subset(Lung_mortality_total, Lung_mortality_total$Sex == "Males")
Females_mortality <-subset(Lung_mortality_total, Lung_mortality_total$Sex == "Females")
#Creating two subsets for Males and Females rates
Males_mortality_1982 <-subset(Males_mortality, Males_mortality$Year == "1982")
Males_mortality_2017 <-subset(Males_mortality, Males_mortality$Year == "2017")
#Creating two subsets for Male mortality for 1982 and 2017
Males_mortality_1982_2017 <- matrix(c(Males_mortality_1982$Age.specific.mortality.rate..per.100.000, Males_mortality_2017$Age.specific.mortality.rate..per.100.000), ncol=2, byrow=TRUE)
colnames(Males_mortality_1982_2017) <- c("1982","2017")
rownames(Males_mortality_1982_2017) <- c("Male Mortality")
Males_mortality_1982_2017
## 1982 2017
## Male Mortality 55.8 41.1
#Values for male mortality for 1982 and 2017
Year = Males_mortality$Year
Male_mortality_rate = Males_mortality$Age.specific.mortality.rate..per.100.000.
Maximum_Minimum_Males <- matrix(c(max(Male_mortality_rate), min(Male_mortality_rate)), ncol=2, byrow=TRUE)
colnames(Maximum_Minimum_Males) <- c("Max","Min")
rownames(Maximum_Minimum_Males) <- c("Male Mortality")
Maximum_Minimum_Males
## Max Min
## Male Mortality 56.3 41.1
#Maximum and minimum values for male mortality rate
plot(Year,Male_mortality_rate, xlab = "Year", ylab = "Male mortality rate per 100,000", main = "Total male mortality rate due to Lung cancer 1982-2017", xlim=c(1982,2017))
abline(lm(Male_mortality_rate~Year), col="red")
#Graphing total male mortality rate over years with a regression line
model = lm(Male_mortality_rate~Year)
model
##
## Call:
## lm(formula = Male_mortality_rate ~ Year)
##
## Coefficients:
## (Intercept) Year
## 942.7499 -0.4468
legend("topright", legend = c("942.7499-0.4468x"))
#Determining the slope and intercept of the regression line for male mortality
res <- resid(model)
plot(Year, res, xlab = "Year", ylab = "Residuals", main = "Residuals for Total male mortality rate due to Lung cancer 1982-2017")
abline(h=0, col="red")
#Plotting residuals
To determine the relationship between years and lung cancer mortality in males, a scatter plot was created. As can be seen, the scatterplot for male mortality rate reveals a decrease in lung cancer mortality rate from 55.8 per 100,000 in 1982 to 41.1 per 100,000 in 2017. Interestingly, the trend is linear and can be well predicted through the use of a linear regression line with equation y=942.7499-0.4468x, where x is the year and y is the mortality rate. Through plotting residuals, it is evident that the points are randomly distributed - homoscedastic, suggesting that the linear model may be appropriate and justifying fitting of a linear regression line.
Year = Females_mortality$Year
Females_mortality_rate = Females_mortality$Age.specific.mortality.rate..per.100.000.
Females_mortality_1982 <-subset(Females_mortality, Females_mortality$Year == "1982")
Females_mortality_2017 <-subset(Females_mortality, Females_mortality$Year == "2017")
#Creating two subsets for Male mortality for 1982 and 2017
Females_mortality_1982_2017 <- matrix(c(Females_mortality_1982$Age.specific.mortality.rate..per.100.000, Females_mortality_2017$Age.specific.mortality.rate..per.100.000), ncol=2, byrow=TRUE)
colnames(Females_mortality_1982_2017) <- c("1982","2017")
rownames(Females_mortality_1982_2017) <- c("Female Mortality")
Females_mortality_1982_2017
## 1982 2017
## Female Mortality 14 27.6
#Values for female mortality for 1982 and 2017
Maximum_Minimum_Females <- matrix(c(max(Females_mortality_rate), min(Females_mortality_rate)), ncol=2, byrow=TRUE)
colnames(Maximum_Minimum_Females) <- c("Max","Min")
rownames(Maximum_Minimum_Females) <- c("Female Mortality")
Maximum_Minimum_Females
## Max Min
## Female Mortality 29.1 14
plot(Year,Females_mortality_rate, xlab = "Year", ylab = "Female mortality rate per 100,000", main = "Total female mortality rate due to Lung cancer 1982-2017", xlim=c(1982,2017))
abline(lm(Females_mortality_rate~Year), col="red")
#Graphing total female mortality rate over years with a regression line
model = lm(Females_mortality_rate~Year)
model
##
## Call:
## lm(formula = Females_mortality_rate ~ Year)
##
## Coefficients:
## (Intercept) Year
## -835.9073 0.4296
legend("bottomright", legend = c("-835.9073+0.4296x"))
#Determining the slope and intercept of the regression line for female mortalityres <- resid(model)
res <- resid(model)
plot(Year, res, xlab = "Year", ylab = "Residiuals", main = "Residuals for total female mortality rate due to Lung cancer 1982-2017")
abline(h=0, col="red")
#Plotting residuals
Contrary to the data for males, the scatterplot for female mortality rate reveals an increase in lung cancer mortality rate from 14 per 100,000 in 1982 to 27.6 per 100,000 in 2017. Interestingly, whilst the trend is initially linear, with the regression line y=-835.9073+0.4296x being fitted, the mortality rate points appear to reach a plateau towards 2017. In line with this, the residual plot reveals that the spread is not random, suggesting that an alternative predictive function may need to be fitted. Interestingly, the magnitudes of coefficients of two regression lines are similar, suggesting similar rates of change in mortality rates.
Gender is a major factor in cancer mortality, hence in this study, male and female mortality rates were examined separately to avoid its confounding effect. Interestingly, from 1982 to 2017 the lung cancer-associated mortality rate was seen to decrease in males in a linear fashion. Conversely, over the same period lung-cancer-associated mortality rate was seen to increase in females. Consequently, it can be concluded that the gap in mortality due to lung cancer in both sexes is narrowing.
The differing directions in lung cancer mortality rates in two
genders have been attributed to historical differences in their smoking
behaviors. Smoking has been determined to be a major contributing factor
to lung cancer. In males, smoking rates have been steadily decreasing
since 1960s, following discoveries of health implications of smoking.
Conversely, following World War 2, more than a quarter of the Australian
smoking population were female with smoking rates not decreasing until
1970s. Consequently, this lag has led to an increase in female deaths
and decrease in male deaths from lung cancer (Cancer Council, Victoria,
2019).
How did age correlate with lung cancer mortality in Australia in 2017?
## write code here
Lung_mortality_agegroups <- subset(Lung_Mortality, Lung_Mortality$Sex != "Persons" & Lung_Mortality$Age.group..years. != "All ages combined" & Lung_Mortality$Year == "2017")
#Limiting mortality rates to “Males” and “Females”, excluding the “All ages combined” category and limiting year selection to 2017.
Males_mortality_2017 <- subset(Lung_mortality_agegroups, Lung_mortality_agegroups$Sex == "Males")
Females_mortality_2017 <- subset(Lung_mortality_agegroups, Lung_mortality_agegroups$Sex == "Females")
Age_2017 <- Males_mortality_2017$Age.group..years.
Males_2017 <- Males_mortality_2017$Age.specific.mortality.rate..per.100.000.
Females_2017 <- Females_mortality_2017$Age.specific.mortality.rate..per.100.000.
Combined_Agegroups <- matrix(c(Males_2017, Females_2017), ncol=10, byrow=TRUE)
colnames(Combined_Agegroups) <- c(Age_2017)
rownames(Combined_Agegroups) <- c("Males","Females")
Combined_Agegroups <- as.table(Combined_Agegroups)
#Creating table for male and female mortality rates across different agegroups
Combined_Agegroups
## 00-09 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90+
## Males 0.0 0.0 0.0 0.7 5.6 29.9 102.5 219.2 356.7 446.3
## Females 0.0 0.1 0.0 0.7 4.9 23.2 63.7 134.1 180.4 172.1
barplot(Combined_Agegroups, main = "Male and Female lung cancer mortality rates in 2017", xlab = "", ylab = "", col=c("cadetblue1", "pink"), legend = rownames(Combined_Agegroups), beside = TRUE, args.legend = list (x="top"), ylim=c(0,500), las=2)
title(xlab = "Age groups (years)", mgp=c(4,1,0))
title(ylab = "Mortality rate per 100,000", mgp=c(3,1,0))
To determine the relationship between age and lung cancer mortality, a bar plot was created to compare the mortality rate across 10 age groups. Similarly to analysis above, sex was recognised as a confounding variable and male and female cohorts were separated.
It is evident that in 2017 age and mortality were positively correlated. We observe that the male mortality rate is close to 0 for ages 0-39. This corresponding to lower reported incidence of lung cancer within these age groups. There is a slight increase in ages 40-49 and 50-59 followed by a progressive growth to 100 per 100,00 for 60-69 years of age, 200 per 100,00 for 70-79 years and increasing to 400 per 100,00 for ages 90+. Similarly in the female population, a lower mortality rate per 100,000 is apparent, standing between 50-100 for ages 60-69, around 150 for 70-79 and 150-300 for ages 80-89 and 90+.
The relationship between age and mortality can be justified by various factors. Age has been recognised as a prognosic factor in lung cancer as malignancy may develop gradually due to genetic and environmental factors and is hance more likely to manifest at an older age. Furthermore, research shows that aggressive lung cancer treatments such as chemotherapy and radiotherapy is less successful in older population, leading to higher morbidity. This is paired with difficult decision making within treatment procedures due to under-representation of those aged 65+ in clinical trials (Venuta et al., 2016). Furthermore, the occurrence of side effects of aforementioned medical interventions is more prominent in the elderly (TAS et al., 2013). Older invidividuals experience structural, psychological and immunological changes to the respiratory system. Thus, fatality due to side effects is substantially greater. Additionally, increasing susceptibility to other issues such as respiratory tract infections becomes prevalent, which may aggravate lung cancer as a weakened immune system can no longer defend the body, causing organ failure and hence death in many cases.
Overall higher mortality rates within males can be attributed to a higher proportion of male smokers 19.3% compared to female smokers 14.7% for those aged 45-54 in 2017 and effects of tobacco are often presented later in life and are strongly correlated to lung cancer incidence and mortality. (Australian Bureau of Statistics, 2021).
Although a positive correlation between age and mortality can be interpreted from the barplot, the conclusion is only accurate for 2017. It would be entirely invalid to infer the general relationship for 1982-2017 from one particular year, due to potential confounding factors such as medical advancements and population changes, which can have profound effects on mortality rate outcomes.
Australian Bureau of Statistics. (2021). Smoking, 2017-18 financial year. https://www.abs.gov.au/statistics/health/health-conditions-and-risks/smoking/2017-18
Australian Institute of Health and Welfare. (2019). Cancer Mortality. https://ncci.canceraustralia.gov.au/outcomes/cancer-mortality/cancer-mortality
Australian Institute of Health and Welfare. (2011). Lung cancer in Australia: an overview. https://www.aihw.gov.au/reports/cancer/lung-cancer-in-australia-overview/summary
Cancer Council. (2022). Smoking, reduce the harms caused by smoking. https://www.cancer.org.au/cancer-information/causes-and-prevention/smoking
Cancer Council, Victoria. (2019). A brief history of tobacco smoking in Australia. https://www.tobaccoinaustralia.org.au/chapter-1-prevalence/1-1-a-brief-history-of-tobacco-smoking-in-australia
TAS, F., CIFTCI, R., KILIC, L. and KARABULUT, S. (2013). Age is a prognostic factor affecting survival in lung cancer patients. Oncology Letters, pp.1507–1513. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3813578/
Venuta, F., Diso, D., Onorati, I., Anile, M., Mantovani, S. and Rendina, E.A. (2016). Lung cancer in elderly patients. Journal of Thoracic Disease, pp.S908–S914. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5124601/