1 Executive Summary

  • The aim of this report is to investigate the trend in lung cancer mortality over time in Australia and determine the relationship between age and cancer mortality in 2017.
  • The main discoveries are that cancer mortality has decreased from 1982, when data was first collected to 2017. In 2017, males suffer from greater mortality attributed to lung cancer than females across all age groups. Cancer mortality increases with age in both sexes.


2 Full Report

2.1 Initial Data Analysis (IDA)

  • The data was sourced from the Australian Government, Australian Institute of Health and Welfare.

  • This dataset is valid because it has been obtained from a reliable source - Australian Government and has been rigorously collected. The dataset represents the Australian population as all states and territories have legislations mandating the notification of cancer diagnosis and incident rates (AIHW, 2019).

  • There are several issues that need to be taken into account when analysing this dataset. For example, missing data points for morbidity and mortality statistics (give examples), indicate that data has not been collected for certain cancers during certain years. Furthermore, the dataset includes projection data (2018-2021), which may not be representative of the population. Lastly, the dataset may omit some cancer patients, such as those who have been treated overseas.

  • There are several Australian and global stakeholders that could benefit from analysis of this dataset. In particular, its availability to the general Australian public can increase cancer awareness and promote prevention strategies. Furthermore, it could be utilised by Australian healthcare organisations such as NSW Health, to guide cancer prevention initiatives and allocate resources to cancer treatment. Moreover, regulatory bodies who implement policies would need to access this data to have a better understanding to inform policy-making and perform future projections. The data would also be important for the media to advertise prevention measures, such as anti-smoking campaigns. On a state and federal government level, this data would guide fund allocation to healthcare in the budget. Lastly, the data could be used by international organisations such as WHO to establish global trends in lung cancer.

  • There are 9 variables in the dataset: type of data data (actual or projected), cancer group, sex, age group, incidence counts, mortalty counts, incidence rate and mortality rate which correspond to dataset columns.

  • The key variables are type of data data (actual or projected), cancer group, sex, age group, incidence counts, mortalty counts, incidence rate and mortality rate

  • Each row contains data entries corresponding to these variables

## read in data
data = read.csv("data/cancer_incidence_mortality.csv", header = TRUE)
## show classification of variables
names(data)
## [1] "ï..Type"                                  
## [2] "Cancer.group.site"                        
## [3] "Year"                                     
## [4] "Sex"                                      
## [5] "Age.group..years."                        
## [6] "Incidence.Count"                          
## [7] "Age.specific.incidence.rate..per.100.000."
## [8] "Mortality.Count"                          
## [9] "Age.specific.mortality.rate..per.100.000."
str(data)
## 'data.frame':    91146 obs. of  9 variables:
##  $ ï..Type                                  : chr  "Actual" "Actual" "Actual" "Actual" ...
##  $ Cancer.group.site                        : chr  "Acute lymphoblastic leukaemia" "Acute lymphoblastic leukaemia" "Acute lymphoblastic leukaemia" "Acute lymphoblastic leukaemia" ...
##  $ Year                                     : int  1982 1982 1982 1982 1982 1982 1982 1982 1982 1982 ...
##  $ Sex                                      : chr  "Males" "Males" "Males" "Males" ...
##  $ Age.group..years.                        : chr  "00-09" "10-19" "20-29" "30-39" ...
##  $ Incidence.Count                          : chr  "67" "28" "7" "8" ...
##  $ Age.specific.incidence.rate..per.100.000.: chr  "5.5" "2.1" "0.5" "0.7" ...
##  $ Mortality.Count                          : chr  "19" "14" "16" "7" ...
##  $ Age.specific.mortality.rate..per.100.000.: chr  "1.6" "1" "1.2" "0.6" ...


2.2 Research Question 1: write research question here

How has lung cancer mortality in all males and all females changed from 1968-2017 in Australia?

[Insert text and analysis]

## write code here
data_actual <- subset(data, data$ï..Type != "Projections")
#Excluding projected data from analysis
Year_Mortality <- subset(data_actual, select = c(2,3,4,5,9))
#Limiting dataset to columns for Cancer type, Year, Sex, Age group and Mortality
Lung_Mortality <- subset(Year_Mortality, Year_Mortality$Cancer.group.site == "Lung cancer")
#Limiting dataset to rows for “Lung cancer” only
Lung_Mortality$Age.specific.mortality.rate..per.100.000. <- as.numeric(Lung_Mortality$Age.specific.mortality.rate..per.100.000.)
#Changing mortality rate to be classified as a “numeric”
str(Lung_Mortality)
## 'data.frame':    1188 obs. of  5 variables:
##  $ Cancer.group.site                        : chr  "Lung cancer" "Lung cancer" "Lung cancer" "Lung cancer" ...
##  $ Year                                     : int  1982 1982 1982 1982 1982 1982 1982 1982 1982 1982 ...
##  $ Sex                                      : chr  "Males" "Males" "Males" "Males" ...
##  $ Age.group..years.                        : chr  "00-09" "10-19" "20-29" "30-39" ...
##  $ Age.specific.mortality.rate..per.100.000.: num  0 0 0.1 1.8 16.9 ...
Lung_mortality_total <- subset(Lung_Mortality, Lung_Mortality$Age.group..years. == "All ages combined")
#Limiting dataset to rows for Mortality combined across all age groups
Males_mortality <-subset(Lung_mortality_total, Lung_mortality_total$Sex == "Males")
Females_mortality <-subset(Lung_mortality_total, Lung_mortality_total$Sex == "Females")
#Creating two subsets for Males and Females rates
Year = Males_mortality$Year
Male_mortality_rate = Males_mortality$Age.specific.mortality.rate..per.100.000.
plot(Year,Male_mortality_rate, xlab = "Year", ylab = "Male mortality rate per 100,000", main = "Total male mortality rate due to Lung cancer 1982-2017", xlim=c(1982,2017))
abline(lm(Male_mortality_rate~Year), col="red")
#Graphing total male mortality rate over years with a regression line
model = lm(Male_mortality_rate~Year)
model
## 
## Call:
## lm(formula = Male_mortality_rate ~ Year)
## 
## Coefficients:
## (Intercept)         Year  
##    942.7499      -0.4468
legend("topright", legend = c("942.7499-0.4468x"))

#Determining the slope and intercept of the regression line for male mortality
res <- resid(model)
plot(Year, res, xlab = "Year", ylab = "Residuals", main = "Residuals for Total male mortality rate due to Lung cancer 1982-2017")
abline(h=0, col="red")

#Plotting residuals
Year = Females_mortality$Year
Females_mortality_rate = Females_mortality$Age.specific.mortality.rate..per.100.000.
plot(Year,Females_mortality_rate, xlab = "Year", ylab = "Female mortality rate per 100,000", main = "Total female mortality rate due to Lung cancer 1982-2017", xlim=c(1982,2017))
abline(lm(Females_mortality_rate~Year), col="red")
#Graphing total female mortality rate over years with a regression line
model = lm(Females_mortality_rate~Year)
model
## 
## Call:
## lm(formula = Females_mortality_rate ~ Year)
## 
## Coefficients:
## (Intercept)         Year  
##   -835.9073       0.4296
legend("bottomright", legend = c("-835.9073+0.4296x"))

#Determining the slope and intercept of the regression line for female mortalityres <- resid(model)
res <- resid(model)
plot(Year, res, xlab = "Year", ylab = "Residiuals", main = "Residuals for total female mortality rate due to Lung cancer 1982-2017")
abline(h=0, col="red")

#Plotting residuals


2.3 Research Question 2: write research question here

How did age correlate with lung cancer mortality in Australia in 2017?

[Insert text and analysis]

## write code here
Lung_mortality_agegroups <- subset(Lung_Mortality, Lung_Mortality$Sex != "Persons" & Lung_Mortality$Age.group..years. != "All ages combined" & Lung_Mortality$Year == "2017")
#Limiting mortality rates to “Males” and “Females”, excluding the “All ages combined” category and limiting year selection to 2017.
Males_mortality_2017 <- subset(Lung_mortality_agegroups, Lung_mortality_agegroups$Sex == "Males")
Females_mortality_2017 <- subset(Lung_mortality_agegroups, Lung_mortality_agegroups$Sex == "Females")
Age_2017 <- Males_mortality_2017$Age.group..years.
Males_2017 <- Males_mortality_2017$Age.specific.mortality.rate..per.100.000.
Females_2017 <- Females_mortality_2017$Age.specific.mortality.rate..per.100.000.
Combined_Agegroups <- matrix(c(Males_2017, Females_2017), ncol=10, byrow=TRUE)
colnames(Combined_Agegroups) <- c(Age_2017)
rownames(Combined_Agegroups) <- c("Males","Females")
Combined_Agegroups <- as.table(Combined_Agegroups)
#Creating table for male and female mortality rates across different agegroups
Combined_Agegroups
##         00-09 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89   90+
## Males     0.0   0.0   0.0   0.7   5.6  29.9 102.5 219.2 356.7 446.3
## Females   0.0   0.1   0.0   0.7   4.9  23.2  63.7 134.1 180.4 172.1
barplot(Combined_Agegroups, main = "Male and Female lung cancer mortality rates in 2017", xlab = "", ylab = "", col=c("cadetblue1", "pink"), legend = rownames(Combined_Agegroups), beside = TRUE, args.legend = list (x="top"), ylim=c(0,500), las=2)
title(xlab = "Age groups (years)", mgp=c(4,1,0))
title(ylab = "Mortality rate per 100,000", mgp=c(3,1,0))


3 References

Put references here in APA format.