According to the National Cancer Institute, we have a 0.6% lifetime risk of being diagnosed with brain or other nervous system cancer, hereby referred to as the CNS, in our lifetime. The NCI data on these malignancies was collected from 2011-2013 and is in their publication Surveillance, Epidemology, and End Results Program. Based on their statistics, an average of 6.4 patients per 100,000 people were diagnosed between annually between the years of 2009-2013, men more frequently than women. Although cancers of the CNS only account for about 1.4 % of all new cancer diagnoses, their overall prognosis is not good. The 5 year survival rate for all stages of these diseases is 33.8%, low when compared with other cancers. For more information on relative 5 year survival rates, see the American Cancer Society’s brochure Cancer Facts and Figures 2014.
In this report, we are interested in observing trends in diagnosis, treatment, and death by state.
To begin our analysis, all applicable libraries were loaded.
library(dplyr)
library(knitr)
library(tidyr)
library(ggmap)
library(ggvis)
library(lubridate)
library(tools)
library(rgdal)
library(leaflet)
#Five year survival data set
FiveYear <- c(1975, 1978, 1981, 1984, 1987, 1990, 1993, 1996, 1999, 2002, 2006)
FiveYearSurvival <- c(.224, .236, .241, .269, .289, .301, .322, .312, .338, .349, .35)
FiveYear <- as.Date(as.character(FiveYear),format="%Y")
SEERFiveYr <- data.frame(FiveYear, FiveYearSurvival)
#Create a dataframe with the above statistics.
Year <- c(1975:2013)
NewCasesper100K <-c(5.85, 5.82, 6.17,5.76, 6.12, 6.3, 6.51, 6.43, 6.31, 6.12, 6.94, 6.85, 7, 6.83, 6.86, 7.05, 6.96, 6.98, 6.77, 6.63, 6.5, 6.68, 6.76, 6.64, 6.92, 6.8, 6.64, 6.76, 6.66, 6.85, 6.76, 6.43, 6.59, 6.71, 6.82, 6.55, 6.61, 6.48, 6.41)
#Give the Y axis a misnomer so it reflects the intention of the line chart
Annual_Mortality_and_Diagnosis_per_100000 <- c(4.11, 4.34, 4.4, 4.53, 4.26, 4.37, 4.36, 4.43, 4.39, 4.55, 4.57, 4.53, 4.71, 4.72, 4.73, 4.87, 4.95, 4.85, 4.79, 4.84, 4.67, 4.73, 4.68, 4.68, 4.64, 4.53, 4.45, 4.45, 4.4, 4.31, 4.34, 4.17, 4.21, 4.28, 4.35, 4.25, 4.25, 4.4, 4.34)
Annually <- data.frame(Year,NewCasesper100K, Annual_Mortality_and_Diagnosis_per_100000)
#Convert Year variable to date format.
Annually$Year <- as.Date(as.character(Annually$Year), format="%Y")
The 5 year survival rate for all stages of these diseases is 33.8%, low when compared with other cancers. For more information on relative 5 year survival rates, see the American Cancer Society’s brochure Cancer Facts and Figures 2014. The improving survival rate can be obersved below.
SEERFiveYr %>% ggvis(~FiveYear, ~FiveYearSurvival, stroke := "green") %>% set_options(height = 280, width = 950)
## Guessing layer_lines()
According to the SEER data, rates of new CNS cancers have been declining at a 0.2% per year over the past decade and death rates have remained stable.
The incidence of new cases and deaths for the past (nearly) 4 decades is shown in on the time series graph below. This data can be accessed on another NCI ink. The website allows for queries based on various demographics.
Annually %>% ggvis(~Year,~NewCasesper100K, stroke := "#FF00FF") %>% layer_paths() %>% layer_paths(x = ~Year, y = ~Annual_Mortality_and_Diagnosis_per_100000, stroke := "#436EEE") %>% set_options(height = 480, width = 950)
#read in diagnosis and death statistics
MM <- read.csv("MandM.csv", header = TRUE, stringsAsFactors = FALSE)
#Trim data frame and rename columns
MM <- MM %>% filter(Aver_Annual_Incid_Count<=5000) %>% mutate(RatioDeathtoDiag = Aver_Deaths_per_Yr/Aver_Annual_Incid_Count, TrendInDeath1 = ifelse(Recent_Death_Trend == "falling", 1, ifelse(Recent_Death_Trend=="stable", 2,3))) %>% rename(Diagnosed = Aver_Annual_Incid_Count, DeathPer100K = Death_per_100000, Death = Aver_Deaths_per_Yr, TrendInIncidence = Trend) %>% arrange(desc(Diagnosed)) %>% select(State, Incid_per_100000, Diagnosed, Death, Recent_Death_Trend, RatioDeathtoDiag)
#round ratio to 2 places
MM$RatioDeathtoDiag <- round(MM$RatioDeathtoDiag, 2)
We address these questions a table and a density map. We read the NCI state data referenced earlier. Some states were missing values in the NCI database. For our purposes, we have elected to maintain the NA incidence rate for these states.
The second column shows where the diseases are most prevalent. Incidence of new cases per 100,000 is in the third column. The other key variables are the number of number of deaths in 2013, the death trends, and the ratio of death to diagnosis. A sample follows. YOu will notice that the prevalence of disease is not necessarily where the highest incidence is. New diagnoses in 2013 were most prevalent in CA, yet the incidence there, 6.1 to 100,000 is among the lowest in the country.
kable(head(MM))
State | Incid_per_100000 | Diagnosed | Death | Recent_Death_Trend | RatioDeathtoDiag |
---|---|---|---|---|---|
California | 6.1 | 2298 | 1628 | falling | 0.71 |
Texas | 6.4 | 1597 | 986 | stable | 0.62 |
Florida | 6.6 | 1463 | 1001 | stable | 0.68 |
New York | 6.5 | 1355 | 823 | falling | 0.61 |
Pennsylvania | 7.2 | 1034 | 658 | stable | 0.64 |
Ohio | 6.9 | 878 | 591 | falling | 0.67 |
We address the second question with a density map of the prevalence of the diagnoses, populated with treatment centers. In 2014, U.S. News released a list of the best hospitals specializing in the treatment of brain cancer in the United States. Treatment should be where the disease is most prevalent, therefore we consider the actual number of new cases in each state. Because best oncological practices are necessary to improve outcomes, patients look to highly reputed treatment centers. Neurosurgery and Neurooncology are highly competitive subspecialities and there are relatively few practitioners. It’s critical that they target the most patients requiring their services. Ideally, the hospitals will be clustered the the areas with the most disease.
The process in developing this visualization is as follows:
Hosp <- read.csv("Treatment_Centers.csv", header = TRUE, stringsAsFactors = FALSE)
Hosp1 <- Hosp %>% mutate(Address = paste(Street, City_St_Zip)) %>% select(Hospital, Program = Program_Name, Address, Phone)
Hosp2 <- Hosp1 %>% mutate(Phone = gsub("Phone:", "", Phone))
#Load shapefile
States <- readOGR("./States", "States")
#Transform the shapefile
States <- spTransform(States, CRS("+proj=longlat +datum=WGS84"))
#Shape map code here, but not shown
#leaflet() %>% addProviderTiles("CartoDB.Positron") %>% addPolygons(data = States, popup = ~NAME, color = "#CD6889")%>% fitBounds(-119, 46, -72, 29)
#Merging the shape and the incidence files.
CAByState <- merge(States, MM, by.x="NAME", by.y= "State")
#Design popup label for the Diagnosed density map.
State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)
#Create color scheme for map.
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build Diagnosed density map
CancerbyState1 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Diagnosed), color = "##D46A6A", weight = 1, popup = ~State_Popup)%>% fitBounds(-119, 46, -72, 28)
#Geocode
BrCACenters <- geocode(Hosp2$Address)
#Rename columns
BrCACenters <- BrCACenters %>% rename(Lat = lat, Lon = lon)
BrCACenters <- bind_cols(Hosp2, BrCACenters)
#Format labels
Label <- paste("<strong>Hospital: </strong>", BrCACenters$Hospital, "<br><strong>Address: </strong>", BrCACenters$Address, "<br><strong>Phone Number: </strong>", BrCACenters$Phone)
kable(head(BrCACenters,2))
Hospital | Program | Address | Phone | Lon | Lat |
---|---|---|---|---|---|
Chao Family Comprehensive Cancer Center - University of California, Irvine | Comprehensive Brain Tumor Program | 101 The City Drive South Orange, CA 92868 | 714-456-8000 | -117.8903 | 33.78807 |
City of Hope Comprehensive Cancer Center | Brain Tumor Program | 1500 East Duarte Road Duarte, CA 91010 | 800-826-4673 | -117.9708 | 34.12964 |
State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)
#Create color scheme for map.
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build Diagnosed density map
CancerbyState1 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Diagnosed), color = "##D46A6A", weight = 1, popup = ~State_Popup)%>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-119, 46, -72, 28)
This map is an interactive graphic. Clicking on a state activates a popup with the number of new cases diagnosed in 2013 and clicking on a blue pointer provides contact information for the hospital’s specialty program.
CancerbyState1
In the density map above where the darker shaded areas represent the highest incidence of new disease in 2013, at a glance we can see that the best centers appropriately located. Zooming in on areas where hospitals are highly concentrated increases visibility of the incidence.
From the data given, we can not measure if outcomes are better in any particular state. Further, it would be presumptious to say that a state’s population would live longer and stronger based on its proximity to a top cancer center. With the available data, to speculate on a possible relationship, we use data from CDC table which shows state trends in mortality over 5 years, 2008-2013. The lightest color represents falling incidence, the medium, stable, and the darkest, rising. No data is available for Nevada. We have left the pinned locations for reference.
State_Popup6 <- paste("In ", CAByState$NAME, "the incidence of death from CNS cancers has been " ,CAByState$Recent_Death_Trend)
factpal <- colorFactor(c("#DBEFED","#8AD0C6","#C0E4E2"), MM$Recent_Death_Trend)
CancerByState7 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(stroke = FALSE, smoothFactor = 0.2, fillOpacity = 1, data = CAByState, fillColor = ~factpal(Recent_Death_Trend), color = "##D46A6A", popup = ~State_Popup6) %>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-119, 46, -72, 28)
CancerByState7
In the density map above where the darker shaded areas represent the highest incidence of new disease in 2013, at a glance we can see that the best centers appropriately located. Zooming in on areas where hospitals are highly concentrated increases visibility of the incidence.
A visual pattern emerges; the hospitals are now mostly clustered in areas where incidence is falling or stable. Alaska, which can be seen by zooming out, is very far from these highly rated hospitals and shows a rising trend in incidence. Further research may be warranted and it might make sense to explore if there is a connection to the manner in which these highly rated hospitals approach treatment and if so, find ways to reach out to local hospitals and make clinical trials more accessible. Additionally, local hospitals may need to connect with these research institutions in consultation and referral for the benefit of the patient.
The map choices we were given in class didn’t suit our needs. The state labels were distracting and the information in the popups made them unnecessary. More map choices can be found here.