According to the National Cancer Institute, we have about a 0.6% of being diagnosed with brain or other nervous system cancer, hereby referred to as the CNS, in our lifetime. The NCI data on these malignancies was collected from 2011-2013 and is in their publication Surveillance, Epidemology, and End Results Program. Based on their statistics, 6.4 patients per 100,000 people were diagnosed between 2009-2013, men more frequently than women.

The overall trend in the incidence of new cases for the past 2 decades is shown below. The diagnosis and death rate data broken down into various demographics comes from the same institute and can be reviewed on their website. All applicable libraries were loaded to complete the analysis.

In this report, we are interested in observing trends in diagnosis, treatment, and death by state.

library(dplyr)
library(knitr)
library(tidyr)
library(ggmap)
library(ggvis)
library(lubridate)
library(tools)
library(rgdal)
library(leaflet)
#Create a dataframe with the above statistics. 
Year <-  c(1992, 1993, 1994,1995,1996,1997,1998,1999,2000,2001,2001,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013)
NewCasesper100K <- c(6.8,6.6,6.4,6.3,6.5,6.5,6.5,6.7,6.4,6.3,6.5,6.4,6.5,6.5,6.3,6.4,6.4,6.6,6.2,6.1,6.2,6.1)
#Give the Y axis a misnomer so it reflects the intention of the line chart 
Annual_Mortality_and_Diagnosis_per_100000 <- c(4.9,4.8,4.8,4.7,4.7,4.7,4.7,4.6,4.5,4.4,4.5,4.4,4.3,4.3,4.2,4.2,4.3,4.3,4.2,4.3,4.4,4.3)
Annually <- data.frame(Year,NewCasesper100K, Annual_Mortality_and_Diagnosis_per_100000)
#Convert Year variable to date format. 
Annually$Year <- as.Date(as.character(Annually$Year), format="%Y")

New cases (in pink) and deaths (in blue) of brain cancer per 100,000 per year

The 5 year survival rate for all stages of these diseases is 33.8%; this is low when compared with other cancers. For more information on relative 5 year survival rates, see the American Cancer Society’s brochure Cancer Facts and Figures 2014.

Annually %>%  ggvis(~Year,~NewCasesper100K, stroke := "#FF00FF") %>% layer_paths() %>% layer_paths(x = ~Year, y = ~Annual_Mortality_and_Diagnosis_per_100000, stroke := "#436EEE")

Where is the cancer?

The CDC collects data on the incidence of all cancers. We have chosen to look at the incidence of CNS malignancies by state in 2013. Some states were missing values. For our purposes, we have elected to maintain the NA incidence rate for these states. Because

Initially we plotted the annual incidence of new cases per 100,000 people on a density map but concluded that because treatment facilities strive to serve the most patients, it would be more meaningful to look at actual numbers. Below we show density maps of diagnosis and death data.

The steps involved in creating such a visualization include:

  1. Read in a shapefile of the US. The ShapeFile is from here.
#Load shapefile
States <- readOGR("./States", "States")
#Transform the shapefile
States <- spTransform(States, CRS("+proj=longlat +datum=WGS84"))
#Shape map code here, but not shown
#leaflet() %>% addProviderTiles("CartoDB.Positron") %>% addPolygons(data = States, popup = ~NAME, color = "#CD6889")%>% fitBounds(-120, 47, -72, 29)
  1. Read CDC data into R. Here we show a sample of top 6 states with the most cases diagnosed in 2013.
#read in diagnosis and death statistics
MM <- read.csv("MandM.csv", header = TRUE, stringsAsFactors = FALSE)
#Trim data frame and rename columns
kable(head(MM <- MM %>%  rename(Incidence = Incid_per_100000, Diagnosed = Aver_Annual_Incid_Count, DeathPer100K = Death_per_100000, Death = Aver_Deaths_per_Yr, TrendInIncidence = Trend, TrendInDeath = Recent_Death_Trend) %>% filter(Diagnosed<=5000) %>% mutate(RatioDeathtoDiag = Death/Diagnosed) %>% arrange(desc(Diagnosed)) %>% select(State,  Incidence, Diagnosed, RatioDeathtoDiag, TrendInIncidence, Death, TrendInDeath)), digits = 2)
State Incidence Diagnosed RatioDeathtoDiag TrendInIncidence Death TrendInDeath
California 6.1 2298 0.71 stable 1628 falling
Texas 6.4 1597 0.62 stable 986 stable
Florida 6.6 1463 0.68 stable 1001 stable
New York 6.5 1355 0.61 stable 823 falling
Pennsylvania 7.2 1034 0.64 stable 658 stable
Ohio 6.9 878 0.67 stable 591 falling
  1. Create density map showing number of 2013 diagnoses.
#Merging the shape and the incidence files.
CAByState <- merge(States, MM, by.x="NAME", by.y= "State")
#Design popup label for the Diagnosed density map.
State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build Diagnosed density map
CancerbyState1 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Diagnosed), color = "##D46A6A", weight = 1, popup = ~State_Popup) %>% fitBounds(-120, 47, -72, 29)
CancerbyState1
  1. Create density map of deaths of CNS cancer in 2013.
#Design popup label for the Deaths density map
State_Popup1 <- paste("In ", CAByState$NAME, "the number of people who died from brain cancer in 2013 was ",CAByState$Death)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
CancerbyState2 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Death), color = "##D46A6A", weight = 1, popup = ~State_Popup1) %>% fitBounds(-120, 47, -72, 29)
CancerbyState2

Where is the best treatment for cancers of the CNS?

Because best oncological practices are necessary to improve outcomes, patients look to highly reputed treatment centers. In 2014, U.S. News released a list of the best hospitals specializing in the treatment of brain cancer in the United States. But are these centers spread evenly through the country?

The data was found at the above referenced link. It was heavily cleaned outside of R before importing.

  1. Read the U.S. News data into R.
Hosp <- read.csv("Treatment_Centers.csv", header = TRUE, stringsAsFactors = FALSE)
Hosp1 <-  Hosp %>% mutate(Address = paste(Street, City_St_Zip)) %>% select(Hospital, Program = Program_Name, Address, Phone)
Hosp2 <- Hosp1 %>% mutate(Phone = gsub("Phone:", "", Phone))
  1. Geocode the treatment facilities and create a popup with the contact information. Below is a slice of two centers.
#Geocode
BrCACenters <- geocode(Hosp2$Address)
#Rename columns
BrCACenters <- BrCACenters %>% rename(Lat = lat, Lon = lon)
BrCACenters <- bind_cols(Hosp2, BrCACenters)
#Format labels
Label <- paste("<strong>Hospital: </strong>", BrCACenters$Hospital, "<br><strong>Address:  </strong>", BrCACenters$Address, "<br><strong>Phone Number: </strong>", BrCACenters$Phone)
kable(head(BrCACenters,2))
Hospital Program Address Phone Lon Lat
Chao Family Comprehensive Cancer Center - University of California, Irvine Comprehensive Brain Tumor Program 101 The City Drive South Orange, CA 92868 714-456-8000 -117.8903 33.78807
City of Hope Comprehensive Cancer Center Brain Tumor Program 1500 East Duarte Road Duarte, CA 91010 800-826-4673 -117.9708 34.12964

Relationship between diagnosis and treatment to death?

From the data given, we can not measure if outcomes are better in any particular state. Further, it would be presumptious to say that a state’s population would live longer and stronger based on its proximity to a top cancer center. However we have elected to look at the ratio of cases of death to diagnosis in 2013 on a third map with the treatment centers plotted.

State_Popup2 <- paste("In ", CAByState$NAME, "the ratio of those dying of brain cancer to those diagnosed in 2013 was " ,CAByState$RatioDeathtoDiag)
#Create color scheme for map
pal <- colorQuantile("PuRd", NULL, n = 4)
#Map death to diagnosis ratio
CancerbyState3 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(RatioDeathtoDiag), color = "##D46A6A", weight = 1, popup = ~State_Popup2) %>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-120, 47, -72, 29)
#Show map with treatment center popups
CancerbyState3

It is interesting that in the Northeast, where the treatment centers are highly concentrated, the death to diagnosis ratio is in the lower 60%. There are far fewer states with a higher death to diagnosis ratio (mid 75-80%) that have so many best rated treatment options.

To work on:

  1. Sizing maps so they’re wider

  2. Round to 2 decimal places for ratio

  3. Legends

  4. Less obtrusive location pins for the hospitals

The map choices we were given in class didn’t suit our needs. The state labels were distracting and the information in the popups made them unnecessary. More map choices can be found here.