Brain Cancer Incidence by State

According to the National Cancer Institute, we have about a 0.6% of being diagnosed with brain or other nervous system cancer, hereby referred to as the CNS, in our lifetime. The NCI data on these malignancies was collected from 2011-2013 and is in their publication Surveillance, Epidemology, and End Results Program. Based on their statistics, 6.4 patients per 100,000 people were diagnosed between 2009-2013, men more frequently than women.

The overall trend in the incidence of new cases for the past 2 decades is shown below. The diagnosis and death rate data broken down into various demographics comes from the same institute and can be reviewed on their website. All applicable libraries were loaded to complete the analysis.

In this report, we are interested in observing trends in diagnosis, treatment, and death by state.

library(dplyr)
library(knitr)
library(tidyr)
library(ggmap)
library(ggvis)
library(lubridate)
library(tools)
library(rgdal)
library(leaflet)

#Create a dataframe with the above statistics. 
Year <-  c(1992, 1993, 1994,1995,1996,1997,1998,1999,2000,2001,2001,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013)
NewCasesper100K <- c(6.8,6.6,6.4,6.3,6.5,6.5,6.5,6.7,6.4,6.3,6.5,6.4,6.5,6.5,6.3,6.4,6.4,6.6,6.2,6.1,6.2,6.1)
#Give the Y axis a misnomer so it reflects the intention of the line chart 
Annual_Mortality_and_Diagnosis_per_100000 <- c(4.9,4.8,4.8,4.7,4.7,4.7,4.7,4.6,4.5,4.4,4.5,4.4,4.3,4.3,4.2,4.2,4.3,4.3,4.2,4.3,4.4,4.3)
Annually <- data.frame(Year,NewCasesper100K, Annual_Mortality_and_Diagnosis_per_100000)
#Convert Year variable to date format. 
Annually$Year <- as.Date(as.character(Annually$Year), format="%Y")

New cases (in pink) and deaths (in blue) of brain cancer per 100,000 per year

The 5 year survival rate for all stages of these diseases is 33.8%; this is low when compared with other cancers. For more information on relative 5 year survival rates, see the American Cancer Society’s brochure Cancer Facts and Figures 2014.

Annually %>%  ggvis(~Year,~NewCasesper100K, stroke := "#FF00FF") %>% layer_paths() %>% layer_paths(x = ~Year, y = ~Annual_Mortality_and_Diagnosis_per_100000, stroke := "#436EEE")

Where is the cancer?

The CDC collects data on the incidence of all cancers. We have chosen to look at the incidence of CNS malignancies by state in 2013. Some states were missing values. For our purposes, we have elected to maintain the NA incidence rate for these states. Because

Initially we plotted the annual incidence of new cases per 100,000 people on a density map but concluded that because treatment facilities strive to serve the most patients, it would be more meaningful to look at actual numbers. Below we show density maps of diagnosis and death data.

The steps involved in creating such a visualization include:

Read in a shapefile of the US. The ShapeFile is from here.

#Load shapefile
States <- readOGR("./States", "States")
#Transform the shapefile
States <- spTransform(States, CRS("+proj=longlat +datum=WGS84"))
#Shape map code here, but not shown
#leaflet() %>% addProviderTiles("CartoDB.Positron") %>% addPolygons(data = States, popup = ~NAME, color = "#CD6889")%>% fitBounds(-120, 47, -72, 29)

Read CDC data into R. Here we show a sample of top 6 states with the most cases diagnosed in 2013.

#read in diagnosis and death statistics
MM <- read.csv("MandM.csv", header = TRUE, stringsAsFactors = FALSE)
#Trim data frame and rename columns
kable(head(MM <- MM %>%  rename(Incidence = Incid_per_100000, Diagnosed = Aver_Annual_Incid_Count, DeathPer100K = Death_per_100000, Death = Aver_Deaths_per_Yr, TrendInIncidence = Trend, TrendInDeath = Recent_Death_Trend) %>% filter(Diagnosed<=5000) %>% mutate(RatioDeathtoDiag = Death/Diagnosed) %>% arrange(desc(Diagnosed)) %>% select(State,  Incidence, Diagnosed, RatioDeathtoDiag, TrendInIncidence, Death, TrendInDeath)), digits = 2)

State	Incidence	Diagnosed	RatioDeathtoDiag	TrendInIncidence	Death	TrendInDeath
California	6.1	2298	0.71	stable	1628	falling
Texas	6.4	1597	0.62	stable	986	stable
Florida	6.6	1463	0.68	stable	1001	stable
New York	6.5	1355	0.61	stable	823	falling
Pennsylvania	7.2	1034	0.64	stable	658	stable
Ohio	6.9	878	0.67	stable	591	falling

Create density map showing number of 2013 diagnoses.

#Merging the shape and the incidence files.
CAByState <- merge(States, MM, by.x="NAME", by.y= "State")
#Design popup label for the Diagnosed density map.
State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build Diagnosed density map
CancerbyState1 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Diagnosed), color = "##D46A6A", weight = 1, popup = ~State_Popup) %>% fitBounds(-120, 47, -72, 29)
CancerbyState1

Create density map of deaths of CNS cancer in 2013.

#Design popup label for the Deaths density map
State_Popup1 <- paste("In ", CAByState$NAME, "the number of people who died from brain cancer in 2013 was ",CAByState$Death)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
CancerbyState2 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Death), color = "##D46A6A", weight = 1, popup = ~State_Popup1) %>% fitBounds(-120, 47, -72, 29)
CancerbyState2

Where is the best treatment for cancers of the CNS?

Because best oncological practices are necessary to improve outcomes, patients look to highly reputed treatment centers. In 2014, U.S. News released a list of the best hospitals specializing in the treatment of brain cancer in the United States. But are these centers spread evenly through the country?

The data was found at the above referenced link. It was heavily cleaned outside of R before importing.

Read the U.S. News data into R.

Hosp <- read.csv("Treatment_Centers.csv", header = TRUE, stringsAsFactors = FALSE)
Hosp1 <-  Hosp %>% mutate(Address = paste(Street, City_St_Zip)) %>% select(Hospital, Program = Program_Name, Address, Phone)
Hosp2 <- Hosp1 %>% mutate(Phone = gsub("Phone:", "", Phone))

Geocode the treatment facilities and create a popup with the contact information. Below is a slice of two centers.

#Geocode
BrCACenters <- geocode(Hosp2$Address)
#Rename columns
BrCACenters <- BrCACenters %>% rename(Lat = lat, Lon = lon)
BrCACenters <- bind_cols(Hosp2, BrCACenters)
#Format labels
Label <- paste("<strong>Hospital: </strong>", BrCACenters$Hospital, "<br><strong>Address:  </strong>", BrCACenters$Address, "<br><strong>Phone Number: </strong>", BrCACenters$Phone)
kable(head(BrCACenters,2))

Hospital	Program	Address	Phone	Lon	Lat
Chao Family Comprehensive Cancer Center - University of California, Irvine	Comprehensive Brain Tumor Program	101 The City Drive South Orange, CA 92868	714-456-8000	-117.8903	33.78807
City of Hope Comprehensive Cancer Center	Brain Tumor Program	1500 East Duarte Road Duarte, CA 91010	800-826-4673	-117.9708	34.12964

Relationship between diagnosis and treatment to death?

From the data given, we can not measure if outcomes are better in any particular state. Further, it would be presumptious to say that a state’s population would live longer and stronger based on its proximity to a top cancer center. However we have elected to look at the ratio of cases of death to diagnosis in 2013 on a third map with the treatment centers plotted.

State_Popup2 <- paste("In ", CAByState$NAME, "the ratio of those dying of brain cancer to those diagnosed in 2013 was " ,CAByState$RatioDeathtoDiag)
#Create color scheme for map
pal <- colorQuantile("PuRd", NULL, n = 4)
#Map death to diagnosis ratio
CancerbyState3 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(RatioDeathtoDiag), color = "##D46A6A", weight = 1, popup = ~State_Popup2) %>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-120, 47, -72, 29)
#Show map with treatment center popups
CancerbyState3

It is interesting that in the Northeast, where the treatment centers are highly concentrated, the death to diagnosis ratio is in the lower 60%. There are far fewer states with a higher death to diagnosis ratio (mid 75-80%) that have so many best rated treatment options.

To work on:

Sizing maps so they’re wider
Round to 2 decimal places for ratio
Legends
Less obtrusive location pins for the hospitals

The map choices we were given in class didn’t suit our needs. The state labels were distracting and the information in the popups made them unnecessary. More map choices can be found here.

Brain Cancer Incidence by State

Christine Iyer & Edrick Wittes

October 19, 2016

New cases (in pink) and deaths (in blue) of brain cancer per 100,000 per year

Where is the cancer?

Where is the best treatment for cancers of the CNS?

Relationship between diagnosis and treatment to death?