Brain Cancer Incidence by State

According to the National Cancer Institute, we have about a 0.6% of being diagnosed with brain or other nervous system, hereby referred to as the CNS, cancer in our lifetime. The NCI data on these malignancies was collected from 2011-2013 and is in their publication Surveillance, Epidemology, and End Results Program. Based on their statistics, 6.4 patients per 100,000 people were diagnosed between 2009-2013 in the U.S., men more frequently than women.

The overall trend in the incidence of new cases for the past 2 decades is shown below. Though not featured in this report, they break down the diagnosis and death rates into various demographics; this information can be reviewed on their website.

Ours is an observational report and we are interested in observing trends in diagnosis, treatment, and death by state.

All applicable libraries were loaded to complete the analysis.

library(dplyr)
library(knitr)
library(tidyr)
library(ggmap)
library(ggvis)
library(lubridate)
library(tools)
library(rgdal)
library(leaflet)

#Create a dataframe with the above statistics. 
Year <-  c(1992:2013)
NewCasesper100K <- c(6.8,6.6,6.4,6.3,6.5,6.5,6.5,6.7,6.4,6.3,6.5,6.4,6.5,6.5,6.3,6.4,6.4,6.6,6.2,6.1,6.2,6.1)
#Give the Y axis a misnomer so it reflects the intention of the line chart 
Annual_Mortality_and_Diagnosis_per_100000 <- c(4.9,4.8,4.8,4.7,4.7,4.7,4.7,4.6,4.5,4.4,4.5,4.4,4.3,4.3,4.2,4.2,4.3,4.3,4.2,4.3,4.4,4.3)
Annually <- data.frame(Year,NewCasesper100K, Annual_Mortality_and_Diagnosis_per_100000)
#Convert Year variable to date format. 
Annually$Year <- as.Date(as.character(Annually$Year), format="%Y")

United States’ trends in diagnosis (in pink) and deaths (in blue) of CNS malignancies per 100,000 per year

The 5 year survival rate for all stages of these diseases is 33.8%; this is low when compared with other cancers. For more information on relative 5 year survival rates, see the American Cancer Society’s brochure Cancer Facts and Figures 2014.

Annually %>%  ggvis(~Year,~NewCasesper100K, stroke := "#FF00FF") %>% layer_paths() %>% layer_paths(x = ~Year, y = ~Annual_Mortality_and_Diagnosis_per_100000, stroke := "#436EEE") %>% add_axis("y", title = "Average Deaths per 100,000 (blue) and Average Diagnoses per 100,000 (pink)") %>% add_axis("x", title = "Year") %>% set_options(height = 480, width = 950)

Where is the cancer?

The CDC collects data on the incidence of all cancers. We have chosen to look at the incidence of CNS malignancies by state in 2013. Some states were missing values. For our purposes, we have elected to maintain the NA incidence rate for these states. Because

Initially we plotted the annual incidence of new cases per 100,000 people on a density map but concluded that because treatment facilities strive to serve the most patients, it would be more meaningful to look at actual numbers. Below we show density maps of diagnosis and death data.

The steps involved in creating such a visualization include:

Read in a shapefile of the US. The ShapeFile is from here.

#Load shapefile
States <- readOGR("./States", "States")
#Transform the shapefile
States <- spTransform(States, CRS("+proj=longlat +datum=WGS84"))
#Shape map code here, but not shown
#leaflet() %>% addProviderTiles("CartoDB.Positron") %>% addPolygons(data = States, popup = ~NAME, color = "#CD6889")%>% fitBounds(-119, 46, -72, 29)

Read CDC data into R and create a new variable which will be used in the density map.

#read in diagnosis and death statistics
MM <- read.csv("MandM.csv", header = TRUE, stringsAsFactors = FALSE)
#Trim data frame and rename columns
MM <- MM %>%  rename(IncidencePer100000 = Incid_per_100000, Diagnosed = Aver_Annual_Incid_Count, DeathPer100K = Death_per_100000, Death = Aver_Deaths_per_Yr, TrendInIncidence = Trend, TrendInDeath = Recent_Death_Trend) %>% mutate(Ratio = Death/Diagnosed) %>%  filter(Diagnosed<=5000) %>% arrange(desc(IncidencePer100000)) %>% select(State,  IncidencePer100000, DeathPer100K, Diagnosed, Death, TrendInIncidence, TrendInDeath, Ratio)
MM <- MM %>% mutate(Ratio = round(Ratio, digits = 2))

Where is the best treatment for cancers of the CNS and is it situated to capture the greatest number of patients affected?

Because best oncological practices are necessary to improve outcomes, patients look to highly reputed treatment centers. In 2014, U.S. News released a list of the best hospitals specializing in the treatment of CNS cancers in the United States.

Neurosurgery and Neurooncology are highly competitive subspecialities and there are relatively few practitioners. It’s critical that they target the most patients requiring their services.

The data was found at the above referenced link. It was heavily cleaned outside of R before importing. The steps involved in addressing these questions are outlined below.

Read the U.S. News data into R.

Hosp <- read.csv("Treatment_Centers.csv", header = TRUE, stringsAsFactors = FALSE)
Hosp1 <-  Hosp %>% mutate(Address = paste(Street, City_St_Zip)) %>% select(Hospital, Program = Program_Name, Address, Phone)
Hosp2 <- Hosp1 %>% mutate(Phone = gsub("Phone:", "", Phone))

Geocode the treatment facilities and create a popup with the contact information. Below is a slice of two centers.

#Geocode
BrCACenters <- geocode(Hosp2$Address)
#Rename columns
BrCACenters <- BrCACenters %>% rename(Lat = lat, Lon = lon)
BrCACenters <- bind_cols(Hosp2, BrCACenters)
#Format labels
Label <- paste("<strong>Hospital: </strong>", BrCACenters$Hospital, "<br><strong>Address:  </strong>", BrCACenters$Address, "<br><strong>Phone Number: </strong>", BrCACenters$Phone)
kable(head(BrCACenters,2))

Hospital	Program	Address	Phone	Lon	Lat
Chao Family Comprehensive Cancer Center - University of California, Irvine	Comprehensive Brain Tumor Program	101 The City Drive South Orange, CA 92868	714-456-8000	-117.8903	33.78807
City of Hope Comprehensive Cancer Center	Brain Tumor Program	1500 East Duarte Road Duarte, CA 91010	800-826-4673	-117.9708	34.12964

Plot the treatment facilities on a density map of the incidence of new cases.

#Merging the shape and the incidence files.
CAByState <- merge(States, MM, by.x="NAME", by.y= "State")
#Design popup label for the Diagnosed density map.
State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)

State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build Diagnosed density map
CancerbyState1 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Diagnosed), color = "##D46A6A", weight = 1, popup = ~State_Popup)%>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-119, 46, -72, 28)

Number of cases diagnosed

This map is an interactive graphic. Clicking on a state activates a popup with the number of new cases diagnosed in 2013 and clicking on a blue pointer provides contact information for the hospital’s specialty program.

CancerbyState1

In the density map above where the darker shaded areas represent the highest incidence of new disease in 2013, at a glance we can see that the best centers appropriately located. Zooming in on areas where hospitals are highly concentrated increases visibility of the incidence.

Is mortality lower near the best cancer centers?

From the data given, we can not measure if outcomes are better in any particular state. Further, it would be presumptious to say that a state’s population would live longer and stronger based on its proximity to a top cancer center. To observe if there is a relationship based on the available data, for our purposes, we use the ratio of the actual number of deaths to the number of new cases in 2013 as a proxy outcomes. We have kept the hospital locations pinned for reference.

#Design popup label for the Diagnosed density map.
State_Popup7 <- paste(" In ", CAByState$NAME, "the ratio of death to diagnosis in 2013 was ",CAByState$Ratio)

#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build DeathRate density map
CancerbyState4 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(Ratio), color = "##D46A6A", weight = 1, popup = ~State_Popup7) %>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-119, 46, -72, 28)

The Ratio of death to newly diagnosed CNS cancers in 2013

This map is an interactive graphic. Clicking on a state activates a popup with the death rate per 100,000 in 2013 and clicking on a blue pointer provides contact information for the hospital’s specialty program. Zooming in on areas where hospitals are highly concentrated increases visibility of the death rates.

CancerbyState4

Again, at a crude glance, the results are striking. Many of the states with a large number of sick patients have a lower death to diagnosis ratio. NY, PA, VA, Il, CA, and TX look profoundly lighter; hospitals are not as clustered among the darkest states. Though TX only features one of the highly rated centers, MD Anderson Cancer Center in Houston is repeatedly ranked #1 and is regarded as a protocol setter for all cancers. Patients with all stages of diagnoses travel there for their treatements and clinical trials.

Given that a visual pattern emerges, further research may be warranted. Perhaps the differences in the death to diagnosis are not statistically significant, but if they are then it might make sense to explore if there is a connection to the manner in which these highly rated hospitals approach treatment and if there is a way to get local hospitals to reach out to them more in consultation and referral for the benefit of the patient.

To work on:

Legends
Less obtrusive location pins for the hospitals The map choices we were given in class didn’t suit our needs. The state labels were distracting and the information in the popups made them unnecessary. More map choices can be found here.