According to the National Cancer Institute, we have a 0.6% lifetime risk of being diagnosed with brain or other nervous system cancer, hereby referred to as the CNS, in our lifetime. The NCI data on these malignancies was collected from 2011-2013 and is in their publication Surveillance, Epidemology, and End Results Program. Based on their statistics, an average of 6.4 patients per 100,000 people were diagnosed between annually between the years of 2009-2013, men more frequently than women. Although cancers of the CNS only account for about 1.4 % of all new cancer diagnoses, their overall prognosis is not good. The 5 year survival rate for all stages of these diseases is 33.8%, low when compared with other cancers. For more information on relative 5 year survival rates, see the American Cancer Society’s brochure Cancer Facts and Figures 2014.

In this report, we are interested in observing trends in diagnosis, treatment, and death by state.

To begin our analysis, all applicable libraries were loaded.

library(dplyr)
library(knitr)
library(tidyr)
library(ggmap)
library(ggvis)
library(lubridate)
library(tools)
library(rgdal)
library(leaflet)

#Five year survival data set
FiveYear <- c(1975, 1978, 1981, 1984, 1987, 1990, 1993, 1996, 1999, 2002, 2006)
FiveYearSurvival <- c(.224, .236, .241, .269, .289, .301, .322, .312, .338, .349, .35)
FiveYear <- as.Date(as.character(FiveYear),format="%Y")
SEERFiveYr <- data.frame(FiveYear, FiveYearSurvival)

#Create a dataframe with the above statistics. 
Year <-  c(1975:2013)
NewCasesper100K <-c(5.85, 5.82, 6.17,5.76, 6.12, 6.3, 6.51, 6.43, 6.31, 6.12, 6.94, 6.85, 7, 6.83, 6.86, 7.05, 6.96, 6.98, 6.77, 6.63, 6.5, 6.68, 6.76, 6.64, 6.92, 6.8, 6.64, 6.76, 6.66, 6.85, 6.76, 6.43, 6.59, 6.71, 6.82, 6.55, 6.61, 6.48, 6.41)
#Give the Y axis a misnomer so it reflects the intention of the line chart 
Annual_Mortality_and_Diagnosis_per_100000 <- c(4.11, 4.34, 4.4, 4.53, 4.26, 4.37, 4.36, 4.43, 4.39, 4.55, 4.57, 4.53, 4.71, 4.72, 4.73, 4.87, 4.95, 4.85, 4.79, 4.84, 4.67, 4.73, 4.68, 4.68, 4.64, 4.53, 4.45, 4.45, 4.4, 4.31, 4.34, 4.17, 4.21, 4.28, 4.35, 4.25, 4.25, 4.4, 4.34)
Annually <- data.frame(Year,NewCasesper100K, Annual_Mortality_and_Diagnosis_per_100000)
#Convert Year variable to date format.
Annually$Year <- as.Date(as.character(Annually$Year), format="%Y")

The 5 year survival rate for all stages of these diseases is 33.8%, low when compared with other cancers. For more information on relative 5 year survival rates, see the American Cancer Society’s brochure Cancer Facts and Figures 2014. The improving survival rate can be obersved below.

SEERFiveYr %>% ggvis(~FiveYear, ~FiveYearSurvival, stroke := "green") %>% set_options(height = 280, width = 950)

## Guessing layer_lines()

Annual Diagnosis and Mortality

According to the SEER data, rates of new CNS cancers have been declining at a 0.2% per year over the past decade and death rates have remained stable.

The incidence of new cases and deaths for the past (nearly) 4 decades is shown in on the time series graph below. This data can be accessed on another NCI ink. The website allows for queries based on various demographics.

New cases (in pink) and deaths (in blue) of brain cancer per 100,000 per year

Annually %>%  ggvis(~Year,~NewCasesper100K, stroke := "#FF00FF") %>% layer_paths() %>% layer_paths(x = ~Year, y = ~Annual_Mortality_and_Diagnosis_per_100000, stroke := "#436EEE") %>% set_options(height = 480, width = 950)

Where is the cancer and where is the best treatment?

#read in diagnosis and death statistics
MM <- read.csv("MandM.csv", header = TRUE, stringsAsFactors = FALSE)
#Trim data frame and rename columns
MM <- MM %>% filter(Aver_Annual_Incid_Count<=5000) %>% mutate(RatioDeathtoDiag = Aver_Deaths_per_Yr/Aver_Annual_Incid_Count, TrendInDeath1 = ifelse(Recent_Death_Trend == "falling", 1, ifelse(Recent_Death_Trend=="stable", 2,3))) %>% rename(Diagnosed = Aver_Annual_Incid_Count, DeathPer100K = Death_per_100000, Death = Aver_Deaths_per_Yr, TrendInIncidence = Trend) %>% arrange(desc(Diagnosed)) %>% select(State, Incid_per_100000, Diagnosed, Death,  Recent_Death_Trend, RatioDeathtoDiag)
#round ratio to 2 places
MM$RatioDeathtoDiag <- round(MM$RatioDeathtoDiag, 2)

We address these questions a table and a density map. We read the NCI state data referenced earlier. Some states were missing values in the NCI database. For our purposes, we have elected to maintain the NA incidence rate for these states.

The second column shows where the diseases are most prevalent. Incidence of new cases per 100,000 is in the third column. The other key variables are the number of number of deaths in 2013, the death trends, and the ratio of death to diagnosis. A sample follows. YOu will notice that the prevalence of disease is not necessarily where the highest incidence is. New diagnoses in 2013 were most prevalent in CA, yet the incidence there, 6.1 to 100,000 is among the lowest in the country.

kable(head(MM))

State	Incid_per_100000	Diagnosed	Death	Recent_Death_Trend	RatioDeathtoDiag
California	6.1	2298	1628	falling	0.71
Texas	6.4	1597	986	stable	0.62
Florida	6.6	1463	1001	stable	0.68
New York	6.5	1355	823	falling	0.61
Pennsylvania	7.2	1034	658	stable	0.64
Ohio	6.9	878	591	falling	0.67

Is the best treatment situated where the need is greatest?

We address the second question with a density map of the prevalence of the diagnoses, populated with treatment centers. In 2014, U.S. News released a list of the best hospitals specializing in the treatment of brain cancer in the United States. Treatment should be where the disease is most prevalent, therefore we consider the actual number of new cases in each state. Because best oncological practices are necessary to improve outcomes, patients look to highly reputed treatment centers. Neurosurgery and Neurooncology are highly competitive subspecialities and there are relatively few practitioners. It’s critical that they target the most patients requiring their services. Ideally, the hospitals will be clustered the the areas with the most disease.

The process in developing this visualization is as follows:

Read the U.S. News data into R. The data was found at the above referenced link was heavily cleaned outside of R before importing.

Hosp <- read.csv("Treatment_Centers.csv", header = TRUE, stringsAsFactors = FALSE)
Hosp1 <-  Hosp %>% mutate(Address = paste(Street, City_St_Zip)) %>% select(Hospital, Program = Program_Name, Address, Phone)
Hosp2 <- Hosp1 %>% mutate(Phone = gsub("Phone:", "", Phone))

Read in a shapefile of the US. The ShapeFile is from here.

#Load shapefile
States <- readOGR("./States", "States")
#Transform the shapefile
States <- spTransform(States, CRS("+proj=longlat +datum=WGS84"))
#Shape map code here, but not shown
#leaflet() %>% addProviderTiles("CartoDB.Positron") %>% addPolygons(data = States, popup = ~NAME, color = "#CD6889")%>% fitBounds(-119, 46, -72, 29)

Merge the Shape file with the state disease data.

#Merging the shape and the incidence files.
CAByState <- merge(States, MM, by.x="NAME", by.y= "State")
#Design popup label for the Diagnosed density map.
State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build Diagnosed density map
CancerbyState1 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Diagnosed), color = "##D46A6A", weight = 1, popup = ~State_Popup)%>% fitBounds(-119, 46, -72, 28)

Geocode the treatment facilities and create a popup with the contact information. Below is a slice of two hospitals

#Geocode
BrCACenters <- geocode(Hosp2$Address)
#Rename columns
BrCACenters <- BrCACenters %>% rename(Lat = lat, Lon = lon)
BrCACenters <- bind_cols(Hosp2, BrCACenters)
#Format labels
Label <- paste("<strong>Hospital: </strong>", BrCACenters$Hospital, "<br><strong>Address:  </strong>", BrCACenters$Address, "<br><strong>Phone Number: </strong>", BrCACenters$Phone)
kable(head(BrCACenters,2))

Hospital	Program	Address	Phone	Lon	Lat
Chao Family Comprehensive Cancer Center - University of California, Irvine	Comprehensive Brain Tumor Program	101 The City Drive South Orange, CA 92868	714-456-8000	-117.8903	33.78807
City of Hope Comprehensive Cancer Center	Brain Tumor Program	1500 East Duarte Road Duarte, CA 91010	800-826-4673	-117.9708	34.12964

Prevalence of new cases in 2013 and where to get treated

State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build Diagnosed density map
CancerbyState1 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Diagnosed), color = "##D46A6A", weight = 1, popup = ~State_Popup)%>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-119, 46, -72, 28)

This map is an interactive graphic. Clicking on a state activates a popup with the number of new cases diagnosed in 2013 and clicking on a blue pointer provides contact information for the hospital’s specialty program.

CancerbyState1

In the density map above where the darker shaded areas represent the highest incidence of new disease in 2013, at a glance we can see that the best centers appropriately located. Zooming in on areas where hospitals are highly concentrated increases visibility of the incidence.

Is mortality lower near the best cancer centers?

From the data given, we can not measure if outcomes are better in any particular state. Further, it would be presumptious to say that a state’s population would live longer and stronger based on its proximity to a top cancer center. With the available data, to speculate on a possible relationship, we use data from CDC table which shows state trends in mortality over 5 years, 2008-2013. The lightest color represents falling incidence, the medium, stable, and the darkest, rising. No data is available for Nevada. We have left the pinned locations for reference.

State_Popup6 <- paste("In ", CAByState$NAME, "the incidence of death from CNS cancers has been " ,CAByState$Recent_Death_Trend)
factpal <- colorFactor(c("#DBEFED","#8AD0C6","#C0E4E2"), MM$Recent_Death_Trend)

CancerByState7 <- leaflet() %>%  addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(stroke = FALSE, smoothFactor = 0.2, fillOpacity = 1, data = CAByState, fillColor = ~factpal(Recent_Death_Trend), color = "##D46A6A", popup = ~State_Popup6)  %>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-119, 46, -72, 28)

CancerByState7

A visual pattern emerges; the hospitals are now mostly clustered in areas where incidence is falling or stable. Alaska, which can be seen by zooming out, is very far from these highly rated hospitals and shows a rising trend in incidence. Further research may be warranted and it might make sense to explore if there is a connection to the manner in which these highly rated hospitals approach treatment and if so, find ways to reach out to local hospitals and make clinical trials more accessible. Additionally, local hospitals may need to connect with these research institutions in consultation and referral for the benefit of the patient.

Color scheme

The map choices we were given in class didn’t suit our needs. The state labels were distracting and the information in the popups made them unnecessary. More map choices can be found here.

Cancers of the Central Nervous System in the United States

Edrick Wittes, Christine Iyer

October 19, 2016