According to the National Cancer Institute, we have about a 0.6% of being diagnosed with brain or other nervous system cancer, hereby referred to as the CNS, in our lifetime. The NCI data on these malignancies was collected from 2011-2013 and is in their publication Surveillance, Epidemology, and End Results Program. Based on their statistics, 6.4 patients per 100,000 people were diagnosed between 2009-2013, men more frequently than women.
The overall trend in the incidence of new cases for the past 2 decades is shown below. The diagnosis and death rate data broken down into various demographics comes from the same institute and can be reviewed on their website. All applicable libraries were loaded to complete the analysis.
In this report, we are interested in observing trends in diagnosis, treatment, and death by state.
library(dplyr)
library(knitr)
library(tidyr)
library(ggmap)
library(ggvis)
library(lubridate)
library(tools)
library(rgdal)
library(leaflet)#Create a dataframe with the above statistics. 
Year <-  c(1992, 1993, 1994,1995,1996,1997,1998,1999,2000,2001,2001,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013)
NewCasesper100K <- c(6.8,6.6,6.4,6.3,6.5,6.5,6.5,6.7,6.4,6.3,6.5,6.4,6.5,6.5,6.3,6.4,6.4,6.6,6.2,6.1,6.2,6.1)
#Give the Y axis a misnomer so it reflects the intention of the line chart 
Annual_Mortality_and_Diagnosis_per_100000 <- c(4.9,4.8,4.8,4.7,4.7,4.7,4.7,4.6,4.5,4.4,4.5,4.4,4.3,4.3,4.2,4.2,4.3,4.3,4.2,4.3,4.4,4.3)
Annually <- data.frame(Year,NewCasesper100K, Annual_Mortality_and_Diagnosis_per_100000)
#Convert Year variable to date format. 
Annually$Year <- as.Date(as.character(Annually$Year), format="%Y")The 5 year survival rate for all stages of these diseases is 33.8%; this is low when compared with other cancers. For more information on relative 5 year survival rates, see the American Cancer Society’s brochure Cancer Facts and Figures 2014.
Annually %>%  ggvis(~Year,~NewCasesper100K, stroke := "#FF00FF") %>% layer_paths() %>% layer_paths(x = ~Year, y = ~Annual_Mortality_and_Diagnosis_per_100000, stroke := "#436EEE")The CDC collects data on the incidence of all cancers. We have chosen to look at the incidence of CNS malignancies by state in 2013. Some states were missing values. For our purposes, we have elected to maintain the NA incidence rate for these states. Because
Initially we plotted the annual incidence of new cases per 100,000 people on a density map but concluded that because treatment facilities strive to serve the most patients, it would be more meaningful to look at actual numbers. Below we show density maps of diagnosis and death data.
The steps involved in creating such a visualization include:
#Load shapefile
States <- readOGR("./States", "States")
#Transform the shapefile
States <- spTransform(States, CRS("+proj=longlat +datum=WGS84"))
#Shape map code here, but not shown
#leaflet() %>% addProviderTiles("CartoDB.Positron") %>% addPolygons(data = States, popup = ~NAME, color = "#CD6889")%>% fitBounds(-120, 47, -72, 29)#read in diagnosis and death statistics
MM <- read.csv("MandM.csv", header = TRUE, stringsAsFactors = FALSE)
#Trim data frame and rename columns
kable(head(MM <- MM %>%  rename(Incidence = Incid_per_100000, Diagnosed = Aver_Annual_Incid_Count, DeathPer100K = Death_per_100000, Death = Aver_Deaths_per_Yr, TrendInIncidence = Trend, TrendInDeath = Recent_Death_Trend) %>% filter(Diagnosed<=5000) %>% mutate(RatioDeathtoDiag = Death/Diagnosed) %>% arrange(desc(Diagnosed)) %>% select(State,  Incidence, Diagnosed, RatioDeathtoDiag, TrendInIncidence, Death, TrendInDeath)), digits = 2)| State | Incidence | Diagnosed | RatioDeathtoDiag | TrendInIncidence | Death | TrendInDeath | 
|---|---|---|---|---|---|---|
| California | 6.1 | 2298 | 0.71 | stable | 1628 | falling | 
| Texas | 6.4 | 1597 | 0.62 | stable | 986 | stable | 
| Florida | 6.6 | 1463 | 0.68 | stable | 1001 | stable | 
| New York | 6.5 | 1355 | 0.61 | stable | 823 | falling | 
| Pennsylvania | 7.2 | 1034 | 0.64 | stable | 658 | stable | 
| Ohio | 6.9 | 878 | 0.67 | stable | 591 | falling | 
#Merging the shape and the incidence files.
CAByState <- merge(States, MM, by.x="NAME", by.y= "State")
#Design popup label for the Diagnosed density map.
State_Popup <- paste(" In ", CAByState$NAME, "the number of people who were diagnosed with brain cancer in 2013 was ",CAByState$Diagnosed)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
#Build Diagnosed density map
CancerbyState1 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Diagnosed), color = "##D46A6A", weight = 1, popup = ~State_Popup) %>% fitBounds(-120, 47, -72, 29)
CancerbyState1#Design popup label for the Deaths density map
State_Popup1 <- paste("In ", CAByState$NAME, "the number of people who died from brain cancer in 2013 was ",CAByState$Death)
#Create color scheme for map. 
pal <- colorQuantile("PuRd", NULL, n = 4)
CancerbyState2 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(CAByState$Death), color = "##D46A6A", weight = 1, popup = ~State_Popup1) %>% fitBounds(-120, 47, -72, 29)
CancerbyState2Because best oncological practices are necessary to improve outcomes, patients look to highly reputed treatment centers. In 2014, U.S. News released a list of the best hospitals specializing in the treatment of brain cancer in the United States. But are these centers spread evenly through the country?
The data was found at the above referenced link. It was heavily cleaned outside of R before importing.
Hosp <- read.csv("Treatment_Centers.csv", header = TRUE, stringsAsFactors = FALSE)
Hosp1 <-  Hosp %>% mutate(Address = paste(Street, City_St_Zip)) %>% select(Hospital, Program = Program_Name, Address, Phone)
Hosp2 <- Hosp1 %>% mutate(Phone = gsub("Phone:", "", Phone))#Geocode
BrCACenters <- geocode(Hosp2$Address)
#Rename columns
BrCACenters <- BrCACenters %>% rename(Lat = lat, Lon = lon)
BrCACenters <- bind_cols(Hosp2, BrCACenters)
#Format labels
Label <- paste("<strong>Hospital: </strong>", BrCACenters$Hospital, "<br><strong>Address:  </strong>", BrCACenters$Address, "<br><strong>Phone Number: </strong>", BrCACenters$Phone)
kable(head(BrCACenters,2))| Hospital | Program | Address | Phone | Lon | Lat | 
|---|---|---|---|---|---|
| Chao Family Comprehensive Cancer Center - University of California, Irvine | Comprehensive Brain Tumor Program | 101 The City Drive South Orange, CA 92868 | 714-456-8000 | -117.8903 | 33.78807 | 
| City of Hope Comprehensive Cancer Center | Brain Tumor Program | 1500 East Duarte Road Duarte, CA 91010 | 800-826-4673 | -117.9708 | 34.12964 | 
From the data given, we can not measure if outcomes are better in any particular state. Further, it would be presumptious to say that a state’s population would live longer and stronger based on its proximity to a top cancer center. However we have elected to look at the ratio of cases of death to diagnosis in 2013 on a third map with the treatment centers plotted.
State_Popup2 <- paste("In ", CAByState$NAME, "the ratio of those dying of brain cancer to those diagnosed in 2013 was " ,CAByState$RatioDeathtoDiag)
#Create color scheme for map
pal <- colorQuantile("PuRd", NULL, n = 4)
#Map death to diagnosis ratio
CancerbyState3 <- leaflet() %>% addProviderTiles("CartoDB.PositronNoLabels") %>% addPolygons(data = CAByState, fillColor = ~pal(RatioDeathtoDiag), color = "##D46A6A", weight = 1, popup = ~State_Popup2) %>% addMarkers(BrCACenters, lng = BrCACenters$Lon, lat = BrCACenters$Lat, popup = Label) %>% fitBounds(-120, 47, -72, 29)
#Show map with treatment center popups
CancerbyState3It is interesting that in the Northeast, where the treatment centers are highly concentrated, the death to diagnosis ratio is in the lower 60%. There are far fewer states with a higher death to diagnosis ratio (mid 75-80%) that have so many best rated treatment options.
To work on:
Sizing maps so they’re wider
Round to 2 decimal places for ratio
Legends
Less obtrusive location pins for the hospitals
The map choices we were given in class didn’t suit our needs. The state labels were distracting and the information in the popups made them unnecessary. More map choices can be found here.