As a consultant to many different healthcare entities, I always find healthcare data analysis fascinating. For this assignment I found a data table with various information on home health agencies by state. My plan to get an idea of how each state performed in comparison to each other on at least one metric. I started by loading the necessary packages.
library(dplyr)
library(rgdal)
library(leaflet)
I was able to locate a shapefile for the entire United States, and I brought this into R using the readOGR function.
us_states <- readOGR("./cb_2015_us_state_20m", "cb_2015_us_state_20m")
## OGR data source with driver: ESRI Shapefile
## Source: "./cb_2015_us_state_20m", layer: "cb_2015_us_state_20m"
## with 52 features
## It has 9 fields
us_states <- spTransform(us_states, CRS("+proj=longlat +datum=WGS84"))
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = us_states,
popup = ~NAME)
I then brought in the csv file containing data for different metrics on home health agencies by state.
home_health <- read.csv("Home_Health_Care_-_State_by_State_Data.csv", stringsAsFactors = FALSE)
I had to do a bit of clean up in order to simplify the data frame and also get rid of unnecessary data. The column I added correlates to a different column with percentages which were stored as characters. I attempted to find easy ways to convert percentages stored as characters to a numeric format, however I didn’t have much luck on that front. My solution was to create a column with manually entered numbers and convert to numeric that way (if anyone knows a good way to convert a character to percentage feel free to let me know!).
hh <- home_health[-c(12, 28),]
hh <- hh[,-c(3:15, 17:24)]
hh$Better.Walking.Patients <- c("55.6", "72.9", "71.1", "63.1", "66.1", "68.3", "63.5", "71.4", "67.8", "70.5", "69", "62", "68.1", "69.4", "67.5", "66.7", "67.4", "71.2", "67.2", "69.1", "69.9", "66.7", "67.2", "63.6", "68.9", "71.5", "72", "63.7", "67.7", "69.9", "67.5", "65.7", "69.8", "65.1", "64.8", "66.9", "67.9", "68.1", "61.9", "69.5", "70.7", "68.1", "69.5", "66", "69.3", "61.3", "71.8", "67", "65", "66.3", "64.5", "65.5", "62.6")
hh <- transform(hh, Better.Walking.Patients = as.numeric(Better.Walking.Patients))
The column that I used to create the choropleth contains percentages of patients who got better at walking or moving around while receiving home health care. Even though I converted this column to numeric for mapping purposes, the general rule still holds true that, in this case, the higher the number is the better.
I then merged the home health data set with the SPDF by state abbreviation and used leaflet to map the data. I chose Reds as the color scheme because it was a bit easier to see the distinctions between states in my opinion.
home_health_bystate <- merge(us_states, hh, by.x = "STUSPS", by.y = "State")
pal <- colorQuantile("Reds", NULL, n = 4)
leaflet() %>%
addProviderTiles("Stamen.Toner") %>%
addPolygons(data = home_health_bystate,
fillColor = ~pal(Better.Walking.Patients),
popup = ~NAME)
This mapping shows states with a higher percentage as darker in shades of red. The darker the state, the greater percentage of inidividuals receiving home health care that improved mobility. As you can see on the map, some of the highest ranking states include Idaho, Ohio, Iowa, Oklahoma, and Colorado. Some of the lowest ranking states are Alaska, Oregon and Texas. Maine is about average in this regard. It is interesting to see the data shown in this format.