1. Introduction

The following project was designed to complement my prospective dissertation work, which focuses on the phenomenon of occupational burnout and related stress in peer support specialists working in the mental health service sector. Peer support specialists are people with mental health recovery experiences who, in a formal capacity, assist others with mental health conditions on their own recovery journeys. Peer support specialists differ from mental health professionals in that they are hired based on their personal experience in recovery rather than professional mental health training; the supportive services they provide are theoretically non-clinical and rooted in their own experiential learning as mental health service users. Occupational stress is shared by many peer support specialists; however, it is grossly understudied despite its notoriety within mental health service organizations.

Research questions will address:

To answer these research questions, data will be collected from approximately 500 California-based peer support specialists in the form of an online survey. Data collection is anticipated to be the most significant, resource-intensive challenge associated with this project. As such, I am brainstorming ways in which the project can easily be publicized and generate interest. One idea is to create a website to inform interested parties about the study, including an interactive map about progress on data collection efforts–actionable knowledge in the simplest sense. Prior to this course, I hadn’t realized the extent of the graphics capabilities offered by R software packages. I was pleasantly surprised to have the opportunity to work on a dissertation-related project in this class. Therefore, this project centers on the creation of a prototypical map linking California counties to data collection progress.

2. Methods/Concepts

The map in this incarnation serves a simple descriptive purpose: to graphically display current survey response rate in an interactive fashion. The eventual goal is to embed a version of this map in a webpage. Ease-of-use will facilitate the engagement of interested parties (supervisors, potential respondents) in the data collection process. The map serves as a convenient update on the collection effort.

The key package allowing for creation of this map is entitled Leaflet. Leaflet allows you to link geographical shapefiles to basemaps which are created by developers and free for public use. The advantage of using Leaflet in a project like this is that counties are more readily identifiable when cities are also visible. The map is also “zoomable”–as there are many small counties in California, it helps to have the option to zoom in on the map. Leaflet code is also highly customizable and many interactive features can be added.

The features I wanted to include in this map included:

To code for the prototypical map I created some dummy data in the format it will likely be stored in during the data collection process. The person-level variables included ID, County, and Response Date. There was also a separate file listing the response goal for each county. These needed to be summarized and merged in order to populate the map. The dataframe was manipulated in the following manner–see annotations for details:

Set Working Directory

setwd("C:/Users/Stephania/Google Drive/Applied Epi Using R/Project2/")
getwd()
## [1] "C:/Users/Stephania/Google Drive/Applied Epi Using R/Project2"

load packages and import data

library(maps)
## 
##  # ATTENTION: maps v3.0 has an updated 'world' map.        #
##  # Many country borders and names have changed since 1990. #
##  # Type '?world' or 'news(package="maps")'. See README_v3. #
library(leaflet)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data(countyMapEnv) #load maps package data

#load dummy data: this file includes unique entries from 400 hypothetical survey respondents from all over the state.  
dummyData <- read.csv("./DummyData.csv")

#show summary of dummyData
summary(dummyData)
##        ID              RespDate             County      CertStatus   
##  Min.   :  1.0   11/18/2016:  8   Alameda      : 21   Min.   :0.000  
##  1st Qu.:100.8   10/27/2016:  6   Los Angeles  : 21   1st Qu.:0.000  
##  Median :200.5   10/13/2016:  5   Riverside    : 21   Median :0.000  
##  Mean   :200.5   10/20/2016:  5   Sacramento   : 21   Mean   :0.455  
##  3rd Qu.:300.2   11/12/2016:  5   San Diego    : 21   3rd Qu.:1.000  
##  Max.   :400.0   11/5/2016 :  5   San Francisco: 21   Max.   :1.000  
##                  (Other)   :366   (Other)      :274

collapsing the data and merging in county response targets

#summarize by County:  to match with county reponse goals and prep for the map, responses will need to be summarized into county-level data
sumByCounty <- tapply(dummyData$ID, dummyData$County, length)
sumByCounty <- as.data.frame(sumByCounty)
sumByCounty$NAME <- row.names(sumByCounty)
sumByCounty
##                 sumByCounty            NAME
## Alameda                  21         Alameda
## Alpine                    6          Alpine
## Amador                    6          Amador
## Butte                     6           Butte
## Calaveras                 6       Calaveras
## Colusa                    6          Colusa
## Contra Costa              6    Contra Costa
## Del Norte                 6       Del Norte
## El Dorado                 6       El Dorado
## Fresno                    8          Fresno
## Glenn                     8           Glenn
## Humboldt                  8        Humboldt
## Imperial                  8        Imperial
## Inyo                      8            Inyo
## Kern                      8            Kern
## Kings                     8           Kings
## Lake                      8            Lake
## Lassen                    8          Lassen
## Los Angeles              21     Los Angeles
## Madera                    5          Madera
## Marin                     5           Marin
## Mariposa                  5        Mariposa
## Mendocino                 5       Mendocino
## Merced                    5          Merced
## Modoc                     5           Modoc
## Mono                      8            Mono
## Monterey                  8        Monterey
## Napa                      8            Napa
## Nevada                    8          Nevada
## Orange                    8          Orange
## Placer                    8          Placer
## Plumas                    8          Plumas
## Riverside                21       Riverside
## Sacramento               21      Sacramento
## San Benito                8      San Benito
## San Bernardino            8  San Bernardino
## San Diego                21       San Diego
## San Francisco            21   San Francisco
## San Joaquin               3     San Joaquin
## San Luis Obispo           3 San Luis Obispo
## San Mateo                 3       San Mateo
## Santa Barbara             3   Santa Barbara
## Santa Clara               3     Santa Clara
## Santa Cruz                3      Santa Cruz
## Shasta                    3          Shasta
## Sierra                    3          Sierra
## Siskiyou                  3        Siskiyou
## Solano                    3          Solano
## Sonoma                    3          Sonoma
## Stanislaus                3      Stanislaus
## Sutter                    2          Sutter
## Tehama                    2          Tehama
## Trinity                   2         Trinity
## Tulare                    2          Tulare
## Tuolumne                  2        Tuolumne
## Ventura                   2         Ventura
## Yolo                      2            Yolo
## Yuba                      2            Yuba
#Now that the dataframe is in the correct format, merge in the target numbers for each county--counties often have unique target numbers.  
goal <- read.csv("./CACountyGoal.csv")
names(goal) <- c("NAME", "Goal")
goal
##               NAME Goal
## 1          Alameda   25
## 2           Alpine   10
## 3           Amador   10
## 4            Butte   10
## 5        Calaveras   10
## 6           Colusa   10
## 7     Contra Costa   10
## 8        Del Norte   10
## 9        El Dorado   10
## 10          Fresno   10
## 11           Glenn   10
## 12        Humboldt   10
## 13        Imperial   10
## 14            Inyo   10
## 15            Kern   15
## 16           Kings   10
## 17            Lake   10
## 18          Lassen   10
## 19     Los Angeles   30
## 20          Madera   10
## 21           Marin   10
## 22        Mariposa   10
## 23       Mendocino   10
## 24          Merced   10
## 25           Modoc   10
## 26            Mono   10
## 27        Monterey   10
## 28            Napa   10
## 29          Nevada   10
## 30          Orange   10
## 31          Placer   10
## 32          Plumas   10
## 33       Riverside   30
## 34      Sacramento   25
## 35      San Benito   10
## 36  San Bernardino   15
## 37       San Diego   25
## 38   San Francisco   20
## 39     San Joaquin   10
## 40 San Luis Obispo   10
## 41       San Mateo   10
## 42   Santa Barbara   15
## 43     Santa Clara   10
## 44      Santa Cruz   10
## 45          Shasta   10
## 46          Sierra   10
## 47        Siskiyou   10
## 48          Solano   10
## 49          Sonoma   10
## 50      Stanislaus   15
## 51          Sutter   10
## 52          Tehama   10
## 53         Trinity   10
## 54          Tulare   10
## 55        Tuolumne   10
## 56         Ventura   10
## 57            Yolo   10
## 58            Yuba   10
sumByCounty <- merge(sumByCounty,goal,by="NAME")
sumByCounty
##               NAME sumByCounty Goal
## 1          Alameda          21   25
## 2           Alpine           6   10
## 3           Amador           6   10
## 4            Butte           6   10
## 5        Calaveras           6   10
## 6           Colusa           6   10
## 7     Contra Costa           6   10
## 8        Del Norte           6   10
## 9        El Dorado           6   10
## 10          Fresno           8   10
## 11           Glenn           8   10
## 12        Humboldt           8   10
## 13        Imperial           8   10
## 14            Inyo           8   10
## 15            Kern           8   15
## 16           Kings           8   10
## 17            Lake           8   10
## 18          Lassen           8   10
## 19     Los Angeles          21   30
## 20          Madera           5   10
## 21           Marin           5   10
## 22        Mariposa           5   10
## 23       Mendocino           5   10
## 24          Merced           5   10
## 25           Modoc           5   10
## 26            Mono           8   10
## 27        Monterey           8   10
## 28            Napa           8   10
## 29          Nevada           8   10
## 30          Orange           8   10
## 31          Placer           8   10
## 32          Plumas           8   10
## 33       Riverside          21   30
## 34      Sacramento          21   25
## 35      San Benito           8   10
## 36  San Bernardino           8   15
## 37       San Diego          21   25
## 38   San Francisco          21   20
## 39     San Joaquin           3   10
## 40 San Luis Obispo           3   10
## 41       San Mateo           3   10
## 42   Santa Barbara           3   15
## 43     Santa Clara           3   10
## 44      Santa Cruz           3   10
## 45          Shasta           3   10
## 46          Sierra           3   10
## 47        Siskiyou           3   10
## 48          Solano           3   10
## 49          Sonoma           3   10
## 50      Stanislaus           3   15
## 51          Sutter           2   10
## 52          Tehama           2   10
## 53         Trinity           2   10
## 54          Tulare           2   10
## 55        Tuolumne           2   10
## 56         Ventura           2   10
## 57            Yolo           2   10
## 58            Yuba           2   10
#Now calculate percentage of county goal complete--rounded to nearest integer for a cleaner map
sumByCounty$progress <- round(((sumByCounty$sumByCounty/sumByCounty$Goal)*100), digits=0)
sumByCounty
##               NAME sumByCounty Goal progress
## 1          Alameda          21   25       84
## 2           Alpine           6   10       60
## 3           Amador           6   10       60
## 4            Butte           6   10       60
## 5        Calaveras           6   10       60
## 6           Colusa           6   10       60
## 7     Contra Costa           6   10       60
## 8        Del Norte           6   10       60
## 9        El Dorado           6   10       60
## 10          Fresno           8   10       80
## 11           Glenn           8   10       80
## 12        Humboldt           8   10       80
## 13        Imperial           8   10       80
## 14            Inyo           8   10       80
## 15            Kern           8   15       53
## 16           Kings           8   10       80
## 17            Lake           8   10       80
## 18          Lassen           8   10       80
## 19     Los Angeles          21   30       70
## 20          Madera           5   10       50
## 21           Marin           5   10       50
## 22        Mariposa           5   10       50
## 23       Mendocino           5   10       50
## 24          Merced           5   10       50
## 25           Modoc           5   10       50
## 26            Mono           8   10       80
## 27        Monterey           8   10       80
## 28            Napa           8   10       80
## 29          Nevada           8   10       80
## 30          Orange           8   10       80
## 31          Placer           8   10       80
## 32          Plumas           8   10       80
## 33       Riverside          21   30       70
## 34      Sacramento          21   25       84
## 35      San Benito           8   10       80
## 36  San Bernardino           8   15       53
## 37       San Diego          21   25       84
## 38   San Francisco          21   20      105
## 39     San Joaquin           3   10       30
## 40 San Luis Obispo           3   10       30
## 41       San Mateo           3   10       30
## 42   Santa Barbara           3   15       20
## 43     Santa Clara           3   10       30
## 44      Santa Cruz           3   10       30
## 45          Shasta           3   10       30
## 46          Sierra           3   10       30
## 47        Siskiyou           3   10       30
## 48          Solano           3   10       30
## 49          Sonoma           3   10       30
## 50      Stanislaus           3   15       20
## 51          Sutter           2   10       20
## 52          Tehama           2   10       20
## 53         Trinity           2   10       20
## 54          Tulare           2   10       20
## 55        Tuolumne           2   10       20
## 56         Ventura           2   10       20
## 57            Yolo           2   10       20
## 58            Yuba           2   10       20

Setting up the map

The shapefile of US counties needs to be reduced to California counties and linked to 1. the dataframe of responses/goals created, and 2. the leaflet map.

#loading package needed for the leaflet map
  library(rgdal)
## Loading required package: sp
## rgdal: version: 1.1-1, (SVN revision 572)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
##  Path to GDAL shared files: C:/Users/Stephania/Documents/R/win-library/3.2/rgdal/gdal
##  GDAL does not use iconv for recoding strings.
##  Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
##  Path to PROJ.4 shared files: C:/Users/Stephania/Documents/R/win-library/3.2/rgdal/proj
##  Linking to sp version: 1.2-1
#loading shapefile of all counties in the US
  counties <- readOGR("./shapefiles", layer="cb_2014_us_county_20m")
## OGR data source with driver: ESRI Shapefile 
## Source: "./shapefiles", layer: "cb_2014_us_county_20m"
## with 3220 features
## It has 9 fields
## Warning in readOGR("./shapefiles", layer = "cb_2014_us_county_20m"): Z-
## dimension discarded
#filtering for only california counties based on FIPS county codes, which uniquely identify counties and county equivalents in the United States.  CA = 06.
  counties <- subset(counties, counties@data$STATEFP=="06")
  
#making a leaflet map of california counties
  leaflet() %>% addTiles() %>% addPolygons(data=counties)

#merging in the data into this shapefile
  counties@data = data.frame(counties@data, sumByCounty[match(counties@data[,"NAME"], sumByCounty[,"NAME"]),])

3. Results

Now that the data and map objects have been prepped, the following code displays creation of the map using a shapefile of US counties (filtering for CA counties) and the Leaflet package. In summary, the basemap, ‘Stamen.TonerLite’, was among the more minimalist designs that could be found at https://leaflet-extras.github.io/leaflet-providers/preview/. Selecting the map style was simple using this catalog of providers–I simply inserted the name of the map in the code to preview how it would look in the finished product.

The county colors were selected based on recommendations from ColorBrewer (www.colorbrewer2.org). Opacity refers to the color within county lines, stroke refers to the county outlines, weight refers to the weight of those lines.

In formatting the legend, the number of bins is the number of percentage categories dictating the colors utilized in the map. The percentages are drawn from the response goal met in the dataframe, so do not include the character “%” –therefore these must be included in the suffix to display in the legend.

Making the map

#set color palette and base map
  colorRamp <- colorRamp(c("#2c7fb8","#7fcdbb","#edf8b1"), interpolate = "spline")
  palette <- colorNumeric(colorRamp, counties@data$progress)
  
  leaflet() %>% addProviderTiles("Stamen.TonerLite") %>%
    addPolygons(
      weight= 2,
      stroke = TRUE,
      fillOpacity = .65,
      data=counties,
      color = ~palette(progress),
       popup = ~paste("<strong>County:</strong>",NAME,
                       "<br>",
                       "<strong>Total Responses:</strong>",sumByCounty,
                      "<br>",
                       "<strong>Complete:</strong>",progress,"<strong>%</strong>")
      ) %>% addLegend(title = "Response <br> Goal Met", pal = palette, values = counties@data$progress, bins=5, opacity = 1, position="topright", labFormat = labelFormat(suffix = '%'))

4. Discussion

This map is a good start on what I had envisioned for the project website. It is intuitive, especially considering how similar it is to using a Google map. The colors show a clear gradient, are colorblind-safe and print friendly. The counties have clear borders and clicking on them yields straightforward information about the number of surveys completed and how much progress has been made toward individual county goals.

There are a few features I couldn’t figure out, and still a few more I’d like to add. Some features I researched included the possibility of constraining the visible area of the map to include only California, because bordering states are not relevant to the project and seem to clutter the graphic. When I consulted StackOverflow, a question and answer site for professional and enthusiast programmers, it seemed that this could be a difficult technical task (see http://stackoverflow.com/questions/33820299/restrict-the-viewable-part-of-a-leaflet-tile-to-a-polygon-area?noredirect=1#comment55409455_33820299). Further, “covering” other areas of the map (i.e., Nevada, Oregon, etc.) may result in clipping of some of the graphical data identifying cities in California, which would be aesthetically unappealing. I would like to add a feature where hovering over a county will result in the given county’s border being highlighted. This seems more feasible given my current skillset, but I want to incorporate this only after learning how to display on a website multiple maps reflecting the progress toward response goals over time. I am not certain whether I should use ShinyApps for this, as was demonstrated by our guest lecturer, or if this is something I could program in my own website.