For work I have been looking at some counties in Florida. I thought it would be cool to look at some population data and then map them by county.

library(tidyverse)
library(rvest)
I am going to scrap some data from a website that has the population for each county in Florida.
url <- "https://www.florida-demographics.com/counties_by_population"
h <- read_html(url)
tab <- h %>% html_nodes("table")
tab <- tab %>% html_table
FloridaDemographics <- as.data.frame(tab)
Converting character columns to numeric
FloridaDemographics$Rank <- as.numeric(FloridaDemographics$Rank)
## Warning: NAs introduced by coercion
FloridaDemographics$Population <- parse_number(FloridaDemographics$Population)
## Warning: 1 parsing failure.
## row col expected actual
##  68  -- a number      .
Removing the last row because it is needless text
FloridaDemographics <- FloridaDemographics[-c(68),]
Loading the packages I need and getting the Florida Longitude and Latitude data
library(maps)
library(ggmap)
library(mapdata)
The map_data in ggplot2 has longitude and latitude data.
states <- map_data("state")
FL_df <- subset(states, region == "florida")
counties <- map_data("county")
FL_counties <- subset(counties, region == "florida")
Eventually I’ll have to merge FloridaDemographics and Fl_counties, but I need something to match them by. Let me see what they share.
head(FloridaDemographics)
##   Rank              County Population
## 1    1   Miami-Dade County    2715516
## 2    2      Broward County    1909151
## 3    3   Palm Beach County    1446277
## 4    4 Hillsborough County    1378883
## 5    5       Orange County    1321194
## 6    6     Pinellas County     957875
unique(FL_counties$subregion)
##  [1] "alachua"      "baker"        "bay"          "bradford"     "brevard"     
##  [6] "broward"      "calhoun"      "charlotte"    "citrus"       "clay"        
## [11] "collier"      "columbia"     "miami-dade"   "de soto"      "dixie"       
## [16] "duval"        "escambia"     "flagler"      "franklin"     "gadsden"     
## [21] "gilchrist"    "glades"       "gulf"         "hamilton"     "hardee"      
## [26] "hendry"       "hernando"     "highlands"    "hillsborough" "holmes"      
## [31] "indian river" "jackson"      "jefferson"    "lafayette"    "lake"        
## [36] "lee"          "leon"         "levy"         "liberty"      "madison"     
## [41] "manatee"      "marion"       "martin"       "monroe"       "nassau"      
## [46] "okaloosa"     "okeechobee"   "orange"       "osceola"      "palm beach"  
## [51] "pasco"        "pinellas"     "polk"         "putnam"       "st johns"    
## [56] "st lucie"     "santa rosa"   "sarasota"     "seminole"     "sumter"      
## [61] "suwannee"     "taylor"       "union"        "volusia"      "wakulla"     
## [66] "walton"       "washington"

The “subregion” column corresponds with the county name, but it is not capitalized and does not have “County” at the end like the FloridaDemographics. So, within the Fl_counties dataframe, I’m going to create a replicate column of subregion that I will use to match the format of the other dataframe. I’ll call it “County.” Then I’ll paste “County” to the end of each value in the new column.

FL_counties$county <- FL_counties$subregion
FL_counties$county <- paste(FL_counties$county, "County" )   
But the first letter is still not capitalized. I’ll create a functions called “simplecap” that capitalizes the first letter, and then apply the function over each value. Third, I’ll change the column name using the colnames function.
simpleCap <- function(x) {
  s <- strsplit(x, " ")[[1]]
  paste(toupper(substring(s, 1,1)), substring(s, 2),
        sep="", collapse=" ")
}

FL_counties$county <- sapply(FL_counties$county, simpleCap)
colnames(FL_counties)[7] <- "County"
Merging the Long and Lat data with the demographics by the “County” column in each dataframe.
Florida_merged <- inner_join(FloridaDemographics, FL_counties, by = "County")
Looking at the data, it appears Miami-Dade County did not merge. The Fl_counties data was “Miami-dade County” while the FloridaDemographics was “Miami-Dade County.” Let me see if there are any other Counties that did not merged. I can check this by looking at the “Rank” variable in the new merged data.
unique(Florida_merged$Rank)
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 22 23 25 26 27 28
## [26] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 54
## [51] 55 56 57 58 59 60 61 62 63 64 65 66 67
Counties with Ranks 21, 24, and 48 were also not merged. What are the names for each county?
FloridaDemographics
##    Rank              County Population
## 1     1   Miami-Dade County    2715516
## 2     2      Broward County    1909151
## 3     3   Palm Beach County    1446277
## 4     4 Hillsborough County    1378883
## 5     5       Orange County    1321194
## 6     6     Pinellas County     957875
## 7     7        Duval County     924229
## 8     8          Lee County     718679
## 9     9         Polk County     668671
## 10   10      Brevard County     576808
## 11   11      Volusia County     527634
## 12   12        Pasco County     510593
## 13   13     Seminole County     455086
## 14   14     Sarasota County     412144
## 15   15      Manatee County     373853
## 16   16      Collier County     363922
## 17   17       Marion County     348371
## 18   18      Osceola County     338619
## 19   19         Lake County     335362
## 20   20     Escambia County     311522
## 21   21    St. Lucie County     305591
## 22   22         Leon County     288102
## 23   23      Alachua County     263148
## 24   24    St. Johns County     235503
## 25   25         Clay County     207291
## 26   26     Okaloosa County     200737
## 27   27     Hernando County     182696
## 28   28          Bay County     182482
## 29   29    Charlotte County     176954
## 30   30   Santa Rosa County     170442
## 31   31       Martin County     157581
## 32   32 Indian River County     150984
## 33   33       Citrus County     143087
## 34   34       Sumter County     120999
## 35   35      Flagler County     107139
## 36   36    Highlands County     102101
## 37   37       Nassau County      80578
## 38   38       Monroe County      76325
## 39   39       Putnam County      72766
## 40   40     Columbia County      69105
## 41   41       Walton County      65858
## 42   42      Jackson County      48472
## 43   43      Gadsden County      46017
## 44   44     Suwannee County      43924
## 45   45   Okeechobee County      40572
## 46   46       Hendry County      40127
## 47   47         Levy County      39961
## 48   48       DeSoto County      36399
## 49   49      Wakulla County      31877
## 50   50        Baker County      27785
## 51   51       Hardee County      27228
## 52   52     Bradford County      26979
## 53   53   Washington County      24566
## 54   54       Taylor County      22098
## 55   55       Holmes County      19430
## 56   56      Madison County      18474
## 57   57    Gilchrist County      17615
## 58   58        Dixie County      16437
## 59   59         Gulf County      16055
## 60   60        Union County      15239
## 61   61      Calhoun County      14444
## 62   62     Hamilton County      14269
## 63   63    Jefferson County      14105
## 64   64       Glades County      13363
## 65   65     Franklin County      11736
## 66   66    Lafayette County       8744
## 67   67      Liberty County       8365
The counties are St. Lucie, St. Johns, and DeSoto. I will change the spelling of the counties in the the Fl_counties dataframe so they match the spelling of the counties I scraped from the web
FL_counties$County[FL_counties$County == "Miami-dade County"] <- "Miami-Dade County"
FL_counties$County[FL_counties$County == "St Lucie County"] <- "St. Lucie County"
FL_counties$County[FL_counties$County == "St Johns County"] <- "St. Johns County"
FL_counties$County[FL_counties$County == "De Soto County"] <- "DeSoto County"
Now that Fl_counties has the same names, I can redo the inner join. Then I’ll check to make sure there are 67 unique ranks (1 for each county).
Florida_merged <- inner_join(FloridaDemographics, FL_counties, by = "County")
length(unique(Florida_merged$Rank))
## [1] 67
Time to plot. First let me create the base map.
Fl_base <- ggplot(data = Florida_merged, mapping = aes(x = long, y = lat, group = group))+
  coord_fixed(1.3)+
  geom_polygon(color = "black", fill = "gray")
Now I will use geom_polygon and fill the map with populations for each county.
Fl_base +
  geom_polygon(aes(fill = Population), color = "white")

This map legend uses scientific notation, and I want to create more breaks in the legend. So I’ll load the scales package and use the labels function. I’ll also create some breaks for the population range.
library(scales)
Fl_base +
  geom_polygon(aes(fill = Population), color = "white")+
  scale_fill_gradient(labels = comma, breaks = c(0, 500000, 1000000, 1500000, 2000000, 2500000, 3000000))