Introduction

In this assignment, I used US Census 2000 dataset at county level from Social Explorer to plot population density map in the United States. Population level density is color coded and bubles in grey color. In addition, a non-spatial solution using ggplot is also presented and the comparison between the two approaches is listed in the conclusion.

Data

The dataset used in this homework is the US Census 2000 dataset downloaded from Social Explorer:

https://www.socialexplorer.com/

Results

library(tidyverse)
library(sf)
library(tmap)
library(tigris)
library(spdep)
library(readr)
library(tmaptools)
library(ggplot2)

options(tigris_class = "sf")
county_data <- read_csv("R11929070_SL050.csv")
## Parsed with column specification:
## cols(
##   Geo_NAME = col_character(),
##   Geo_QName = col_character(),
##   Geo_AREALAND = col_double(),
##   Geo_AREAWATR = col_double(),
##   Geo_SUMLEV = col_character(),
##   Geo_GEOCOMP = col_character(),
##   Geo_REGION = col_integer(),
##   Geo_DIVISION = col_integer(),
##   Geo_FIPS = col_character(),
##   Geo_STATE = col_character(),
##   Geo_COUNTY = col_character(),
##   SE_T001_001 = col_integer()
## )
#us_counties <- counties(cb= TRUE)
us_counties <- counties(cb= TRUE, resolution = "20m")

##### cb = FALSE examples
#us_counties_f <- counties(cb= FALSE) # test cb = FALSE
#us_counties <- st_as_sf(us_counties)

county_data <- county_data %>% mutate(GEOID = Geo_FIPS)

comb_data <- us_counties %>% left_join(county_data, by="GEOID")

 
 ##### Filter Out Alaska, Hawaii and Purto Rico: 02 and 15
 
 comb_data <- comb_data %>% filter(!STATEFP %in% c("02","15","72"))
 # tm_shape(comb_data, projection = 2163) + tm_polygons("GEOID", palette = "-RdBu", midpoint = 50) + tmap_options(max.categories = 3233) + tm_bubbles("SE_T001_001", col = "grey30", scale=.5)
 
colnames(comb_data)[which(colnames(comb_data) == "SE_T001_001")] <- "Population"

The cb=TRUE option will retrieve the more generalized TIGER data which will save time and memory. It’s a good idea unless you know you need the more detailed data. When cb = FALSE is used, it takes a lot more time to load the data compared to cb = TRUE. In addition, sometimes I might see error message such as “Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : Opening layer failed” if cb = FALSE is used.

# Adding state border
# us_states <- comb_data %>% 
#   aggregate_map(by = "STATEFP")

us_counties <- comb_data %>% 
  aggregate_map(by = "COUNTYFP")

tm_shape(comb_data, projection = 2163) + tm_polygons("Population", palette = "-RdBu", midpoint = 50, border.col = "grey", border.alpha = .4)  + tm_bubbles("Population", col = "grey30", scale=.5) +
  tm_shape(us_counties) + tm_borders(lwd = 1, col = "black", alpha = 1) + tm_layout(bg.color = "ivory",
title = "US Census 2000",
title.position = c("right", "top"), title.size = 1.1,
legend.position = c(0.85, 0), legend.text.size = 0.75,
legend.width = 0.2)   + tmap_options(max.categories = 3199)

We can see from the map that most of the US population is located on the east coast, midwest and major cities on the west coast according to US Census 2000.

# Non-Spatial Solution
county_data <-  county_data %>% separate(Geo_QName, c("County","State"), sep=",")

population_by_state <- county_data %>% group_by(State) %>% summarize(Total_Population = sum(SE_T001_001))

print(population_by_state)
## # A tibble: 52 x 2
##    State                   Total_Population
##    <chr>                              <int>
##  1 " Alabama"                       4447100
##  2 " Alaska"                         626932
##  3 " Arizona"                       5130632
##  4 " Arkansas"                      2673400
##  5 " California"                   33871648
##  6 " Colorado"                      4301261
##  7 " Connecticut"                   3405565
##  8 " Delaware"                       783600
##  9 " District of Columbia"           572059
## 10 " Florida"                      15982378
## # ... with 42 more rows
ggplot(population_by_state, aes(x=State, y=Total_Population, fill = State)) + geom_col() + theme(axis.text.x = element_text(angle = 90)) + ylab("Total Population") + ggtitle("US Census 2000: Population By State: Non-Spatial Approach")

Conclusion

Spatial approach is primarily defined to deal with spatial data which is directly or indirectly referenced to location on earth. Hence, the strength of this approach is that it processes spatial data and solves location related problems very well. However, it’s weakness is narrowly focused on one type of problem and proessing data is usually longer than non-spatial approach. On the other hand, Non-spatial approach (i.e. tidyverse approach) is generic and can be used to solve many types of problems as we saw in the previous assignments. However, non-spatial approach does not process and store spatial data as easily as the spatial approach.