For this lab, I wanted to try making a map-based visualization, which I had wanted to do ever since we started looking at different visualizations in class. I found a library, mapview, that allows you to make them pretty easily. I later realized ggplot has its own map function, and that mapview is useful for more advanced objects/settings, but I liked the default appearance of mapview and had already installed it anyhow.
The census data I decided to look at was the number of Chinese speakers at home in New Jersey. I used B16001, the one the census speaker recommended, which has data about languages spoken at home at the PUMA level. In particular, I wanted to see which parts of New Jersey saw increases in Chinese speakers at home over recent years. So, I loaded two dataframes, one from 2019 (the most recent year availbale) and one from 2012 (the earliest year with the same number of Geos as 2019). I obtained the names of the datasets from https://api.census.gov/data/2019/acs/acs5/groups/B16001.html and https://api.census.gov/data/2012/acs/acs5/groups/B16001.html.
library(tidycensus)
library(mapview)
#api_key <- "46e185d1689c269117e2a9822355433e2a3b968f"
census_api_key(Sys.getenv("CENSUS_API_KEY"))
## To install your API key for use in future sessions, run this function with `install = TRUE`.
acs_data_2019 = get_acs(geography = "public use microdata area",
variables = "B16001_075E",
state = "NJ",
survey = "acs5",
year = 2019,
geometry=TRUE)
## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##
|
| | 0%
|
|======= | 11%
|
|=============== | 22%
|
|================ | 23%
|
|======================== | 34%
|
|============================== | 43%
|
|====================================== | 54%
|
|============================================= | 65%
|
|================================================= | 70%
|
|========================================================= | 81%
|
|============================================================ | 86%
|
|==================================================================== | 97%
|
|======================================================================| 100%
# head(acs_data_2019)
acs_data_2012 = get_acs(geography = "public use microdata area",
variables = "B16001_066E",
year = 2012,
state = "NJ",
survey = "acs5",
geometry=TRUE)
## Getting data from the 2008-2012 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##
|
| | 0%
|
|= | 1%
|
|== | 3%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 8%
|
|====== | 8%
|
|======= | 10%
|
|======== | 12%
|
|========= | 14%
|
|=========== | 15%
|
|============ | 17%
|
|============= | 19%
|
|============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================= | 24%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|====================== | 31%
|
|======================== | 34%
|
|========================= | 36%
|
|============================ | 39%
|
|============================ | 41%
|
|============================= | 41%
|
|============================== | 43%
|
|=============================== | 45%
|
|================================= | 46%
|
|================================== | 49%
|
|=================================== | 50%
|
|==================================== | 52%
|
|===================================== | 53%
|
|====================================== | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 58%
|
|========================================== | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 63%
|
|============================================= | 65%
|
|============================================== | 65%
|
|=============================================== | 67%
|
|================================================= | 70%
|
|=================================================== | 72%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 78%
|
|========================================================= | 81%
|
|========================================================== | 83%
|
|============================================================ | 86%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|================================================================ | 91%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|=================================================================== | 96%
|
|===================================================================== | 98%
|
|======================================================================| 100%
Now I use the mapview package to plot the 2019 data on a map showing NJ PUMAs:
mapview(acs_data_2019, zcol="estimate")
which looks pretty neat. We can see there are higher concentrations of Chinese speakers towards the north-central region of New Jersey, especially near Trenton. There are a good number in regions close to New York City, as well. Now let’s look at how the situation was 7 years ago:
mapview(acs_data_2012, zcol="estimate")
Without looking super closely, the regions of higher Chinese speakers appear roughly the same. Now, I want to plot the change in number of speakers from 2012 to 2019 to see if that provides any extra visual insight.
To obtain the estimated change in Chinese speakers at home, I took the difference of the estimate column in the 2019 data and the estimate column in the 2012 data. For this, I found I could simply isolate each column using “$” and subtract the two. I also found that the two data tables had differently ordered rows, so I first had to sort them by GEOID before I did the subtraction. I stored the result as another column in the 2019 dataframe labelled “change_from_2012”.
sorted_2019 <- acs_data_2019[order(acs_data_2019$GEOID),]
sorted_2012 <- acs_data_2012[order(acs_data_2012$GEOID),]
# head(sorted_2019)
# head(sorted_2012)
sorted_2019$change_from_2012 <- sorted_2019$estimate - sorted_2012$estimate
print(sorted_2019$change_from_2012)
## [1] 7 -228 89 133 2335 147 137 -15 11 316 70 16 -236 -399 2333
## [16] 124 139 1383 107 65 -708 -970 -698 -705 299 203 -247 1918 1035 346
## [31] 227 -563 -119 202 -527 -61 -27 -12 91 225 109 -149 131 -663 -132
## [46] 111 2013 -472 -535 -448 95 86 -111 21 87 978 601 -367 -139 732
## [61] 60 224 -251 -434 477 184 269 277 1014 -151 10 25 258
Finally, I used mapview to plot the column:
mapview(sorted_2019, zcol="change_from_2012")
So, while I was initially looking for regions where the number of speakers particularly increased (since I expected a general increase due to population growth), there were also some regions where the Chinese-speaking population actually decreased a bit, which was interesting. These include a cluster near central jersey/around Middlesex county, as well as a few areas towards the very north of NJ. In between them, however, are a few regions of yellow-green where there was the highest increase, and not necessarily the ones that had the highest number in 2019 (which is neat!).