In this assignment, we’ll explore a dataset of environmental features in the City of Chicago. Along the way, you’ll hone in on key themes learned from the first three weeks of our class. You’ll specifically review:
You will be on your way to mastering the basics of sf,
tmap, and basics of the tidyverse. It’s
helpful if you’ve read through Chapter 3 on Attribute Data
Operations in our course textbook, Geocomputation in R.
This assignment uses the Chives dataset from chichives.com. ChiVes is a collaborative project that integrates and visualizes data; a handful of key metrics―tree cover, air pollution estimates, heat island effects, traffic volumes, and social vulnerability index― help to reveal where in the city people face particular challenges as we work towards a healthier Chicago. You will need to review the Data page to understand the variable names and definitions to complete this assignment.
Download and open the course files (including the .geojson file and this RMarkdown file) on your system, or RStudio Cloud environment. Open the RmD file in RStudio. If you haven’t already, set your working directory.
setwd("~/RPractice/Assignment 1")
Next, load up all the libraries needed for this assignment.
library(sf)
library(spData)
library(tidyverse)
library(tmap)
Now you’re ready to load in the data. Make sure you’ve loaded the geojson in the same file directory as your RmD document.
Use the sf library function st_read to read
in the geojson file. This spatial dataset was downloaded directly from
Chives.
chives <- st_read("chives-data.geojson")
## Reading layer `chives-base' from data source
## `C:\Users\luket\OneDrive\Documents\RPractice\Assignment 1\chives-data.geojson'
## using driver `GeoJSON'
## Simple feature collection with 801 features and 41 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.94025 ymin: 41.64429 xmax: -87.52416 ymax: 42.02392
## Geodetic CRS: WGS 84
Over the past few weeks, we’ve been reading about, coding through examples of, and practicing attribute data wrangling. First we need to inspect and examine the data carefully, looking for interesting trends.
Use the head() function to inspect the data. (Uncomment
on your own machine.) There are many other functions to use, if you
prefer, like glimpse.
head(chives)
## Simple feature collection with 6 features and 41 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.70465 ymin: 41.7943 xmax: -87.63535 ymax: 41.85934
## Geodetic CRS: WGS 84
## geoid simpson specCt uniqSp ndvi acs_population svi_pecentile
## 1 17031291600 0.667 3 3 0.192543 976 63.9
## 2 17031580200 0.500 2 2 0.226763 3784 80.4
## 3 17031590500 0.973 78 52 0.405935 1842 80.7
## 4 17031590700 0.667 3 3 0.188116 3393 81.1
## 5 17031610400 NA NA NA 0.210973 1981 93.2
## 6 17031843800 0.667 3 3 0.247428 1482 82.6
## trees_n trees_area trees_den trees_crown_den zip_code asthma_5yr_avg
## 1 453 25969 0.002056 11.8 60608 158.5
## 2 1588 93486 0.002176 12.8 60632 168.0
## 3 2093 158883 0.003658 27.8 60609 195.5
## 4 1330 68804 0.001529 7.9 60609 195.5
## 5 339 15797 0.002077 9.7 60609 195.5
## 6 2582 163319 0.001972 12.5 60609 195.5
## asthma_age_adj_rate urban_flood_suscep heatisl nn_q3_pm2_5 logtraf
## 1 82.5 9.5 70.9 10.3 4.2
## 2 55.1 9.5 83.9 10.8 5.9
## 3 93.0 9.4 82.8 10.6 5.3
## 4 93.0 9.2 84.5 10.8 5.3
## 5 93.0 9.2 90.9 10.5 5.0
## 6 93.0 9.3 74.2 10.5 4.5
## city_property_tot hardship pct_white pct_black pct_nativeam pct_asian
## 1 0 21.3 66.6 9.3 0.0 1.4
## 2 1 46.6 43.9 0.3 0.7 12.3
## 3 11 27.7 36.7 2.3 1.4 27.3
## 4 6 37.4 41.4 2.6 0.2 20.2
## 5 0 40.0 52.6 2.9 2.5 0.0
## 6 167 22.7 25.8 66.4 0.0 7.3
## pct_other prop_comm prop_plndev prop_resdntl CLRRt AdAsthRt
## 1 0 0 0 0 25.7 10
## 2 0 1 0 0 26.2 11.5
## 3 0 11 0 0 21.1 NA
## 4 0 0 6 0 21.1 NA
## 5 0 0 0 0 36.4 10.5
## 6 0 41 0 126 36.4 10.5
## PhysRate LngCncRt CancerRt HypRt logtrafScld aland
## 1 25.199999999999999 91.37164 616.1548 29.699999999999999 61.76471 220340
## 2 35.799999999999997 44.32203 366.8973 15.800000000000001 86.76471 729817
## 3 24 46.11499 439.7247 32.700000000000003 77.94118 572106
## 4 24 46.11499 439.7247 32.700000000000003 77.94118 869830
## 5 32 67.22063 474.5761 23.100000000000001 73.52941 163180
## 6 32 67.22063 474.5761 23.100000000000001 66.17647 1309516
## awater intptlat intptlon commarea_n community
## 1 0 +41.8576415 -087.6922159 29 NORTH LAWNDALE
## 2 0 +41.8267718 -087.6993478 58 BRIGHTON PARK
## 3 24222 +41.8248978 -087.6804596 59 MCKINLEY PARK
## 4 12614 +41.8260539 -087.6627888 59 MCKINLEY PARK
## 5 0 +41.8103984 -087.6674491 61 NEW CITY
## 6 0 +41.8016566 -087.6404756 61 NEW CITY
## geometry
## 1 MULTIPOLYGON (((-87.68817 4...
## 2 MULTIPOLYGON (((-87.69463 4...
## 3 MULTIPOLYGON (((-87.67509 4...
## 4 MULTIPOLYGON (((-87.65598 4...
## 5 MULTIPOLYGON (((-87.66506 4...
## 6 MULTIPOLYGON (((-87.63536 4...
How many census tracts do we have in this dataset? Use the
dim() function to check.
dim(chives)
## [1] 801 42
There are 801 rows, or census tracts, and 42 columns, or field attributes in this dataset.
Let’s quickly map an interesting variable in our dataset using
tmap. We’ll explore the tree_crown_den
attribute, which shows us the proportion of a census tract covered by
tree crowns, as measured from LiDar data in 2010. Review the
Chives website to learn more about this variable.
In this map we’ll use quantile breaks to classify the variable of interest; a red-yellow-green diverging color palette; and will also rename the variable in the legend for easier viewing.
tm_shape(chives) + tm_fill("trees_crown_den",
style = "quantile",
palette = "RdYlGn",
title = "Tree Coverage %")
Let’s explore data breaks further. Here’s one way of selecting all census tracts that have less than 10% of its area covered by trees:
fewTrees <- chives %>%
filter(trees_crown_den < 10)
#head(fewTrees)
82 features were selected, meaning 82 out of all census tracts in the city have less than 10% of their areas covered by trees.
What’s another way to do this (ie. query your data)? Use a different selection option based on your readings and practice. The end output also identify tracts with less than 10% areas covered by trees.
### ADD YOUR CODE HERE ###
fewTrees <- chives %>%
subset(trees_crown_den < 10)
Let’s quickly map these tracts to see where they show up. We place
our original dataset, chives, as the first layer. We don’t pass
any parameters to the tm_fill() function to force a default
gray background. Then we add the new filtered dataset,
fewTrees, coloring it brown.
tm_shape(chives) + tm_fill() +
tm_shape(fewTrees) + tm_fill(col = "brown")
The subset of data that we have includes all of the original data.
Let’s map social vulnerability measures for the fewTrees
dataset. Higher SVI indicates higher vulnerability.
tm_shape(chives) + tm_fill() +
tm_shape(fewTrees) + tm_fill("svi_pecentile")
Let’s say we want to identify some key characteristics of areas without too many trees in the city. Using our same filter above (or whatever version you’d like to use), we can add an additional option. In the example below, we select all areas with <10% tree crown density that have high social vulnerability (ie. the social vulnerability index percentile is above 60%). Then, we map it.
fewTreesSVI <- chives %>%
filter(trees_crown_den < 10 & svi_pecentile > 60)
tm_shape(chives) + tm_fill() +
tm_shape(fewTreesSVI) + tm_fill(col = "brown")
We can see that several tracts could be found in areas with fewer trees, and higher social vulnerability. Explore another paired query, choosing a new variable (instead of social vulnerability). You can explore the data on your own, read the metadata further on the original data website, etc. to identify a reasonable cutoff.
Generate and run the query (with 2 criteria).
### ADD YOUR CODE HERE ###
AsianHardship <- chives %>%
filter(hardship > 20 & pct_asian > 50)
tm_shape(chives) + tm_fill() +
tm_shape(AsianHardship) + tm_fill(col = "brown")
Explain why you selected this query, and briefly describe any interesting patterns you discovered.
ADD YOUR DESCRIPTION HERE
Let’s take data wrangling a bit further and start to aggregate our
dataset. We will use group_by and summarise
functions from the tidyverse library we’re using.
(Technically, they’re from the dplyr package within the
tidyverse.)
Group_by tells the code which variable we should
aggregate on. For example, let’s say we want to know how many trees
exist in each community area in Chicago. We’ll (1) group_by
the “community” field and (2) summarise using the total sum
of trees. We’ll assign the total number of trees to equal a new
variable, called “Trees”. We may also want to know how many tracts are
included in each community, so are interested the total n
of tracts. The code looks like this:
treeCom <- chives %>%
group_by(community) %>%
summarize(Trees = sum(trees_n),
TotTracts = n())
head(treeCom)
## Simple feature collection with 6 features and 3 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -87.80623 ymin: 41.72865 xmax: -87.62886 ymax: 41.97595
## Geodetic CRS: WGS 84
## # A tibble: 6 × 4
## community Trees TotTracts geometry
## <chr> <int> <int> <POLYGON [°]>
## 1 ALBANY PARK 16496 11 ((-87.69782 41.9654, -87.69871 41.9662, -87.69…
## 2 ARCHER HEIGHTS 7794 5 ((-87.72337 41.80055, -87.72315 41.80055, -87.…
## 3 ARMOUR SQUARE 4866 5 ((-87.62978 41.83852, -87.62979 41.83926, -87.…
## 4 ASHBURN 30954 8 ((-87.72144 41.73486, -87.72142 41.73486, -87.…
## 5 AUBURN GRESHAM 28200 15 ((-87.63406 41.73594, -87.63406 41.73597, -87.…
## 6 AUSTIN 54572 24 ((-87.77536 41.902, -87.77535 41.90163, -87.77…
And we can map it!
tm_shape(treeCom) + tm_fill(col = "Trees")
Note that this dataset no longer shows tracts, since we’re actually working with communities. You can check the dimension of this data frame to be sure:
dim(treeCom)
## [1] 77 4
Take this aggregation further. Add more to the code chunk above to generate a community area dataset that includes at least five variables of interest. Note that some variables you’ll need to sum (like total number of trees), whereas others will need to be averaged. You should use the variables included in the original Chives dataset.
Your code should also output a table view of the data.
### ADD YOUR CODE HERE ###
treeCom <- chives %>%
group_by(community) %>%
summarize(Trees = sum(trees_n),
SVI = sum(svi_pecentile), Asthma = mean(asthma_5yr_avg, na.rm = TRUE), TreeDensity = mean(trees_den), AsianPopulation = sum(pct_asian))
head(treeCom)
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -87.80623 ymin: 41.72865 xmax: -87.62886 ymax: 41.97595
## Geodetic CRS: WGS 84
## # A tibble: 6 × 7
## community Trees SVI Asthma TreeDens…¹ Asian…² geometry
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <POLYGON [°]>
## 1 ALBANY PARK 16496 794. 143. 0.00326 177. ((-87.69782 41.9654, -87…
## 2 ARCHER HEIGHTS 7794 379. 168 0.00179 31.7 ((-87.72337 41.80055, -8…
## 3 ARMOUR SQUARE 4866 455. 102. 0.00192 287. ((-87.62978 41.83852, -8…
## 4 ASHBURN 30954 481. 240. 0.00252 6.6 ((-87.72144 41.73486, -8…
## 5 AUBURN GRESHAM 28200 1170 382. 0.00289 5.7 ((-87.63406 41.73594, -8…
## 6 AUSTIN 54572 2088. 376. 0.00307 11.8 ((-87.77536 41.902, -87.…
## # … with abbreviated variable names ¹TreeDensity, ²AsianPopulation
Pro Tip: If there are null or NA variables in the
census tracts, you won’t get a mean value in aggregate. To force an
average when some areas may have values (and others remain NA/excluded)
add na.rm = TRUE. For example:
mean(trees, na.rm=TRUE).
Generate three maps of three different variables at the community
area level using tmap. You may get bonus points (up to 5pts
total) for going above and beyond in working with thoughtful
tmap parameters! Review the in-class tutorial from our
first week for more ideas, and/or explore the online textbook we use for
more ideas.
### ADD YOUR CODE HERE ###
tm_shape(treeCom) + tm_fill(col = "AsianPopulation")
tm_shape(treeCom) + tm_fill(col = "Asthma")
tm_shape(treeCom) + tm_fill(col = "TreeDensity")
What interesting trends did you find? Discuss in 1-2 paragraphs.
ADD YOUR DESCRIPTION HERE Like the previous map of Asian population, This one shows the highest population density in the area of Chinatown. Again, this is to be expected. There is also a large population in West Ridge. It is also quite interesting that there are communities with little to no Asian Population, such as Garfield Ridge, where Midway Airport is located. In regards to the Asthma map, there is a large amount of asthma cases in the southern area, including communities such as South Deering, South Chicago and Hegewisch. I don’t know why this area specifically, but I suspect it may have to do with the next map. The last map I chose to create was one based on the tree density statistic. The most dense areas are Forest Glenn and Beverly. Interstingly, the Far Southeast area, where South Deering i located, has a pretty average tree density. Since Asthma can be triggered by pollen from trees, this may be part of the re3ason for high asthma cases.
Render your document by clicking on the “Knit” option in RStudio. Upload the .html file to your assignment submission!
.