Assignment #1

Overview

In this assignment, we’ll explore a dataset of environmental features in the City of Chicago. Along the way, you’ll hone in on key themes learned from the first three weeks of our class. You’ll specifically review:

Data Inspection
Basic Mapping in R
Attribute Queries
Aggregating Data

You will be on your way to mastering the basics of sf, tmap, and basics of the tidyverse. It’s helpful if you’ve read through Chapter 3 on Attribute Data Operations in our course textbook, Geocomputation in R.

This assignment uses the Chives dataset from chichives.com. ChiVes is a collaborative project that integrates and visualizes data; a handful of key metrics―tree cover, air pollution estimates, heat island effects, traffic volumes, and social vulnerability index― help to reveal where in the city people face particular challenges as we work towards a healthier Chicago. You will need to review the Data page to understand the variable names and definitions to complete this assignment.

Environment Setup

Download and open the course files (including the .geojson file and this RMarkdown file) on your system, or RStudio Cloud environment. Open the RmD file in RStudio. If you haven’t already, set your working directory.

setwd("~/RPractice/Assignment 1")

Next, load up all the libraries needed for this assignment.

library(sf)
library(spData)
library(tidyverse)
library(tmap)

Now you’re ready to load in the data. Make sure you’ve loaded the geojson in the same file directory as your RmD document.

Use the sf library function st_read to read in the geojson file. This spatial dataset was downloaded directly from Chives.

chives <- st_read("chives-data.geojson")

## Reading layer `chives-base' from data source 
##   `C:\Users\luket\OneDrive\Documents\RPractice\Assignment 1\chives-data.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 801 features and 41 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.94025 ymin: 41.64429 xmax: -87.52416 ymax: 42.02392
## Geodetic CRS:  WGS 84

Explore Data

Over the past few weeks, we’ve been reading about, coding through examples of, and practicing attribute data wrangling. First we need to inspect and examine the data carefully, looking for interesting trends.

Use the head() function to inspect the data. (Uncomment on your own machine.) There are many other functions to use, if you prefer, like glimpse.

head(chives)

## Simple feature collection with 6 features and 41 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.70465 ymin: 41.7943 xmax: -87.63535 ymax: 41.85934
## Geodetic CRS:  WGS 84
##         geoid simpson specCt uniqSp     ndvi acs_population svi_pecentile
## 1 17031291600   0.667      3      3 0.192543            976          63.9
## 2 17031580200   0.500      2      2 0.226763           3784          80.4
## 3 17031590500   0.973     78     52 0.405935           1842          80.7
## 4 17031590700   0.667      3      3 0.188116           3393          81.1
## 5 17031610400      NA     NA     NA 0.210973           1981          93.2
## 6 17031843800   0.667      3      3 0.247428           1482          82.6
##   trees_n trees_area trees_den trees_crown_den zip_code asthma_5yr_avg
## 1     453      25969  0.002056            11.8    60608          158.5
## 2    1588      93486  0.002176            12.8    60632          168.0
## 3    2093     158883  0.003658            27.8    60609          195.5
## 4    1330      68804  0.001529             7.9    60609          195.5
## 5     339      15797  0.002077             9.7    60609          195.5
## 6    2582     163319  0.001972            12.5    60609          195.5
##   asthma_age_adj_rate urban_flood_suscep heatisl nn_q3_pm2_5 logtraf
## 1                82.5                9.5    70.9        10.3     4.2
## 2                55.1                9.5    83.9        10.8     5.9
## 3                93.0                9.4    82.8        10.6     5.3
## 4                93.0                9.2    84.5        10.8     5.3
## 5                93.0                9.2    90.9        10.5     5.0
## 6                93.0                9.3    74.2        10.5     4.5
##   city_property_tot hardship pct_white pct_black pct_nativeam pct_asian
## 1                 0     21.3      66.6       9.3          0.0       1.4
## 2                 1     46.6      43.9       0.3          0.7      12.3
## 3                11     27.7      36.7       2.3          1.4      27.3
## 4                 6     37.4      41.4       2.6          0.2      20.2
## 5                 0     40.0      52.6       2.9          2.5       0.0
## 6               167     22.7      25.8      66.4          0.0       7.3
##   pct_other prop_comm prop_plndev prop_resdntl CLRRt AdAsthRt
## 1         0         0           0            0  25.7       10
## 2         0         1           0            0  26.2     11.5
## 3         0        11           0            0  21.1       NA
## 4         0         0           6            0  21.1       NA
## 5         0         0           0            0  36.4     10.5
## 6         0        41           0          126  36.4     10.5
##             PhysRate LngCncRt CancerRt              HypRt logtrafScld   aland
## 1 25.199999999999999 91.37164 616.1548 29.699999999999999    61.76471  220340
## 2 35.799999999999997 44.32203 366.8973 15.800000000000001    86.76471  729817
## 3                 24 46.11499 439.7247 32.700000000000003    77.94118  572106
## 4                 24 46.11499 439.7247 32.700000000000003    77.94118  869830
## 5                 32 67.22063 474.5761 23.100000000000001    73.52941  163180
## 6                 32 67.22063 474.5761 23.100000000000001    66.17647 1309516
##   awater    intptlat     intptlon commarea_n      community
## 1      0 +41.8576415 -087.6922159         29 NORTH LAWNDALE
## 2      0 +41.8267718 -087.6993478         58  BRIGHTON PARK
## 3  24222 +41.8248978 -087.6804596         59  MCKINLEY PARK
## 4  12614 +41.8260539 -087.6627888         59  MCKINLEY PARK
## 5      0 +41.8103984 -087.6674491         61       NEW CITY
## 6      0 +41.8016566 -087.6404756         61       NEW CITY
##                         geometry
## 1 MULTIPOLYGON (((-87.68817 4...
## 2 MULTIPOLYGON (((-87.69463 4...
## 3 MULTIPOLYGON (((-87.67509 4...
## 4 MULTIPOLYGON (((-87.65598 4...
## 5 MULTIPOLYGON (((-87.66506 4...
## 6 MULTIPOLYGON (((-87.63536 4...

How many census tracts do we have in this dataset? Use the dim() function to check.

dim(chives)

## [1] 801  42

There are 801 rows, or census tracts, and 42 columns, or field attributes in this dataset.

Let’s quickly map an interesting variable in our dataset using tmap. We’ll explore the tree_crown_den attribute, which shows us the proportion of a census tract covered by tree crowns, as measured from LiDar data in 2010. Review the Chives website to learn more about this variable.

In this map we’ll use quantile breaks to classify the variable of interest; a red-yellow-green diverging color palette; and will also rename the variable in the legend for easier viewing.

tm_shape(chives) + tm_fill("trees_crown_den", 
                           style = "quantile", 
                           palette = "RdYlGn",
                           title = "Tree Coverage %")

Let’s explore data breaks further. Here’s one way of selecting all census tracts that have less than 10% of its area covered by trees:

fewTrees <- chives %>% 
  filter(trees_crown_den < 10)

#head(fewTrees)

82 features were selected, meaning 82 out of all census tracts in the city have less than 10% of their areas covered by trees.

Question 1

What’s another way to do this (ie. query your data)? Use a different selection option based on your readings and practice. The end output also identify tracts with less than 10% areas covered by trees.

  ### ADD YOUR CODE HERE ###

fewTrees <- chives %>%
  subset(trees_crown_den < 10)

Let’s quickly map these tracts to see where they show up. We place our original dataset, chives, as the first layer. We don’t pass any parameters to the tm_fill() function to force a default gray background. Then we add the new filtered dataset, fewTrees, coloring it brown.

tm_shape(chives) + tm_fill() + 
  tm_shape(fewTrees) + tm_fill(col = "brown")

The subset of data that we have includes all of the original data. Let’s map social vulnerability measures for the fewTrees dataset. Higher SVI indicates higher vulnerability.

tm_shape(chives) + tm_fill() + 
  tm_shape(fewTrees) + tm_fill("svi_pecentile")

Let’s say we want to identify some key characteristics of areas without too many trees in the city. Using our same filter above (or whatever version you’d like to use), we can add an additional option. In the example below, we select all areas with <10% tree crown density that have high social vulnerability (ie. the social vulnerability index percentile is above 60%). Then, we map it.

fewTreesSVI <- chives %>% 
  filter(trees_crown_den < 10 & svi_pecentile > 60)

tm_shape(chives) + tm_fill() + 
  tm_shape(fewTreesSVI) + tm_fill(col = "brown")

Question 2

We can see that several tracts could be found in areas with fewer trees, and higher social vulnerability. Explore another paired query, choosing a new variable (instead of social vulnerability). You can explore the data on your own, read the metadata further on the original data website, etc. to identify a reasonable cutoff.

Generate and run the query (with 2 criteria).

  ### ADD YOUR CODE HERE ###  


AsianHardship <- chives %>% 
  filter(hardship > 20 & pct_asian > 50)

tm_shape(chives) + tm_fill() + 
  tm_shape(AsianHardship) + tm_fill(col = "brown")

Question 3

Explain why you selected this query, and briefly describe any interesting patterns you discovered.

ADD YOUR DESCRIPTION HERE

As an Asian-American, hardships of any kind in Asian dominated communities are of utmost concern to me. By looking at the cross section of high population density (>50), and high hardship rates (> 20), I was able to identify areas where there may be issues of all kind. An interesting pattern is the location of these places. All 6 of the outlined regions are in the same spot of Chicago: Chinatown. I knew this was Chinatown because the data listed the communities as bBridgeport and Armor Square, which is where Chinatown is located. Although Chinatown is only a part of this region, it is a large proportion of it. Chinatown is a very important place to me, being a place I visited frequently as a kid. This correlation between hardship and Asian population being right here makes sense, but is still eye-opening.

Let’s take data wrangling a bit further and start to aggregate our dataset. We will use group_by and summarise functions from the tidyverse library we’re using. (Technically, they’re from the dplyr package within the tidyverse.)

Group_by tells the code which variable we should aggregate on. For example, let’s say we want to know how many trees exist in each community area in Chicago. We’ll (1) group_by the “community” field and (2) summarise using the total sum of trees. We’ll assign the total number of trees to equal a new variable, called “Trees”. We may also want to know how many tracts are included in each community, so are interested the total n of tracts. The code looks like this:

treeCom <- chives %>% 
  group_by(community) %>%
  summarize(Trees = sum(trees_n),
            TotTracts = n())

head(treeCom)

## Simple feature collection with 6 features and 3 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.80623 ymin: 41.72865 xmax: -87.62886 ymax: 41.97595
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 4
##   community      Trees TotTracts                                        geometry
##   <chr>          <int>     <int>                                   <POLYGON [°]>
## 1 ALBANY PARK    16496        11 ((-87.69782 41.9654, -87.69871 41.9662, -87.69…
## 2 ARCHER HEIGHTS  7794         5 ((-87.72337 41.80055, -87.72315 41.80055, -87.…
## 3 ARMOUR SQUARE   4866         5 ((-87.62978 41.83852, -87.62979 41.83926, -87.…
## 4 ASHBURN        30954         8 ((-87.72144 41.73486, -87.72142 41.73486, -87.…
## 5 AUBURN GRESHAM 28200        15 ((-87.63406 41.73594, -87.63406 41.73597, -87.…
## 6 AUSTIN         54572        24 ((-87.77536 41.902, -87.77535 41.90163, -87.77…

And we can map it!

tm_shape(treeCom) + tm_fill(col = "Trees")

Note that this dataset no longer shows tracts, since we’re actually working with communities. You can check the dimension of this data frame to be sure:

dim(treeCom)

## [1] 77  4

Question 4

Take this aggregation further. Add more to the code chunk above to generate a community area dataset that includes at least five variables of interest. Note that some variables you’ll need to sum (like total number of trees), whereas others will need to be averaged. You should use the variables included in the original Chives dataset.

Your code should also output a table view of the data.

  ### ADD YOUR CODE HERE ###  
treeCom <- chives %>% 
  group_by(community) %>%
  summarize(Trees = sum(trees_n),
            SVI = sum(svi_pecentile), Asthma = mean(asthma_5yr_avg, na.rm = TRUE), TreeDensity = mean(trees_den), AsianPopulation = sum(pct_asian)) 

head(treeCom)

## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.80623 ymin: 41.72865 xmax: -87.62886 ymax: 41.97595
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 7
##   community      Trees   SVI Asthma TreeDens…¹ Asian…²                  geometry
##   <chr>          <int> <dbl>  <dbl>      <dbl>   <dbl>             <POLYGON [°]>
## 1 ALBANY PARK    16496  794.   143.    0.00326   177.  ((-87.69782 41.9654, -87…
## 2 ARCHER HEIGHTS  7794  379.   168     0.00179    31.7 ((-87.72337 41.80055, -8…
## 3 ARMOUR SQUARE   4866  455.   102.    0.00192   287.  ((-87.62978 41.83852, -8…
## 4 ASHBURN        30954  481.   240.    0.00252     6.6 ((-87.72144 41.73486, -8…
## 5 AUBURN GRESHAM 28200 1170    382.    0.00289     5.7 ((-87.63406 41.73594, -8…
## 6 AUSTIN         54572 2088.   376.    0.00307    11.8 ((-87.77536 41.902, -87.…
## # … with abbreviated variable names ¹TreeDensity, ²AsianPopulation

Pro Tip: If there are null or NA variables in the census tracts, you won’t get a mean value in aggregate. To force an average when some areas may have values (and others remain NA/excluded) add na.rm = TRUE. For example: mean(trees, na.rm=TRUE).

Question 5

Generate three maps of three different variables at the community area level using tmap. You may get bonus points (up to 5pts total) for going above and beyond in working with thoughtful tmap parameters! Review the in-class tutorial from our first week for more ideas, and/or explore the online textbook we use for more ideas.

  ### ADD YOUR CODE HERE ###  
tm_shape(treeCom) + tm_fill(col = "AsianPopulation")

tm_shape(treeCom) + tm_fill(col = "Asthma")

tm_shape(treeCom) + tm_fill(col = "TreeDensity")

Question 6

What interesting trends did you find? Discuss in 1-2 paragraphs.

ADD YOUR DESCRIPTION HERE Like the previous map of Asian population, This one shows the highest population density in the area of Chinatown. Again, this is to be expected. There is also a large population in West Ridge. It is also quite interesting that there are communities with little to no Asian Population, such as Garfield Ridge, where Midway Airport is located. In regards to the Asthma map, there is a large amount of asthma cases in the southern area, including communities such as South Deering, South Chicago and Hegewisch. I don’t know why this area specifically, but I suspect it may have to do with the next map. The last map I chose to create was one based on the tree density statistic. The most dense areas are Forest Glenn and Beverly. Interstingly, the Far Southeast area, where South Deering i located, has a pretty average tree density. Since Asthma can be triggered by pollen from trees, this may be part of the re3ason for high asthma cases.

Render your document by clicking on the “Knit” option in RStudio. Upload the .html file to your assignment submission!