Before loading into R Studio, ensure that your shapefile is placed in your folder to be used throughout the lab. I named mine “Lab2” for simplicity. Ensure that you have loaded the proper packages:
-tidyverse
-sf
-ggspatial
Use the following code to load the packages into your project.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(ggspatial)
Using the sf package, the following code can be used to load your shapefile into the project.
knox <- st_read('knox_acs.shp')
## Reading layer `knox_acs' from data source
## `/Users/rileyspeas/Desktop/Lab2/knox_acs.shp' using driver `ESRI Shapefile'
## Simple feature collection with 112 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -84.27347 ymin: 35.79381 xmax: -83.65092 ymax: 36.18648
## Geodetic CRS: NAD83
This may sound more daunting than it truly is. First, let’s look at the columns in our data. We can calculate this fraction by dividing the number of 18-21 year olds, which is already provided to us, into the total population for each census tract. Calculating this by hand would take much longer than if we wrote code for it instead.
Before running or writing the code for this task, we need to rename the two column names.
To rename Age18_21, use the following.
Age18_21 <- knox$AGE18_21
To rename Total_Pop, use the following.
Total_Pop <- knox$TOTAL_POP
To use this code, we will need to make sure we have the Dyplr package. Luckily, this is included with the Tidyverse package, so we will not need to download anything additional.
To create this new column, use the following code.
knox <- mutate(knox, fraction=round((Age18_21/Total_Pop)*100, 1))
Let’s rename that column something more specific… let’s change that to college_Frac.
College_Frac <- knox$fraction
Now that we have this data (and it is neatly rounded), let’s make a map out of it. Specifically, we want a choropleth map, since it displays concentration across different censuses in a colorful way. Luckily, the Ggplot package makes this easy!
Use the following code to create this map.
ggplot(knox) +
annotation_map_tile() +
geom_sf(aes(fill=College_Frac), alpha= 0.8) +
scale_fill_distiller(palette = 'Blues') +
labs(fill = "Proportion of 18-21 Year Olds per Tract\n(%)") +
coord_sf(datum= NA)
## Loading required namespace: raster
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
## (status 2 uses the sf package in place of rgdal)
## Zoom: 9
#annotation_map_tile() is used to automatically determine the zoom of the map. Geom_sf and the following is used to determine what data will be filling the map, and alpha decides how transparent the map will be. I wanted it to be pretty visible. scale_fill_distiller determines the colors and aesthetics. coord_sf and the following removes the latitude and longitude coordinates from the map
To create a histogram, a simple initial function is needed. However, the specifications can be a little trickier. First, we need to rename the median household income column.
Use the following code to rename the column.
median_household_inc <- knox$MEDHHINC
After renaming the column, creating the histogram is much easier.
Use the following code to create this histogram.
hist(median_household_inc, main= "Median Household Income Across Census Tracts", xlab = "Median Household Income in Thousands", xlim = c(9000, 170000),ylab= ("Distribution"), col= "darkmagenta", freq = FALSE)
This distribution is a skewed right distribution. Most of the data is concentrated towards the left side of the histogram, implying that most of the median household income is less than $50,000.
###To create a scatterplot with Median Household Income and Median Home Value
Before creating the scatterplot, we need to define our x and y variables.
Use the following code to define the x and y variables.
x <- median_household_inc
y <- knox$MEDHVALUE
Now we can use the proper code to creat our scatterplot.
Use the following code to create the scatterplot.
plot(x, y, main = "Median Household Income vs Median Home Value", xlab = "Median Household Income", ylab = "Median Home Value", ylim=c(65000,475300), pch= 19, frame= FALSE)
#Define your scale to for the y-axis using the ylim function. To create the scatterplot shape, define the pch as 19.
It appears that this scatterplot follows a generally exponential line. As the median household income increases, so does the median home value.
###To calculate correlation coefficients
We want to calculate several different correlation coefficients: between median household income, median home value, and the percent of 18-21 year olds.
Before beginning, make sure to rename several variables to something more accessible.
medianinc <- knox$MEDHHINC
college_frac <- knox$fraction
medianhome <- knox$MEDHVALUE
Now we can begin calculating the correlation coefficients.
Use the following code.
corr_coef1 <- cor(medianinc, medianhome, use = 'complete.obs')
corr_coef2 <- cor(medianhome, college_frac, use = 'complete.obs')
corr_coef3 <- cor(college_frac, medianinc, use= 'complete.obs')
The correlation coefficient between median income and median home value is 0.8389087. The correlation coefficient between median home value and the fraction of 18-21 year olds is -0.1974105. The correlation coefficient between the fraction of 18-21 year olds and median income is -0.3101236.