Methods 1, Week 4

Download more data

Download the data in the additional data folder and save it in your part1/data/raw folder.

Outline

Research Journal
Readings Discussion
Homework questions and overview
Importing excel files
Paste function
In-class exercise
Homework

Research Journal

Readings Discussion

Who You Calling Hispanic

Homework Questions

What district has the highest student poverty rate?
What county is that district in?
What county has the highest poverty rate? (you should use the county poverty dataset created in script 1 to answer this question)
What school district has the lowest student poverty rate in Montgomery County?
What county has the largest range in student poverty rate?
What county has the smallest range in student poverty rate?

Importing excel files

Use the read_excel() function to import excel files:

We’ll use the Import Dataset user interface to learn about read_excel

ALWAYS copy the code to import your data into your script if you use this interface

Import Dataset

In the Files pane, right-click the data you want to read in:

Look at the data

Import Options

Use the Import Options to write the code to select the Sheet and Rows you want
Copy the code into your script

Import Options

Copy the code into your script

library(tidyverse)
library(readxl)

# import ny  asthma data
raw_asthma <- read_excel("data/raw/Asthma-SubCountyData.xlsx", 
                         sheet = "AD21", skip = 6)

Paste

Sometimes it is useful to concatenate strings with the paste() function:

paste0("Paste0", "has no spaces")

[1] "Paste0has no spaces"

paste("paste()", "adds a space", "between each item")

[1] "paste() adds a space between each item"

You can use this inside a mutate. For instance, if you want to add the word County after each county name in your asthma data.

asthma <- raw_asthma |> 
  mutate(County = paste(County, "County"))

In-class exercise - Join Practice

We’ll add 3 more datasets to our New York county poverty dataframe:

Bank-Owned ATMs by Zip Code
Lottery Retailers
Asthma Hospitalizations
- The Numerator column is the number of Asthma Hospitalizations. Notice that Numerator = “s” in some zip codes. You will need to use the filter() function to remove rows where Numerator = “s” before you add up the asthma hospitalizations for each county

Open your part1 project and create a new script, called ny_county_dataset. If you haven’t already, download the data in the additional data folder and save it in your part1/data/raw folder.

Aggregate & Join

Read in your processed county poverty dataset

For each of the three datasets:

Read in the dataset, look at it
Use summarise()* and group_by() to aggregate the data to County-level
Find the column you can join by
Use left_join() to join each dataset to your New York poverty dataframe
Calculate rates using the county population:
- asthma_hosp_per_10k_people = (asthma hospitalizations/population)* 10,000
- atms_per_10k_people = (atm locations/population)* 10,000
- lottery_per_10k_people = (lottery retailers/population)* 10,000

Example script with first join

# add atm, lottery and asthma data to ny county poverty dataframe

# load packages
library(tidyverse)

# import our county dataset with population and poverty
county_pov <- read_csv("data/processed/county_pov_rate_2019.csv") 

# import atm location data
raw_atms <- read_csv("data/raw/Bank-Owned_ATM_Locations_in_New_York_State.csv")

# aggregate to county-level and create a common join key
atms_by_county <- raw_atms |> 
  group_by(County) |> 
  summarise(atms = n()) |> 
  mutate(County = paste0(County, " County"))

# join atm data to county
county_pov_atms <- county_pov |> 
  left_join(atms_by_county, by = c("COUNTY" = "County")) |> 
  mutate(banks_per10000 = atms/county_pop*10000)

Create one county-level dataframe with:

County name, County ID, population, poverty rate, atms, atms per 10K people, lottery retailers, lottery retailers per 10k people, asthma hospitalizations, asthma hospitalizations per 10k people

Explore and Answer Questions

Explore your new dataframe. Follow your interest! Some things you could look at:

What is the average county asthma rate?
Where is the highest asthma rate?
How does the asthma rate compare to the poverty rate?

Upload your script for assignment 4a with the answer to these 3 questions (or other questions) at the bottom of your script

Next week we will use visualization to explore more.

Optional: Add more data

If you finish exploring those datasets in class, add more data and keep exploring health in NY counties.

Go to the County Health Rankings and Roadmaps website to download data for New York.

Download the 2019 New York Data excel sheet. (You want to get the 2019 data since all of your other data is for 2019)
- There is a LOT of data in here.
Add the Additional Measure Data tab using the Import Dataset
- Use the Import Options to select the Sheet, and skip the first row
Pick some of the variables and join them to your existing dataset
Do similar explorations as you did with the asthma and lottery retailer data.

Homework

Submit the script for your in-class assignment
Read Chapter 6, Never a Real Democracy from The Sum of Us. by Heather McGhee.
R: Explore apportionment and race data to check one of McGhee’s claims in the chapter of Sum of Us

Methods 1, Week 4

Download more data

Outline

Research Journal

Readings Discussion

Homework questions and overview

Importing excel files

Paste function

In-class exercise

Homework

Research Journal

Readings Discussion

Homework Questions

Importing excel files

Import Dataset

Look at the data

Import Options

Import Options

Paste

In-class exercise - Join Practice

Aggregate & Join

Example script with first join

Explore and Answer Questions

Optional: Add more data

Homework