Methods 1, Week 4

Download more data


Download the data in the additional data folder and save it in your part1/data/raw folder.

Outline

  • Research Journal

  • Readings Discussion

  • Homework questions and overview

  • Importing excel files

  • Paste function

  • In-class exercise

  • Homework

Research Journal

Readings Discussion


Who You Calling Hispanic

Homework Questions

  • What district has the highest student poverty rate?
  • What county is that district in?
  • What county has the highest poverty rate? (you should use the county poverty dataset created in script 1 to answer this question)
  • What school district has the lowest student poverty rate in Montgomery County?
  • What county has the largest range in student poverty rate?
  • What county has the smallest range in student poverty rate?

Importing excel files


Use the read_excel() function to import excel files:

We’ll use the Import Dataset user interface to learn about read_excel

  • ALWAYS copy the code to import your data into your script if you use this interface

Import Dataset

In the Files pane, right-click the data you want to read in:

Look at the data

Import Options

  • Use the Import Options to write the code to select the Sheet and Rows you want
  • Copy the code into your script

Import Options

  • Copy the code into your script
library(tidyverse)
library(readxl)

# import ny  asthma data
raw_asthma <- read_excel("data/raw/Asthma-SubCountyData.xlsx", 
                         sheet = "AD21", skip = 6)

Paste

Sometimes it is useful to concatenate strings with the paste() function:

paste0("Paste0", "has no spaces")
[1] "Paste0has no spaces"


paste("paste()", "adds a space", "between each item")
[1] "paste() adds a space between each item"


You can use this inside a mutate. For instance, if you want to add the word County after each county name in your asthma data.

asthma <- raw_asthma |> 
  mutate(County = paste(County, "County"))

In-class exercise - Join Practice

We’ll add 3 more datasets to our New York county poverty dataframe:

Open your part1 project and create a new script, called ny_county_dataset. If you haven’t already, download the data in the additional data folder and save it in your part1/data/raw folder.

Aggregate & Join

Read in your processed county poverty dataset

For each of the three datasets:

  • Read in the dataset, look at it
  • Use summarise()* and group_by() to aggregate the data to County-level
  • Find the column you can join by
  • Use left_join() to join each dataset to your New York poverty dataframe
  • Calculate rates using the county population:
    • asthma_hosp_per_10k_people = (asthma hospitalizations/population)* 10,000
    • atms_per_10k_people = (atm locations/population)* 10,000
    • lottery_per_10k_people = (lottery retailers/population)* 10,000

Example script with first join

# add atm, lottery and asthma data to ny county poverty dataframe

# load packages
library(tidyverse)

# import our county dataset with population and poverty
county_pov <- read_csv("data/processed/county_pov_rate_2019.csv") 

# import atm location data
raw_atms <- read_csv("data/raw/Bank-Owned_ATM_Locations_in_New_York_State.csv")

# aggregate to county-level and create a common join key
atms_by_county <- raw_atms |> 
  group_by(County) |> 
  summarise(atms = n()) |> 
  mutate(County = paste0(County, " County"))

# join atm data to county
county_pov_atms <- county_pov |> 
  left_join(atms_by_county, by = c("COUNTY" = "County")) |> 
  mutate(banks_per10000 = atms/county_pop*10000)

Create one county-level dataframe with:

County name, County ID, population, poverty rate, atms, atms per 10K people, lottery retailers, lottery retailers per 10k people, asthma hospitalizations, asthma hospitalizations per 10k people

Explore and Answer Questions


Explore your new dataframe. Follow your interest! Some things you could look at:

  • What is the average county asthma rate?
  • Where is the highest asthma rate?
  • How does the asthma rate compare to the poverty rate?

Upload your script for assignment 4a with the answer to these 3 questions (or other questions) at the bottom of your script

Next week we will use visualization to explore more.

Optional: Add more data

If you finish exploring those datasets in class, add more data and keep exploring health in NY counties.

Go to the County Health Rankings and Roadmaps website to download data for New York.

  • Download the 2019 New York Data excel sheet. (You want to get the 2019 data since all of your other data is for 2019)
    • There is a LOT of data in here.
  • Add the Additional Measure Data tab using the Import Dataset
    • Use the Import Options to select the Sheet, and skip the first row
  • Pick some of the variables and join them to your existing dataset
  • Do similar explorations as you did with the asthma and lottery retailer data.

Homework


  • Submit the script for your in-class assignment
  • Read Chapter 6, Never a Real Democracy from The Sum of Us. by Heather McGhee.
  • R: Explore apportionment and race data to check one of McGhee’s claims in the chapter of Sum of Us