Methods 1, Week 4

Outline

  • Research Journal

  • Readings Discussion

  • Homework questions and overview

  • Group and summarise

  • In-class exercise

  • Importing excel files

  • Homework

Research Journal

Homework Questions

  • What district has the highest student poverty rate?
  • What county is that district in?
  • What county has the highest poverty rate? (you should use the county poverty dataset to answer this question)
  • What school district has the lowest student poverty rate in Montgomery County?
  • What is the average child poverty ratein a school district

summarise() New York poverty


# calculate child poverty statistics for New York

ny_pov_stats <- sd_county_pov |>
  summarise(districts = n(),  # counts the number of school districts in the state
            kids = sum(stpop), # number of kids in the state
            kids_in_pov = sum(stpov), # number of kids living in poverty in the state
            child_poverty_rate = round(kids_in_pov/kids, 3), # the child poverty rate in the state
            mean_sd_child_povrate = round(mean(stpovrate), 3), # the average child poverty rate for school districts
            max_sd_child_povrate = round(max(stpovrate), 3), # the highest child poverty rate in a school district
            min_sd_child_povrate = round(min(stpovrate), 3), # the lowest child poverty rate in a school district
            poverty_range = max_sd_child_povrate - min_sd_child_povrate) # the difference between the highest and lowest school district poverty 


districts kids kids_in_pov child_poverty_rate mean_sd_child_povrate max_sd_child_povrate min_sd_child_povrate poverty_range
681 2901286 485661 0.167 0.131 0.501 0 0.501

group_by() & summarise()

  • We can also calculate the same statistics for each county by using:
    • group_by() and summarise() together
    • group_by() indicates that you want to summarize based on another variable rather than the whole dataset
ny_county_pov_stats <- sd_county_pov |>
  group_by(COUNTY) |> 
  summarise(districts = n(),
            kids = sum(stpop), # number of kids in the state
            kids_in_pov = sum(stpov), # number of kids living in poverty in the state
            child_poverty_rate = round(kids_in_pov/kids, 3), # the child poverty rate in the state
            mean_sd_child_povrate = round(mean(stpovrate), 3), # the average child poverty rate for school districts
            max_sd_child_povrate = round(max(stpovrate), 3), # the highest child poverty rate in a school district
            min_sd_child_povrate = round(min(stpovrate), 3), # the lowest child poverty rate in a school district
            poverty_range = max_sd_child_povrate - min_sd_child_povrate) # the difference between the highest and lowest school district poverty 

write_csv(ny_county_pov_stats, "data/output/ny_county_poverty_stats.csv")

summarize poverty in new york counties

COUNTY districts kids kids_in_pov child_poverty_rate mean_sd_child_povrate max_sd_child_povrate min_sd_child_povrate poverty_range
Albany County 12 40249 5403 0.134 0.129 0.266 0.041 0.225
Allegany County 12 6796 1633 0.240 0.241 0.315 0.185 0.130
Broome County 12 27673 5916 0.214 0.182 0.370 0.079 0.291
Cattaraugus County 12 13775 2594 0.188 0.183 0.239 0.126 0.113
Cayuga County 7 9694 1728 0.178 0.152 0.225 0.090 0.135
Chautauqua County 18 19226 4335 0.225 0.190 0.333 0.088 0.245
Chemung County 3 12057 2267 0.188 0.173 0.248 0.088 0.160
Chenango County 8 7478 1225 0.164 0.166 0.197 0.125 0.072
Clinton County 8 11080 1647 0.149 0.144 0.211 0.097 0.114
Columbia County 6 7212 894 0.124 0.118 0.185 0.079 0.106
Cortland County 5 6632 1179 0.178 0.189 0.242 0.114 0.128
Delaware County 13 5774 1012 0.175 0.165 0.211 0.106 0.105
Dutchess County 13 41543 3759 0.090 0.098 0.239 0.055 0.184
Erie County 28 134940 24775 0.184 0.124 0.381 0.041 0.340
Essex County 10 3533 578 0.164 0.160 0.246 0.054 0.192
Franklin County 7 7803 1733 0.222 0.218 0.316 0.130 0.186
Fulton County 6 7651 1643 0.215 0.185 0.309 0.131 0.178
Genesee County 8 8609 972 0.113 0.101 0.165 0.072 0.093
Greene County 6 5922 959 0.162 0.177 0.315 0.106 0.209
Hamilton County 7 413 51 0.123 0.120 0.222 0.000 0.222
Herkimer County 10 9636 1673 0.174 0.168 0.237 0.078 0.159
Jefferson County 11 17784 3598 0.202 0.191 0.293 0.127 0.166
Lewis County 5 4254 731 0.172 0.167 0.181 0.128 0.053
Livingston County 8 7917 1168 0.148 0.154 0.256 0.091 0.165
Madison County 10 10139 1190 0.117 0.125 0.214 0.066 0.148
Monroe County 18 113152 19831 0.175 0.113 0.379 0.034 0.345
Montgomery County 5 8413 2014 0.239 0.230 0.268 0.151 0.117
Nassau County 56 216350 12320 0.057 0.054 0.193 0.015 0.178
New York County 1 1193045 259012 0.217 0.217 0.217 0.217 0.000
Niagara County 10 30500 4912 0.161 0.129 0.314 0.048 0.266
Oneida County 15 35425 6159 0.174 0.119 0.321 0.045 0.276
Onondaga County 18 70736 12000 0.170 0.115 0.364 0.035 0.329
Ontario County 9 17119 1754 0.102 0.112 0.181 0.051 0.130
Orange County 17 72854 11745 0.161 0.126 0.501 0.051 0.450
Orleans County 5 5895 1027 0.174 0.169 0.192 0.123 0.069
Oswego County 9 19552 3686 0.189 0.186 0.239 0.152 0.087
Otsego County 12 6822 1067 0.156 0.155 0.235 0.090 0.145
Putnam County 6 14400 780 0.054 0.056 0.073 0.038 0.035
Rensselaer County 12 22228 3147 0.142 0.124 0.262 0.046 0.216
Rockland County 8 65876 12842 0.195 0.110 0.342 0.047 0.295
Saratoga County 12 34936 2350 0.067 0.088 0.151 0.041 0.110
Schenectady County 6 23088 4600 0.199 0.132 0.335 0.045 0.290
Schoharie County 6 4268 652 0.153 0.169 0.285 0.104 0.181
Schuyler County 3 2105 415 0.197 0.200 0.213 0.190 0.023
Seneca County 4 4926 917 0.186 0.192 0.234 0.157 0.077
St. Lawrence County 17 15926 3519 0.221 0.212 0.335 0.121 0.214
Steuben County 12 15402 2732 0.177 0.195 0.291 0.120 0.171
Suffolk County 68 230466 17818 0.077 0.081 0.417 0.000 0.417
Sullivan County 8 10351 2537 0.245 0.214 0.288 0.133 0.155
Tioga County 6 7354 1038 0.141 0.141 0.177 0.102 0.075
Tompkins County 6 11569 1457 0.126 0.137 0.175 0.101 0.074
Ulster County 9 23786 3177 0.134 0.134 0.215 0.074 0.141
Warren County 9 9039 1183 0.131 0.149 0.261 0.053 0.208
Washington County 11 8399 1283 0.153 0.163 0.267 0.102 0.165
Wayne County 11 14309 2098 0.147 0.153 0.233 0.052 0.181
Westchester County 40 157456 13588 0.086 0.058 0.165 0.015 0.150
Wyoming County 5 4410 510 0.116 0.118 0.155 0.088 0.067
Yates County 2 3339 828 0.248 0.260 0.305 0.216 0.089

Look at a histogram

A histogram is a chart that shows the distribution of your data.

  • The height of each bar indicates how many district’s poverty difference is within that range.


hist(sd_county_pov$pov_diff_county)

( pov_diff_county = stpovrate - county_pov_rate )

Look at a scatterplot

A scatterplot is a chart that is used to look at the relationship between two variables.

  • Each dot is a school district.
  • The pattern of dots helps you to determine whether a relationship exists between two variables.
  • Here, you can say there is a positive relationship between County Poverty Rate and School District Poverty Rate
    • meaning as County Poverty Rate increases, the School District Poverty Rate tends to be higher also
plot(sd_county_pov$county_pov_rate, sd_county_pov$stpovrate, 
     xlab="County Poverty Rate", 
     ylab="School District Poverty Rate")

In-class exercise - Join Practice

We’ll add 2 more datasets to our New York county poverty dataframe so that you can practice processing and exploring data on your own:

Open your part1 project and create a new script, called ny_county_health.

  • load the tidyverse and readxl
  • Read in your processed 2022 New York county poverty dataset (county_pov_rate_2022.csv)

Process, Aggregate & Join

Read in both raw datasets and look at the data:

  • Process the datasets:
    • County Health Rankings Data
      • Select FIPS, County, and 5 variables that you want to compare to poverty
    • Lottery Retailaers
      • Use summarise()* and group_by() to aggregate the data to County-level (GEOID is the County Number)
  • Use left_join() to join each processed, county-level dataset to your New York county poverty dataframe
  • Calculate the # of lottery retailers per 10,000 kids in each county: (# retailer/number of kids)* 10,000

Create one county-level dataframe with: County name, County ID, number of kids, child poverty rate, lottery retailers, lottery retailers per 10k kids, and at least 5 variables from the County Health Data

Explore and Answer Questions


Explore your new dataframe. Follow your interest!

  • Create a histogram of each of the health indicators and lottery retailers per 10k people
  • Create a scatterplot that compares your new variables to student poverty rate
  • Answer 3 simple research questions using your dataset

Write out your new county dataframe to data/processed/ny_county_health_poverty_2022.csv

Upload your script for assignment 4a with the answer to 3 questions at the bottom of your script

Importing excel files


Use the read_excel() function to import excel files:

We’ll use the Import Dataset user interface to learn about read_excel

  • ALWAYS copy the code to import your data into your script if you use this interface

Import Dataset

In the Files pane, right-click the data you want to read in:

Look at the data

Import Options

  • Use the Import Options to write the code to select the Sheet and Rows you want
  • Copy the code into your script

Import Options

  • Copy the code into your script
library(tidyverse)
library(readxl)

# import ny  health data
raw_health_data <- read_excel("~/Documents/spatial/NewSchool/methods1-materials-fall2024/methods1/part1/data/raw/CountyHealthRankings/2021 County Health Rankings Data - v1.xlsx", 
    sheet = "Ranked Measure Data", skip = 1)

Homework


  • Submit the script for your in-class assignment
  • Read Chapter 6, Never a Real Democracy from The Sum of Us. by Heather McGhee.
  • R: Explore apportionment and race data to check one of McGhee’s claims in the chapter of Sum of Us