Methods 1, Week 4

Outline

Research Journal
Readings Discussion
Homework questions and overview
Group and summarise
In-class exercise
Importing excel files
Homework

Research Journal

Homework Questions

What district has the highest student poverty rate?
What county is that district in?
What county has the highest poverty rate? (you should use the county poverty dataset to answer this question)
What school district has the lowest student poverty rate in Montgomery County?
What is the average child poverty ratein a school district

summarise() New York poverty

# calculate child poverty statistics for New York

ny_pov_stats <- sd_county_pov |>
  summarise(districts = n(),  # counts the number of school districts in the state
            kids = sum(stpop), # number of kids in the state
            kids_in_pov = sum(stpov), # number of kids living in poverty in the state
            child_poverty_rate = round(kids_in_pov/kids, 3), # the child poverty rate in the state
            mean_sd_child_povrate = round(mean(stpovrate), 3), # the average child poverty rate for school districts
            max_sd_child_povrate = round(max(stpovrate), 3), # the highest child poverty rate in a school district
            min_sd_child_povrate = round(min(stpovrate), 3), # the lowest child poverty rate in a school district
            poverty_range = max_sd_child_povrate - min_sd_child_povrate) # the difference between the highest and lowest school district poverty

districts	kids	kids_in_pov	child_poverty_rate	mean_sd_child_povrate	max_sd_child_povrate	min_sd_child_povrate	poverty_range
681	2901286	485661	0.167	0.131	0.501	0	0.501

group_by() & summarise()

We can also calculate the same statistics for each county by using:
- group_by() and summarise() together
- group_by() indicates that you want to summarize based on another variable rather than the whole dataset

ny_county_pov_stats <- sd_county_pov |>
  group_by(COUNTY) |> 
  summarise(districts = n(),
            kids = sum(stpop), # number of kids in the state
            kids_in_pov = sum(stpov), # number of kids living in poverty in the state
            child_poverty_rate = round(kids_in_pov/kids, 3), # the child poverty rate in the state
            mean_sd_child_povrate = round(mean(stpovrate), 3), # the average child poverty rate for school districts
            max_sd_child_povrate = round(max(stpovrate), 3), # the highest child poverty rate in a school district
            min_sd_child_povrate = round(min(stpovrate), 3), # the lowest child poverty rate in a school district
            poverty_range = max_sd_child_povrate - min_sd_child_povrate) # the difference between the highest and lowest school district poverty 

write_csv(ny_county_pov_stats, "data/output/ny_county_poverty_stats.csv")

summarize poverty in new york counties

COUNTY	districts	kids	kids_in_pov	child_poverty_rate	mean_sd_child_povrate	max_sd_child_povrate	min_sd_child_povrate	poverty_range
Albany County	12	40249	5403	0.134	0.129	0.266	0.041	0.225
Allegany County	12	6796	1633	0.240	0.241	0.315	0.185	0.130
Broome County	12	27673	5916	0.214	0.182	0.370	0.079	0.291
Cattaraugus County	12	13775	2594	0.188	0.183	0.239	0.126	0.113
Cayuga County	7	9694	1728	0.178	0.152	0.225	0.090	0.135
Chautauqua County	18	19226	4335	0.225	0.190	0.333	0.088	0.245
Chemung County	3	12057	2267	0.188	0.173	0.248	0.088	0.160
Chenango County	8	7478	1225	0.164	0.166	0.197	0.125	0.072
Clinton County	8	11080	1647	0.149	0.144	0.211	0.097	0.114
Columbia County	6	7212	894	0.124	0.118	0.185	0.079	0.106
Cortland County	5	6632	1179	0.178	0.189	0.242	0.114	0.128
Delaware County	13	5774	1012	0.175	0.165	0.211	0.106	0.105
Dutchess County	13	41543	3759	0.090	0.098	0.239	0.055	0.184
Erie County	28	134940	24775	0.184	0.124	0.381	0.041	0.340
Essex County	10	3533	578	0.164	0.160	0.246	0.054	0.192
Franklin County	7	7803	1733	0.222	0.218	0.316	0.130	0.186
Fulton County	6	7651	1643	0.215	0.185	0.309	0.131	0.178
Genesee County	8	8609	972	0.113	0.101	0.165	0.072	0.093
Greene County	6	5922	959	0.162	0.177	0.315	0.106	0.209
Hamilton County	7	413	51	0.123	0.120	0.222	0.000	0.222
Herkimer County	10	9636	1673	0.174	0.168	0.237	0.078	0.159
Jefferson County	11	17784	3598	0.202	0.191	0.293	0.127	0.166
Lewis County	5	4254	731	0.172	0.167	0.181	0.128	0.053
Livingston County	8	7917	1168	0.148	0.154	0.256	0.091	0.165
Madison County	10	10139	1190	0.117	0.125	0.214	0.066	0.148
Monroe County	18	113152	19831	0.175	0.113	0.379	0.034	0.345
Montgomery County	5	8413	2014	0.239	0.230	0.268	0.151	0.117
Nassau County	56	216350	12320	0.057	0.054	0.193	0.015	0.178
New York County	1	1193045	259012	0.217	0.217	0.217	0.217	0.000
Niagara County	10	30500	4912	0.161	0.129	0.314	0.048	0.266
Oneida County	15	35425	6159	0.174	0.119	0.321	0.045	0.276
Onondaga County	18	70736	12000	0.170	0.115	0.364	0.035	0.329
Ontario County	9	17119	1754	0.102	0.112	0.181	0.051	0.130
Orange County	17	72854	11745	0.161	0.126	0.501	0.051	0.450
Orleans County	5	5895	1027	0.174	0.169	0.192	0.123	0.069
Oswego County	9	19552	3686	0.189	0.186	0.239	0.152	0.087
Otsego County	12	6822	1067	0.156	0.155	0.235	0.090	0.145
Putnam County	6	14400	780	0.054	0.056	0.073	0.038	0.035
Rensselaer County	12	22228	3147	0.142	0.124	0.262	0.046	0.216
Rockland County	8	65876	12842	0.195	0.110	0.342	0.047	0.295
Saratoga County	12	34936	2350	0.067	0.088	0.151	0.041	0.110
Schenectady County	6	23088	4600	0.199	0.132	0.335	0.045	0.290
Schoharie County	6	4268	652	0.153	0.169	0.285	0.104	0.181
Schuyler County	3	2105	415	0.197	0.200	0.213	0.190	0.023
Seneca County	4	4926	917	0.186	0.192	0.234	0.157	0.077
St. Lawrence County	17	15926	3519	0.221	0.212	0.335	0.121	0.214
Steuben County	12	15402	2732	0.177	0.195	0.291	0.120	0.171
Suffolk County	68	230466	17818	0.077	0.081	0.417	0.000	0.417
Sullivan County	8	10351	2537	0.245	0.214	0.288	0.133	0.155
Tioga County	6	7354	1038	0.141	0.141	0.177	0.102	0.075
Tompkins County	6	11569	1457	0.126	0.137	0.175	0.101	0.074
Ulster County	9	23786	3177	0.134	0.134	0.215	0.074	0.141
Warren County	9	9039	1183	0.131	0.149	0.261	0.053	0.208
Washington County	11	8399	1283	0.153	0.163	0.267	0.102	0.165
Wayne County	11	14309	2098	0.147	0.153	0.233	0.052	0.181
Westchester County	40	157456	13588	0.086	0.058	0.165	0.015	0.150
Wyoming County	5	4410	510	0.116	0.118	0.155	0.088	0.067
Yates County	2	3339	828	0.248	0.260	0.305	0.216	0.089

Look at a histogram

A histogram is a chart that shows the distribution of your data.

The height of each bar indicates how many district’s poverty difference is within that range.

hist(sd_county_pov$pov_diff_county)

( pov_diff_county = stpovrate - county_pov_rate )

Look at a scatterplot

A scatterplot is a chart that is used to look at the relationship between two variables.

Each dot is a school district.
The pattern of dots helps you to determine whether a relationship exists between two variables.
Here, you can say there is a positive relationship between County Poverty Rate and School District Poverty Rate
- meaning as County Poverty Rate increases, the School District Poverty Rate tends to be higher also

plot(sd_county_pov$county_pov_rate, sd_county_pov$stpovrate, 
     xlab="County Poverty Rate", 
     ylab="School District Poverty Rate")

In-class exercise - Join Practice

We’ll add 2 more datasets to our New York county poverty dataframe so that you can practice processing and exploring data on your own:

Open your part1 project and create a new script, called ny_county_health.

load the tidyverse and readxl
Read in your processed 2022 New York county poverty dataset (county_pov_rate_2022.csv)

Process, Aggregate & Join

Read in both raw datasets and look at the data:

Process the datasets:
- County Health Rankings Data
  - Select FIPS, County, and 5 variables that you want to compare to poverty
- Lottery Retailaers
  - Use summarise()* and group_by() to aggregate the data to County-level (GEOID is the County Number)
Use left_join() to join each processed, county-level dataset to your New York county poverty dataframe
Calculate the # of lottery retailers per 10,000 kids in each county: (# retailer/number of kids)* 10,000

Create one county-level dataframe with: County name, County ID, number of kids, child poverty rate, lottery retailers, lottery retailers per 10k kids, and at least 5 variables from the County Health Data

Explore and Answer Questions

Explore your new dataframe. Follow your interest!

Create a histogram of each of the health indicators and lottery retailers per 10k people
Create a scatterplot that compares your new variables to student poverty rate
Answer 3 simple research questions using your dataset

Write out your new county dataframe to data/processed/ny_county_health_poverty_2022.csv

Upload your script for assignment 4a with the answer to 3 questions at the bottom of your script

Importing excel files

Use the read_excel() function to import excel files:

We’ll use the Import Dataset user interface to learn about read_excel

ALWAYS copy the code to import your data into your script if you use this interface

Import Dataset

In the Files pane, right-click the data you want to read in:

Look at the data

Import Options

Use the Import Options to write the code to select the Sheet and Rows you want
Copy the code into your script

Import Options

Copy the code into your script

library(tidyverse)
library(readxl)

# import ny  health data
raw_health_data <- read_excel("~/Documents/spatial/NewSchool/methods1-materials-fall2024/methods1/part1/data/raw/CountyHealthRankings/2021 County Health Rankings Data - v1.xlsx", 
    sheet = "Ranked Measure Data", skip = 1)

Homework

Submit the script for your in-class assignment
Read Chapter 6, Never a Real Democracy from The Sum of Us. by Heather McGhee.
R: Explore apportionment and race data to check one of McGhee’s claims in the chapter of Sum of Us

Methods 1, Week 4

Outline

Research Journal

Readings Discussion

Homework questions and overview

Group and summarise

In-class exercise

Importing excel files

Homework

Research Journal

Homework Questions

summarise() New York poverty

group_by() & summarise()

summarize poverty in new york counties

Look at a histogram

Look at a scatterplot

In-class exercise - Join Practice

Process, Aggregate & Join

Explore and Answer Questions

Importing excel files

Import Dataset

Look at the data

Import Options

Import Options

Homework