What district has the highest student poverty rate?
What county is that district in?
What county has the highest poverty rate? (you should use the county poverty dataset to answer this question)
What school district has the lowest student poverty rate in Montgomery County?
What is the average child poverty ratein a school district
summarise() New York poverty
# calculate child poverty statistics for New Yorkny_pov_stats <- sd_county_pov |>summarise(districts =n(), # counts the number of school districts in the statekids =sum(stpop), # number of kids in the statekids_in_pov =sum(stpov), # number of kids living in poverty in the statechild_poverty_rate =round(kids_in_pov/kids, 3), # the child poverty rate in the statemean_sd_child_povrate =round(mean(stpovrate), 3), # the average child poverty rate for school districtsmax_sd_child_povrate =round(max(stpovrate), 3), # the highest child poverty rate in a school districtmin_sd_child_povrate =round(min(stpovrate), 3), # the lowest child poverty rate in a school districtpoverty_range = max_sd_child_povrate - min_sd_child_povrate) # the difference between the highest and lowest school district poverty
districts
kids
kids_in_pov
child_poverty_rate
mean_sd_child_povrate
max_sd_child_povrate
min_sd_child_povrate
poverty_range
681
2901286
485661
0.167
0.131
0.501
0
0.501
group_by() & summarise()
We can also calculate the same statistics for each county by using:
group_by() and summarise() together
group_by() indicates that you want to summarize based on another variable rather than the whole dataset
ny_county_pov_stats <- sd_county_pov |>group_by(COUNTY) |>summarise(districts =n(),kids =sum(stpop), # number of kids in the statekids_in_pov =sum(stpov), # number of kids living in poverty in the statechild_poverty_rate =round(kids_in_pov/kids, 3), # the child poverty rate in the statemean_sd_child_povrate =round(mean(stpovrate), 3), # the average child poverty rate for school districtsmax_sd_child_povrate =round(max(stpovrate), 3), # the highest child poverty rate in a school districtmin_sd_child_povrate =round(min(stpovrate), 3), # the lowest child poverty rate in a school districtpoverty_range = max_sd_child_povrate - min_sd_child_povrate) # the difference between the highest and lowest school district poverty write_csv(ny_county_pov_stats, "data/output/ny_county_poverty_stats.csv")
summarize poverty in new york counties
COUNTY
districts
kids
kids_in_pov
child_poverty_rate
mean_sd_child_povrate
max_sd_child_povrate
min_sd_child_povrate
poverty_range
Albany County
12
40249
5403
0.134
0.129
0.266
0.041
0.225
Allegany County
12
6796
1633
0.240
0.241
0.315
0.185
0.130
Broome County
12
27673
5916
0.214
0.182
0.370
0.079
0.291
Cattaraugus County
12
13775
2594
0.188
0.183
0.239
0.126
0.113
Cayuga County
7
9694
1728
0.178
0.152
0.225
0.090
0.135
Chautauqua County
18
19226
4335
0.225
0.190
0.333
0.088
0.245
Chemung County
3
12057
2267
0.188
0.173
0.248
0.088
0.160
Chenango County
8
7478
1225
0.164
0.166
0.197
0.125
0.072
Clinton County
8
11080
1647
0.149
0.144
0.211
0.097
0.114
Columbia County
6
7212
894
0.124
0.118
0.185
0.079
0.106
Cortland County
5
6632
1179
0.178
0.189
0.242
0.114
0.128
Delaware County
13
5774
1012
0.175
0.165
0.211
0.106
0.105
Dutchess County
13
41543
3759
0.090
0.098
0.239
0.055
0.184
Erie County
28
134940
24775
0.184
0.124
0.381
0.041
0.340
Essex County
10
3533
578
0.164
0.160
0.246
0.054
0.192
Franklin County
7
7803
1733
0.222
0.218
0.316
0.130
0.186
Fulton County
6
7651
1643
0.215
0.185
0.309
0.131
0.178
Genesee County
8
8609
972
0.113
0.101
0.165
0.072
0.093
Greene County
6
5922
959
0.162
0.177
0.315
0.106
0.209
Hamilton County
7
413
51
0.123
0.120
0.222
0.000
0.222
Herkimer County
10
9636
1673
0.174
0.168
0.237
0.078
0.159
Jefferson County
11
17784
3598
0.202
0.191
0.293
0.127
0.166
Lewis County
5
4254
731
0.172
0.167
0.181
0.128
0.053
Livingston County
8
7917
1168
0.148
0.154
0.256
0.091
0.165
Madison County
10
10139
1190
0.117
0.125
0.214
0.066
0.148
Monroe County
18
113152
19831
0.175
0.113
0.379
0.034
0.345
Montgomery County
5
8413
2014
0.239
0.230
0.268
0.151
0.117
Nassau County
56
216350
12320
0.057
0.054
0.193
0.015
0.178
New York County
1
1193045
259012
0.217
0.217
0.217
0.217
0.000
Niagara County
10
30500
4912
0.161
0.129
0.314
0.048
0.266
Oneida County
15
35425
6159
0.174
0.119
0.321
0.045
0.276
Onondaga County
18
70736
12000
0.170
0.115
0.364
0.035
0.329
Ontario County
9
17119
1754
0.102
0.112
0.181
0.051
0.130
Orange County
17
72854
11745
0.161
0.126
0.501
0.051
0.450
Orleans County
5
5895
1027
0.174
0.169
0.192
0.123
0.069
Oswego County
9
19552
3686
0.189
0.186
0.239
0.152
0.087
Otsego County
12
6822
1067
0.156
0.155
0.235
0.090
0.145
Putnam County
6
14400
780
0.054
0.056
0.073
0.038
0.035
Rensselaer County
12
22228
3147
0.142
0.124
0.262
0.046
0.216
Rockland County
8
65876
12842
0.195
0.110
0.342
0.047
0.295
Saratoga County
12
34936
2350
0.067
0.088
0.151
0.041
0.110
Schenectady County
6
23088
4600
0.199
0.132
0.335
0.045
0.290
Schoharie County
6
4268
652
0.153
0.169
0.285
0.104
0.181
Schuyler County
3
2105
415
0.197
0.200
0.213
0.190
0.023
Seneca County
4
4926
917
0.186
0.192
0.234
0.157
0.077
St. Lawrence County
17
15926
3519
0.221
0.212
0.335
0.121
0.214
Steuben County
12
15402
2732
0.177
0.195
0.291
0.120
0.171
Suffolk County
68
230466
17818
0.077
0.081
0.417
0.000
0.417
Sullivan County
8
10351
2537
0.245
0.214
0.288
0.133
0.155
Tioga County
6
7354
1038
0.141
0.141
0.177
0.102
0.075
Tompkins County
6
11569
1457
0.126
0.137
0.175
0.101
0.074
Ulster County
9
23786
3177
0.134
0.134
0.215
0.074
0.141
Warren County
9
9039
1183
0.131
0.149
0.261
0.053
0.208
Washington County
11
8399
1283
0.153
0.163
0.267
0.102
0.165
Wayne County
11
14309
2098
0.147
0.153
0.233
0.052
0.181
Westchester County
40
157456
13588
0.086
0.058
0.165
0.015
0.150
Wyoming County
5
4410
510
0.116
0.118
0.155
0.088
0.067
Yates County
2
3339
828
0.248
0.260
0.305
0.216
0.089
Look at a histogram
A histogram is a chart that shows the distribution of your data.
The height of each bar indicates how many district’s poverty difference is within that range.
hist(sd_county_pov$pov_diff_county)
( pov_diff_county = stpovrate - county_pov_rate )
Look at a scatterplot
A scatterplot is a chart that is used to look at the relationship between two variables.
Each dot is a school district.
The pattern of dots helps you to determine whether a relationship exists between two variables.
Here, you can say there is a positive relationship between County Poverty Rate and School District Poverty Rate
meaning as County Poverty Rate increases, the School District Poverty Rate tends to be higher also
plot(sd_county_pov$county_pov_rate, sd_county_pov$stpovrate, xlab="County Poverty Rate", ylab="School District Poverty Rate")
In-class exercise - Join Practice
We’ll add 2 more datasets to our New York county poverty dataframe so that you can practice processing and exploring data on your own:
Open your part1 project and create a new script, called ny_county_health.
load the tidyverse and readxl
Read in your processed 2022 New York county poverty dataset (county_pov_rate_2022.csv)
Process, Aggregate & Join
Read in both raw datasets and look at the data:
Process the datasets:
County Health Rankings Data
Select FIPS, County, and 5 variables that you want to compare to poverty
Lottery Retailaers
Use summarise()* and group_by() to aggregate the data to County-level (GEOID is the County Number)
Use left_join() to join each processed, county-level dataset to your New York county poverty dataframe
Calculate the # of lottery retailers per 10,000 kids in each county: (# retailer/number of kids)* 10,000
Create one county-level dataframe with: County name, County ID, number of kids, child poverty rate, lottery retailers, lottery retailers per 10k kids, and at least 5 variables from the County Health Data
Explore and Answer Questions
Explore your new dataframe. Follow your interest!
Create a histogram of each of the health indicators and lottery retailers per 10k people
Create a scatterplot that compares your new variables to student poverty rate
Answer 3 simple research questions using your dataset
Write out your new county dataframe to data/processed/ny_county_health_poverty_2022.csv
Upload your script for assignment 4a with the answer to 3 questions at the bottom of your script
Importing excel files
Use the read_excel() function to import excel files:
We’ll use the Import Dataset user interface to learn about read_excel
ALWAYS copy the code to import your data into your script if you use this interface
Import Dataset
In the Files pane, right-click the data you want to read in:
Look at the data
Import Options
Use the Import Options to write the code to select the Sheet and Rows you want
Copy the code into your script
Import Options
Copy the code into your script
library(tidyverse)library(readxl)# import ny health dataraw_health_data <-read_excel("~/Documents/spatial/NewSchool/methods1-materials-fall2024/methods1/part1/data/raw/CountyHealthRankings/2021 County Health Rankings Data - v1.xlsx", sheet ="Ranked Measure Data", skip =1)
Homework
Submit the script for your in-class assignment
Read Chapter 6, Never a Real Democracy from The Sum of Us. by Heather McGhee.
R: Explore apportionment and race data to check one of McGhee’s claims in the chapter of Sum of Us