The leading causes of death by sex and ethnicity in New York City since 2007.
The data can be downloaded from https://data.cityofnewyork.us/api/views/jb7j-dtam/rows.csv?accessType=DOWNLOAD
Year
Leading.Cause
Sex
Race.Ethnicity
Deaths
Death.Rate
Age.Adjusted.Death.Rate
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(tidyr)
## Loading required package: tidyr
require(knitr)
## Loading required package: knitr
require(ggplot2)
## Loading required package: ggplot2
nyc_data<-read.csv("https://data.cityofnewyork.us/api/views/jb7j-dtam/rows.csv?accessType=DOWNLOAD", sep = ",")
str(nyc_data)
## 'data.frame': 1094 obs. of 7 variables:
## $ Year : int 2014 2011 2008 2010 2012 2007 2011 2007 2012 2009 ...
## $ Leading.Cause : Factor w/ 26 levels "Accidents Except Drug Posioning (V01-X39, X43, X45-X59, Y85-Y86)",..: 12 7 19 19 7 21 2 10 14 13 ...
## $ Sex : Factor w/ 2 levels "F","M": 1 2 2 1 2 2 1 1 1 2 ...
## $ Race.Ethnicity : Factor w/ 6 levels "Asian and Pacific Islander",..: 5 6 4 3 2 4 4 2 3 3 ...
## $ Deaths : Factor w/ 465 levels ".","10","1007",..: 24 269 409 10 132 1 87 122 4 85 ...
## $ Death.Rate : Factor w/ 442 levels ".","10","10.1",..: 1 189 1 409 167 1 1 100 393 62 ...
## $ Age.Adjusted.Death.Rate: Factor w/ 427 levels ".","10","10.1",..: 1 156 1 424 231 1 1 77 412 225 ...
nyc_data <- subset(nyc_data, nyc_data$Deaths != '.')
str(nyc_data)
## 'data.frame': 956 obs. of 7 variables:
## $ Year : int 2014 2011 2008 2010 2012 2011 2007 2012 2009 2008 ...
## $ Leading.Cause : Factor w/ 26 levels "Accidents Except Drug Posioning (V01-X39, X43, X45-X59, Y85-Y86)",..: 12 7 19 19 7 2 10 14 13 7 ...
## $ Sex : Factor w/ 2 levels "F","M": 1 2 2 1 2 1 1 1 2 1 ...
## $ Race.Ethnicity : Factor w/ 6 levels "Asian and Pacific Islander",..: 5 6 4 3 2 4 2 3 3 4 ...
## $ Deaths : Factor w/ 465 levels ".","10","1007",..: 24 269 409 10 132 87 122 4 85 64 ...
## $ Death.Rate : Factor w/ 442 levels ".","10","10.1",..: 1 189 1 409 167 1 100 393 62 1 ...
## $ Age.Adjusted.Death.Rate: Factor w/ 427 levels ".","10","10.1",..: 1 156 1 424 231 1 77 412 225 1 ...
There are 956 observations after removing rows with “.” and 7 columns.
analysis1 <- nyc_data %>%
select(Year, Sex, Leading.Cause, DeathCount = ((as.numeric(as.character(Deaths)) ))) %>%
group_by(Year, Sex, Leading.Cause) %>%
summarise ( Total = sum(as.numeric(as.character(DeathCount))) ) %>%
select(Year, Sex, Leading.Cause,Total) %>%
arrange((Year), Sex ) %>%
top_n(1,Total)
kable(analysis1)
Year | Sex | Leading.Cause | Total |
---|---|---|---|
2007 | F | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 11743 |
2007 | M | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 9698 |
2008 | F | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 11605 |
2008 | M | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 9587 |
2009 | F | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 10780 |
2009 | M | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 9304 |
2010 | F | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 9463 |
2010 | M | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 8464 |
2011 | F | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 9083 |
2011 | M | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 7817 |
2012 | F | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 8777 |
2012 | M | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 7954 |
2013 | F | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 8745 |
2013 | M | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 8014 |
2014 | F | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 8556 |
2014 | M | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 7961 |
analysis2 <- nyc_data %>%
select(Year, Ethnicity=Race.Ethnicity, Leading.Cause, DeathCount=(as.numeric(as.character(Deaths)) )) %>%
group_by(Year, Ethnicity,Leading.Cause) %>%
summarise ( Total = sum(as.numeric(as.character(DeathCount))) ) %>%
select(Year, Ethnicity, Leading.Cause, Total) %>%
arrange(desc(Year), Ethnicity,Leading.Cause) %>%
top_n(1,Total)
kable(analysis2)
Year | Ethnicity | Leading.Cause | Total |
---|---|---|---|
2014 | Asian and Pacific Islander | Malignant Neoplasms (Cancer: C00-C97) | 1159 |
2014 | Black Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 4152 |
2014 | Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 2511 |
2014 | Not Stated/Unknown | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 210 |
2014 | Other Race/ Ethnicity | All Other Causes | 133 |
2014 | White Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 8497 |
2013 | Asian and Pacific Islander | Malignant Neoplasms (Cancer: C00-C97) | 1100 |
2013 | Black Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 4233 |
2013 | Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 2568 |
2013 | Not Stated/Unknown | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 257 |
2013 | Other Race/ Ethnicity | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 113 |
2013 | White Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 8620 |
2012 | Asian and Pacific Islander | Malignant Neoplasms (Cancer: C00-C97) | 1083 |
2012 | Black Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 4209 |
2012 | Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 2514 |
2012 | Not Stated/Unknown | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 175 |
2012 | Other Race/ Ethnicity | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 87 |
2012 | White Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 8875 |
2011 | Asian and Pacific Islander | Malignant Neoplasms (Cancer: C00-C97) | 1001 |
2011 | Black Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 4083 |
2011 | Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 2549 |
2011 | Not Stated/Unknown | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 105 |
2011 | Other Race/ Ethnicity | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 73 |
2011 | White Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 9236 |
2010 | Asian and Pacific Islander | Malignant Neoplasms (Cancer: C00-C97) | 940 |
2010 | Black Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 4297 |
2010 | Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 2670 |
2010 | Not Stated/Unknown | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 140 |
2010 | Other Race/ Ethnicity | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 102 |
2010 | White Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 9846 |
2009 | Asian and Pacific Islander | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 1002 |
2009 | Black Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 4603 |
2009 | Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 2731 |
2009 | Not Stated/Unknown | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 205 |
2009 | Other Race/ Ethnicity | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 78 |
2009 | White Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 11465 |
2008 | Asian and Pacific Islander | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 1002 |
2008 | Black Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 4802 |
2008 | Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 2775 |
2008 | Not Stated/Unknown | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 200 |
2008 | Other Race/ Ethnicity | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 74 |
2008 | White Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 12339 |
2007 | Asian and Pacific Islander | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 924 |
2007 | Black Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 4843 |
2007 | Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 2745 |
2007 | Not Stated/Unknown | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 168 |
2007 | Other Race/ Ethnicity | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 79 |
2007 | White Non-Hispanic | Diseases of Heart (I00-I09, I11, I13, I20-I51) | 12682 |
wide <- nyc_data %>%
select(Year, Leading.Cause, Deaths) %>%
filter(Leading.Cause == "Diseases of Heart (I00-I09, I11, I13, I20-I51)") %>%
group_by(Year, Leading.Cause) %>%
summarise ( Total = sum(as.numeric(as.character(Deaths) ) )) %>%
spread(Year, Total)
kable(wide)
Leading.Cause | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
---|---|---|---|---|---|---|---|---|
Diseases of Heart (I00-I09, I11, I13, I20-I51) | 21441 | 21192 | 20084 | 17927 | 16900 | 16731 | 16759 | 16517 |
Projected total New York City population for five intervals from 2010 through 2040 by Borough, broken down by 18 age cohorts. (Age groups may not add up to the total due to rounding.)
The data set can be downloaded from https://data.cityofnewyork.us/api/views/97pn-acdf/rows.csv?accessType=DOWNLOAD
Borough
Age
2010
2015
2020
2025
2030
2035
2040
population<-read.csv("https://data.cityofnewyork.us/api/views/97pn-acdf/rows.csv?accessType=DOWNLOAD", sep = ",")
str(population)
## 'data.frame': 114 obs. of 9 variables:
## $ Borough: Factor w/ 6 levels "Bronx","Brooklyn",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Age : Factor w/ 19 levels "0-4","10-14",..: 1 3 4 5 6 7 8 9 11 12 ...
## $ X2010 : int 521990 539844 647483 736105 667657 592299 571825 570273 546204 479661 ...
## $ X2015 : int 535209 505783 646075 770396 707726 611239 550097 535998 552074 493997 ...
## $ X2020 : int 545778 492532 606203 763956 743916 649594 569628 517668 520597 501239 ...
## $ X2025 : int 547336 519298 591683 715824 740268 684249 606185 537516 504322 474319 ...
## $ X2030 : int 542426 535024 625253 698195 693684 682964 638148 571723 523815 459574 ...
## $ X2035 : int 540523 546062 643728 740437 675497 639237 637517 600792 556586 477052 ...
## $ X2040 : int 546426 546750 657403 762757 715486 621899 596493 600514 584164 506390 ...
This data set has 114 observations and 9 columns.
head(population)
## Borough Age X2010 X2015 X2020 X2025 X2030 X2035 X2040
## 1 NYC Total 0-4 521990 535209 545778 547336 542426 540523 546426
## 2 NYC Total 15-19 539844 505783 492532 519298 535024 546062 546750
## 3 NYC Total 20-24 647483 646075 606203 591683 625253 643728 657403
## 4 NYC Total 25-29 736105 770396 763956 715824 698195 740437 762757
## 5 NYC Total 30-34 667657 707726 743916 740268 693684 675497 715486
## 6 NYC Total 35-39 592299 611239 649594 684249 682964 639237 621899
# remove the rows with total in columns Borough and Age.
analysis2.1 <- population %>%
select(Borough,Age,X2020 ) %>%
filter( Borough != "NYC Total" ) %>%
filter( Age != "Total" ) %>%
spread(Age,X2020)
kable(analysis2.1)
Borough | 0-4 | 10-14 | 15-19 | 20-24 | 25-29 | 30-34 | 35-39 | 40-44 | 45-49 | 5-9 | 50-54 | 55-59 | 60-64 | 65-69 | 70-74 | 75-79 | 80-84 | 85+ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bronx | 109972 | 95703 | 98987 | 113827 | 127995 | 109797 | 93530 | 87217 | 83496 | 105775 | 89495 | 85632 | 73506 | 56988 | 43380 | 31168 | 20292 | 20028 |
Brooklyn | 186886 | 166761 | 158561 | 180072 | 221421 | 221544 | 205525 | 181452 | 159615 | 180252 | 152501 | 144647 | 137606 | 113940 | 92570 | 61041 | 41717 | 42341 |
Manhattan | 82096 | 60331 | 69487 | 118301 | 180070 | 190129 | 146480 | 110439 | 94732 | 68243 | 94242 | 88669 | 84257 | 76334 | 64227 | 47159 | 31115 | 31971 |
Queens | 138141 | 129351 | 134198 | 162692 | 200730 | 189015 | 172928 | 160306 | 149550 | 132342 | 151019 | 147504 | 137219 | 107634 | 82663 | 57287 | 38275 | 39441 |
Staten Island | 28683 | 30307 | 31299 | 31311 | 33740 | 33431 | 31131 | 30214 | 30275 | 29394 | 33340 | 34787 | 31599 | 25968 | 21232 | 13581 | 8259 | 8604 |
analysis2.2 <- population %>%
select(Borough,Age,X2020,X2040 ) %>%
filter( Borough != "NYC Total" ) %>%
filter( Age == "Total" ) %>%
select(Borough,X2020,X2040) %>%
mutate(Pct.Change = round(((X2040 - X2020)/X2040) * 100, 2)) %>%
rename(Year_2020=X2020, Year_2040=X2040)
kable(analysis2.2)
Borough | Year_2020 | Year_2040 | Pct.Change |
---|---|---|---|
Bronx | 1446788 | 1579245 | 8.39 |
Brooklyn | 2648452 | 2840525 | 6.76 |
Manhattan | 1638282 | 1691617 | 3.15 |
Queens | 2330295 | 2412649 | 3.41 |
Staten Island | 487155 | 501109 | 2.78 |
The most recent school level results for New York City on the SAT. Results are available at the school level for the graduating seniors of 2012. Records contain 2012 College-bound seniors mean SAT scores taken during SY 2012.
The data set can be downloaded from the below URL: https://data.cityofnewyork.us/api/views/f9bf-2cp4/rows.csv?accessType=DOWNLOAD
DBN
SCHOOL NAME
Num of SAT Test Takers
SAT Critical Reading Avg. Score
SAT Math Avg. Score
SAT Writing Avg. Score
sat_result<-read.csv("https://data.cityofnewyork.us/api/views/f9bf-2cp4/rows.csv?accessType=DOWNLOAD", sep = ",")
str(sat_result)
## 'data.frame': 478 obs. of 6 variables:
## $ DBN : Factor w/ 478 levels "01M292","01M448",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ SCHOOL.NAME : Factor w/ 478 levels "47 THE AMERICAN SIGN LANGUAGE AND ENGLISH SECONDARY SCHOOL",..: 204 438 141 169 309 296 333 110 38 1 ...
## $ Num.of.SAT.Test.Takers : Factor w/ 175 levels "10","101","102",..: 68 166 136 135 98 11 35 42 20 36 ...
## $ SAT.Critical.Reading.Avg..Score: Factor w/ 164 levels "279","287","300",..: 34 62 56 93 69 17 145 95 159 74 ...
## $ SAT.Math.Avg..Score : Factor w/ 173 levels "312","315","317",..: 70 87 68 67 94 153 158 82 164 66 ...
## $ SAT.Writing.Avg..Score : Factor w/ 163 levels "286","291","297",..: 45 48 52 41 66 11 147 92 158 69 ...
# remove the rows that does not have scores
sat_result1 <- sat_result %>%
filter(Num.of.SAT.Test.Takers != 's' )
str(sat_result1)
## 'data.frame': 421 obs. of 6 variables:
## $ DBN : Factor w/ 478 levels "01M292","01M448",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ SCHOOL.NAME : Factor w/ 478 levels "47 THE AMERICAN SIGN LANGUAGE AND ENGLISH SECONDARY SCHOOL",..: 204 438 141 169 309 296 333 110 38 1 ...
## $ Num.of.SAT.Test.Takers : Factor w/ 175 levels "10","101","102",..: 68 166 136 135 98 11 35 42 20 36 ...
## $ SAT.Critical.Reading.Avg..Score: Factor w/ 164 levels "279","287","300",..: 34 62 56 93 69 17 145 95 159 74 ...
## $ SAT.Math.Avg..Score : Factor w/ 173 levels "312","315","317",..: 70 87 68 67 94 153 158 82 164 66 ...
## $ SAT.Writing.Avg..Score : Factor w/ 163 levels "286","291","297",..: 45 48 52 41 66 11 147 92 158 69 ...
# split the DBN column to School district and code
school_District <- substring(sat_result1$DBN,1,3 )
school_code <- substring(sat_result1$DBN,4,6 )
#change the name of the columns
names(sat_result1)[3:6] <- c('No_of_SAT_takers','Reading_Score','Math_Score','Writing_Score')
sat_result2 <- sat_result1 %>%
mutate(school_District, school_code) %>%
select(school_District, school_code,SCHOOL.NAME,No_of_SAT_takers,Reading_Score,Math_Score,Writing_Score )
str(sat_result1)
## 'data.frame': 421 obs. of 6 variables:
## $ DBN : Factor w/ 478 levels "01M292","01M448",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ SCHOOL.NAME : Factor w/ 478 levels "47 THE AMERICAN SIGN LANGUAGE AND ENGLISH SECONDARY SCHOOL",..: 204 438 141 169 309 296 333 110 38 1 ...
## $ No_of_SAT_takers: Factor w/ 175 levels "10","101","102",..: 68 166 136 135 98 11 35 42 20 36 ...
## $ Reading_Score : Factor w/ 164 levels "279","287","300",..: 34 62 56 93 69 17 145 95 159 74 ...
## $ Math_Score : Factor w/ 173 levels "312","315","317",..: 70 87 68 67 94 153 158 82 164 66 ...
## $ Writing_Score : Factor w/ 163 levels "286","291","297",..: 45 48 52 41 66 11 147 92 158 69 ...
This data set has 421 observations and 6 columns.
sat_result2$school_District = as.factor(sat_result2$school_District)
sat_result2$Reading_Score = as.numeric(sat_result2$Reading_Score)
sat_result2$Math_Score = as.numeric(sat_result2$Math_Score )
sat_result2$Writing_Score = as.numeric(sat_result2$Writing_Score)
school_Rank<-round(rank( -(sat_result2$Reading_Score+sat_result2$Math_Score+sat_result2$Writing_Score )))
sat_result3 <- sat_result2 %>%
mutate(school_Rank) %>%
select(school_District, school_code,SCHOOL.NAME,No_of_SAT_takers,Reading_Score,Math_Score,Writing_Score,school_Rank ) %>%
arrange(school_Rank) %>%
top_n(-10,school_Rank)
kable(sat_result3)
school_District | school_code | SCHOOL.NAME | No_of_SAT_takers | Reading_Score | Math_Score | Writing_Score | school_Rank |
---|---|---|---|---|---|---|---|
02M | 475 | STUYVESANT HIGH SCHOOL | 832 | 163 | 172 | 162 | 1 |
10X | 445 | BRONX HIGH SCHOOL OF SCIENCE | 731 | 160 | 171 | 161 | 2 |
31R | 605 | STATEN ISLAND TECHNICAL HIGH SCHOOL | 227 | 161 | 170 | 159 | 3 |
10X | 696 | HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE | 92 | 162 | 165 | 159 | 4 |
25Q | 525 | TOWNSEND HARRIS HIGH SCHOOL | 278 | 158 | 166 | 160 | 5 |
28Q | 687 | QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK COLLEGE | 121 | 157 | 169 | 157 | 6 |
01M | 696 | BARD HIGH SCHOOL EARLY COLLEGE | 130 | 159 | 164 | 158 | 7 |
05M | 692 | HIGH SCHOOL FOR MATHEMATICS, SCIENCE AND ENGINEERING AT CITY COLLEGE | 101 | 156 | 167 | 155 | 8 |
13K | 430 | BROOKLYN TECHNICAL HIGH SCHOOL | 1277 | 155 | 168 | 154 | 9 |
02M | 416 | ELEANOR ROOSEVELT HIGH SCHOOL | 127 | 152 | 163 | 156 | 10 |
Total_score = (sat_result2$Reading_Score + sat_result2$Math_Score + sat_result2$Writing_Score)
sat_result4 <- sat_result2 %>%
select( school_District, Reading_Score,Math_Score ,Writing_Score ) %>%
mutate(Total_score) %>%
group_by ( school_District) %>%
summarise(Avg_Reading_Score = round(mean(Reading_Score)), Avg_math_Score = round(mean(Math_Score)), Avg_writing_Score=round(mean(Writing_Score)), Totals = round(mean(Total_score))) %>%
top_n(5,Totals)
kable(sat_result4)
school_District | Avg_Reading_Score | Avg_math_Score | Avg_writing_Score | Totals |
---|---|---|---|---|
22K | 114 | 119 | 114 | 347 |
26Q | 112 | 121 | 113 | 346 |
30Q | 98 | 117 | 98 | 313 |
31R | 112 | 106 | 106 | 324 |
75Q | 104 | 103 | 114 | 321 |