CDC’s Division of Population Health provides cross-cutting set of 124 indicators that were developed by consensus and that allows states and territories and large metropolitan areas to uniformly define, collect, and report chronic disease data that are important to public health practice and available for states, territories and large metropolitan areas. In addition to providing access to state-specific indicator data, the CDI web site serves as a gateway to additional information and data resources.
Source: https://data.cdc.gov/Chronic-Disease-Indicators/U-S-Chronic-Disease-Indicators-CDI-/g4ie-h725
library(tidyverse)
library(knitr)
cd.data <- read.csv('C:\\DATA607\\project2\\U.S._Chronic_Disease_Indicators__CDI_.csv')
dim(cd.data)## [1] 814937 34
kable(head(cd.data))| YearStart | YearEnd | LocationAbbr | LocationDesc | DataSource | Topic | Question | Response | DataValueUnit | DataValueType | DataValue | DataValueAlt | DataValueFootnoteSymbol | DatavalueFootnote | LowConfidenceLimit | HighConfidenceLimit | StratificationCategory1 | Stratification1 | StratificationCategory2 | Stratification2 | StratificationCategory3 | Stratification3 | GeoLocation | ResponseID | LocationID | TopicID | QuestionID | DataValueTypeID | StratificationCategoryID1 | StratificationID1 | StratificationCategoryID2 | StratificationID2 | StratificationCategoryID3 | StratificationID3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2017 | 2017 | RI | Rhode Island | YRBSS | Tobacco | Current smokeless tobacco use among youth | NA | % | Crude Prevalence | 8.0 | 8.0 | 5.2 | 12.3 | Gender | Male | NA | NA | NA | NA | (41.70828019300046, -71.52247031399963) | NA | 44 | TOB | TOB2_1 | CRDPREV | GENDER | GENM | NA | NA | NA | NA | ||
| 2013 | 2013 | MS | Mississippi | BRFSS | Arthritis | Fair or poor health among adults aged >= 18 years with arthritis | NA | % | Crude Prevalence | 46.9 | 46.9 | 42.3 | 51.6 | Gender | Male | NA | NA | NA | NA | (32.745510099000455, -89.53803082499968) | NA | 28 | ART | ART4_0 | CRDPREV | GENDER | GENM | NA | NA | NA | NA | ||
| 2016 | 2016 | KY | Kentucky | BRFSS | Cancer | Papanicolaou smear use among adult women aged 21-65 years | NA | % | Age-adjusted Prevalence | 76.9 | 76.9 | 59.4 | 88.3 | Race/Ethnicity | Multiracial, non-Hispanic | NA | NA | NA | NA | (37.645970271000465, -84.77497104799966) | NA | 21 | CAN | CAN2_1 | AGEADJPREV | RACE | MRC | NA | NA | NA | NA | ||
| 2015 | 2015 | KS | Kansas | BRFSS | Tobacco | Quit attempts in the past year among current smokers | NA | % | Age-adjusted Prevalence | 53.2 | 53.2 | 50.3 | 56.0 | Gender | Male | NA | NA | NA | NA | (38.34774030000045, -98.20078122699965) | NA | 20 | TOB | TOB3_0 | AGEADJPREV | GENDER | GENM | NA | NA | NA | NA | ||
| 2015 | 2015 | ND | North Dakota | ACS 1-Year Estimates | Overarching Conditions | Poverty | NA | % | Crude Prevalence | 8.9 | 8.9 | 8.2 | 9.6 | Race/Ethnicity | White, non-Hispanic | NA | NA | NA | NA | (47.47531977900047, -100.11842104899966) | NA | 38 | OVC | OVC3_1 | CRDPREV | RACE | WHT | NA | NA | NA | NA | ||
| 2016 | 2016 | ND | North Dakota | ACS 1-Year Estimates | Disability | Disability among adults aged >= 65 years | NA | % | Crude Prevalence | 35.0 | 35.0 | 32.5 | 37.5 | Gender | Male | NA | NA | NA | NA | (47.47531977900047, -100.11842104899966) | NA | 38 | DIS | DIS1_0 | CRDPREV | GENDER | GENM | NA | NA | NA | NA |
_ Data cleansing for national data _
cd.us.data <- cd.data %>%
select(YearStart,LocationAbbr,DataSource,Topic,Question,QuestionID,DataValueType,DataValue,DataValueAlt,Stratification1) %>%
filter(LocationAbbr=='US' , DataValueAlt!='NA')
kable(head(cd.us.data))| YearStart | LocationAbbr | DataSource | Topic | Question | QuestionID | DataValueType | DataValue | DataValueAlt | Stratification1 |
|---|---|---|---|---|---|---|---|---|---|
| 2015 | US | NVSS | Alcohol | Chronic liver disease mortality | ALC6_0 | Number | 40326 | 40326.0 | Overall |
| 2011 | US | NVSS | Asthma | Asthma mortality rate | AST4_1 | Crude Rate | 10.7 | 10.7 | Overall |
| 2010 | US | Death Certificate | Cancer | Cancer of the female breast, mortality | CAN5_2 | Average Annual Number | 41029 | 41029.0 | Overall |
| 2018 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | ALC2_2 | Crude Prevalence | 11.6 | 11.6 | Female |
| 2018 | US | BRFSS | Cancer | Mammography use among women aged 50-74 years | CAN1_0 | Age-adjusted Prevalence | 77.9 | 77.9 | Overall |
| 2014 | US | BRFSS | Arthritis | Arthritis among adults aged >= 18 years who have diabetes | ART1_3 | Crude Prevalence | 51.9 | 51.9 | Overall |
unique(cd.us.data$DataValueType)## [1] Number
## [2] Crude Rate
## [3] Average Annual Number
## [4] Crude Prevalence
## [5] Age-adjusted Prevalence
## [6] Mean
## [7] Age-adjusted Rate
## [8] Age-adjusted Mean
## [9] Percent
## [10] Average Annual Crude Rate
## [11] Average Annual Age-adjusted Rate
## [12] Median
## [13] Per capita alcohol consumption
## [14] Adjusted by age, sex, race and ethnicity
## [15] Prevalence
## 19 Levels: Adjusted by age, sex, race and ethnicity ... Yes/No
unique(cd.us.data$YearStart)## [1] 2015 2011 2010 2018 2014 2017 2016 2013 2012 2009 2008 2001
cd.us.data.aprv <- cd.us.data %>%
filter(DataValueType=='Age-adjusted Prevalence')
nrow(cd.us.data.aprv)## [1] 1361
ggplot(data = cd.us.data.aprv, aes(x=Topic,y=DataValueAlt))+
geom_bar(stat = 'identity',aes(fill=Topic))+
facet_wrap(~Stratification1, ncol = 2)+
theme_bw()Diabetes appears to be most prevelant among Male, Femmale both and overall population also.
ggplot(data=filter(cd.us.data.aprv,Topic=='Alcohol'),aes(x=YearStart, y=DataValueAlt, group=Stratification1
,color=Stratification1)) +
geom_line()+
geom_point()+
facet_wrap(~Question,ncol = 1)+
theme_bw()Binge drinking for age >=18 are defintely higher in male population
cd.cp.2017 <- filter(cd.us.data,DataValueType=='Crude Prevalence',YearStart=='2017')
kable(head(cd.cp.2017))| YearStart | LocationAbbr | DataSource | Topic | Question | QuestionID | DataValueType | DataValue | DataValueAlt | Stratification1 |
|---|---|---|---|---|---|---|---|---|---|
| 2017 | US | ACS 1-Year Estimates | Overarching Conditions | Poverty | OVC3_1 | Crude Prevalence | 12.2 | 12.2 | Male |
| 2017 | US | ACS 1-Year Estimates | Disability | Disability among adults aged >= 65 years | DIS1_0 | Crude Prevalence | 47.4 | 47.4 | American Indian or Alaska Native |
| 2017 | US | YRBSS | Tobacco | Current smokeless tobacco use among youth | TOB2_1 | Crude Prevalence | 3.5 | 3.5 | Black, non-Hispanic |
| 2017 | US | BRFSS | Arthritis | Arthritis among adults aged >= 18 years who have diabetes | ART1_3 | Crude Prevalence | 43.7 | 43.7 | Male |
| 2017 | US | BRFSS | Arthritis | Arthritis among adults aged >= 18 years | ART1_1 | Crude Prevalence | 21.0 | 21.0 | Male |
| 2017 | US | BRFSS | Cardiovascular Disease | Influenza vaccination among noninstitutionalized adults aged >= 65 years with a history of coronary heart disease or stroke | CVD9_2 | Crude Prevalence | 66.0 | 66.0 | Male |
cd.cp.2018 <- filter(cd.us.data,DataValueType=='Crude Prevalence',YearStart=='2018')
kable(head(cd.cp.2018))| YearStart | LocationAbbr | DataSource | Topic | Question | QuestionID | DataValueType | DataValue | DataValueAlt | Stratification1 |
|---|---|---|---|---|---|---|---|---|---|
| 2018 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | ALC2_2 | Crude Prevalence | 11.6 | 11.6 | Female |
| 2018 | US | BRFSS | Chronic Obstructive Pulmonary Disease | Influenza vaccination among noninstitutionalized adults aged >= 45 years with chronic obstructive pulmonary disease | COPD7_0 | Crude Prevalence | 47.9 | 47.9 | Male |
| 2018 | US | BRFSS | Tobacco | Quit attempts in the past year among current smokers | TOB3_0 | Crude Prevalence | 58.2 | 58.2 | Female |
| 2018 | US | BRFSS | Asthma | Pneumococcal vaccination among noninstitutionalized adults aged >= 65 years with asthma | AST6_2 | Crude Prevalence | 87.5 | 87.5 | Female |
| 2018 | US | BRFSS | Asthma | Pneumococcal vaccination among noninstitutionalized adults aged >= 65 years with asthma | AST6_2 | Crude Prevalence | 82.4 | 82.4 | Male |
| 2018 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | ALC2_2 | Crude Prevalence | 21.2 | 21.2 | Male |
cd.cp.data.17.18.tidy <- merge(cd.cp.2017,cd.cp.2018,by.x =c('QuestionID','Stratification1'),
by.y = c('QuestionID','Stratification1'))
kable(head(cd.cp.data.17.18.tidy))| QuestionID | Stratification1 | YearStart.x | LocationAbbr.x | DataSource.x | Topic.x | Question.x | DataValueType.x | DataValue.x | DataValueAlt.x | YearStart.y | LocationAbbr.y | DataSource.y | Topic.y | Question.y | DataValueType.y | DataValue.y | DataValueAlt.y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ALC2_2 | Female | 2017 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | Crude Prevalence | 12.4 | 12.4 | 2018 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | Crude Prevalence | 11.6 | 11.6 |
| ALC2_2 | Male | 2017 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | Crude Prevalence | 22.3 | 22.3 | 2018 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | Crude Prevalence | 21.2 | 21.2 |
| ALC2_2 | Overall | 2017 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | Crude Prevalence | 17.4 | 17.4 | 2018 | US | BRFSS | Alcohol | Binge drinking prevalence among adults aged >= 18 years | Crude Prevalence | 16.2 | 16.2 |
| ALC2_3 | Overall | 2017 | US | BRFSS | Alcohol | Binge drinking prevalence among women aged 18-44 years | Crude Prevalence | 18.7 | 18.7 | 2018 | US | BRFSS | Alcohol | Binge drinking prevalence among women aged 18-44 years | Crude Prevalence | 18.0 | 18.0 |
| ALC5_1 | Female | 2017 | US | BRFSS | Alcohol | Heavy drinking among adults aged >= 18 years | Crude Prevalence | 5.8 | 5.8 | 2018 | US | BRFSS | Alcohol | Heavy drinking among adults aged >= 18 years | Crude Prevalence | 5.8 | 5.8 |
| ALC5_1 | Male | 2017 | US | BRFSS | Alcohol | Heavy drinking among adults aged >= 18 years | Crude Prevalence | 6.9 | 6.9 | 2018 | US | BRFSS | Alcohol | Heavy drinking among adults aged >= 18 years | Crude Prevalence | 7.2 | 7.2 |
colnames(cd.cp.data.17.18.tidy)## [1] "QuestionID" "Stratification1" "YearStart.x"
## [4] "LocationAbbr.x" "DataSource.x" "Topic.x"
## [7] "Question.x" "DataValueType.x" "DataValue.x"
## [10] "DataValueAlt.x" "YearStart.y" "LocationAbbr.y"
## [13] "DataSource.y" "Topic.y" "Question.y"
## [16] "DataValueType.y" "DataValue.y" "DataValueAlt.y"
cp.percent.change.2017.2018 <- cd.cp.data.17.18.tidy %>%
select(QuestionID,Question.x,Stratification1,YearStart.x,YearStart.y,DataSource.x,Topic.x,DataValue.x,DataValue.y)
kable(head(cp.percent.change.2017.2018))| QuestionID | Question.x | Stratification1 | YearStart.x | YearStart.y | DataSource.x | Topic.x | DataValue.x | DataValue.y |
|---|---|---|---|---|---|---|---|---|
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Female | 2017 | 2018 | BRFSS | Alcohol | 12.4 | 11.6 |
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Male | 2017 | 2018 | BRFSS | Alcohol | 22.3 | 21.2 |
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Overall | 2017 | 2018 | BRFSS | Alcohol | 17.4 | 16.2 |
| ALC2_3 | Binge drinking prevalence among women aged 18-44 years | Overall | 2017 | 2018 | BRFSS | Alcohol | 18.7 | 18.0 |
| ALC5_1 | Heavy drinking among adults aged >= 18 years | Female | 2017 | 2018 | BRFSS | Alcohol | 5.8 | 5.8 |
| ALC5_1 | Heavy drinking among adults aged >= 18 years | Male | 2017 | 2018 | BRFSS | Alcohol | 6.9 | 7.2 |
cp.pc.17.18.clean <- rename(cp.percent.change.2017.2018,Question=Question.x,Year2017=YearStart.x,
Year2018=YearStart.y,DataSource=DataSource.x,Topic=Topic.x,
DataValue2018=DataValue.x,DataValue2017=DataValue.y)
kable(head(cp.pc.17.18.clean))| QuestionID | Question | Stratification1 | Year2017 | Year2018 | DataSource | Topic | DataValue2018 | DataValue2017 |
|---|---|---|---|---|---|---|---|---|
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Female | 2017 | 2018 | BRFSS | Alcohol | 12.4 | 11.6 |
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Male | 2017 | 2018 | BRFSS | Alcohol | 22.3 | 21.2 |
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Overall | 2017 | 2018 | BRFSS | Alcohol | 17.4 | 16.2 |
| ALC2_3 | Binge drinking prevalence among women aged 18-44 years | Overall | 2017 | 2018 | BRFSS | Alcohol | 18.7 | 18.0 |
| ALC5_1 | Heavy drinking among adults aged >= 18 years | Female | 2017 | 2018 | BRFSS | Alcohol | 5.8 | 5.8 |
| ALC5_1 | Heavy drinking among adults aged >= 18 years | Male | 2017 | 2018 | BRFSS | Alcohol | 6.9 | 7.2 |
cp.percent.change <- cp.pc.17.18.clean %>%
mutate(PercentChange = (as.numeric(as.character(DataValue2018))-
as.numeric(as.character(DataValue2017)))/as.numeric(as.character(DataValue2017))*100)
kable(head(cp.percent.change))| QuestionID | Question | Stratification1 | Year2017 | Year2018 | DataSource | Topic | DataValue2018 | DataValue2017 | PercentChange |
|---|---|---|---|---|---|---|---|---|---|
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Female | 2017 | 2018 | BRFSS | Alcohol | 12.4 | 11.6 | 6.896552 |
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Male | 2017 | 2018 | BRFSS | Alcohol | 22.3 | 21.2 | 5.188679 |
| ALC2_2 | Binge drinking prevalence among adults aged >= 18 years | Overall | 2017 | 2018 | BRFSS | Alcohol | 17.4 | 16.2 | 7.407407 |
| ALC2_3 | Binge drinking prevalence among women aged 18-44 years | Overall | 2017 | 2018 | BRFSS | Alcohol | 18.7 | 18.0 | 3.888889 |
| ALC5_1 | Heavy drinking among adults aged >= 18 years | Female | 2017 | 2018 | BRFSS | Alcohol | 5.8 | 5.8 | 0.000000 |
| ALC5_1 | Heavy drinking among adults aged >= 18 years | Male | 2017 | 2018 | BRFSS | Alcohol | 6.9 | 7.2 | -4.166667 |
library(ggplot2)
ggplot(data = cp.percent.change, aes(x=Topic,y=PercentChange))+
geom_bar(stat = 'identity',aes(fill=Topic))+
facet_wrap(~Stratification1, ncol = 1)The Female stratification has visbile increase percentage change as compared to Male for Alcohol
cp.percent.change.arrange <- cp.percent.change %>%
arrange(desc(PercentChange))
ggplot(data = head(cp.percent.change.arrange), aes(x=Topic,y=PercentChange))+
geom_bar(stat = 'identity',aes(fill=Topic))+
facet_wrap(~Stratification1, ncol = 1) Top percentage increase in CDI topics are Asthma, Diabetes and Tobacco
The world made remarkable progress in child survival in the past few decades, and millions of children have better survival chances than in 1990–5 1 in 26 children died before reaching age five in 2018, compared to 1 in 11 in 1990. Moreover, progress in reducing child mortality has been accelerated in the 2000–2018 period compared with the 1990s, with the annual rate of reduction in the global under-five mortality rate increasing from 2.0 per cent in 1990–2000 to 3.8 per cent in 2000–2018. Despite the global progress in reducing child mortality over the past few decades, an estimated 5.3 million children under age five died in 2018–roughly half of those deaths occurred in sub-Saharan Africa.
Source: https://data.unicef.org/topic/child-survival/under-five-mortality/
unicef.data <- read.csv('https://raw.githubusercontent.com/keshaws/CUNY_MSDS_2020/master/DATA607/unicef-u5mr_2.csv')
head(unicef.data)## CountryName U5MR.1950 U5MR.1951 U5MR.1952 U5MR.1953 U5MR.1954
## 1 Afghanistan NA NA NA NA NA
## 2 Albania NA NA NA NA NA
## 3 Algeria NA NA NA NA 251
## 4 Andorra NA NA NA NA NA
## 5 Angola NA NA NA NA NA
## 6 Antigua & Barbuda NA NA NA NA NA
## U5MR.1955 U5MR.1956 U5MR.1957 U5MR.1958 U5MR.1959 U5MR.1960 U5MR.1961
## 1 NA NA NA NA NA NA 356.5
## 2 NA NA NA NA NA NA NA
## 3 249.9 249 248 247.5 246.7 246.3 246.1
## 4 NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA
## U5MR.1962 U5MR.1963 U5MR.1964 U5MR.1965 U5MR.1966 U5MR.1967 U5MR.1968
## 1 350.6 345.0 339.7 334.1 328.7 323.3 318.1
## 2 NA NA NA NA NA NA NA
## 3 246.2 246.8 247.4 248.2 248.7 248.4 247.4
## 4 NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA
## U5MR.1969 U5MR.1970 U5MR.1971 U5MR.1972 U5MR.1973 U5MR.1974 U5MR.1975
## 1 313.0 307.8 302.1 296.4 290.8 284.9 279.4
## 2 NA NA NA NA NA NA NA
## 3 245.3 241.7 236.5 230.0 222.5 214.2 205.0
## 4 NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA
## U5MR.1976 U5MR.1977 U5MR.1978 U5MR.1979 U5MR.1980 U5MR.1981 U5MR.1982
## 1 273.6 267.8 261.6 255.5 249.1 242.7 236.2
## 2 NA NA 91.1 84.7 78.6 73.0 67.8
## 3 195.2 184.9 173.8 161.8 148.1 132.5 115.8
## 4 NA NA NA NA NA NA NA
## 5 NA NA NA NA 234.1 232.8 231.5
## 6 NA NA NA NA NA NA NA
## U5MR.1983 U5MR.1984 U5MR.1985 U5MR.1986 U5MR.1987 U5MR.1988 U5MR.1989
## 1 229.7 222.9 216.0 209.2 202.1 195.0 187.8
## 2 62.8 58.3 54.3 50.7 47.6 44.9 42.5
## 3 99.2 83.8 71.2 61.9 55.4 51.2 48.5
## 4 NA NA NA NA NA NA NA
## 5 230.2 229.1 228.3 227.5 226.9 226.5 226.2
## 6 NA NA NA NA NA NA NA
## U5MR.1990 U5MR.1991 U5MR.1992 U5MR.1993 U5MR.1994 U5MR.1995 U5MR.1996
## 1 181.0 174.2 167.8 162.0 156.8 152.3 148.6
## 2 40.6 38.8 37.3 36.0 34.6 33.2 31.8
## 3 46.8 45.7 44.9 44.1 43.3 42.5 41.8
## 4 8.5 7.9 7.4 6.9 6.4 6.0 5.7
## 5 226.0 225.9 226.0 225.8 225.5 224.8 224.0
## 6 25.5 24.2 23.1 21.9 20.8 19.7 18.8
## U5MR.1997 U5MR.1998 U5MR.1999 U5MR.2000 U5MR.2001 U5MR.2002 U5MR.2003
## 1 145.5 142.6 139.9 137.0 133.8 130.3 126.8
## 2 30.3 28.9 27.5 26.2 24.9 23.6 22.5
## 3 41.1 40.6 40.2 39.7 38.9 37.8 36.5
## 4 5.3 5.0 4.8 4.6 4.4 4.2 4.1
## 5 222.6 220.8 218.9 216.7 214.1 211.7 209.2
## 6 17.9 17.0 16.2 15.5 14.8 14.1 13.5
## U5MR.2004 U5MR.2005 U5MR.2006 U5MR.2007 U5MR.2008 U5MR.2009 U5MR.2010
## 1 123.2 119.6 116.3 113.2 110.4 107.6 105.0
## 2 21.5 20.5 19.5 18.7 17.9 17.3 16.6
## 3 35.1 33.6 32.1 30.7 29.4 28.3 27.3
## 4 4.0 3.9 3.7 3.6 3.5 3.4 3.3
## 5 206.7 203.9 200.5 196.4 192.0 187.3 182.5
## 6 12.9 12.4 11.8 11.3 10.9 10.4 9.9
## U5MR.2011 U5MR.2012 U5MR.2013 U5MR.2014 U5MR.2015
## 1 102.3 99.5 96.7 93.9 91.1
## 2 16.0 15.5 14.9 14.4 14.0
## 3 26.6 26.1 25.8 25.6 25.5
## 4 3.2 3.1 3.0 2.9 2.8
## 5 177.3 172.2 167.1 162.2 156.9
## 6 9.5 9.1 8.7 8.4 8.1
unique(unicef.data$CountryName)## [1] Afghanistan Albania
## [3] Algeria Andorra
## [5] Angola Antigua & Barbuda
## [7] Azerbaijan Argentina
## [9] Australia Austria
## [11] Bahamas Bahrain
## [13] Bangladesh Armenia
## [15] Barbados Belgium
## [17] Bhutan Bolivia
## [19] Bosnia & Herzegovina Botswana
## [21] Brazil Belize
## [23] Solomon Islands Brunei
## [25] Bulgaria Myanmar
## [27] Burundi Belarus
## [29] Cambodia Cameroon
## [31] Canada Cape Verde
## [33] Central African Republic Sri Lanka
## [35] Chad Chile
## [37] China Colombia
## [39] Comoros Congo
## [41] Congo DR Cook Islands
## [43] Costa Rica Croatia
## [45] Cuba Cyprus
## [47] Czech Republic Benin
## [49] Denmark Dominica
## [51] Dominican Republic Ecuador
## [53] El Salvador Equatorial Guinea
## [55] Ethiopia Eritrea
## [57] Estonia Fiji
## [59] Finland France
## [61] Djibouti Gabon
## [63] Georgia Gambia The
## [65] State of Palestine Germany
## [67] Ghana Kiribati
## [69] Greece Grenada
## [71] Guatemala Guinea
## [73] Guyana Haiti
## [75] Honduras Hungary
## [77] Iceland India
## [79] Indonesia Iran
## [81] Iraq Ireland
## [83] Israel Italy
## [85] Cote d Ivoire Jamaica
## [87] Japan Kazakhstan
## [89] Jordan Kenya
## [91] Korea DPR Korea Rep
## [93] Kuwait Kyrgyzstan
## [95] Lao PDR Lebanon
## [97] Lesotho Latvia
## [99] Liberia Libya
## [101] Liechtenstein Lithuania
## [103] Luxembourg Madagascar
## [105] Malawi Malaysia
## [107] Maldives Mali
## [109] Malta Mauritania
## [111] Mauritius Mexico
## [113] Monaco Mongolia
## [115] Moldova Montenegro
## [117] Morocco Mozambique
## [119] Oman Namibia
## [121] Nauru Nepal
## [123] Netherlands Vanuatu
## [125] New Zealand Nicaragua
## [127] Niger Nigeria
## [129] Niue Norway
## [131] Federated States of Micronesia Marshall Islands
## [133] Palau Pakistan
## [135] Panama Papua New Guinea
## [137] Paraguay Peru
## [139] Philippines Poland
## [141] Portugal Guinea-Bissau
## [143] Timor Leste Qatar
## [145] Romania Russian Federation
## [147] Rwanda Saint Kitts & Nevis
## [149] Saint Lucia St Vincent & the Grenadines
## [151] San Marino Sao Tome & Principe
## [153] Saudi Arabia Senegal
## [155] Serbia Seychelles
## [157] Sierra Leone Singapore
## [159] Slovakia Vietnam
## [161] Slovenia Somalia
## [163] South Africa Zimbabwe
## [165] Spain South Sudan
## [167] Sudan Suriname
## [169] Swaziland Sweden
## [171] Switzerland Syria
## [173] Tajikistan Thailand
## [175] Togo Tonga
## [177] Trinidad & Tobago United Arab Emirates
## [179] Tunisia Turkey
## [181] Turkmenistan Tuvalu
## [183] Uganda Ukraine
## [185] Macedonia Egypt
## [187] United Kingdom Tanzania
## [189] United States of America Burkina Faso
## [191] Uruguay Uzbekistan
## [193] Venezuela Samoa
## [195] Yemen Zambia
## 196 Levels: Afghanistan Albania Algeria Andorra ... Zimbabwe
There are 196 countries listed here in this dataset
library(tidyverse)
library(dplyr)
#Pivot all years into one column
population.data.US.tidy <- unicef.data %>%
filter(CountryName == "United States of America") %>%
gather(Year, Mortalityrate, U5MR.1950:U5MR.2015) %>%
separate(Year, c("left","Year"), sep="U5MR.") %>%
select(-left) %>%
arrange(Year)
head(population.data.US.tidy)## CountryName Year Mortalityrate
## 1 United States of America 1950 37.7
## 2 United States of America 1951 36.6
## 3 United States of America 1952 35.6
## 4 United States of America 1953 34.7
## 5 United States of America 1954 33.8
## 6 United States of America 1955 33.0
Now, extract 1995 value as the baseline value and add to US population dataframe
us_base <- population.data.US.tidy[population.data.US.tidy$Year==1995,3]
population.data.US.tidy$us_baseline <- us_base
population.data.US.tidy <- population.data.US.tidy %>%
mutate(us_change = (as.numeric(as.character(Mortalityrate))-as.numeric(as.character(us_baseline)))/as.numeric(as.character(us_baseline)))
head(population.data.US.tidy)## CountryName Year Mortalityrate us_baseline us_change
## 1 United States of America 1950 37.7 9.5 2.968421
## 2 United States of America 1951 36.6 9.5 2.852632
## 3 United States of America 1952 35.6 9.5 2.747368
## 4 United States of America 1953 34.7 9.5 2.652632
## 5 United States of America 1954 33.8 9.5 2.557895
## 6 United States of America 1955 33.0 9.5 2.473684
creating a world dataframe, clean and turn into long data format
world.data <- unicef.data %>%
filter(CountryName != "United States of America") %>%
gather(Year, Mortalityrate, U5MR.1950:U5MR.2015) %>%
separate(Year, c("left","Year"), sep="U5MR.") %>%
select(-left) %>%
arrange(Year) %>%
filter(Mortalityrate!="NA")
head(world.data, 20) ## CountryName Year Mortalityrate
## 1 Australia 1950 31.6
## 2 Canada 1950 48.7
## 3 Benin 1950 348.2
## 4 Denmark 1950 34.1
## 5 Dominican Republic 1950 156.0
## 6 Fiji 1950 135.7
## 7 France 1950 57.1
## 8 Ghana 1950 256.7
## 9 Guyana 1950 157.7
## 10 Iraq 1950 364.3
## 11 Ireland 1950 54.2
## 12 Italy 1950 88.9
## 13 Japan 1950 91.3
## 14 Jordan 1950 275.1
## 15 Mauritania 1950 316.4
## 16 Netherlands 1950 31.9
## 17 New Zealand 1950 35.5
## 18 Norway 1950 32.8
## 19 Senegal 1950 346.2
## 20 Seychelles 1950 118.5
createing baseline point for each country
world_base <- world.data %>%
filter(Year == 1995) %>%
select(CountryName, world_Baseline = Mortalityrate)
world.data <- world.data %>%
left_join(world_base) %>%
arrange(CountryName) %>%
mutate(Mortality_change = (as.numeric(as.character(Mortalityrate))-as.numeric(as.character(world_Baseline)))/
as.numeric(as.character(world_Baseline)))## Joining, by = "CountryName"
head((world.data))## CountryName Year Mortalityrate world_Baseline Mortality_change
## 1 Afghanistan 1961 356.5 152.3 1.340775
## 2 Afghanistan 1962 350.6 152.3 1.302035
## 3 Afghanistan 1963 345.0 152.3 1.265266
## 4 Afghanistan 1964 339.7 152.3 1.230466
## 5 Afghanistan 1965 334.1 152.3 1.193697
## 6 Afghanistan 1966 328.7 152.3 1.158240
world.data$Year <- as.numeric(world.data$Year)
population.data.US.tidy$Year <- as.numeric(population.data.US.tidy$Year)Now ranking the country - Top 5 and Bottom 5
rank <- world.data %>%
filter(Year == 2010) %>%
arrange(desc(Mortality_change)) %>%
mutate(rank = seq(1,length(CountryName), by=1)) %>%
filter(rank < 6 | rank > 45 )
kable(head(rank,10))| CountryName | Year | Mortalityrate | world_Baseline | Mortality_change | rank |
|---|---|---|---|---|---|
| Haiti | 2010 | 208.8 | 124.9 | 0.6717374 | 1 |
| Niue | 2010 | 26.2 | 17.9 | 0.4636872 | 2 |
| Dominica | 2010 | 18.8 | 16.5 | 0.1393939 | 3 |
| Lesotho | 2010 | 100.7 | 98.9 | 0.0182002 | 4 |
| Seychelles | 2010 | 14.2 | 14.5 | -0.0206897 | 5 |
| Switzerland | 2010 | 4.5 | 6.4 | -0.2968750 | 46 |
| Benin | 2010 | 111.6 | 159.1 | -0.2985544 | 47 |
| Philippines | 2010 | 31.9 | 45.6 | -0.3004386 | 48 |
| Jamaica | 2010 | 18.1 | 26.0 | -0.3038462 | 49 |
| Djibouti | 2010 | 76.1 | 110.2 | -0.3094374 | 50 |
We can see Haiti is topping the list where as Djibouti is at the bottom in the top 50 list.
p <- ggplot(world.data, aes(Year, Mortality_change, group=CountryName)) +
theme_bw() +
theme(plot.background = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(linetype = 3, colour = "grey50"),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank()) +
geom_line(colour="grey90", alpha=.9)
print(p)p <- p +
geom_line(data=population.data.US.tidy, aes(Year, us_change, group=1), linetype=5)
print(p)p <- p +
geom_line(data=filter(world.data, CountryName=="Switzerland"),
aes(Year, Mortality_change, group=CountryName), colour="dodgerblue", size = 1)
print(p)We looked into the Unicef under five mortality data and ranked them according to percent mortality change. We took 1995 as a baseline year and also drew plot for Switzerland mortality percent change.
The New York City Police Department records reported crime and offense data in accordance with the New York State Penal Law and other New York State laws. For statistical presentation purposes the numerous law categories and subsections are summarized by law class: felony, misdemeanor and violation. These legal categories are then subdivided into broad crime and offense categories, e.g., Felonious Assault, Grand Larceny, Misdemeanor Criminal Mischief, etc. The tabular data presented here compile reported crime and offense data recorded by the New York City Police Department from 2000 through 2015. Separate tables are presented for the seven major felonies, other felony crimes, misdemeanors, and violations.
Source: https://www1.nyc.gov/site/nypd/stats/crime-statistics/park-crime-stats.page
nyc.crime.data<- read.csv('https://raw.githubusercontent.com/keshaws/CUNY_MSDS_2020/master/DATA607/nycparkcrime.csv',skip = 3)
head(nyc.crime.data)## PARK BOROUGH SIZE..ACRES.
## 1 PELHAM BAY PARK BRONX 2771.75
## 2 VAN CORTLANDT PARK BRONX 1146.43
## 3 ROCKAWAY BEACH AND BOARDWALK QUEENS 1072.56
## 4 FRESHKILLS PARK STATEN ISLAND 913.32
## 5 FLUSHING MEADOWS CORONA PARK QUEENS 897.69
## 6 LATOURETTE PARK & GOLF COURSE STATEN ISLAND 843.97
## CATEGORY MURDER RAPE ROBBERY FELONY.ASSAULT BURGLARY
## 1 ONE ACRE OR LARGER 0 0 0 0 0
## 2 ONE ACRE OR LARGER 0 0 0 0 1
## 3 ONE ACRE OR LARGER 0 0 0 0 0
## 4 ONE ACRE OR LARGER 0 0 0 0 0
## 5 ONE ACRE OR LARGER 0 0 4 1 5
## 6 ONE ACRE OR LARGER 0 0 0 0 0
## GRAND.LARCENY GRAND.LARCENY.OF.MOTOR.VEHICLE TOTAL X
## 1 0 0 0 NA
## 2 0 0 1 NA
## 3 0 0 0 NA
## 4 0 0 0 NA
## 5 10 0 20 NA
## 6 0 0 0 NA
Checking the features of the dataset
colnames(nyc.crime.data)## [1] "PARK" "BOROUGH"
## [3] "SIZE..ACRES." "CATEGORY"
## [5] "MURDER" "RAPE"
## [7] "ROBBERY" "FELONY.ASSAULT"
## [9] "BURGLARY" "GRAND.LARCENY"
## [11] "GRAND.LARCENY.OF.MOTOR.VEHICLE" "TOTAL"
## [13] "X"
getting rid of X (the unknown attribute)
nyc.crime.data$X <- NULLnyctidy <- nyc.crime.data %>%
gather('Assault_Type', 'Instance_Rate', 5:11)
nyctidy <-select(nyctidy,-c(Instance_Rate))
kable(head(nyctidy, 10)) | PARK | BOROUGH | SIZE..ACRES. | CATEGORY | TOTAL | Assault_Type |
|---|---|---|---|---|---|
| PELHAM BAY PARK | BRONX | 2771.75 | ONE ACRE OR LARGER | 0 | MURDER |
| VAN CORTLANDT PARK | BRONX | 1146.43 | ONE ACRE OR LARGER | 1 | MURDER |
| ROCKAWAY BEACH AND BOARDWALK | QUEENS | 1072.56 | ONE ACRE OR LARGER | 0 | MURDER |
| FRESHKILLS PARK | STATEN ISLAND | 913.32 | ONE ACRE OR LARGER | 0 | MURDER |
| FLUSHING MEADOWS CORONA PARK | QUEENS | 897.69 | ONE ACRE OR LARGER | 20 | MURDER |
| LATOURETTE PARK & GOLF COURSE | STATEN ISLAND | 843.97 | ONE ACRE OR LARGER | 0 | MURDER |
| MARINE PARK | BROOKLYN | 798.00 | ONE ACRE OR LARGER | 2 | MURDER |
| BELT PARKWAY/SHORE PARKWAY | BROOKLYN/QUEENS | 760.43 | ONE ACRE OR LARGER | 0 | MURDER |
| BRONX PARK | BRONX | 718.37 | ONE ACRE OR LARGER | 6 | MURDER |
| FRANKLIN D. ROOSEVELT BOARDWALK AND BEACH | STATEN ISLAND | 644.35 | ONE ACRE OR LARGER | 0 | MURDER |
Summarizing the dataset per borough
summarise_at(group_by(nyctidy,BOROUGH),vars(TOTAL),funs(sum(.,na.rm=TRUE)))## Warning: funs() is soft deprecated as of dplyr 0.8.0
## please use list() instead
##
## # Before:
## funs(name = f(.)
##
## # After:
## list(name = ~f(.))
## This warning is displayed once per session.
## # A tibble: 7 x 2
## BOROUGH TOTAL
## <fct> <int>
## 1 "" 1736
## 2 "BRONX " 343
## 3 "BROOKLYN " 364
## 4 BROOKLYN/QUEENS 7
## 5 "MANHATTAN " 609
## 6 "QUEENS " 399
## 7 STATEN ISLAND 14
Bronx clearly has highest crime rate followed by Manhattan.
Plotting the data for crimes by category of park
ggplot(nyctidy,aes(x=CATEGORY,y=TOTAL))+
geom_bar(aes(fill=TOTAL),stat='identity')+
ggtitle('Crimes by Category of Park')+
coord_flip()From the analysis it appears that the most of the crimes occur in one acre or large parks.
ggplot(nyctidy,aes(x=BOROUGH,y=TOTAL))+
geom_bar(aes(fill=CATEGORY),stat='identity')+
ggtitle('Types of Park - Borough-wise')+
coord_flip()Also, when plotting borough wise, Manhattan appears to be topping the list