1 CDC Population Health Data

1.1 Data Acquisition

CDC’s Division of Population Health provides cross-cutting set of 124 indicators that were developed by consensus and that allows states and territories and large metropolitan areas to uniformly define, collect, and report chronic disease data that are important to public health practice and available for states, territories and large metropolitan areas. In addition to providing access to state-specific indicator data, the CDI web site serves as a gateway to additional information and data resources.

Source: https://data.cdc.gov/Chronic-Disease-Indicators/U-S-Chronic-Disease-Indicators-CDI-/g4ie-h725

library(tidyverse)
library(knitr)
cd.data <- read.csv('C:\\DATA607\\project2\\U.S._Chronic_Disease_Indicators__CDI_.csv')
dim(cd.data)
## [1] 814937     34
kable(head(cd.data))
YearStart YearEnd LocationAbbr LocationDesc DataSource Topic Question Response DataValueUnit DataValueType DataValue DataValueAlt DataValueFootnoteSymbol DatavalueFootnote LowConfidenceLimit HighConfidenceLimit StratificationCategory1 Stratification1 StratificationCategory2 Stratification2 StratificationCategory3 Stratification3 GeoLocation ResponseID LocationID TopicID QuestionID DataValueTypeID StratificationCategoryID1 StratificationID1 StratificationCategoryID2 StratificationID2 StratificationCategoryID3 StratificationID3
2017 2017 RI Rhode Island YRBSS Tobacco Current smokeless tobacco use among youth NA % Crude Prevalence 8.0 8.0 5.2 12.3 Gender Male NA NA NA NA (41.70828019300046, -71.52247031399963) NA 44 TOB TOB2_1 CRDPREV GENDER GENM NA NA NA NA
2013 2013 MS Mississippi BRFSS Arthritis Fair or poor health among adults aged >= 18 years with arthritis NA % Crude Prevalence 46.9 46.9 42.3 51.6 Gender Male NA NA NA NA (32.745510099000455, -89.53803082499968) NA 28 ART ART4_0 CRDPREV GENDER GENM NA NA NA NA
2016 2016 KY Kentucky BRFSS Cancer Papanicolaou smear use among adult women aged 21-65 years NA % Age-adjusted Prevalence 76.9 76.9 59.4 88.3 Race/Ethnicity Multiracial, non-Hispanic NA NA NA NA (37.645970271000465, -84.77497104799966) NA 21 CAN CAN2_1 AGEADJPREV RACE MRC NA NA NA NA
2015 2015 KS Kansas BRFSS Tobacco Quit attempts in the past year among current smokers NA % Age-adjusted Prevalence 53.2 53.2 50.3 56.0 Gender Male NA NA NA NA (38.34774030000045, -98.20078122699965) NA 20 TOB TOB3_0 AGEADJPREV GENDER GENM NA NA NA NA
2015 2015 ND North Dakota ACS 1-Year Estimates Overarching Conditions Poverty NA % Crude Prevalence 8.9 8.9 8.2 9.6 Race/Ethnicity White, non-Hispanic NA NA NA NA (47.47531977900047, -100.11842104899966) NA 38 OVC OVC3_1 CRDPREV RACE WHT NA NA NA NA
2016 2016 ND North Dakota ACS 1-Year Estimates Disability Disability among adults aged >= 65 years NA % Crude Prevalence 35.0 35.0 32.5 37.5 Gender Male NA NA NA NA (47.47531977900047, -100.11842104899966) NA 38 DIS DIS1_0 CRDPREV GENDER GENM NA NA NA NA

1.2 Data Cleansing

_ Data cleansing for national data _

cd.us.data <- cd.data %>%
  select(YearStart,LocationAbbr,DataSource,Topic,Question,QuestionID,DataValueType,DataValue,DataValueAlt,Stratification1) %>%
  filter(LocationAbbr=='US' , DataValueAlt!='NA')
kable(head(cd.us.data))
YearStart LocationAbbr DataSource Topic Question QuestionID DataValueType DataValue DataValueAlt Stratification1
2015 US NVSS Alcohol Chronic liver disease mortality ALC6_0 Number 40326 40326.0 Overall
2011 US NVSS Asthma Asthma mortality rate AST4_1 Crude Rate 10.7 10.7 Overall
2010 US Death Certificate Cancer Cancer of the female breast, mortality CAN5_2 Average Annual Number 41029 41029.0 Overall
2018 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years ALC2_2 Crude Prevalence 11.6 11.6 Female
2018 US BRFSS Cancer Mammography use among women aged 50-74 years CAN1_0 Age-adjusted Prevalence 77.9 77.9 Overall
2014 US BRFSS Arthritis Arthritis among adults aged >= 18 years who have diabetes ART1_3 Crude Prevalence 51.9 51.9 Overall

1.3 EDA (Exploratory Data Analysis)

unique(cd.us.data$DataValueType)
##  [1] Number                                  
##  [2] Crude Rate                              
##  [3] Average Annual Number                   
##  [4] Crude Prevalence                        
##  [5] Age-adjusted Prevalence                 
##  [6] Mean                                    
##  [7] Age-adjusted Rate                       
##  [8] Age-adjusted Mean                       
##  [9] Percent                                 
## [10] Average Annual Crude Rate               
## [11] Average Annual Age-adjusted Rate        
## [12] Median                                  
## [13] Per capita alcohol consumption          
## [14] Adjusted by age, sex, race and ethnicity
## [15] Prevalence                              
## 19 Levels: Adjusted by age, sex, race and ethnicity ... Yes/No
unique(cd.us.data$YearStart)
##  [1] 2015 2011 2010 2018 2014 2017 2016 2013 2012 2009 2008 2001
cd.us.data.aprv <- cd.us.data %>%
    filter(DataValueType=='Age-adjusted Prevalence')
nrow(cd.us.data.aprv)
## [1] 1361
ggplot(data = cd.us.data.aprv, aes(x=Topic,y=DataValueAlt))+
  geom_bar(stat = 'identity',aes(fill=Topic))+
  facet_wrap(~Stratification1, ncol = 2)+
  theme_bw()

Diabetes appears to be most prevelant among Male, Femmale both and overall population also.

ggplot(data=filter(cd.us.data.aprv,Topic=='Alcohol'),aes(x=YearStart, y=DataValueAlt, group=Stratification1
                                                                ,color=Stratification1)) +
    geom_line()+
    geom_point()+
    facet_wrap(~Question,ncol = 1)+
    theme_bw()

Binge drinking for age >=18 are defintely higher in male population

cd.cp.2017 <- filter(cd.us.data,DataValueType=='Crude Prevalence',YearStart=='2017')
kable(head(cd.cp.2017))
YearStart LocationAbbr DataSource Topic Question QuestionID DataValueType DataValue DataValueAlt Stratification1
2017 US ACS 1-Year Estimates Overarching Conditions Poverty OVC3_1 Crude Prevalence 12.2 12.2 Male
2017 US ACS 1-Year Estimates Disability Disability among adults aged >= 65 years DIS1_0 Crude Prevalence 47.4 47.4 American Indian or Alaska Native
2017 US YRBSS Tobacco Current smokeless tobacco use among youth TOB2_1 Crude Prevalence 3.5 3.5 Black, non-Hispanic
2017 US BRFSS Arthritis Arthritis among adults aged >= 18 years who have diabetes ART1_3 Crude Prevalence 43.7 43.7 Male
2017 US BRFSS Arthritis Arthritis among adults aged >= 18 years ART1_1 Crude Prevalence 21.0 21.0 Male
2017 US BRFSS Cardiovascular Disease Influenza vaccination among noninstitutionalized adults aged >= 65 years with a history of coronary heart disease or stroke CVD9_2 Crude Prevalence 66.0 66.0 Male
cd.cp.2018 <- filter(cd.us.data,DataValueType=='Crude Prevalence',YearStart=='2018')
kable(head(cd.cp.2018))
YearStart LocationAbbr DataSource Topic Question QuestionID DataValueType DataValue DataValueAlt Stratification1
2018 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years ALC2_2 Crude Prevalence 11.6 11.6 Female
2018 US BRFSS Chronic Obstructive Pulmonary Disease Influenza vaccination among noninstitutionalized adults aged >= 45 years with chronic obstructive pulmonary disease COPD7_0 Crude Prevalence 47.9 47.9 Male
2018 US BRFSS Tobacco Quit attempts in the past year among current smokers TOB3_0 Crude Prevalence 58.2 58.2 Female
2018 US BRFSS Asthma Pneumococcal vaccination among noninstitutionalized adults aged >= 65 years with asthma AST6_2 Crude Prevalence 87.5 87.5 Female
2018 US BRFSS Asthma Pneumococcal vaccination among noninstitutionalized adults aged >= 65 years with asthma AST6_2 Crude Prevalence 82.4 82.4 Male
2018 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years ALC2_2 Crude Prevalence 21.2 21.2 Male
cd.cp.data.17.18.tidy <- merge(cd.cp.2017,cd.cp.2018,by.x =c('QuestionID','Stratification1'),
                                     by.y = c('QuestionID','Stratification1'))
kable(head(cd.cp.data.17.18.tidy))
QuestionID Stratification1 YearStart.x LocationAbbr.x DataSource.x Topic.x Question.x DataValueType.x DataValue.x DataValueAlt.x YearStart.y LocationAbbr.y DataSource.y Topic.y Question.y DataValueType.y DataValue.y DataValueAlt.y
ALC2_2 Female 2017 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years Crude Prevalence 12.4 12.4 2018 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years Crude Prevalence 11.6 11.6
ALC2_2 Male 2017 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years Crude Prevalence 22.3 22.3 2018 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years Crude Prevalence 21.2 21.2
ALC2_2 Overall 2017 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years Crude Prevalence 17.4 17.4 2018 US BRFSS Alcohol Binge drinking prevalence among adults aged >= 18 years Crude Prevalence 16.2 16.2
ALC2_3 Overall 2017 US BRFSS Alcohol Binge drinking prevalence among women aged 18-44 years Crude Prevalence 18.7 18.7 2018 US BRFSS Alcohol Binge drinking prevalence among women aged 18-44 years Crude Prevalence 18.0 18.0
ALC5_1 Female 2017 US BRFSS Alcohol Heavy drinking among adults aged >= 18 years Crude Prevalence 5.8 5.8 2018 US BRFSS Alcohol Heavy drinking among adults aged >= 18 years Crude Prevalence 5.8 5.8
ALC5_1 Male 2017 US BRFSS Alcohol Heavy drinking among adults aged >= 18 years Crude Prevalence 6.9 6.9 2018 US BRFSS Alcohol Heavy drinking among adults aged >= 18 years Crude Prevalence 7.2 7.2
colnames(cd.cp.data.17.18.tidy)
##  [1] "QuestionID"      "Stratification1" "YearStart.x"    
##  [4] "LocationAbbr.x"  "DataSource.x"    "Topic.x"        
##  [7] "Question.x"      "DataValueType.x" "DataValue.x"    
## [10] "DataValueAlt.x"  "YearStart.y"     "LocationAbbr.y" 
## [13] "DataSource.y"    "Topic.y"         "Question.y"     
## [16] "DataValueType.y" "DataValue.y"     "DataValueAlt.y"
cp.percent.change.2017.2018 <- cd.cp.data.17.18.tidy %>%
    select(QuestionID,Question.x,Stratification1,YearStart.x,YearStart.y,DataSource.x,Topic.x,DataValue.x,DataValue.y)
kable(head(cp.percent.change.2017.2018))
QuestionID Question.x Stratification1 YearStart.x YearStart.y DataSource.x Topic.x DataValue.x DataValue.y
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Female 2017 2018 BRFSS Alcohol 12.4 11.6
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Male 2017 2018 BRFSS Alcohol 22.3 21.2
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Overall 2017 2018 BRFSS Alcohol 17.4 16.2
ALC2_3 Binge drinking prevalence among women aged 18-44 years Overall 2017 2018 BRFSS Alcohol 18.7 18.0
ALC5_1 Heavy drinking among adults aged >= 18 years Female 2017 2018 BRFSS Alcohol 5.8 5.8
ALC5_1 Heavy drinking among adults aged >= 18 years Male 2017 2018 BRFSS Alcohol 6.9 7.2
cp.pc.17.18.clean <- rename(cp.percent.change.2017.2018,Question=Question.x,Year2017=YearStart.x,
                                           Year2018=YearStart.y,DataSource=DataSource.x,Topic=Topic.x,
                                            DataValue2018=DataValue.x,DataValue2017=DataValue.y)
kable(head(cp.pc.17.18.clean))
QuestionID Question Stratification1 Year2017 Year2018 DataSource Topic DataValue2018 DataValue2017
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Female 2017 2018 BRFSS Alcohol 12.4 11.6
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Male 2017 2018 BRFSS Alcohol 22.3 21.2
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Overall 2017 2018 BRFSS Alcohol 17.4 16.2
ALC2_3 Binge drinking prevalence among women aged 18-44 years Overall 2017 2018 BRFSS Alcohol 18.7 18.0
ALC5_1 Heavy drinking among adults aged >= 18 years Female 2017 2018 BRFSS Alcohol 5.8 5.8
ALC5_1 Heavy drinking among adults aged >= 18 years Male 2017 2018 BRFSS Alcohol 6.9 7.2
cp.percent.change <- cp.pc.17.18.clean %>%
    mutate(PercentChange = (as.numeric(as.character(DataValue2018))-
                   as.numeric(as.character(DataValue2017)))/as.numeric(as.character(DataValue2017))*100)
kable(head(cp.percent.change))
QuestionID Question Stratification1 Year2017 Year2018 DataSource Topic DataValue2018 DataValue2017 PercentChange
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Female 2017 2018 BRFSS Alcohol 12.4 11.6 6.896552
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Male 2017 2018 BRFSS Alcohol 22.3 21.2 5.188679
ALC2_2 Binge drinking prevalence among adults aged >= 18 years Overall 2017 2018 BRFSS Alcohol 17.4 16.2 7.407407
ALC2_3 Binge drinking prevalence among women aged 18-44 years Overall 2017 2018 BRFSS Alcohol 18.7 18.0 3.888889
ALC5_1 Heavy drinking among adults aged >= 18 years Female 2017 2018 BRFSS Alcohol 5.8 5.8 0.000000
ALC5_1 Heavy drinking among adults aged >= 18 years Male 2017 2018 BRFSS Alcohol 6.9 7.2 -4.166667
library(ggplot2)
ggplot(data = cp.percent.change, aes(x=Topic,y=PercentChange))+
  geom_bar(stat = 'identity',aes(fill=Topic))+
  facet_wrap(~Stratification1, ncol = 1)

The Female stratification has visbile increase percentage change as compared to Male for Alcohol

cp.percent.change.arrange <- cp.percent.change %>%
    arrange(desc(PercentChange))
ggplot(data = head(cp.percent.change.arrange), aes(x=Topic,y=PercentChange))+
  geom_bar(stat = 'identity',aes(fill=Topic))+
  facet_wrap(~Stratification1, ncol = 1) 

Top percentage increase in CDI topics are Asthma, Diabetes and Tobacco

2 Unicef Data

2.1 Data Acquisition

The world made remarkable progress in child survival in the past few decades, and millions of children have better survival chances than in 1990–5 1 in 26 children died before reaching age five in 2018, compared to 1 in 11 in 1990. Moreover, progress in reducing child mortality has been accelerated in the 2000–2018 period compared with the 1990s, with the annual rate of reduction in the global under-five mortality rate increasing from 2.0 per cent in 1990–2000 to 3.8 per cent in 2000–2018. Despite the global progress in reducing child mortality over the past few decades, an estimated 5.3 million children under age five died in 2018–roughly half of those deaths occurred in sub-Saharan Africa.

Source: https://data.unicef.org/topic/child-survival/under-five-mortality/

unicef.data <- read.csv('https://raw.githubusercontent.com/keshaws/CUNY_MSDS_2020/master/DATA607/unicef-u5mr_2.csv')
head(unicef.data)
##         CountryName U5MR.1950 U5MR.1951 U5MR.1952 U5MR.1953 U5MR.1954
## 1       Afghanistan        NA        NA        NA        NA        NA
## 2           Albania        NA        NA        NA        NA        NA
## 3           Algeria        NA        NA        NA        NA       251
## 4           Andorra        NA        NA        NA        NA        NA
## 5            Angola        NA        NA        NA        NA        NA
## 6 Antigua & Barbuda        NA        NA        NA        NA        NA
##   U5MR.1955 U5MR.1956 U5MR.1957 U5MR.1958 U5MR.1959 U5MR.1960 U5MR.1961
## 1        NA        NA        NA        NA        NA        NA     356.5
## 2        NA        NA        NA        NA        NA        NA        NA
## 3     249.9       249       248     247.5     246.7     246.3     246.1
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA        NA        NA        NA        NA
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1962 U5MR.1963 U5MR.1964 U5MR.1965 U5MR.1966 U5MR.1967 U5MR.1968
## 1     350.6     345.0     339.7     334.1     328.7     323.3     318.1
## 2        NA        NA        NA        NA        NA        NA        NA
## 3     246.2     246.8     247.4     248.2     248.7     248.4     247.4
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA        NA        NA        NA        NA
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1969 U5MR.1970 U5MR.1971 U5MR.1972 U5MR.1973 U5MR.1974 U5MR.1975
## 1     313.0     307.8     302.1     296.4     290.8     284.9     279.4
## 2        NA        NA        NA        NA        NA        NA        NA
## 3     245.3     241.7     236.5     230.0     222.5     214.2     205.0
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA        NA        NA        NA        NA
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1976 U5MR.1977 U5MR.1978 U5MR.1979 U5MR.1980 U5MR.1981 U5MR.1982
## 1     273.6     267.8     261.6     255.5     249.1     242.7     236.2
## 2        NA        NA      91.1      84.7      78.6      73.0      67.8
## 3     195.2     184.9     173.8     161.8     148.1     132.5     115.8
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA        NA     234.1     232.8     231.5
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1983 U5MR.1984 U5MR.1985 U5MR.1986 U5MR.1987 U5MR.1988 U5MR.1989
## 1     229.7     222.9     216.0     209.2     202.1     195.0     187.8
## 2      62.8      58.3      54.3      50.7      47.6      44.9      42.5
## 3      99.2      83.8      71.2      61.9      55.4      51.2      48.5
## 4        NA        NA        NA        NA        NA        NA        NA
## 5     230.2     229.1     228.3     227.5     226.9     226.5     226.2
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1990 U5MR.1991 U5MR.1992 U5MR.1993 U5MR.1994 U5MR.1995 U5MR.1996
## 1     181.0     174.2     167.8     162.0     156.8     152.3     148.6
## 2      40.6      38.8      37.3      36.0      34.6      33.2      31.8
## 3      46.8      45.7      44.9      44.1      43.3      42.5      41.8
## 4       8.5       7.9       7.4       6.9       6.4       6.0       5.7
## 5     226.0     225.9     226.0     225.8     225.5     224.8     224.0
## 6      25.5      24.2      23.1      21.9      20.8      19.7      18.8
##   U5MR.1997 U5MR.1998 U5MR.1999 U5MR.2000 U5MR.2001 U5MR.2002 U5MR.2003
## 1     145.5     142.6     139.9     137.0     133.8     130.3     126.8
## 2      30.3      28.9      27.5      26.2      24.9      23.6      22.5
## 3      41.1      40.6      40.2      39.7      38.9      37.8      36.5
## 4       5.3       5.0       4.8       4.6       4.4       4.2       4.1
## 5     222.6     220.8     218.9     216.7     214.1     211.7     209.2
## 6      17.9      17.0      16.2      15.5      14.8      14.1      13.5
##   U5MR.2004 U5MR.2005 U5MR.2006 U5MR.2007 U5MR.2008 U5MR.2009 U5MR.2010
## 1     123.2     119.6     116.3     113.2     110.4     107.6     105.0
## 2      21.5      20.5      19.5      18.7      17.9      17.3      16.6
## 3      35.1      33.6      32.1      30.7      29.4      28.3      27.3
## 4       4.0       3.9       3.7       3.6       3.5       3.4       3.3
## 5     206.7     203.9     200.5     196.4     192.0     187.3     182.5
## 6      12.9      12.4      11.8      11.3      10.9      10.4       9.9
##   U5MR.2011 U5MR.2012 U5MR.2013 U5MR.2014 U5MR.2015
## 1     102.3      99.5      96.7      93.9      91.1
## 2      16.0      15.5      14.9      14.4      14.0
## 3      26.6      26.1      25.8      25.6      25.5
## 4       3.2       3.1       3.0       2.9       2.8
## 5     177.3     172.2     167.1     162.2     156.9
## 6       9.5       9.1       8.7       8.4       8.1
unique(unicef.data$CountryName)
##   [1] Afghanistan                    Albania                       
##   [3] Algeria                        Andorra                       
##   [5] Angola                         Antigua & Barbuda             
##   [7] Azerbaijan                     Argentina                     
##   [9] Australia                      Austria                       
##  [11] Bahamas                        Bahrain                       
##  [13] Bangladesh                     Armenia                       
##  [15] Barbados                       Belgium                       
##  [17] Bhutan                         Bolivia                       
##  [19] Bosnia & Herzegovina           Botswana                      
##  [21] Brazil                         Belize                        
##  [23] Solomon Islands                Brunei                        
##  [25] Bulgaria                       Myanmar                       
##  [27] Burundi                        Belarus                       
##  [29] Cambodia                       Cameroon                      
##  [31] Canada                         Cape Verde                    
##  [33] Central African Republic       Sri Lanka                     
##  [35] Chad                           Chile                         
##  [37] China                          Colombia                      
##  [39] Comoros                        Congo                         
##  [41] Congo DR                       Cook Islands                  
##  [43] Costa Rica                     Croatia                       
##  [45] Cuba                           Cyprus                        
##  [47] Czech Republic                 Benin                         
##  [49] Denmark                        Dominica                      
##  [51] Dominican Republic             Ecuador                       
##  [53] El Salvador                    Equatorial Guinea             
##  [55] Ethiopia                       Eritrea                       
##  [57] Estonia                        Fiji                          
##  [59] Finland                        France                        
##  [61] Djibouti                       Gabon                         
##  [63] Georgia                        Gambia The                    
##  [65] State of Palestine             Germany                       
##  [67] Ghana                          Kiribati                      
##  [69] Greece                         Grenada                       
##  [71] Guatemala                      Guinea                        
##  [73] Guyana                         Haiti                         
##  [75] Honduras                       Hungary                       
##  [77] Iceland                        India                         
##  [79] Indonesia                      Iran                          
##  [81] Iraq                           Ireland                       
##  [83] Israel                         Italy                         
##  [85] Cote d Ivoire                  Jamaica                       
##  [87] Japan                          Kazakhstan                    
##  [89] Jordan                         Kenya                         
##  [91] Korea DPR                      Korea Rep                     
##  [93] Kuwait                         Kyrgyzstan                    
##  [95] Lao PDR                        Lebanon                       
##  [97] Lesotho                        Latvia                        
##  [99] Liberia                        Libya                         
## [101] Liechtenstein                  Lithuania                     
## [103] Luxembourg                     Madagascar                    
## [105] Malawi                         Malaysia                      
## [107] Maldives                       Mali                          
## [109] Malta                          Mauritania                    
## [111] Mauritius                      Mexico                        
## [113] Monaco                         Mongolia                      
## [115] Moldova                        Montenegro                    
## [117] Morocco                        Mozambique                    
## [119] Oman                           Namibia                       
## [121] Nauru                          Nepal                         
## [123] Netherlands                    Vanuatu                       
## [125] New Zealand                    Nicaragua                     
## [127] Niger                          Nigeria                       
## [129] Niue                           Norway                        
## [131] Federated States of Micronesia Marshall Islands              
## [133] Palau                          Pakistan                      
## [135] Panama                         Papua New Guinea              
## [137] Paraguay                       Peru                          
## [139] Philippines                    Poland                        
## [141] Portugal                       Guinea-Bissau                 
## [143] Timor Leste                    Qatar                         
## [145] Romania                        Russian Federation            
## [147] Rwanda                         Saint Kitts & Nevis           
## [149] Saint Lucia                    St Vincent & the Grenadines   
## [151] San Marino                     Sao Tome & Principe           
## [153] Saudi Arabia                   Senegal                       
## [155] Serbia                         Seychelles                    
## [157] Sierra Leone                   Singapore                     
## [159] Slovakia                       Vietnam                       
## [161] Slovenia                       Somalia                       
## [163] South Africa                   Zimbabwe                      
## [165] Spain                          South Sudan                   
## [167] Sudan                          Suriname                      
## [169] Swaziland                      Sweden                        
## [171] Switzerland                    Syria                         
## [173] Tajikistan                     Thailand                      
## [175] Togo                           Tonga                         
## [177] Trinidad & Tobago              United Arab Emirates          
## [179] Tunisia                        Turkey                        
## [181] Turkmenistan                   Tuvalu                        
## [183] Uganda                         Ukraine                       
## [185] Macedonia                      Egypt                         
## [187] United Kingdom                 Tanzania                      
## [189] United States of America       Burkina Faso                  
## [191] Uruguay                        Uzbekistan                    
## [193] Venezuela                      Samoa                         
## [195] Yemen                          Zambia                        
## 196 Levels: Afghanistan Albania Algeria Andorra ... Zimbabwe

There are 196 countries listed here in this dataset

2.2 Tidying the data

library(tidyverse)
library(dplyr)
#Pivot all years into one column
population.data.US.tidy <- unicef.data %>%
        filter(CountryName == "United States of America") %>%
        gather(Year, Mortalityrate, U5MR.1950:U5MR.2015) %>%
        separate(Year, c("left","Year"), sep="U5MR.") %>%
        select(-left) %>%
        arrange(Year)
head(population.data.US.tidy)
##                CountryName Year Mortalityrate
## 1 United States of America 1950          37.7
## 2 United States of America 1951          36.6
## 3 United States of America 1952          35.6
## 4 United States of America 1953          34.7
## 5 United States of America 1954          33.8
## 6 United States of America 1955          33.0

Now, extract 1995 value as the baseline value and add to US population dataframe

us_base <- population.data.US.tidy[population.data.US.tidy$Year==1995,3]
population.data.US.tidy$us_baseline <- us_base

population.data.US.tidy <- population.data.US.tidy %>% 
  mutate(us_change = (as.numeric(as.character(Mortalityrate))-as.numeric(as.character(us_baseline)))/as.numeric(as.character(us_baseline)))
head(population.data.US.tidy)
##                CountryName Year Mortalityrate us_baseline us_change
## 1 United States of America 1950          37.7         9.5  2.968421
## 2 United States of America 1951          36.6         9.5  2.852632
## 3 United States of America 1952          35.6         9.5  2.747368
## 4 United States of America 1953          34.7         9.5  2.652632
## 5 United States of America 1954          33.8         9.5  2.557895
## 6 United States of America 1955          33.0         9.5  2.473684

creating a world dataframe, clean and turn into long data format

world.data <- unicef.data %>%
        filter(CountryName != "United States of America") %>%
        gather(Year, Mortalityrate, U5MR.1950:U5MR.2015) %>%
        separate(Year, c("left","Year"), sep="U5MR.") %>%
        select(-left) %>%
        arrange(Year) %>%
        filter(Mortalityrate!="NA")
head(world.data, 20)        
##           CountryName Year Mortalityrate
## 1           Australia 1950          31.6
## 2              Canada 1950          48.7
## 3               Benin 1950         348.2
## 4             Denmark 1950          34.1
## 5  Dominican Republic 1950         156.0
## 6                Fiji 1950         135.7
## 7              France 1950          57.1
## 8               Ghana 1950         256.7
## 9              Guyana 1950         157.7
## 10               Iraq 1950         364.3
## 11            Ireland 1950          54.2
## 12              Italy 1950          88.9
## 13              Japan 1950          91.3
## 14             Jordan 1950         275.1
## 15         Mauritania 1950         316.4
## 16        Netherlands 1950          31.9
## 17        New Zealand 1950          35.5
## 18             Norway 1950          32.8
## 19            Senegal 1950         346.2
## 20         Seychelles 1950         118.5

createing baseline point for each country

world_base <- world.data %>%
        filter(Year == 1995) %>%
        select(CountryName, world_Baseline = Mortalityrate)

world.data <- world.data %>%
        left_join(world_base) %>%
        arrange(CountryName) %>%
        mutate(Mortality_change = (as.numeric(as.character(Mortalityrate))-as.numeric(as.character(world_Baseline)))/
                 as.numeric(as.character(world_Baseline)))
## Joining, by = "CountryName"
head((world.data))
##   CountryName Year Mortalityrate world_Baseline Mortality_change
## 1 Afghanistan 1961         356.5          152.3         1.340775
## 2 Afghanistan 1962         350.6          152.3         1.302035
## 3 Afghanistan 1963         345.0          152.3         1.265266
## 4 Afghanistan 1964         339.7          152.3         1.230466
## 5 Afghanistan 1965         334.1          152.3         1.193697
## 6 Afghanistan 1966         328.7          152.3         1.158240
world.data$Year <- as.numeric(world.data$Year)
population.data.US.tidy$Year <- as.numeric(population.data.US.tidy$Year)

Now ranking the country - Top 5 and Bottom 5

rank <- world.data %>% 
        filter(Year == 2010) %>% 
        arrange(desc(Mortality_change)) %>%
        mutate(rank = seq(1,length(CountryName), by=1)) %>%
        filter(rank < 6 | rank > 45  )
kable(head(rank,10))
CountryName Year Mortalityrate world_Baseline Mortality_change rank
Haiti 2010 208.8 124.9 0.6717374 1
Niue 2010 26.2 17.9 0.4636872 2
Dominica 2010 18.8 16.5 0.1393939 3
Lesotho 2010 100.7 98.9 0.0182002 4
Seychelles 2010 14.2 14.5 -0.0206897 5
Switzerland 2010 4.5 6.4 -0.2968750 46
Benin 2010 111.6 159.1 -0.2985544 47
Philippines 2010 31.9 45.6 -0.3004386 48
Jamaica 2010 18.1 26.0 -0.3038462 49
Djibouti 2010 76.1 110.2 -0.3094374 50

We can see Haiti is topping the list where as Djibouti is at the bottom in the top 50 list.

2.3 EDA

p <- ggplot(world.data, aes(Year, Mortality_change, group=CountryName)) +
        theme_bw() +
        theme(plot.background = element_blank(),
              panel.grid.minor = element_blank(),
              panel.grid.major.x = element_blank(),
              panel.grid.major.y = element_line(linetype = 3, colour = "grey50"),
              panel.border = element_blank(),
              panel.background = element_blank(),
              axis.ticks = element_blank(),  
              axis.title = element_blank()) +
        geom_line(colour="grey90", alpha=.9)

print(p)

p <- p +
        geom_line(data=population.data.US.tidy, aes(Year, us_change, group=1), linetype=5)

print(p)

p <- p +
        geom_line(data=filter(world.data, CountryName=="Switzerland"), 
                  aes(Year, Mortality_change, group=CountryName), colour="dodgerblue", size = 1)

print(p)

We looked into the Unicef under five mortality data and ranked them according to percent mortality change. We took 1995 as a baseline year and also drew plot for Switzerland mortality percent change.

3 3 NYC PArk Crime Data

3.1 Data Acquisuition

The New York City Police Department records reported crime and offense data in accordance with the New York State Penal Law and other New York State laws. For statistical presentation purposes the numerous law categories and subsections are summarized by law class: felony, misdemeanor and violation. These legal categories are then subdivided into broad crime and offense categories, e.g., Felonious Assault, Grand Larceny, Misdemeanor Criminal Mischief, etc. The tabular data presented here compile reported crime and offense data recorded by the New York City Police Department from 2000 through 2015. Separate tables are presented for the seven major felonies, other felony crimes, misdemeanors, and violations.

Source: https://www1.nyc.gov/site/nypd/stats/crime-statistics/park-crime-stats.page

nyc.crime.data<- read.csv('https://raw.githubusercontent.com/keshaws/CUNY_MSDS_2020/master/DATA607/nycparkcrime.csv',skip = 3)
head(nyc.crime.data)
##                            PARK       BOROUGH SIZE..ACRES.
## 1               PELHAM BAY PARK BRONX              2771.75
## 2            VAN CORTLANDT PARK BRONX              1146.43
## 3  ROCKAWAY BEACH AND BOARDWALK QUEENS             1072.56
## 4               FRESHKILLS PARK STATEN ISLAND       913.32
## 5  FLUSHING MEADOWS CORONA PARK QUEENS              897.69
## 6 LATOURETTE PARK & GOLF COURSE STATEN ISLAND       843.97
##             CATEGORY MURDER RAPE ROBBERY FELONY.ASSAULT BURGLARY
## 1 ONE ACRE OR LARGER      0    0       0              0        0
## 2 ONE ACRE OR LARGER      0    0       0              0        1
## 3 ONE ACRE OR LARGER      0    0       0              0        0
## 4 ONE ACRE OR LARGER      0    0       0              0        0
## 5 ONE ACRE OR LARGER      0    0       4              1        5
## 6 ONE ACRE OR LARGER      0    0       0              0        0
##   GRAND.LARCENY GRAND.LARCENY.OF.MOTOR.VEHICLE TOTAL  X
## 1             0                              0     0 NA
## 2             0                              0     1 NA
## 3             0                              0     0 NA
## 4             0                              0     0 NA
## 5            10                              0    20 NA
## 6             0                              0     0 NA

Checking the features of the dataset

colnames(nyc.crime.data)
##  [1] "PARK"                           "BOROUGH"                       
##  [3] "SIZE..ACRES."                   "CATEGORY"                      
##  [5] "MURDER"                         "RAPE"                          
##  [7] "ROBBERY"                        "FELONY.ASSAULT"                
##  [9] "BURGLARY"                       "GRAND.LARCENY"                 
## [11] "GRAND.LARCENY.OF.MOTOR.VEHICLE" "TOTAL"                         
## [13] "X"

getting rid of X (the unknown attribute)

nyc.crime.data$X <- NULL

3.2 Tidying the data

nyctidy <- nyc.crime.data %>%
  gather('Assault_Type', 'Instance_Rate', 5:11)
  nyctidy <-select(nyctidy,-c(Instance_Rate))
  
kable(head(nyctidy, 10))  
PARK BOROUGH SIZE..ACRES. CATEGORY TOTAL Assault_Type
PELHAM BAY PARK BRONX 2771.75 ONE ACRE OR LARGER 0 MURDER
VAN CORTLANDT PARK BRONX 1146.43 ONE ACRE OR LARGER 1 MURDER
ROCKAWAY BEACH AND BOARDWALK QUEENS 1072.56 ONE ACRE OR LARGER 0 MURDER
FRESHKILLS PARK STATEN ISLAND 913.32 ONE ACRE OR LARGER 0 MURDER
FLUSHING MEADOWS CORONA PARK QUEENS 897.69 ONE ACRE OR LARGER 20 MURDER
LATOURETTE PARK & GOLF COURSE STATEN ISLAND 843.97 ONE ACRE OR LARGER 0 MURDER
MARINE PARK BROOKLYN 798.00 ONE ACRE OR LARGER 2 MURDER
BELT PARKWAY/SHORE PARKWAY BROOKLYN/QUEENS 760.43 ONE ACRE OR LARGER 0 MURDER
BRONX PARK BRONX 718.37 ONE ACRE OR LARGER 6 MURDER
FRANKLIN D. ROOSEVELT BOARDWALK AND BEACH STATEN ISLAND 644.35 ONE ACRE OR LARGER 0 MURDER

3.3 Data Summary

Summarizing the dataset per borough

summarise_at(group_by(nyctidy,BOROUGH),vars(TOTAL),funs(sum(.,na.rm=TRUE)))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## please use list() instead
## 
## # Before:
## funs(name = f(.)
## 
## # After: 
## list(name = ~f(.))
## This warning is displayed once per session.
## # A tibble: 7 x 2
##   BOROUGH         TOTAL
##   <fct>           <int>
## 1 ""               1736
## 2 "BRONX        "   343
## 3 "BROOKLYN     "   364
## 4 BROOKLYN/QUEENS     7
## 5 "MANHATTAN    "   609
## 6 "QUEENS       "   399
## 7 STATEN ISLAND      14

Bronx clearly has highest crime rate followed by Manhattan.

3.4 EDA

Plotting the data for crimes by category of park

ggplot(nyctidy,aes(x=CATEGORY,y=TOTAL))+
  geom_bar(aes(fill=TOTAL),stat='identity')+
  ggtitle('Crimes by Category of Park')+
  coord_flip()

From the analysis it appears that the most of the crimes occur in one acre or large parks.

ggplot(nyctidy,aes(x=BOROUGH,y=TOTAL))+
  geom_bar(aes(fill=CATEGORY),stat='identity')+
  ggtitle('Types of Park - Borough-wise')+
  coord_flip()

Also, when plotting borough wise, Manhattan appears to be topping the list