Aim And Methodology:

  • Aim:

The goal of this project is to:

  • Show the merit and demerit of visual analytics on data analysis, and how to improve it.
  • Provide answer to the question, “Does demographics like Age, Sex, Location have impact on types of death, economic status?”
  • Make the visualized image(s) tells the story in an unambiguous way, understandable, even to a layman!

  • Methodology:

The project would lay more emphasis on the explanatory techniques. It will be used in making data presentation to the viewers in a more succinct way. I therefore plan to use the R programing language to explore and analysis the dataset.

The dataset to be used is the World Health Nutrition and Population Statistics from year 2010 to 2016 . This can be obtained from http://databank.worldbank.org/data/reports.aspx?source=health-nutrition-and-population-statistics#advancedDownloadOptions .

Load this libraries and dataset and lets get to work!

Glimpse

  • Although the dataset contain lots of missing values, we will deal with it!
suppressMessages(library(knitr))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(plotly))
suppressMessages(library(sqldf))
suppressPackageStartupMessages(library(googleVis))
df <- read.csv("WorldHealth.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
kable(head(df[200:206, ]))
Year_Code Country_Name Country_Code Adults_15_living_HIV Adults_Children_0_14_15_living_HIV AIDS_estimated_deaths_UNAIDS Adults_children_0_14_15_newly_infected_HIV Adults_15_newly_infected_HIV Children_0_14_living_with_HIV Children_orphaned_by_HIV_AIDS Children_0_14_newly_infected_HIV Incidence_tuberculosis_per_100000 Labor_force_total Mortality_traffic_injury_100K Population_female Population_male Population_total Malaria_cases_reported Suicide_mortality_per_100K Tuberculosis_death_per_100K Tuberculosis_case_detection Tuberculosis_treatment_success_NewCases
200 YR2010 Uganda UGA 1100000 1300000 63000 99000 73000 190000 1400000 26000 210.0 14270644 29.50000 17074596 16840537 33915133 1581160 6.90000 18.00 62 68
201 YR2010 Ukraine UKR 200000 200000 12000 16000 16000 3100 39000 500 110.0 23063322 15.10000 24698001 21172699 45870700 NA 22.50000 17.00 67 69
202 YR2010 United Arab Emirates ARE NA NA NA NA NA NA NA NA 2.4 5781880 11.70000 2109194 6161490 8270684 NA 3.50000 0.49 65 72
203 YR2010 United Kingdom GBR NA NA NA NA NA NA NA NA 14.0 32127930 3.70000 31932131 30834234 62766365 NA 7.70000 0.59 89 81
204 YR2010 United States USA 1000000 NA NA NA 43000 NA NA NA 4.1 157264492 11.50000 156551423 152796770 309348193 NA 12.90000 0.20 87 66
205 YR2010 Upper middle income UMC NA NA NA NA NA NA NA NA 92.0 1291951878 20.46007 1221792253 1242542506 2464534816 NA 10.07323 6.20 80 81
  • World longitudes and latitudes
lat_long <- read.csv("Countries_long_lat2.csv", header = TRUE, sep = ",")
colnames(lat_long) <- c("Country", "Country_Code", "Latitude", "Longtitude")
kable(head(lat_long))
Country Country_Code Latitude Longtitude
Albania ALB 41.0000 20.0000
Algeria DZA 28.0000 3.0000
American Samoa ASM -14.3333 -170.0000
Andorra AND 42.5000 1.6000
Angola AGO -12.5000 18.5000
Anguilla AIA 18.2500 -63.1667

Cleaning and renaming of dataset and column respectively.

options(warn = -1)
df2 <- merge(df, lat_long, by.x = "Country_Code", by.y = "Country_Code", all = FALSE)
df2[, 4:19] <- sapply(df2[, 4:19], as.numeric) # Covet the columns to Numeric
kable(head(df2))
Country_Code Year_Code Country_Name Adults_15_living_HIV Adults_Children_0_14_15_living_HIV AIDS_estimated_deaths_UNAIDS Adults_children_0_14_15_newly_infected_HIV Adults_15_newly_infected_HIV Children_0_14_living_with_HIV Children_orphaned_by_HIV_AIDS Children_0_14_newly_infected_HIV Incidence_tuberculosis_per_100000 Labor_force_total Mortality_traffic_injury_100K Population_female Population_male Population_total Malaria_cases_reported Suicide_mortality_per_100K Tuberculosis_death_per_100K Tuberculosis_case_detection Tuberculosis_treatment_success_NewCases Country Latitude Longtitude
ABW YR2011 Aruba NA NA NA NA NA NA NA NA 9.0 NA NA 53404 48649 102053 NA NA 0.74 87 93 Aruba 12.52111 -69.9667
ABW YR2014 Aruba NA NA NA NA NA NA NA NA 2.2 NA NA 54417 49378 103795 NA NA 0.18 87 NA Aruba 12.52111 -69.9667
ABW YR2010 Aruba NA NA NA NA NA NA NA NA 6.8 NA NA 53202 48467 101669 NA NA 0.56 87 NA Aruba 12.52111 -69.9667
ABW YR2015 Aruba NA NA NA NA NA NA NA NA 12.0 NA NA 54743 49598 104341 NA NA 1.00 NA NA Aruba 12.52111 -69.9667
ABW YR2013 Aruba NA NA NA NA NA NA NA NA 12.0 NA NA 54060 49127 103187 NA NA 1.00 87 NA Aruba 12.52111 -69.9667
ABW YR2012 Aruba NA NA NA NA NA NA NA NA 31.0 NA NA 53701 48876 102577 NA NA 2.60 87 NA Aruba 12.52111 -69.9667

Merging column lonitude and Latitude together for a better coordinate to be in maps (googlevis)

df2$Lat_Long = paste(df2$Latitude, df2$Longtitude, sep=":")
kable(head(df2))
Country_Code Year_Code Country_Name Adults_15_living_HIV Adults_Children_0_14_15_living_HIV AIDS_estimated_deaths_UNAIDS Adults_children_0_14_15_newly_infected_HIV Adults_15_newly_infected_HIV Children_0_14_living_with_HIV Children_orphaned_by_HIV_AIDS Children_0_14_newly_infected_HIV Incidence_tuberculosis_per_100000 Labor_force_total Mortality_traffic_injury_100K Population_female Population_male Population_total Malaria_cases_reported Suicide_mortality_per_100K Tuberculosis_death_per_100K Tuberculosis_case_detection Tuberculosis_treatment_success_NewCases Country Latitude Longtitude Lat_Long
ABW YR2011 Aruba NA NA NA NA NA NA NA NA 9.0 NA NA 53404 48649 102053 NA NA 0.74 87 93 Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW YR2014 Aruba NA NA NA NA NA NA NA NA 2.2 NA NA 54417 49378 103795 NA NA 0.18 87 NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW YR2010 Aruba NA NA NA NA NA NA NA NA 6.8 NA NA 53202 48467 101669 NA NA 0.56 87 NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW YR2015 Aruba NA NA NA NA NA NA NA NA 12.0 NA NA 54743 49598 104341 NA NA 1.00 NA NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW YR2013 Aruba NA NA NA NA NA NA NA NA 12.0 NA NA 54060 49127 103187 NA NA 1.00 87 NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW YR2012 Aruba NA NA NA NA NA NA NA NA 31.0 NA NA 53701 48876 102577 NA NA 2.60 87 NA Aruba 12.52111 -69.9667 12.52111:-69.9667

we are now to goint make use of sql to subset(query) columns so as to diffentiate between year 2000 and 2010 where the number children orphaned by HIV/AIDS more than 50000.

Twentyfirst_Cen <- sqldf("SELECT Country_Name, Year_Code, Lat_Long, Population_total, (Children_orphaned_by_HIV_AIDS/Population_total)*100 as 'Percentage_Orphaned_byHIV' FROM df2 where Percentage_Orphaned_byHIV >= 2 ORDER BY Percentage_Orphaned_byHIV DESC LIMIT 50" )
  • The world map showing the countries where children are orphaned by HIV?AIDS (2000, 2010)

Twenty First century coutries that are still battling with HIV/AIDS related deaths.

Malaria_Inc_Map <- gvisGeoChart(Twentyfirst_Cen, locationvar ="Lat_Long", hovervar ="Country_Name",sizevar = "Percentage_Orphaned_byHIV", colorvar = "Population_total",
options=list(displayMode="Markers",
colorAxis="{colors:['purple', 'red', 'orange', 'grey', 'pink']}",
backgroundColor="lightblue"), chartid="Lost_Their_Parents_To_HIV_AIDS")
plot(Malaria_Inc_Map )
  • From the map above, we can see that majority of countries where more that 50000 children lost their parents to HIV/AIDS are in southern part of Africa.

A better way to see for youself!

ggplotly(ggplot(Twentyfirst_Cen, aes(x=Percentage_Orphaned_byHIV, y=reorder(Country_Name, +Percentage_Orphaned_byHIV), fill=Year_Code)) +
geom_point(colour="purple", size=2, alpha=.8) +
scale_fill_brewer(palette="Blues", breaks=rev(levels(Twentyfirst_Cen$Year_Code))) + labs(title="Chart of Children Orphaned By HIV/AIDS BY Countries (%)"))
  • The chart below depicts the 20-21st Century Countries Where Most Children Are Losing Their Parent To HIV/AIDS
Combo <- gvisComboChart(Twentyfirst_Cen, xvar="Country_Name",
yvar="Percentage_Orphaned_byHIV",
options=list(seriesType="bars",
bar="{groupWidth:'100%'}",
title="Interactive Chart Of Countries And Related HIV/AIDS Orphaned ",
series='{0: {type:"line"}}'),chartid = "ER")
plot(Combo)

The chart shows that majority of the HIV/AIDS related death were rampant in the late Nineteen century than it were in the 20-21th century.

Tools And Packages

Tools To Be Used:

  • GoogleVis
  • Plotly
  • GGplot2
  • Sqldf

Packages To Be Used:

  • Plotly
  • Knitr
  • Dplyr
  • Plyr
  • MASS
  • Reshape2
  • Ggplot2
  • Graphics
  • Ggthemes
  • GoogleVis

etc