The goal of this project is to:
Make the visualized image(s) tells the story in an unambiguous way, understandable, even to a layman!
Methodology:
The project would lay more emphasis on the explanatory techniques. It will be used in making data presentation to the viewers in a more succinct way. I therefore plan to use the R programing language to explore and analysis the dataset.
The dataset to be used is the World Health Nutrition and Population Statistics from year 2010 to 2016 . This can be obtained from http://databank.worldbank.org/data/reports.aspx?source=health-nutrition-and-population-statistics#advancedDownloadOptions .
Load this libraries and dataset and lets get to work!
suppressMessages(library(knitr))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(plotly))
suppressMessages(library(sqldf))
suppressPackageStartupMessages(library(googleVis))
df <- read.csv("WorldHealth.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
kable(head(df[200:206, ]))
| Year_Code | Country_Name | Country_Code | Adults_15_living_HIV | Adults_Children_0_14_15_living_HIV | AIDS_estimated_deaths_UNAIDS | Adults_children_0_14_15_newly_infected_HIV | Adults_15_newly_infected_HIV | Children_0_14_living_with_HIV | Children_orphaned_by_HIV_AIDS | Children_0_14_newly_infected_HIV | Incidence_tuberculosis_per_100000 | Labor_force_total | Mortality_traffic_injury_100K | Population_female | Population_male | Population_total | Malaria_cases_reported | Suicide_mortality_per_100K | Tuberculosis_death_per_100K | Tuberculosis_case_detection | Tuberculosis_treatment_success_NewCases | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 200 | YR2010 | Uganda | UGA | 1100000 | 1300000 | 63000 | 99000 | 73000 | 190000 | 1400000 | 26000 | 210.0 | 14270644 | 29.50000 | 17074596 | 16840537 | 33915133 | 1581160 | 6.90000 | 18.00 | 62 | 68 |
| 201 | YR2010 | Ukraine | UKR | 200000 | 200000 | 12000 | 16000 | 16000 | 3100 | 39000 | 500 | 110.0 | 23063322 | 15.10000 | 24698001 | 21172699 | 45870700 | NA | 22.50000 | 17.00 | 67 | 69 |
| 202 | YR2010 | United Arab Emirates | ARE | NA | NA | NA | NA | NA | NA | NA | NA | 2.4 | 5781880 | 11.70000 | 2109194 | 6161490 | 8270684 | NA | 3.50000 | 0.49 | 65 | 72 |
| 203 | YR2010 | United Kingdom | GBR | NA | NA | NA | NA | NA | NA | NA | NA | 14.0 | 32127930 | 3.70000 | 31932131 | 30834234 | 62766365 | NA | 7.70000 | 0.59 | 89 | 81 |
| 204 | YR2010 | United States | USA | 1000000 | NA | NA | NA | 43000 | NA | NA | NA | 4.1 | 157264492 | 11.50000 | 156551423 | 152796770 | 309348193 | NA | 12.90000 | 0.20 | 87 | 66 |
| 205 | YR2010 | Upper middle income | UMC | NA | NA | NA | NA | NA | NA | NA | NA | 92.0 | 1291951878 | 20.46007 | 1221792253 | 1242542506 | 2464534816 | NA | 10.07323 | 6.20 | 80 | 81 |
lat_long <- read.csv("Countries_long_lat2.csv", header = TRUE, sep = ",")
colnames(lat_long) <- c("Country", "Country_Code", "Latitude", "Longtitude")
kable(head(lat_long))
| Country | Country_Code | Latitude | Longtitude |
|---|---|---|---|
| Albania | ALB | 41.0000 | 20.0000 |
| Algeria | DZA | 28.0000 | 3.0000 |
| American Samoa | ASM | -14.3333 | -170.0000 |
| Andorra | AND | 42.5000 | 1.6000 |
| Angola | AGO | -12.5000 | 18.5000 |
| Anguilla | AIA | 18.2500 | -63.1667 |
Cleaning and renaming of dataset and column respectively.
options(warn = -1)
df2 <- merge(df, lat_long, by.x = "Country_Code", by.y = "Country_Code", all = FALSE)
df2[, 4:19] <- sapply(df2[, 4:19], as.numeric) # Covet the columns to Numeric
kable(head(df2))
| Country_Code | Year_Code | Country_Name | Adults_15_living_HIV | Adults_Children_0_14_15_living_HIV | AIDS_estimated_deaths_UNAIDS | Adults_children_0_14_15_newly_infected_HIV | Adults_15_newly_infected_HIV | Children_0_14_living_with_HIV | Children_orphaned_by_HIV_AIDS | Children_0_14_newly_infected_HIV | Incidence_tuberculosis_per_100000 | Labor_force_total | Mortality_traffic_injury_100K | Population_female | Population_male | Population_total | Malaria_cases_reported | Suicide_mortality_per_100K | Tuberculosis_death_per_100K | Tuberculosis_case_detection | Tuberculosis_treatment_success_NewCases | Country | Latitude | Longtitude |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ABW | YR2011 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 9.0 | NA | NA | 53404 | 48649 | 102053 | NA | NA | 0.74 | 87 | 93 | Aruba | 12.52111 | -69.9667 |
| ABW | YR2014 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 2.2 | NA | NA | 54417 | 49378 | 103795 | NA | NA | 0.18 | 87 | NA | Aruba | 12.52111 | -69.9667 |
| ABW | YR2010 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 6.8 | NA | NA | 53202 | 48467 | 101669 | NA | NA | 0.56 | 87 | NA | Aruba | 12.52111 | -69.9667 |
| ABW | YR2015 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 12.0 | NA | NA | 54743 | 49598 | 104341 | NA | NA | 1.00 | NA | NA | Aruba | 12.52111 | -69.9667 |
| ABW | YR2013 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 12.0 | NA | NA | 54060 | 49127 | 103187 | NA | NA | 1.00 | 87 | NA | Aruba | 12.52111 | -69.9667 |
| ABW | YR2012 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 31.0 | NA | NA | 53701 | 48876 | 102577 | NA | NA | 2.60 | 87 | NA | Aruba | 12.52111 | -69.9667 |
Merging column lonitude and Latitude together for a better coordinate to be in maps (googlevis)
df2$Lat_Long = paste(df2$Latitude, df2$Longtitude, sep=":")
kable(head(df2))
| Country_Code | Year_Code | Country_Name | Adults_15_living_HIV | Adults_Children_0_14_15_living_HIV | AIDS_estimated_deaths_UNAIDS | Adults_children_0_14_15_newly_infected_HIV | Adults_15_newly_infected_HIV | Children_0_14_living_with_HIV | Children_orphaned_by_HIV_AIDS | Children_0_14_newly_infected_HIV | Incidence_tuberculosis_per_100000 | Labor_force_total | Mortality_traffic_injury_100K | Population_female | Population_male | Population_total | Malaria_cases_reported | Suicide_mortality_per_100K | Tuberculosis_death_per_100K | Tuberculosis_case_detection | Tuberculosis_treatment_success_NewCases | Country | Latitude | Longtitude | Lat_Long |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ABW | YR2011 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 9.0 | NA | NA | 53404 | 48649 | 102053 | NA | NA | 0.74 | 87 | 93 | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | YR2014 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 2.2 | NA | NA | 54417 | 49378 | 103795 | NA | NA | 0.18 | 87 | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | YR2010 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 6.8 | NA | NA | 53202 | 48467 | 101669 | NA | NA | 0.56 | 87 | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | YR2015 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 12.0 | NA | NA | 54743 | 49598 | 104341 | NA | NA | 1.00 | NA | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | YR2013 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 12.0 | NA | NA | 54060 | 49127 | 103187 | NA | NA | 1.00 | 87 | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | YR2012 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 31.0 | NA | NA | 53701 | 48876 | 102577 | NA | NA | 2.60 | 87 | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
we are now to goint make use of sql to subset(query) columns so as to diffentiate between year 2000 and 2010 where the number children orphaned by HIV/AIDS more than 50000.
Twentyfirst_Cen <- sqldf("SELECT Country_Name, Year_Code, Lat_Long, Population_total, (Children_orphaned_by_HIV_AIDS/Population_total)*100 as 'Percentage_Orphaned_byHIV' FROM df2 where Percentage_Orphaned_byHIV >= 2 ORDER BY Percentage_Orphaned_byHIV DESC LIMIT 50" )
Twenty First century coutries that are still battling with HIV/AIDS related deaths.
Malaria_Inc_Map <- gvisGeoChart(Twentyfirst_Cen, locationvar ="Lat_Long", hovervar ="Country_Name",sizevar = "Percentage_Orphaned_byHIV", colorvar = "Population_total",
options=list(displayMode="Markers",
colorAxis="{colors:['purple', 'red', 'orange', 'grey', 'pink']}",
backgroundColor="lightblue"), chartid="Lost_Their_Parents_To_HIV_AIDS")
plot(Malaria_Inc_Map )
A better way to see for youself!
ggplotly(ggplot(Twentyfirst_Cen, aes(x=Percentage_Orphaned_byHIV, y=reorder(Country_Name, +Percentage_Orphaned_byHIV), fill=Year_Code)) +
geom_point(colour="purple", size=2, alpha=.8) +
scale_fill_brewer(palette="Blues", breaks=rev(levels(Twentyfirst_Cen$Year_Code))) + labs(title="Chart of Children Orphaned By HIV/AIDS BY Countries (%)"))
Combo <- gvisComboChart(Twentyfirst_Cen, xvar="Country_Name",
yvar="Percentage_Orphaned_byHIV",
options=list(seriesType="bars",
bar="{groupWidth:'100%'}",
title="Interactive Chart Of Countries And Related HIV/AIDS Orphaned ",
series='{0: {type:"line"}}'),chartid = "ER")
plot(Combo)
The chart shows that majority of the HIV/AIDS related death were rampant in the late Nineteen century than it were in the 20-21th century.
Tools To Be Used:
Packages To Be Used:
etc