Your final project is to create a public visualization using data relevant to a current policy, business, or justice issue. You may use any dataset you can find for this assignment, as long as it is either public or you have permission from the data’s owner/administrator to work with it and share it.
Recommended data sources are: governmental data, data provided by a non-profit/Nongovernmental organizations, and data available from large, semi-structured data sets (ie social networks, company financials, etc).
You must document each step of your data analysis process (excluding data acquisition) in code: this will include changing the format of the data and the creation of any images or interactive displays that are made.
You must also include a short (2-3 paragraph) write-up on the visualization. This write-up must include the following: the data source, what the parameters of the data set are (geography, timeframe, what the data points are, etc) what the data shows, and why it is important.
Grading:
This assignment will account for 40% of your final grade. Points will be awarded for the following components:
Dates:
Note - The type of deliverable you provide will depend on the strategy you use for this project. If you put together an interactive visualization, you should be able to provide code that I will be able to run and host locally. If you are choosing static visualizations, your write up will be more important to your overall grade, and it may be useful to think about how you’re presenting these visualizations (in a formated R markdown document for example.)
Proposal
You must submit a proposal for your project by 03/26. This proposal must include: a link to the data source, an explanation of what you want to show, why this is relevant to a current policy, business, or justice issue, and which technologies you plan to use.
Your instructor must approve this proposal: you may have to refine this somewhat. You will present your final project during our last meetup. If you are not able to attend the lecture on those days, you must write-up a status report with screenshots of current progress and issues you are experiencing.
The goal of this project is to:
The project would lay more emphasis on the explanatory techniques. It will be used in making data presentation to the viewers in a more succinct way. I therefore plan to use the R programing language to explore and analysis the dataset.
The dataset to be used is the World Health Nutrition and Population Statistics from year 2000 to 2019 . This can be obtained from DataBankHealth Nutrition and Population Statistics, last updated on 12/20/2019.
Load the source dataset
df <- read.csv("World Health Nutrition and Population Statistics_2020-2020.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
knitr::kable(head(df[200:206, ]))| Year | Country_Name | Country_Code | Adults_15_living_HIV | Adults_Children_0_14_15_living_HIV | AIDS_estimated_deaths_UNAIDS | Adults_children_0_14_15_newly_infected_HIV | Adults_15_newly_infected_HIV | Children_0_14_living_with_HIV | Children_orphaned_by_HIV_AIDS | Children_0_14_newly_infected_HIV | Incidence_tuberculosis_per_100000 | Labor_force_total | Mortality_traffic_injury_100K | Population_female | Population_male | Population_total | Malaria_cases_reported | Suicide_mortality_per_100K | Tuberculosis_death_per_100K | Tuberculosis_case_detection | Tuberculosis_treatment_success_NewCases | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 200 | 2000 | Ukraine | UKR | 170000 | 170000 | 4500 | 29000 | 29000 | 760 | 5600 | 500 | 114.0 | 23221521 | NA | 26284954 | 22890894 | 49175848 | NA | 36.90000 | 23.00 | 59 | NA |
| 201 | 2000 | Upper middle income | UMC | NA | NA | NA | NA | NA | NA | NA | NA | 103.0 | 1179100250 | NA | 1149821735 | 1165590459 | 2317310149 | NA | 14.01896 | 9.80 | 48 | 81 |
| 202 | 2000 | Uruguay | URY | 5900 | 6000 | 500 | 590 | 570 | 100 | 1100 | 100 | 22.0 | 1567214 | NA | 1713077 | 1606659 | 3319736 | NA | 17.40000 | 2.30 | 87 | 85 |
| 203 | 2000 | United States | USA | NA | NA | NA | NA | NA | NA | NA | NA | 6.7 | 146767130 | NA | 143178430 | 138983981 | 282162411 | NA | 11.30000 | 0.32 | 87 | 83 |
| 204 | 2000 | Uzbekistan | UZB | 14000 | 14000 | 840 | 2100 | 1900 | 730 | 7600 | 200 | 99.0 | 9733490 | NA | 12392841 | 12257559 | 24650400 | 126 | 7.60000 | 16.00 | 64 | 80 |
| 205 | 2000 | St. Vincent and the Grenadines | VCT | NA | NA | NA | NA | NA | NA | NA | NA | 17.0 | 48284 | NA | 53486 | 54298 | 107784 | NA | 6.30000 | 3.20 | 87 | 100 |
World longitudes and latitudes
lat_long <- read.csv("Countries-longitude_latitude.csv", header = TRUE, sep = ",")
colnames(lat_long) <- c("Country", "Country_Code", "Latitude", "Longtitude")
knitr::kable(head(lat_long))| Country | Country_Code | Latitude | Longtitude |
|---|---|---|---|
| Afghanistan | AFG | 33.0000 | 65.0 |
| Albania | ALB | 41.0000 | 20.0 |
| Algeria | DZA | 28.0000 | 3.0 |
| American Samoa | ASM | -14.3333 | -170.0 |
| Andorra | AND | 42.5000 | 1.6 |
| Angola | AGO | -12.5000 | 18.5 |
Cleaning and renaming of dataset and column respectively.
options(warn = -1)
df2 <- merge(df, lat_long, by.x = "Country_Code", by.y = "Country_Code", all = FALSE)
df2[, 5:11] <- sapply(df2[, 5:11], as.numeric)Merging column Longitude and Latitude together for a better coordinate to be in maps (googlevis)
| Country_Code | Year | Country_Name | Adults_15_living_HIV | Adults_Children_0_14_15_living_HIV | AIDS_estimated_deaths_UNAIDS | Adults_children_0_14_15_newly_infected_HIV | Adults_15_newly_infected_HIV | Children_0_14_living_with_HIV | Children_orphaned_by_HIV_AIDS | Children_0_14_newly_infected_HIV | Incidence_tuberculosis_per_100000 | Labor_force_total | Mortality_traffic_injury_100K | Population_female | Population_male | Population_total | Malaria_cases_reported | Suicide_mortality_per_100K | Tuberculosis_death_per_100K | Tuberculosis_case_detection | Tuberculosis_treatment_success_NewCases | Country | Latitude | Longtitude | Lat_Long |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ABW | 2006 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 8.9 | NA | NA | 52897 | 47937 | 100834 | NA | NA | 0.73 | NA | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | 2015 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 11.0 | NA | NA | 54743 | 49598 | 104341 | NA | NA | 0.92 | NA | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | 2017 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 8.7 | NA | NA | 55331 | 50035 | 105366 | NA | NA | 0.72 | 87 | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | 2005 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 8.6 | NA | NA | 52456 | 47575 | 100031 | NA | NA | 0.71 | NA | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | 2010 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 6.8 | NA | NA | 53202 | 48467 | 101669 | NA | NA | 0.56 | 87 | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
| ABW | 2003 | Aruba | NA | NA | NA | NA | NA | NA | NA | NA | 8.1 | NA | NA | 50707 | 46310 | 97017 | NA | NA | 0.67 | NA | NA | Aruba | 12.52111 | -69.9667 | 12.52111:-69.9667 |
We are now to goint make use of sql to subset(query) columns so as to diffentiate between year 2000 and 2019 where the number children orphaned by HIV/AIDS more than 50000.
Twentyfirst_Cen <- sqldf("SELECT Country_Name, Year, Lat_Long, Population_total, (Children_orphaned_by_HIV_AIDS/Population_total)*100 as 'Percentage_Orphaned_byHIV' FROM df2 where Percentage_Orphaned_byHIV >= 2 ORDER BY Percentage_Orphaned_byHIV DESC LIMIT 50")
head(Twentyfirst_Cen)## Country_Name Year Lat_Long Population_total Percentage_Orphaned_byHIV
## 1 Zimbabwe 2004 -20:30 12019912 8.319528
## 2 Zimbabwe 2005 -20:30 12076699 8.280408
## 3 Zimbabwe 2006 -20:30 12155491 8.226735
## 4 Zimbabwe 2003 -20:30 11982224 8.178782
## 5 Zimbabwe 2007 -20:30 12255922 8.159321
## 6 Zimbabwe 2008 -20:30 12379549 8.077839
The world map showing the countries where children are orphaned by HIV/AIDS (2000-2019)
Show_map <- googleVis::gvisGeoChart(Twentyfirst_Cen, locationvar ="Lat_Long", hovervar ="Country_Name", sizevar = "Percentage_Orphaned_byHIV", colorvar = "Population_total",
options=list(displayMode="Markers",
colorAxis="{colors:['purple', 'red', 'orange', 'grey', 'pink']}",
backgroundColor="lightblue"),
chartid="Lost_Their_Parents_To_HIV_AIDS")
plot(Show_map)From the map above, we can see that majority of countries where more that 50000 children lost their parents to HIV/AIDS are in southern part of Africa.
library(graphics)
plotly::ggplotly(ggplot2::ggplot(Twentyfirst_Cen, ggplot2::aes(x=Percentage_Orphaned_byHIV, y=reorder(Country_Name, +Percentage_Orphaned_byHIV), fill=Year)) +
geom_point(colour="purple", size=2, alpha=.8) +
scale_fill_brewer(palette="Blues", breaks=rev(levels(Twentyfirst_Cen$Year))) +
labs(title="Chart of Children Orphaned By HIV/AIDS BY Countries (%)"))The chart below depicts the 20-21st Century Countries Where Most Children Are Losing Their Parent To HIV/AIDS
Combo <- googleVis::gvisComboChart(Twentyfirst_Cen, xvar="Country_Name", yvar="Percentage_Orphaned_byHIV",
options=list(seriesType="bars",
bar="{groupWidth:'100%'}",
title="Interactive Chart Of Countries And Related HIV/AIDS Orphaned ",
series='{0: {type:"line"}}'),chartid = "ER")
plot(Combo)
Malaria_Inc_Map <- googleVis::gvisGeoChart(Twentyfirst_Cen, locationvar ="Lat_Long", hovervar ="Country_Name",sizevar = "Percentage_Orphaned_byHIV", colorvar = "Population_total",
options=list(displayMode="Markers",
colorAxis="{colors:['purple', 'red', 'orange', 'grey', 'pink']}",
backgroundColor="lightblue"),
chartid="Lost_Their_Parents_To_HIV_AIDS")
plot(Malaria_Inc_Map )The chart shows that majority of the HIV/AIDS related death were rampant in the late Nineteen century than it were in the 20-21th century.