1 Final Project Proposal Requirement

Your final project is to create a public visualization using data relevant to a current policy, business, or justice issue. You may use any dataset you can find for this assignment, as long as it is either public or you have permission from the data’s owner/administrator to work with it and share it.

Recommended data sources are: governmental data, data provided by a non-profit/Nongovernmental organizations, and data available from large, semi-structured data sets (ie social networks, company financials, etc).

You must document each step of your data analysis process (excluding data acquisition) in code: this will include changing the format of the data and the creation of any images or interactive displays that are made.

You must also include a short (2-3 paragraph) write-up on the visualization. This write-up must include the following: the data source, what the parameters of the data set are (geography, timeframe, what the data points are, etc) what the data shows, and why it is important.

Grading:

This assignment will account for 40% of your final grade. Points will be awarded for the following components:

  • 25% - finding your dataset(s) and getting approval for your project on-time, recognition of strength/weaknesses of data, analysis to find insights in the data
  • 25% - data handling: cleaning, outlier/null handling, and transfer/loading data to the web
  • 40% - data presentation: compliance with best data visualization practices, clarity, information-to-ink ratio, how memorable the visualization is
  • 10% - contextual write-up: why the data is important, why the insights are important Due

Dates:

Note - The type of deliverable you provide will depend on the strategy you use for this project. If you put together an interactive visualization, you should be able to provide code that I will be able to run and host locally. If you are choosing static visualizations, your write up will be more important to your overall grade, and it may be useful to think about how you’re presenting these visualizations (in a formated R markdown document for example.)

Proposal

You must submit a proposal for your project by 03/26. This proposal must include: a link to the data source, an explanation of what you want to show, why this is relevant to a current policy, business, or justice issue, and which technologies you plan to use.

Your instructor must approve this proposal: you may have to refine this somewhat. You will present your final project during our last meetup. If you are not able to attend the lecture on those days, you must write-up a status report with screenshots of current progress and issues you are experiencing.

2 Pre-Requistes : Available Libraries

  • knitr
  • plyr
  • dplyr
  • sqldf
  • data.table
  • DT
  • kableExtra
  • ggplot2
  • reshape2
  • ggplot2
  • plotly
  • graphics
  • ggthemes
  • googleVis

3 Final Project Proposal

3.1 Aim:

The goal of this project is to:

  • Does demographics like Age, Sex, Location have impact on types of death, economic status?
  • Show the merit and demerit of visual analytics on data analysis, and how to improve it.
  • Make the visualized image(s) tells the story in an unambiguous way, understandable, even to a layman!

3.2 Methodology:

The project would lay more emphasis on the explanatory techniques. It will be used in making data presentation to the viewers in a more succinct way. I therefore plan to use the R programing language to explore and analysis the dataset.

The dataset to be used is the World Health Nutrition and Population Statistics from year 2000 to 2019 . This can be obtained from DataBankHealth Nutrition and Population Statistics, last updated on 12/20/2019.

Load the source dataset

Year Country_Name Country_Code Adults_15_living_HIV Adults_Children_0_14_15_living_HIV AIDS_estimated_deaths_UNAIDS Adults_children_0_14_15_newly_infected_HIV Adults_15_newly_infected_HIV Children_0_14_living_with_HIV Children_orphaned_by_HIV_AIDS Children_0_14_newly_infected_HIV Incidence_tuberculosis_per_100000 Labor_force_total Mortality_traffic_injury_100K Population_female Population_male Population_total Malaria_cases_reported Suicide_mortality_per_100K Tuberculosis_death_per_100K Tuberculosis_case_detection Tuberculosis_treatment_success_NewCases
200 2000 Ukraine UKR 170000 170000 4500 29000 29000 760 5600 500 114.0 23221521 NA 26284954 22890894 49175848 NA 36.90000 23.00 59 NA
201 2000 Upper middle income UMC NA NA NA NA NA NA NA NA 103.0 1179100250 NA 1149821735 1165590459 2317310149 NA 14.01896 9.80 48 81
202 2000 Uruguay URY 5900 6000 500 590 570 100 1100 100 22.0 1567214 NA 1713077 1606659 3319736 NA 17.40000 2.30 87 85
203 2000 United States USA NA NA NA NA NA NA NA NA 6.7 146767130 NA 143178430 138983981 282162411 NA 11.30000 0.32 87 83
204 2000 Uzbekistan UZB 14000 14000 840 2100 1900 730 7600 200 99.0 9733490 NA 12392841 12257559 24650400 126 7.60000 16.00 64 80
205 2000 St. Vincent and the Grenadines VCT NA NA NA NA NA NA NA NA 17.0 48284 NA 53486 54298 107784 NA 6.30000 3.20 87 100

World longitudes and latitudes

Country Country_Code Latitude Longtitude
Afghanistan AFG 33.0000 65.0
Albania ALB 41.0000 20.0
Algeria DZA 28.0000 3.0
American Samoa ASM -14.3333 -170.0
Andorra AND 42.5000 1.6
Angola AGO -12.5000 18.5

Cleaning and renaming of dataset and column respectively.

Merging column Longitude and Latitude together for a better coordinate to be in maps (googlevis)

Country_Code Year Country_Name Adults_15_living_HIV Adults_Children_0_14_15_living_HIV AIDS_estimated_deaths_UNAIDS Adults_children_0_14_15_newly_infected_HIV Adults_15_newly_infected_HIV Children_0_14_living_with_HIV Children_orphaned_by_HIV_AIDS Children_0_14_newly_infected_HIV Incidence_tuberculosis_per_100000 Labor_force_total Mortality_traffic_injury_100K Population_female Population_male Population_total Malaria_cases_reported Suicide_mortality_per_100K Tuberculosis_death_per_100K Tuberculosis_case_detection Tuberculosis_treatment_success_NewCases Country Latitude Longtitude Lat_Long
ABW 2006 Aruba NA NA NA NA NA NA NA NA 8.9 NA NA 52897 47937 100834 NA NA 0.73 NA NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW 2015 Aruba NA NA NA NA NA NA NA NA 11.0 NA NA 54743 49598 104341 NA NA 0.92 NA NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW 2017 Aruba NA NA NA NA NA NA NA NA 8.7 NA NA 55331 50035 105366 NA NA 0.72 87 NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW 2005 Aruba NA NA NA NA NA NA NA NA 8.6 NA NA 52456 47575 100031 NA NA 0.71 NA NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW 2010 Aruba NA NA NA NA NA NA NA NA 6.8 NA NA 53202 48467 101669 NA NA 0.56 87 NA Aruba 12.52111 -69.9667 12.52111:-69.9667
ABW 2003 Aruba NA NA NA NA NA NA NA NA 8.1 NA NA 50707 46310 97017 NA NA 0.67 NA NA Aruba 12.52111 -69.9667 12.52111:-69.9667

We are now to goint make use of sql to subset(query) columns so as to diffentiate between year 2000 and 2019 where the number children orphaned by HIV/AIDS more than 50000.

##   Country_Name Year Lat_Long Population_total Percentage_Orphaned_byHIV
## 1     Zimbabwe 2004   -20:30         12019912                  8.319528
## 2     Zimbabwe 2005   -20:30         12076699                  8.280408
## 3     Zimbabwe 2006   -20:30         12155491                  8.226735
## 4     Zimbabwe 2003   -20:30         11982224                  8.178782
## 5     Zimbabwe 2007   -20:30         12255922                  8.159321
## 6     Zimbabwe 2008   -20:30         12379549                  8.077839

The world map showing the countries where children are orphaned by HIV/AIDS (2000-2019)

From the map above, we can see that majority of countries where more that 50000 children lost their parents to HIV/AIDS are in southern part of Africa.

The chart below depicts the 20-21st Century Countries Where Most Children Are Losing Their Parent To HIV/AIDS

The chart shows that majority of the HIV/AIDS related death were rampant in the late Nineteen century than it were in the 20-21th century.