Introduction

For this project i will be utilizing the “Aids.csv” file i got from CORGIS-Edu(https://corgis-edu.github.io/corgis/csv/aids/). This Data set was obtained from the UNAIDS Organization whos sole role is to reduce the transmission of AIDS while providing resources to countries affected by this disease. My question is what is the relationship between the total number of people living with HIV and the number of Aids related deaths across different regions from 1990 - 2015.

Chunk Description

In the first chunk ill be installing all packages i need for my project and uploading the csv file and looking at the data in question utilizing the head,structure, glimpse and summary functions.The particular data set i will be utilizing in this project contains information on the number of those affected by this disease, new cases being reported and Aids related deaths for a large set of countries spanning between 1990 - 2015.

Cleaning My Data

This Chunk will be utilized to clean my data set before i start my project, i utilize the mutate and filter funtion to create continents for my project. I also used the mutate fybtion to replace Na with “other” to not have any missing information.

Clean_Data <- data %>% clean_names()
names(Clean_Data)
##  [1] "country"                                            
##  [2] "year"                                               
##  [3] "data_aids_related_deaths_aids_orphans"              
##  [4] "data_aids_related_deaths_adults"                    
##  [5] "data_aids_related_deaths_all_ages"                  
##  [6] "data_aids_related_deaths_children"                  
##  [7] "data_aids_related_deaths_female_adults"             
##  [8] "data_aids_related_deaths_male_adults"               
##  [9] "data_hiv_prevalence_adults"                         
## [10] "data_hiv_prevalence_young_men"                      
## [11] "data_hiv_prevalence_young_women"                    
## [12] "data_new_hiv_infections_young_adults"               
## [13] "data_new_hiv_infections_male_adults"                
## [14] "data_new_hiv_infections_female_adults"              
## [15] "data_new_hiv_infections_children"                   
## [16] "data_new_hiv_infections_all_ages"                   
## [17] "data_new_hiv_infections_adults"                     
## [18] "data_new_hiv_infections_incidence_rate_among_adults"
## [19] "data_people_living_with_hiv_total"                  
## [20] "data_people_living_with_hiv_male_adults"            
## [21] "data_people_living_with_hiv_female_adults"          
## [22] "data_people_living_with_hiv_children"               
## [23] "data_people_living_with_hiv_adults"
Clean_Data <- Clean_Data %>%
  mutate(country_region = countrycode(country, 
                                      origin = "country.name", 
                                      destination = "continent"))
# Checking for unmapped countries
Clean_Data %>% filter(is.na(country_region)) %>% distinct(country)
## [1] country
## <0 rows> (or 0-length row.names)
# Optional: Fill NAs with "Other"
Clean_Data <- Clean_Data %>%
  mutate(country_region = replace_na(country_region, "Other"))

Creating Scatterplot

In this chunk i will be creatting a scatterplot to examine the relationship between the total number of people living with HIV and the number of Aids related deaths across different Continents.

Geom_point is used to create the actual scatter plot, for the color palattes i used the Scale_color_brewer and Theme_minimal is used to remove the grey background initially was getting.

I will be using the aes() function to map my data set columns to visual properties, the X-axis is people living with HIV while the Y-axis represents aids related deaths for all ages. I specified different colors to represent different specific geographic regions as not to create a confusing scatterplot.

Conclusion

The final product shows that there is a positive correlation between death related HIV and those living with HIV meaning that a higher number of peolpe living with HIV generally means a higher number of Aids related deaths. Africa seems to have more outliers than other regions but this seems vary credible given the media portrayal of the continent having the most amount of Aids patients, compounded by the fact that they are classified as “underdeveloped” by many nations. This result answered my question and even though i theorized Africa being the answer prior to my actual visualization it still overwhelming to see by how much.I would like to do something similar but for children in a future research as i would like to note any differences if any.

References

Aids.csv data set : CORGIS-Edu(https://corgis-edu.github.io/corgis/csv/aids/)