For this project i will be utilizing the “Aids.csv” file i got from CORGIS-Edu(https://corgis-edu.github.io/corgis/csv/aids/). This Data set was obtained from the UNAIDS Organization whos sole role is to reduce the transmission of AIDS while providing resources to countries affected by this disease. The particular data set i will be utilizing in this project contains information on the number of those affected by this disease, new cases being reported and Aids related deaths for a large set of countries spanning between 1990 - 2015.
##Chunk Information
In this chunk ill be installing all packages i need for my project and uploading the csv file and looking at the data in question utilizing the head,structure, glimpse and summary functions.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(countrycode)
readr::read_csv("aids.csv")
## Rows: 2759 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Country
## dbl (22): Year, Data.AIDS-Related Deaths.AIDS Orphans, Data.AIDS-Related Dea...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 2,759 × 23
## Country Year Data.AIDS-Related De…¹ Data.AIDS-Related De…²
## <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan 1990 100 100
## 2 Algeria 1990 200 100
## 3 Angola 1990 1300 500
## 4 Argentina 1990 500 200
## 5 Armenia 1990 100 100
## 6 Azerbaijan 1990 100 100
## 7 Benin 1990 2800 1000
## 8 Bolivia (Plurinational S… 1990 200 100
## 9 Botswana 1990 1800 500
## 10 Burkina Faso 1990 16000 3200
## # ℹ 2,749 more rows
## # ℹ abbreviated names: ¹`Data.AIDS-Related Deaths.AIDS Orphans`,
## # ²`Data.AIDS-Related Deaths.Adults`
## # ℹ 19 more variables: `Data.AIDS-Related Deaths.All Ages` <dbl>,
## # `Data.AIDS-Related Deaths.Children` <dbl>,
## # `Data.AIDS-Related Deaths.Female Adults` <dbl>,
## # `Data.AIDS-Related Deaths.Male Adults` <dbl>, …
data <- read.csv("aids.csv")
str(data)
## 'data.frame': 2759 obs. of 23 variables:
## $ Country : chr "Afghanistan" "Algeria" "Angola" "Argentina" ...
## $ Year : int 1990 1990 1990 1990 1990 1990 1990 1990 1990 1990 ...
## $ Data.AIDS.Related.Deaths.AIDS.Orphans : int 100 200 1300 500 100 100 2800 200 1800 16000 ...
## $ Data.AIDS.Related.Deaths.Adults : int 100 100 500 200 100 100 1000 100 500 3200 ...
## $ Data.AIDS.Related.Deaths.All.Ages : int 100 100 1000 500 100 100 1000 100 1000 7200 ...
## $ Data.AIDS.Related.Deaths.Children : int 100 100 500 100 100 100 1000 100 1000 4000 ...
## $ Data.AIDS.Related.Deaths.Female.Adults : int 100 100 200 100 100 100 500 100 500 1600 ...
## $ Data.AIDS.Related.Deaths.Male.Adults : int 100 100 200 200 100 100 500 100 500 1700 ...
## $ Data.HIV.Prevalence.Adults : num 0.1 0.1 0.2 0.1 0.1 0.1 0.8 0.1 5.7 2.6 ...
## $ Data.HIV.Prevalence.Young.Men : num 0.1 0.1 0.1 0.1 0.1 0.1 0.3 0.1 2.8 1.4 ...
## $ Data.HIV.Prevalence.Young.Women : num 0.1 0.1 0.2 0.1 0.1 0.1 0.8 0.1 6.7 2.4 ...
## $ Data.New.HIV.Infections.Young.Adults : int 100 100 2600 4100 100 100 3900 500 14000 15000 ...
## $ Data.New.HIV.Infections.Male.Adults : int 100 100 1200 3100 100 100 1700 200 6600 7800 ...
## $ Data.New.HIV.Infections.Female.Adults : int 100 100 1700 1200 100 100 2400 200 8700 8800 ...
## $ Data.New.HIV.Infections.Children : int 100 100 1000 200 100 100 1100 100 1200 8100 ...
## $ Data.New.HIV.Infections.All.Ages : int 100 100 3400 4500 100 100 5300 500 16000 25000 ...
## $ Data.New.HIV.Infections.Adults : int 100 100 2800 4400 100 100 4200 500 15000 17000 ...
## $ Data.New.HIV.Infections.Incidence.Rate.Among.Adults: num 0.01 0.01 0.47 0.19 0.01 ...
## $ Data.People.Living.with.HIV.Total : int 500 500 12000 13000 100 200 21000 1200 40000 130000 ...
## $ Data.People.Living.with.HIV.Male.Adults : int 500 500 4600 9100 100 100 8100 1000 17000 53000 ...
## $ Data.People.Living.with.HIV.Female.Adults : int 100 200 6100 3700 100 100 11000 500 22000 56000 ...
## $ Data.People.Living.with.HIV.Children : int 100 100 1100 200 100 100 2300 100 1800 19000 ...
## $ Data.People.Living.with.HIV.Adults : int 500 500 11000 13000 100 200 19000 1100 38000 110000 ...
head(data)
## Country Year Data.AIDS.Related.Deaths.AIDS.Orphans
## 1 Afghanistan 1990 100
## 2 Algeria 1990 200
## 3 Angola 1990 1300
## 4 Argentina 1990 500
## 5 Armenia 1990 100
## 6 Azerbaijan 1990 100
## Data.AIDS.Related.Deaths.Adults Data.AIDS.Related.Deaths.All.Ages
## 1 100 100
## 2 100 100
## 3 500 1000
## 4 200 500
## 5 100 100
## 6 100 100
## Data.AIDS.Related.Deaths.Children Data.AIDS.Related.Deaths.Female.Adults
## 1 100 100
## 2 100 100
## 3 500 200
## 4 100 100
## 5 100 100
## 6 100 100
## Data.AIDS.Related.Deaths.Male.Adults Data.HIV.Prevalence.Adults
## 1 100 0.1
## 2 100 0.1
## 3 200 0.2
## 4 200 0.1
## 5 100 0.1
## 6 100 0.1
## Data.HIV.Prevalence.Young.Men Data.HIV.Prevalence.Young.Women
## 1 0.1 0.1
## 2 0.1 0.1
## 3 0.1 0.2
## 4 0.1 0.1
## 5 0.1 0.1
## 6 0.1 0.1
## Data.New.HIV.Infections.Young.Adults Data.New.HIV.Infections.Male.Adults
## 1 100 100
## 2 100 100
## 3 2600 1200
## 4 4100 3100
## 5 100 100
## 6 100 100
## Data.New.HIV.Infections.Female.Adults Data.New.HIV.Infections.Children
## 1 100 100
## 2 100 100
## 3 1700 1000
## 4 1200 200
## 5 100 100
## 6 100 100
## Data.New.HIV.Infections.All.Ages Data.New.HIV.Infections.Adults
## 1 100 100
## 2 100 100
## 3 3400 2800
## 4 4500 4400
## 5 100 100
## 6 100 100
## Data.New.HIV.Infections.Incidence.Rate.Among.Adults
## 1 0.01
## 2 0.01
## 3 0.47
## 4 0.19
## 5 0.01
## 6 0.01
## Data.People.Living.with.HIV.Total Data.People.Living.with.HIV.Male.Adults
## 1 500 500
## 2 500 500
## 3 12000 4600
## 4 13000 9100
## 5 100 100
## 6 200 100
## Data.People.Living.with.HIV.Female.Adults
## 1 100
## 2 200
## 3 6100
## 4 3700
## 5 100
## 6 100
## Data.People.Living.with.HIV.Children Data.People.Living.with.HIV.Adults
## 1 100 500
## 2 100 500
## 3 1100 11000
## 4 200 13000
## 5 100 100
## 6 100 200
glimpse(data)
## Rows: 2,759
## Columns: 23
## $ Country <chr> "Afghanistan", "Al…
## $ Year <int> 1990, 1990, 1990, …
## $ Data.AIDS.Related.Deaths.AIDS.Orphans <int> 100, 200, 1300, 50…
## $ Data.AIDS.Related.Deaths.Adults <int> 100, 100, 500, 200…
## $ Data.AIDS.Related.Deaths.All.Ages <int> 100, 100, 1000, 50…
## $ Data.AIDS.Related.Deaths.Children <int> 100, 100, 500, 100…
## $ Data.AIDS.Related.Deaths.Female.Adults <int> 100, 100, 200, 100…
## $ Data.AIDS.Related.Deaths.Male.Adults <int> 100, 100, 200, 200…
## $ Data.HIV.Prevalence.Adults <dbl> 0.1, 0.1, 0.2, 0.1…
## $ Data.HIV.Prevalence.Young.Men <dbl> 0.1, 0.1, 0.1, 0.1…
## $ Data.HIV.Prevalence.Young.Women <dbl> 0.1, 0.1, 0.2, 0.1…
## $ Data.New.HIV.Infections.Young.Adults <int> 100, 100, 2600, 41…
## $ Data.New.HIV.Infections.Male.Adults <int> 100, 100, 1200, 31…
## $ Data.New.HIV.Infections.Female.Adults <int> 100, 100, 1700, 12…
## $ Data.New.HIV.Infections.Children <int> 100, 100, 1000, 20…
## $ Data.New.HIV.Infections.All.Ages <int> 100, 100, 3400, 45…
## $ Data.New.HIV.Infections.Adults <int> 100, 100, 2800, 44…
## $ Data.New.HIV.Infections.Incidence.Rate.Among.Adults <dbl> 0.01, 0.01, 0.47, …
## $ Data.People.Living.with.HIV.Total <int> 500, 500, 12000, 1…
## $ Data.People.Living.with.HIV.Male.Adults <int> 500, 500, 4600, 91…
## $ Data.People.Living.with.HIV.Female.Adults <int> 100, 200, 6100, 37…
## $ Data.People.Living.with.HIV.Children <int> 100, 100, 1100, 20…
## $ Data.People.Living.with.HIV.Adults <int> 500, 500, 11000, 1…
summary(data)
## Country Year Data.AIDS.Related.Deaths.AIDS.Orphans
## Length:2759 Min. :1990 Min. : 100
## Class :character 1st Qu.:1997 1st Qu.: 1200
## Mode :character Median :2005 Median : 12000
## Mean :2005 Mean : 104294
## 3rd Qu.:2013 3rd Qu.: 66000
## Max. :2020 Max. :1800000
## Data.AIDS.Related.Deaths.Adults Data.AIDS.Related.Deaths.All.Ages
## Min. : 100 Min. : 100
## 1st Qu.: 200 1st Qu.: 500
## Median : 1100 Median : 1400
## Mean : 7755 Mean : 10172
## 3rd Qu.: 5000 3rd Qu.: 6600
## Max. :220000 Max. :270000
## Data.AIDS.Related.Deaths.Children Data.AIDS.Related.Deaths.Female.Adults
## Min. : 100 Min. : 100
## 1st Qu.: 100 1st Qu.: 100
## Median : 500 Median : 500
## Mean : 2499 Mean : 4134
## 3rd Qu.: 1700 3rd Qu.: 2400
## Max. :49000 Max. :130000
## Data.AIDS.Related.Deaths.Male.Adults Data.HIV.Prevalence.Adults
## Min. : 100 Min. : 0.100
## 1st Qu.: 100 1st Qu.: 0.100
## Median : 1000 Median : 0.600
## Mean : 3711 Mean : 2.601
## 3rd Qu.: 2800 3rd Qu.: 2.100
## Max. :92000 Max. :28.900
## Data.HIV.Prevalence.Young.Men Data.HIV.Prevalence.Young.Women
## Min. :0.1000 Min. : 0.100
## 1st Qu.:0.1000 1st Qu.: 0.100
## Median :0.1000 Median : 0.200
## Mean :0.6811 Mean : 1.754
## 3rd Qu.:0.6000 3rd Qu.: 1.400
## Max. :8.9000 Max. :23.900
## Data.New.HIV.Infections.Young.Adults Data.New.HIV.Infections.Male.Adults
## Min. : 100 Min. : 100
## 1st Qu.: 1000 1st Qu.: 500
## Median : 2700 Median : 1600
## Mean : 15691 Mean : 7341
## 3rd Qu.: 10000 3rd Qu.: 5500
## Max. :460000 Max. :190000
## Data.New.HIV.Infections.Female.Adults Data.New.HIV.Infections.Children
## Min. : 100 Min. : 100
## 1st Qu.: 500 1st Qu.: 100
## Median : 1100 Median : 500
## Mean : 9440 Mean : 3763
## 3rd Qu.: 5400 3rd Qu.: 2500
## Max. :280000 Max. :73000
## Data.New.HIV.Infections.All.Ages Data.New.HIV.Infections.Adults
## Min. : 100 Min. : 100
## 1st Qu.: 1000 1st Qu.: 1000
## Median : 3600 Median : 2900
## Mean : 20377 Mean : 16715
## 3rd Qu.: 13000 3rd Qu.: 11000
## Max. :520000 Max. :470000
## Data.New.HIV.Infections.Incidence.Rate.Among.Adults
## Min. : 0.010
## 1st Qu.: 0.140
## Median : 0.400
## Mean : 2.447
## 3rd Qu.: 1.860
## Max. :45.110
## Data.People.Living.with.HIV.Total Data.People.Living.with.HIV.Male.Adults
## Min. : 100 Min. : 100
## 1st Qu.: 7900 1st Qu.: 4300
## Median : 37000 Median : 17000
## Mean : 229503 Mean : 88794
## 3rd Qu.: 140000 3rd Qu.: 63000
## Max. :7800000 Max. :2700000
## Data.People.Living.with.HIV.Female.Adults Data.People.Living.with.HIV.Children
## Min. : 100 Min. : 100
## 1st Qu.: 2700 1st Qu.: 200
## Median : 13000 Median : 1700
## Mean : 120779 Mean : 20145
## 3rd Qu.: 66500 3rd Qu.: 13000
## Max. :4800000 Max. :380000
## Data.People.Living.with.HIV.Adults
## Min. : 100
## 1st Qu.: 7500
## Median : 34000
## Mean : 209528
## 3rd Qu.: 130000
## Max. :7500000
This Chunk will be utilized to clean my data set before i start my project, all i did was utilize the clean names function because i didnt notice any missing values in the dataset.
Clean_Data <- data %>% clean_names()
names(Clean_Data)
## [1] "country"
## [2] "year"
## [3] "data_aids_related_deaths_aids_orphans"
## [4] "data_aids_related_deaths_adults"
## [5] "data_aids_related_deaths_all_ages"
## [6] "data_aids_related_deaths_children"
## [7] "data_aids_related_deaths_female_adults"
## [8] "data_aids_related_deaths_male_adults"
## [9] "data_hiv_prevalence_adults"
## [10] "data_hiv_prevalence_young_men"
## [11] "data_hiv_prevalence_young_women"
## [12] "data_new_hiv_infections_young_adults"
## [13] "data_new_hiv_infections_male_adults"
## [14] "data_new_hiv_infections_female_adults"
## [15] "data_new_hiv_infections_children"
## [16] "data_new_hiv_infections_all_ages"
## [17] "data_new_hiv_infections_adults"
## [18] "data_new_hiv_infections_incidence_rate_among_adults"
## [19] "data_people_living_with_hiv_total"
## [20] "data_people_living_with_hiv_male_adults"
## [21] "data_people_living_with_hiv_female_adults"
## [22] "data_people_living_with_hiv_children"
## [23] "data_people_living_with_hiv_adults"
Clean_Data <- Clean_Data %>%
mutate(country_region = countrycode(country,
origin = "country.name",
destination = "continent"))
# Checking for unmapped countries
Clean_Data %>% filter(is.na(country_region)) %>% distinct(country)
## [1] country
## <0 rows> (or 0-length row.names)
# Optional: Fill NAs with "Other"
Clean_Data <- Clean_Data %>%
mutate(country_region = replace_na(country_region, "Other"))
In this chunk i will be creating a nultiple linear regression modeling Aids related deaths as the funtion of People living with HIV and New HIV infections.
model <- lm(data_aids_related_deaths_all_ages ~ data_people_living_with_hiv_total + data_new_hiv_infections_all_ages, data = Clean_Data)
summary(model)
##
## Call:
## lm(formula = data_aids_related_deaths_all_ages ~ data_people_living_with_hiv_total +
## data_new_hiv_infections_all_ages, data = Clean_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -86159 -2197 -1922 -658 83331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.995e+03 2.412e+02 8.27 <2e-16 ***
## data_people_living_with_hiv_total 1.350e-02 6.560e-04 20.57 <2e-16 ***
## data_new_hiv_infections_all_ages 2.493e-01 7.846e-03 31.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11740 on 2756 degrees of freedom
## Multiple R-squared: 0.7479, Adjusted R-squared: 0.7477
## F-statistic: 4087 on 2 and 2756 DF, p-value: < 2.2e-16
# based on the output of my model my equation is Death(All ages) = 1995 + (0.0135 x Total PLHIV) + ( 0.2493 x New Infection)
# This means that for every 1 additional new HIOV infection, the prediction is that approximately 0.25 additional deaths will occur, keeping the total number of people living with HIv constant.
##Plotting results
# Spliting the plotting area into 2 rows and 2 columns
par(mfrow = c(2, 2))
# Generate the diagnostic plots
plot(model)
# Reset plotting area back to 1x1
par(mfrow = c(1, 1))
ggplot(Clean_Data, aes(x = data_people_living_with_hiv_total,
y = data_aids_related_deaths_all_ages,
color = country_region)) +
geom_point(alpha = 0.6) +
scale_color_brewer(palette = "Set1") + # Professional non-default palette
labs(title = "AIDS Deaths vs. HIV Prevalence by Region",
color = "Region") +
theme_minimal()
I had to clean the data set first by using the clean names function which which i got form the Janitor package to standardize the column names of my data set putting it into a machine friendly format. Then i had to use mutate to create a country_region region for my final plot which required an installation of the countrycode package. After this due to recieving errors while trying to create my final plot, i had to research why it was happening and post research i found out that my data set had older names in my csv. This required me to filter using the is.na funtion, as well as mutate again to correct those errors. My final visualization represents Aids death pitted against HIV prevalence by region utilizing people living with HIV and Aids related deaths for all ages. I already knew Africa held the title for both but seeing the visualization put things more in perspective for me. I wanted to see the statisitics in children but that was also providing me with errors but after turning in this assignments i will be researching this as well.