Project 1 Data 110

Intorduction

For this project i will be utilizing the “Aids.csv” file i got from CORGIS-Edu(https://corgis-edu.github.io/corgis/csv/aids/). This Data set was obtained from the UNAIDS Organization whos sole role is to reduce the transmission of AIDS while providing resources to countries affected by this disease. The particular data set i will be utilizing in this project contains information on the number of those affected by this disease, new cases being reported and Aids related deaths for a large set of countries spanning between 1990 - 2015.

##Chunk Information

In this chunk ill be installing all packages i need for my project and uploading the csv file and looking at the data in question utilizing the head,structure, glimpse and summary functions.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(countrycode)

readr::read_csv("aids.csv")

## Rows: 2759 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): Country
## dbl (22): Year, Data.AIDS-Related Deaths.AIDS Orphans, Data.AIDS-Related Dea...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 2,759 × 23
##    Country                    Year Data.AIDS-Related De…¹ Data.AIDS-Related De…²
##    <chr>                     <dbl>                  <dbl>                  <dbl>
##  1 Afghanistan                1990                    100                    100
##  2 Algeria                    1990                    200                    100
##  3 Angola                     1990                   1300                    500
##  4 Argentina                  1990                    500                    200
##  5 Armenia                    1990                    100                    100
##  6 Azerbaijan                 1990                    100                    100
##  7 Benin                      1990                   2800                   1000
##  8 Bolivia (Plurinational S…  1990                    200                    100
##  9 Botswana                   1990                   1800                    500
## 10 Burkina Faso               1990                  16000                   3200
## # ℹ 2,749 more rows
## # ℹ abbreviated names: ¹`Data.AIDS-Related Deaths.AIDS Orphans`,
## #   ²`Data.AIDS-Related Deaths.Adults`
## # ℹ 19 more variables: `Data.AIDS-Related Deaths.All Ages` <dbl>,
## #   `Data.AIDS-Related Deaths.Children` <dbl>,
## #   `Data.AIDS-Related Deaths.Female Adults` <dbl>,
## #   `Data.AIDS-Related Deaths.Male Adults` <dbl>, …

data <- read.csv("aids.csv")
str(data)

## 'data.frame':    2759 obs. of  23 variables:
##  $ Country                                            : chr  "Afghanistan" "Algeria" "Angola" "Argentina" ...
##  $ Year                                               : int  1990 1990 1990 1990 1990 1990 1990 1990 1990 1990 ...
##  $ Data.AIDS.Related.Deaths.AIDS.Orphans              : int  100 200 1300 500 100 100 2800 200 1800 16000 ...
##  $ Data.AIDS.Related.Deaths.Adults                    : int  100 100 500 200 100 100 1000 100 500 3200 ...
##  $ Data.AIDS.Related.Deaths.All.Ages                  : int  100 100 1000 500 100 100 1000 100 1000 7200 ...
##  $ Data.AIDS.Related.Deaths.Children                  : int  100 100 500 100 100 100 1000 100 1000 4000 ...
##  $ Data.AIDS.Related.Deaths.Female.Adults             : int  100 100 200 100 100 100 500 100 500 1600 ...
##  $ Data.AIDS.Related.Deaths.Male.Adults               : int  100 100 200 200 100 100 500 100 500 1700 ...
##  $ Data.HIV.Prevalence.Adults                         : num  0.1 0.1 0.2 0.1 0.1 0.1 0.8 0.1 5.7 2.6 ...
##  $ Data.HIV.Prevalence.Young.Men                      : num  0.1 0.1 0.1 0.1 0.1 0.1 0.3 0.1 2.8 1.4 ...
##  $ Data.HIV.Prevalence.Young.Women                    : num  0.1 0.1 0.2 0.1 0.1 0.1 0.8 0.1 6.7 2.4 ...
##  $ Data.New.HIV.Infections.Young.Adults               : int  100 100 2600 4100 100 100 3900 500 14000 15000 ...
##  $ Data.New.HIV.Infections.Male.Adults                : int  100 100 1200 3100 100 100 1700 200 6600 7800 ...
##  $ Data.New.HIV.Infections.Female.Adults              : int  100 100 1700 1200 100 100 2400 200 8700 8800 ...
##  $ Data.New.HIV.Infections.Children                   : int  100 100 1000 200 100 100 1100 100 1200 8100 ...
##  $ Data.New.HIV.Infections.All.Ages                   : int  100 100 3400 4500 100 100 5300 500 16000 25000 ...
##  $ Data.New.HIV.Infections.Adults                     : int  100 100 2800 4400 100 100 4200 500 15000 17000 ...
##  $ Data.New.HIV.Infections.Incidence.Rate.Among.Adults: num  0.01 0.01 0.47 0.19 0.01 ...
##  $ Data.People.Living.with.HIV.Total                  : int  500 500 12000 13000 100 200 21000 1200 40000 130000 ...
##  $ Data.People.Living.with.HIV.Male.Adults            : int  500 500 4600 9100 100 100 8100 1000 17000 53000 ...
##  $ Data.People.Living.with.HIV.Female.Adults          : int  100 200 6100 3700 100 100 11000 500 22000 56000 ...
##  $ Data.People.Living.with.HIV.Children               : int  100 100 1100 200 100 100 2300 100 1800 19000 ...
##  $ Data.People.Living.with.HIV.Adults                 : int  500 500 11000 13000 100 200 19000 1100 38000 110000 ...

head(data)

##       Country Year Data.AIDS.Related.Deaths.AIDS.Orphans
## 1 Afghanistan 1990                                   100
## 2     Algeria 1990                                   200
## 3      Angola 1990                                  1300
## 4   Argentina 1990                                   500
## 5     Armenia 1990                                   100
## 6  Azerbaijan 1990                                   100
##   Data.AIDS.Related.Deaths.Adults Data.AIDS.Related.Deaths.All.Ages
## 1                             100                               100
## 2                             100                               100
## 3                             500                              1000
## 4                             200                               500
## 5                             100                               100
## 6                             100                               100
##   Data.AIDS.Related.Deaths.Children Data.AIDS.Related.Deaths.Female.Adults
## 1                               100                                    100
## 2                               100                                    100
## 3                               500                                    200
## 4                               100                                    100
## 5                               100                                    100
## 6                               100                                    100
##   Data.AIDS.Related.Deaths.Male.Adults Data.HIV.Prevalence.Adults
## 1                                  100                        0.1
## 2                                  100                        0.1
## 3                                  200                        0.2
## 4                                  200                        0.1
## 5                                  100                        0.1
## 6                                  100                        0.1
##   Data.HIV.Prevalence.Young.Men Data.HIV.Prevalence.Young.Women
## 1                           0.1                             0.1
## 2                           0.1                             0.1
## 3                           0.1                             0.2
## 4                           0.1                             0.1
## 5                           0.1                             0.1
## 6                           0.1                             0.1
##   Data.New.HIV.Infections.Young.Adults Data.New.HIV.Infections.Male.Adults
## 1                                  100                                 100
## 2                                  100                                 100
## 3                                 2600                                1200
## 4                                 4100                                3100
## 5                                  100                                 100
## 6                                  100                                 100
##   Data.New.HIV.Infections.Female.Adults Data.New.HIV.Infections.Children
## 1                                   100                              100
## 2                                   100                              100
## 3                                  1700                             1000
## 4                                  1200                              200
## 5                                   100                              100
## 6                                   100                              100
##   Data.New.HIV.Infections.All.Ages Data.New.HIV.Infections.Adults
## 1                              100                            100
## 2                              100                            100
## 3                             3400                           2800
## 4                             4500                           4400
## 5                              100                            100
## 6                              100                            100
##   Data.New.HIV.Infections.Incidence.Rate.Among.Adults
## 1                                                0.01
## 2                                                0.01
## 3                                                0.47
## 4                                                0.19
## 5                                                0.01
## 6                                                0.01
##   Data.People.Living.with.HIV.Total Data.People.Living.with.HIV.Male.Adults
## 1                               500                                     500
## 2                               500                                     500
## 3                             12000                                    4600
## 4                             13000                                    9100
## 5                               100                                     100
## 6                               200                                     100
##   Data.People.Living.with.HIV.Female.Adults
## 1                                       100
## 2                                       200
## 3                                      6100
## 4                                      3700
## 5                                       100
## 6                                       100
##   Data.People.Living.with.HIV.Children Data.People.Living.with.HIV.Adults
## 1                                  100                                500
## 2                                  100                                500
## 3                                 1100                              11000
## 4                                  200                              13000
## 5                                  100                                100
## 6                                  100                                200

glimpse(data)

## Rows: 2,759
## Columns: 23
## $ Country                                             <chr> "Afghanistan", "Al…
## $ Year                                                <int> 1990, 1990, 1990, …
## $ Data.AIDS.Related.Deaths.AIDS.Orphans               <int> 100, 200, 1300, 50…
## $ Data.AIDS.Related.Deaths.Adults                     <int> 100, 100, 500, 200…
## $ Data.AIDS.Related.Deaths.All.Ages                   <int> 100, 100, 1000, 50…
## $ Data.AIDS.Related.Deaths.Children                   <int> 100, 100, 500, 100…
## $ Data.AIDS.Related.Deaths.Female.Adults              <int> 100, 100, 200, 100…
## $ Data.AIDS.Related.Deaths.Male.Adults                <int> 100, 100, 200, 200…
## $ Data.HIV.Prevalence.Adults                          <dbl> 0.1, 0.1, 0.2, 0.1…
## $ Data.HIV.Prevalence.Young.Men                       <dbl> 0.1, 0.1, 0.1, 0.1…
## $ Data.HIV.Prevalence.Young.Women                     <dbl> 0.1, 0.1, 0.2, 0.1…
## $ Data.New.HIV.Infections.Young.Adults                <int> 100, 100, 2600, 41…
## $ Data.New.HIV.Infections.Male.Adults                 <int> 100, 100, 1200, 31…
## $ Data.New.HIV.Infections.Female.Adults               <int> 100, 100, 1700, 12…
## $ Data.New.HIV.Infections.Children                    <int> 100, 100, 1000, 20…
## $ Data.New.HIV.Infections.All.Ages                    <int> 100, 100, 3400, 45…
## $ Data.New.HIV.Infections.Adults                      <int> 100, 100, 2800, 44…
## $ Data.New.HIV.Infections.Incidence.Rate.Among.Adults <dbl> 0.01, 0.01, 0.47, …
## $ Data.People.Living.with.HIV.Total                   <int> 500, 500, 12000, 1…
## $ Data.People.Living.with.HIV.Male.Adults             <int> 500, 500, 4600, 91…
## $ Data.People.Living.with.HIV.Female.Adults           <int> 100, 200, 6100, 37…
## $ Data.People.Living.with.HIV.Children                <int> 100, 100, 1100, 20…
## $ Data.People.Living.with.HIV.Adults                  <int> 500, 500, 11000, 1…

summary(data)

##    Country               Year      Data.AIDS.Related.Deaths.AIDS.Orphans
##  Length:2759        Min.   :1990   Min.   :    100                      
##  Class :character   1st Qu.:1997   1st Qu.:   1200                      
##  Mode  :character   Median :2005   Median :  12000                      
##                     Mean   :2005   Mean   : 104294                      
##                     3rd Qu.:2013   3rd Qu.:  66000                      
##                     Max.   :2020   Max.   :1800000                      
##  Data.AIDS.Related.Deaths.Adults Data.AIDS.Related.Deaths.All.Ages
##  Min.   :   100                  Min.   :   100                   
##  1st Qu.:   200                  1st Qu.:   500                   
##  Median :  1100                  Median :  1400                   
##  Mean   :  7755                  Mean   : 10172                   
##  3rd Qu.:  5000                  3rd Qu.:  6600                   
##  Max.   :220000                  Max.   :270000                   
##  Data.AIDS.Related.Deaths.Children Data.AIDS.Related.Deaths.Female.Adults
##  Min.   :  100                     Min.   :   100                        
##  1st Qu.:  100                     1st Qu.:   100                        
##  Median :  500                     Median :   500                        
##  Mean   : 2499                     Mean   :  4134                        
##  3rd Qu.: 1700                     3rd Qu.:  2400                        
##  Max.   :49000                     Max.   :130000                        
##  Data.AIDS.Related.Deaths.Male.Adults Data.HIV.Prevalence.Adults
##  Min.   :  100                        Min.   : 0.100            
##  1st Qu.:  100                        1st Qu.: 0.100            
##  Median : 1000                        Median : 0.600            
##  Mean   : 3711                        Mean   : 2.601            
##  3rd Qu.: 2800                        3rd Qu.: 2.100            
##  Max.   :92000                        Max.   :28.900            
##  Data.HIV.Prevalence.Young.Men Data.HIV.Prevalence.Young.Women
##  Min.   :0.1000                Min.   : 0.100                 
##  1st Qu.:0.1000                1st Qu.: 0.100                 
##  Median :0.1000                Median : 0.200                 
##  Mean   :0.6811                Mean   : 1.754                 
##  3rd Qu.:0.6000                3rd Qu.: 1.400                 
##  Max.   :8.9000                Max.   :23.900                 
##  Data.New.HIV.Infections.Young.Adults Data.New.HIV.Infections.Male.Adults
##  Min.   :   100                       Min.   :   100                     
##  1st Qu.:  1000                       1st Qu.:   500                     
##  Median :  2700                       Median :  1600                     
##  Mean   : 15691                       Mean   :  7341                     
##  3rd Qu.: 10000                       3rd Qu.:  5500                     
##  Max.   :460000                       Max.   :190000                     
##  Data.New.HIV.Infections.Female.Adults Data.New.HIV.Infections.Children
##  Min.   :   100                        Min.   :  100                   
##  1st Qu.:   500                        1st Qu.:  100                   
##  Median :  1100                        Median :  500                   
##  Mean   :  9440                        Mean   : 3763                   
##  3rd Qu.:  5400                        3rd Qu.: 2500                   
##  Max.   :280000                        Max.   :73000                   
##  Data.New.HIV.Infections.All.Ages Data.New.HIV.Infections.Adults
##  Min.   :   100                   Min.   :   100                
##  1st Qu.:  1000                   1st Qu.:  1000                
##  Median :  3600                   Median :  2900                
##  Mean   : 20377                   Mean   : 16715                
##  3rd Qu.: 13000                   3rd Qu.: 11000                
##  Max.   :520000                   Max.   :470000                
##  Data.New.HIV.Infections.Incidence.Rate.Among.Adults
##  Min.   : 0.010                                     
##  1st Qu.: 0.140                                     
##  Median : 0.400                                     
##  Mean   : 2.447                                     
##  3rd Qu.: 1.860                                     
##  Max.   :45.110                                     
##  Data.People.Living.with.HIV.Total Data.People.Living.with.HIV.Male.Adults
##  Min.   :    100                   Min.   :    100                        
##  1st Qu.:   7900                   1st Qu.:   4300                        
##  Median :  37000                   Median :  17000                        
##  Mean   : 229503                   Mean   :  88794                        
##  3rd Qu.: 140000                   3rd Qu.:  63000                        
##  Max.   :7800000                   Max.   :2700000                        
##  Data.People.Living.with.HIV.Female.Adults Data.People.Living.with.HIV.Children
##  Min.   :    100                           Min.   :   100                      
##  1st Qu.:   2700                           1st Qu.:   200                      
##  Median :  13000                           Median :  1700                      
##  Mean   : 120779                           Mean   : 20145                      
##  3rd Qu.:  66500                           3rd Qu.: 13000                      
##  Max.   :4800000                           Max.   :380000                      
##  Data.People.Living.with.HIV.Adults
##  Min.   :    100                   
##  1st Qu.:   7500                   
##  Median :  34000                   
##  Mean   : 209528                   
##  3rd Qu.: 130000                   
##  Max.   :7500000

Cleaning Data

This Chunk will be utilized to clean my data set before i start my project, all i did was utilize the clean names function because i didnt notice any missing values in the dataset.

Clean_Data <- data %>% clean_names()
names(Clean_Data)

##  [1] "country"                                            
##  [2] "year"                                               
##  [3] "data_aids_related_deaths_aids_orphans"              
##  [4] "data_aids_related_deaths_adults"                    
##  [5] "data_aids_related_deaths_all_ages"                  
##  [6] "data_aids_related_deaths_children"                  
##  [7] "data_aids_related_deaths_female_adults"             
##  [8] "data_aids_related_deaths_male_adults"               
##  [9] "data_hiv_prevalence_adults"                         
## [10] "data_hiv_prevalence_young_men"                      
## [11] "data_hiv_prevalence_young_women"                    
## [12] "data_new_hiv_infections_young_adults"               
## [13] "data_new_hiv_infections_male_adults"                
## [14] "data_new_hiv_infections_female_adults"              
## [15] "data_new_hiv_infections_children"                   
## [16] "data_new_hiv_infections_all_ages"                   
## [17] "data_new_hiv_infections_adults"                     
## [18] "data_new_hiv_infections_incidence_rate_among_adults"
## [19] "data_people_living_with_hiv_total"                  
## [20] "data_people_living_with_hiv_male_adults"            
## [21] "data_people_living_with_hiv_female_adults"          
## [22] "data_people_living_with_hiv_children"               
## [23] "data_people_living_with_hiv_adults"

Clean_Data <- Clean_Data %>%
  mutate(country_region = countrycode(country, 
                                      origin = "country.name", 
                                      destination = "continent"))
# Checking for unmapped countries
Clean_Data %>% filter(is.na(country_region)) %>% distinct(country)

## [1] country
## <0 rows> (or 0-length row.names)

# Optional: Fill NAs with "Other"
Clean_Data <- Clean_Data %>%
  mutate(country_region = replace_na(country_region, "Other"))

Creating Linear regression

In this chunk i will be creating a nultiple linear regression modeling Aids related deaths as the funtion of People living with HIV and New HIV infections.

model <- lm(data_aids_related_deaths_all_ages ~ data_people_living_with_hiv_total + data_new_hiv_infections_all_ages, data = Clean_Data)
summary(model)

## 
## Call:
## lm(formula = data_aids_related_deaths_all_ages ~ data_people_living_with_hiv_total + 
##     data_new_hiv_infections_all_ages, data = Clean_Data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -86159  -2197  -1922   -658  83331 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       1.995e+03  2.412e+02    8.27   <2e-16 ***
## data_people_living_with_hiv_total 1.350e-02  6.560e-04   20.57   <2e-16 ***
## data_new_hiv_infections_all_ages  2.493e-01  7.846e-03   31.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11740 on 2756 degrees of freedom
## Multiple R-squared:  0.7479, Adjusted R-squared:  0.7477 
## F-statistic:  4087 on 2 and 2756 DF,  p-value: < 2.2e-16

# based on the output of my model my equation is Death(All ages) = 1995 + (0.0135 x Total PLHIV) + ( 0.2493 x New Infection)
# This means that for every 1 additional new HIOV infection, the prediction is that approximately 0.25 additional deaths will occur, keeping the total number of people living with HIv constant.

##Plotting results

# Spliting the plotting area into 2 rows and 2 columns
par(mfrow = c(2, 2))

# Generate the diagnostic plots
plot(model)

# Reset plotting area back to 1x1
par(mfrow = c(1, 1))

Creating Plot

ggplot(Clean_Data, aes(x = data_people_living_with_hiv_total, 
                       y = data_aids_related_deaths_all_ages, 
                       color = country_region)) +
  geom_point(alpha = 0.6) +
  scale_color_brewer(palette = "Set1") + # Professional non-default palette
  labs(title = "AIDS Deaths vs. HIV Prevalence by Region",
       color = "Region") +
  theme_minimal()

I had to clean the data set first by using the clean names function which which i got form the Janitor package to standardize the column names of my data set putting it into a machine friendly format. Then i had to use mutate to create a country_region region for my final plot which required an installation of the countrycode package. After this due to recieving errors while trying to create my final plot, i had to research why it was happening and post research i found out that my data set had older names in my csv. This required me to filter using the is.na funtion, as well as mutate again to correct those errors. My final visualization represents Aids death pitted against HIV prevalence by region utilizing people living with HIV and Aids related deaths for all ages. I already knew Africa held the title for both but seeing the visualization put things more in perspective for me. I wanted to see the statisitics in children but that was also providing me with errors but after turning in this assignments i will be researching this as well.

Project 1 Data 110

Arinze Ugbah

2026-03-29

Intorduction

Cleaning Data

Creating Linear regression

Creating Plot