HIV_AID_NY

Author

Zijin Wang

Introduction

HIV/AIDS has always been a significant public health concern, particularly in densely populated areas. This document seeks to explore a dataset that contains statistics related to HIV/AIDS in New York City. The dataset includes variables such as year, borough, gender, race, number of diagnoses, death rates, and various other metrics. Throughout this analysis, our main objective is to uncover trends, patterns, and insights about the disease’s prevalence and its impact. The dataset has been sourced from NYC Health.

Loading necessary libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(treemap)

Loading the dataset

data <- read.csv("HIV_AIDS_NY.csv")

Exploratory data analysis to understand the structure and cleanliness of the data

head(data)
  Year Borough UHF      Gender     Age Race HIV.diagnoses HIV.diagnosis.rate
1 2011     All All         All     All  All          3379               48.3
2 2011     All All        Male     All  All          2595               79.1
3 2011     All All      Female     All  All           733               21.1
4 2011     All All Transgender     All  All            51            99999.0
5 2011     All All      Female 13 - 19  All            47               13.6
6 2011     All All      Female 20 - 29  All           178               24.7
  Concurrent.diagnoses X..linked.to.care.within.3.months AIDS.diagnoses
1                  640                                66           2366
2                  480                                66           1712
3                  153                                66            622
4                    7                                63             32
5                    4                                64             22
6                   20                                67             96
  AIDS.diagnosis.rate PLWDHI.prevalence X..viral.suppression Deaths Death.rate
1                33.8               1.1                   71   2040       13.6
2                52.2               1.7                   72   1423       13.4
3                17.6               0.6                   68    605       14.0
4             99999.0           99999.0                   55     12       11.1
5                 6.4               0.1                   57      1        1.4
6                13.3               0.3                   48     19        7.2
  HIV.related.death.rate Non.HIV.related.death.rate
1                    5.8                        7.8
2                    5.7                        7.7
3                    6.0                        8.0
4                    5.7                        5.4
5                    1.4                        0.0
6                    3.2                        4.0
summary(data)
      Year        Borough              UHF               Gender         
 Min.   :2011   Length:6005        Length:6005        Length:6005       
 1st Qu.:2012   Class :character   Class :character   Class :character  
 Median :2013   Mode  :character   Mode  :character   Mode  :character  
 Mean   :2013                                                           
 3rd Qu.:2014                                                           
 Max.   :2015                                                           
     Age                Race           HIV.diagnoses    HIV.diagnosis.rate
 Length:6005        Length:6005        Min.   :   0.0   Min.   :    0.0   
 Class :character   Class :character   1st Qu.:   0.0   1st Qu.:    0.0   
 Mode  :character   Mode  :character   Median :   3.0   Median :   18.5   
                                       Mean   :  26.5   Mean   :  119.5   
                                       3rd Qu.:  13.0   3rd Qu.:   49.4   
                                       Max.   :3379.0   Max.   :99999.0   
 Concurrent.diagnoses X..linked.to.care.within.3.months AIDS.diagnoses   
 Min.   :  0.000      Min.   :    0                     Min.   :    0.0  
 1st Qu.:  0.000      1st Qu.:   67                     1st Qu.:    0.0  
 Median :  1.000      Median :   83                     Median :    2.0  
 Mean   :  5.095      Mean   :25399                     Mean   :   33.3  
 3rd Qu.:  3.000      3rd Qu.:99999                     3rd Qu.:    8.0  
 Max.   :640.000      Max.   :99999                     Max.   :99999.0  
 AIDS.diagnosis.rate PLWDHI.prevalence X..viral.suppression     Deaths        
 Min.   :    0.0     Min.   :    0.0   Min.   :    0        Min.   :    0.00  
 1st Qu.:    0.0     1st Qu.:    0.2   1st Qu.:   71        1st Qu.:    0.00  
 Median :   10.4     Median :    0.6   Median :   79        Median :    1.00  
 Mean   :  122.8     Mean   :  317.5   Mean   : 2656        Mean   :   49.45  
 3rd Qu.:   30.6     3rd Qu.:    1.5   3rd Qu.:   87        3rd Qu.:    8.00  
 Max.   :99999.0     Max.   :99999.0   Max.   :99999        Max.   :99999.00  
   Death.rate     HIV.related.death.rate Non.HIV.related.death.rate
 Min.   :  0.00   Min.   :    0.0        Min.   :    0.0           
 1st Qu.:  0.00   1st Qu.:    0.0        1st Qu.:    0.0           
 Median :  6.00   Median :    3.0        Median :    5.5           
 Mean   : 10.34   Mean   :20003.2        Mean   :20005.1           
 3rd Qu.: 14.10   3rd Qu.:   14.4        3rd Qu.:   22.1           
 Max.   :263.20   Max.   :99999.0        Max.   :99999.0           

A Treemap of Diagnossses Across Boroughs

This treemap provides a visual representation of the number of HIV diagnoses across different boroughs in New York City.

treemap(data,
        index = c("Borough"),
        vSize = "HIV.diagnoses",
        title="HIV Diagnoses Across Boroughs",
        fontsize.labels = 12)

Gender Dynamics in HIV Diagnoses

# Grouping data by gender and summarizing the total HIV diagnoses for each gender
data_gender <- aggregate(HIV.diagnoses ~ Gender, data=data, sum)

plot_ly(data_gender, labels = ~Gender, values = ~HIV.diagnoses, type = 'pie') %>%
  layout(title = "Gender Distribution of HIV Diagnoses in NYC")

Mortality Rates: A Focus on UHF Neighborhoods

# Grouping data by UHF neighborhoods and calculating the average death rate
data_grouped <- aggregate(Death.rate ~ UHF, data=data, mean)
# Representing this data using a bar chart
plot_ly(data = data_grouped, x = ~UHF, y = ~Death.rate, type = "bar", color = ~UHF) %>%
  layout(title = "Age-adjusted Mortality Rates among People with HIV in Different UHF Neighborhoods of NYC",
         yaxis = list(title = "Age-adjusted Mortality Rate"),
         xaxis = list(title = "UHF Neighborhoods", tickangle = -45))
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors

Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors

Conclusion and Analysis

After loading my dataset, I began by performing a quick exploratory data analysis to identify any inconsistencies or missing values. The data primarily consists of metrics associated with HIV/AIDS in New York, divided by year, borough, and demographic information such as age, gender, and race. Some critical steps in the data cleaning process includes the ‘Year’ column was observed to have fractional values, which didn’t make logical sense for a time-based metric. As a result, this column was converted into an integer format, ensuring each year is represented as a whole number. The visualizations crafted from this dataset serve as a comprehensive representation of the HIV/AIDS situation in New York over the years. The treemap offers a quick glance at the distribution of diagnoses across boroughs, highlighting areas of concern. The gender pie chart gives insights into the gender dynamics of the disease, while the streamgraph provides a temporal view, allowing us to see trends over time. Lastly, the bar chart focusing on UHF neighborhoods gives us granular details about the mortality rates associated with HIV/AIDS in specific parts of the city. There were certain visualizations and analyses I considered but could not fully implement due to data limitations. For instance, a geographical heatmap would have been insightful, but it would require additional geographical data which was not available in the current dataset. Similarly, further demographic insights considerin factors like race, age, and gender together would be interesting for future analyses. In conclusion, the dataset offers a comprehensive look into the HIV/AIDS situation in New York. The visualizations created serve as a testament to the power of data in understanding and potentially mitigating health crises.