HIV/AIDS has always been a significant public health concern, particularly in densely populated areas. This document seeks to explore a dataset that contains statistics related to HIV/AIDS in New York City. The dataset includes variables such as year, borough, gender, race, number of diagnoses, death rates, and various other metrics. Throughout this analysis, our main objective is to uncover trends, patterns, and insights about the disease’s prevalence and its impact. The dataset has been sourced from NYC Health.
Loading necessary libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(treemap)
Loading the dataset
data <-read.csv("HIV_AIDS_NY.csv")
Exploratory data analysis to understand the structure and cleanliness of the data
head(data)
Year Borough UHF Gender Age Race HIV.diagnoses HIV.diagnosis.rate
1 2011 All All All All All 3379 48.3
2 2011 All All Male All All 2595 79.1
3 2011 All All Female All All 733 21.1
4 2011 All All Transgender All All 51 99999.0
5 2011 All All Female 13 - 19 All 47 13.6
6 2011 All All Female 20 - 29 All 178 24.7
Concurrent.diagnoses X..linked.to.care.within.3.months AIDS.diagnoses
1 640 66 2366
2 480 66 1712
3 153 66 622
4 7 63 32
5 4 64 22
6 20 67 96
AIDS.diagnosis.rate PLWDHI.prevalence X..viral.suppression Deaths Death.rate
1 33.8 1.1 71 2040 13.6
2 52.2 1.7 72 1423 13.4
3 17.6 0.6 68 605 14.0
4 99999.0 99999.0 55 12 11.1
5 6.4 0.1 57 1 1.4
6 13.3 0.3 48 19 7.2
HIV.related.death.rate Non.HIV.related.death.rate
1 5.8 7.8
2 5.7 7.7
3 6.0 8.0
4 5.7 5.4
5 1.4 0.0
6 3.2 4.0
summary(data)
Year Borough UHF Gender
Min. :2011 Length:6005 Length:6005 Length:6005
1st Qu.:2012 Class :character Class :character Class :character
Median :2013 Mode :character Mode :character Mode :character
Mean :2013
3rd Qu.:2014
Max. :2015
Age Race HIV.diagnoses HIV.diagnosis.rate
Length:6005 Length:6005 Min. : 0.0 Min. : 0.0
Class :character Class :character 1st Qu.: 0.0 1st Qu.: 0.0
Mode :character Mode :character Median : 3.0 Median : 18.5
Mean : 26.5 Mean : 119.5
3rd Qu.: 13.0 3rd Qu.: 49.4
Max. :3379.0 Max. :99999.0
Concurrent.diagnoses X..linked.to.care.within.3.months AIDS.diagnoses
Min. : 0.000 Min. : 0 Min. : 0.0
1st Qu.: 0.000 1st Qu.: 67 1st Qu.: 0.0
Median : 1.000 Median : 83 Median : 2.0
Mean : 5.095 Mean :25399 Mean : 33.3
3rd Qu.: 3.000 3rd Qu.:99999 3rd Qu.: 8.0
Max. :640.000 Max. :99999 Max. :99999.0
AIDS.diagnosis.rate PLWDHI.prevalence X..viral.suppression Deaths
Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.00
1st Qu.: 0.0 1st Qu.: 0.2 1st Qu.: 71 1st Qu.: 0.00
Median : 10.4 Median : 0.6 Median : 79 Median : 1.00
Mean : 122.8 Mean : 317.5 Mean : 2656 Mean : 49.45
3rd Qu.: 30.6 3rd Qu.: 1.5 3rd Qu.: 87 3rd Qu.: 8.00
Max. :99999.0 Max. :99999.0 Max. :99999 Max. :99999.00
Death.rate HIV.related.death.rate Non.HIV.related.death.rate
Min. : 0.00 Min. : 0.0 Min. : 0.0
1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
Median : 6.00 Median : 3.0 Median : 5.5
Mean : 10.34 Mean :20003.2 Mean :20005.1
3rd Qu.: 14.10 3rd Qu.: 14.4 3rd Qu.: 22.1
Max. :263.20 Max. :99999.0 Max. :99999.0
A Treemap of Diagnossses Across Boroughs
This treemap provides a visual representation of the number of HIV diagnoses across different boroughs in New York City.
treemap(data,index =c("Borough"),vSize ="HIV.diagnoses",title="HIV Diagnoses Across Boroughs",fontsize.labels =12)
Gender Dynamics in HIV Diagnoses
# Grouping data by gender and summarizing the total HIV diagnoses for each genderdata_gender <-aggregate(HIV.diagnoses ~ Gender, data=data, sum)plot_ly(data_gender, labels =~Gender, values =~HIV.diagnoses, type ='pie') %>%layout(title ="Gender Distribution of HIV Diagnoses in NYC")
Temporal Trends: Diagnoses Over Time
# Converting data into a time-series format to observe yearly trends in diagnoses across different boroughsstream_data <- data %>%group_by(Year, Borough) %>%summarize(Total =sum(`HIV.diagnoses`), .groups ="drop")# Check for unique yearscat("Unique years in the data:", unique(stream_data$Year), "\n")
Unique years in the data: 2011 2012 2013 2014 2015
# Convert the Year column to integer (this step is a precaution, in case the column is of type float)stream_data$Year <-as.integer(stream_data$Year)# Plotting with specific x-axis tick valuesplot_ly(stream_data, x =~Year, y =~Total, color =~Borough, type='scatter', mode='lines', fill='tonexty') %>%layout(xaxis =list(tickvals =unique(stream_data$Year), title="Year"),yaxis =list(title="Total Diagnoses"),title ="Yearly Trends in HIV Diagnoses Across Different Boroughs")
Mortality Rates: A Focus on UHF Neighborhoods
# Grouping data by UHF neighborhoods and calculating the average death ratedata_grouped <-aggregate(Death.rate ~ UHF, data=data, mean)
# Representing this data using a bar chartplot_ly(data = data_grouped, x =~UHF, y =~Death.rate, type ="bar", color =~UHF) %>%layout(title ="Age-adjusted Mortality Rates among People with HIV in Different UHF Neighborhoods of NYC",yaxis =list(title ="Age-adjusted Mortality Rate"),xaxis =list(title ="UHF Neighborhoods", tickangle =-45))
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Conclusion and Analysis
After loading my dataset, I began by performing a quick exploratory data analysis to identify any inconsistencies or missing values. The data primarily consists of metrics associated with HIV/AIDS in New York, divided by year, borough, and demographic information such as age, gender, and race. Some critical steps in the data cleaning process includes the ‘Year’ column was observed to have fractional values, which didn’t make logical sense for a time-based metric. As a result, this column was converted into an integer format, ensuring each year is represented as a whole number. The visualizations crafted from this dataset serve as a comprehensive representation of the HIV/AIDS situation in New York over the years. The treemap offers a quick glance at the distribution of diagnoses across boroughs, highlighting areas of concern. The gender pie chart gives insights into the gender dynamics of the disease, while the streamgraph provides a temporal view, allowing us to see trends over time. Lastly, the bar chart focusing on UHF neighborhoods gives us granular details about the mortality rates associated with HIV/AIDS in specific parts of the city. There were certain visualizations and analyses I considered but could not fully implement due to data limitations. For instance, a geographical heatmap would have been insightful, but it would require additional geographical data which was not available in the current dataset. Similarly, further demographic insights considerin factors like race, age, and gender together would be interesting for future analyses. In conclusion, the dataset offers a comprehensive look into the HIV/AIDS situation in New York. The visualizations created serve as a testament to the power of data in understanding and potentially mitigating health crises.