HIV/AIDS has always been a significant public health concern, particularly in densely populated areas. This document seeks to explore a dataset that contains statistics related to HIV/AIDS in New York City. The dataset includes variables such as year, borough, gender, race, number of diagnoses, death rates, and various other metrics. Throughout this analysis, our main objective is to uncover trends, patterns, and insights about the disease’s prevalence and its impact. The dataset has been sourced from NYC Health.
Loading necessary libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Loading the dataset
getwd()
[1] "/Users/zwang30/Desktop/DATA110"
data <-read_csv("HIV_AIDS_NY.csv")
Rows: 6005 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Borough, UHF, Gender, Age, Race
dbl (13): Year, HIV diagnoses, HIV diagnosis rate, Concurrent diagnoses, % l...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Exploratory data analysis to understand the structure and cleanliness of the datahead(data)
# A tibble: 6 × 18
Year Borough UHF Gender Age Race `HIV diagnoses` `HIV diagnosis rate`
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 2011 All All All All All 3379 48.3
2 2011 All All Male All All 2595 79.1
3 2011 All All Female All All 733 21.1
4 2011 All All Transgen… All All 51 99999
5 2011 All All Female 13 -… All 47 13.6
6 2011 All All Female 20 -… All 178 24.7
# ℹ 10 more variables: `Concurrent diagnoses` <dbl>,
# `% linked to care within 3 months` <dbl>, `AIDS diagnoses` <dbl>,
# `AIDS diagnosis rate` <dbl>, `PLWDHI prevalence` <dbl>,
# `% viral suppression` <dbl>, Deaths <dbl>, `Death rate` <dbl>,
# `HIV-related death rate` <dbl>, `Non-HIV-related death rate` <dbl>
Temporal Trends: Diagnoses Over Time
# Converting data into a time-series format to observe yearly trends in diagnoses across different boroughsstream_data <- data %>%group_by(Year, Borough) %>%summarize(Total =sum(`HIV diagnoses`)) %>%ungroup() # Remove the grouping
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
# Convert the Year column to a date if it represents datesstream_data$Year <-as.Date(as.character(stream_data$Year), format="%Y")# Create a text variable for tooltipstooltip_text <-paste("Year: ", stream_data$Year, "<br>","Borough: ", stream_data$Borough, "<br>","Total Diagnoses: ", stream_data$Total)# Plotting with tooltipsplot_ly(stream_data, x =~Year, y =~Total, color =~Borough, type='scatter', mode='lines', fill='tonexty', text = tooltip_text) %>%layout(xaxis =list(type ='date', title="Year"),yaxis =list(title="Total Diagnoses"),title ="Yearly Trends in HIV Diagnoses Across Different Boroughs") %>%add_trace(text = tooltip_text, hoverinfo ="text")
# Add tooltips to the traces
Conclusion and Analysis
My analysis began with the loading of essential libraries, including tidyverse, ggplot2, and plotly, which provide powerful tools for data manipulation and visualization. One of the central components of this analysis was the visualization of yearly trends in HIV diagnoses across different boroughs of New York City. The visualization and interactive elements of the plot allow for a dynamic exploration of the data, helping to uncover patterns and trends that may inform public health interventions. Further analysis and exploration could delve into more specific aspects of the dataset, such as examining disparities among different demographic groups or investigating the impact of interventions over time. While this analysis successfully provided insights into overall trends, future work could explore additional aspects of the dataset. For instance, examining the relationship between HIV diagnoses and demographic variables such as gender or race could reveal important disparities. In summary, this analysis represents a preliminary step in understanding HIV/AIDS in New York City, and it demonstrates the potential for further exploration and research in this critical public health domain.