HIV is a virus that attacks cells that help the body fight infection, making a person more vulnerable to other infections and diseases. It is spread by contact with certain bodily fluids of a person with HIV, most commonly during unprotected sex , or through sharing needles. While there is no cure, If untreated, it can progress to AIDS
This dataset focuses on HIV surveillance in New York City, examining various factors such as year, race, gender, deaths, and more. Today, I will focus on analyzing the HIV diagnosis rate in new york city and I will also look at which of the five boroughs has the highest number of HIV cases to identify any emerging trends.
Rows: 6005 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Borough, UHF, Gender, Age, Race
dbl (13): Year, HIV diagnoses, HIV diagnosis rate, Concurrent diagnoses, % l...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(HIV_AIDS_NY)
# A tibble: 6 × 18
Year Borough UHF Gender Age Race `HIV diagnoses` `HIV diagnosis rate`
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 2011 All All All All All 3379 48.3
2 2011 All All Male All All 2595 79.1
3 2011 All All Female All All 733 21.1
4 2011 All All Transgen… All All 51 99999
5 2011 All All Female 13 -… All 47 13.6
6 2011 All All Female 20 -… All 178 24.7
# ℹ 10 more variables: `Concurrent diagnoses` <dbl>,
# `% linked to care within 3 months` <dbl>, `AIDS diagnoses` <dbl>,
# `AIDS diagnosis rate` <dbl>, `PLWDHI prevalence` <dbl>,
# `% viral suppression` <dbl>, Deaths <dbl>, `Death rate` <dbl>,
# `HIV-related death rate` <dbl>, `Non-HIV-related death rate` <dbl>
# A tibble: 6 × 6
# Groups: Year, Borough [1]
Year Borough Age `Death rate` `HIV diagnoses` `HIV diagnosis rate`
<dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 2011 All All 13.6 3379 48.3
2 2011 All All 13.4 2595 79.1
3 2011 All All 14 733 21.1
4 2011 All All 11.1 51 99999
5 2011 All 13 - 19 1.4 47 13.6
6 2011 All 20 - 29 7.2 178 24.7
plot1 <- HIV_AIDS_NY |>ggplot() +geom_bar(aes(x =`Year`, y =`HIV diagnoses`, fill =`Borough`), position ="dodge", stat ="identity") +labs(fill ="Borough",y ="Number of HIV Diagnoses",title ="HIV Diagnoses by Year and Borough",caption ="nyc.gov") +theme_minimal()plot1
plot2 <- HIV_AIDS_NY |>ggplot() +geom_bar(aes(x =`Borough`, y =`HIV diagnoses`, fill =`Borough`), position ="dodge", stat ="identity") +labs(fill ="Borough",y ="Number of HIV Diagnoses",title ="HIV Diagnoses by Borough",caption ="nyc.gov") +theme_minimal()plot2
model <-lm(`Death rate`~`Age`, data = HIV_AIDS_NY)summary(model)
plot3 <- HIV_AIDS_NY |>ggplot(aes(x =`Age`, y =`Death rate`)) +geom_point(size =2) +labs(title ="Linear Regression: Age vs. Death Rate for HIV/AIDS Deaths",x ="Age",y ="Death Rate",caption ="nyc.gov") +theme_minimal()plot3
How you cleaned the dataset up (be detailed and specific, using proper terminology where appropriate).
What the visualization represents, any interesting patterns or surprises that arise within the visualization. Anything that you might have shown that you could not get to work or that you wished you could have included. I got the inspiration for this project from the hate crime homework we did a couple of weeks ago. I started by cleaning up the dataset and picking the columns I wanted to focus on. I created a bar graph to show HIV diagnosis rates by year and borough, which helped me spot a trend i was looking for. I discovered that Brooklyn and Manhattan had significantly higher rates compared to the others. Then, I ran a linear regression to look at the relationship between age and death rates, which revealed that the death rate tends to increase with age, while younger individuals had a 50% lower mortality rate. One thing I really wanted to explore was whether the survey people were locals or tourists or both, especially since New York is such a popular tourist destination.