This data set includes tables on persons living with HIV/AIDS, newly diagnosed HIV cases, and all-cause deaths in HIV/AIDS cases by gender, age, race/ethnicity, and transmission category. A data frame with 6005 rows and 18 variables. It contains both quantitative and categorical values. The data set here records the reported cases of AIDS diagnosed from 2011 and until 2015. I choose to explore this dataset because I want to have a better insight into HIV/AIDS.
• Year: The year of the diagnosis • Borough: Cities • UHF: the United States hospital fund neighborhood • Gender: Female, male, transgender • Age: the age group of the people the column is describing • Race: Black, White, Latino/Hispanic, Asian/Pacific Island, Other/Unknown • HIV diagnoses: Number of people infected by HIV • HIV diagnosis rate: the HIV diagnosis rate out of 100,000 • Concurrent diagnoses • Percent linked to care within 3 months: • AIDS diagnoses: Number of people infected by AIDS • AIDS diagnosis rate: the rate of AIDS diagnoses out of 100,000 • PLWDHI prevalence: the prevalence of People Living with Diagnosed HIV Infection • Percent viral suppression of the: Number of people dies by HIV or AIDS • Death rate: the rate of deaths out of 100,000 • HIV-related death rate: HIV-related to deaths out of 100,000 • Non-HIV related death rate, the rate of non-HIV related death out of 100,000.
library(readr)
HIV_AIDS_NY <- read_csv("C:/Users/Mitcheyla$/Desktop/DATA110 -VISUALISATION/HIV_AIDS_NY.csv")
## Rows: 6005 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Borough, UHF, Gender, Age, Race
## dbl (13): Year, HIV diagnoses, HIV diagnosis rate, Concurrent diagnoses, % l...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(HIV_AIDS_NY)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ dplyr 1.0.10
## ✔ tibble 3.1.8 ✔ stringr 1.5.0
## ✔ tidyr 1.2.1 ✔ forcats 0.5.2
## ✔ purrr 0.3.5
## Warning: package 'dplyr' was built under R version 4.2.2
## Warning: package 'stringr' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
str(HIV_AIDS_NY)
## spc_tbl_ [6,005 × 18] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Year : num [1:6005] 2011 2011 2011 2011 2011 ...
## $ Borough : chr [1:6005] "All" "All" "All" "All" ...
## $ UHF : chr [1:6005] "All" "All" "All" "All" ...
## $ Gender : chr [1:6005] "All" "Male" "Female" "Transgender" ...
## $ Age : chr [1:6005] "All" "All" "All" "All" ...
## $ Race : chr [1:6005] "All" "All" "All" "All" ...
## $ HIV diagnoses : num [1:6005] 3379 2595 733 51 47 ...
## $ HIV diagnosis rate : num [1:6005] 48.3 79.1 21.1 99999 13.6 ...
## $ Concurrent diagnoses : num [1:6005] 640 480 153 7 4 20 31 50 32 23 ...
## $ % linked to care within 3 months: num [1:6005] 66 66 66 63 64 67 66 62 72 68 ...
## $ AIDS diagnoses : num [1:6005] 2366 1712 622 32 22 ...
## $ AIDS diagnosis rate : num [1:6005] 33.8 52.2 17.6 99999 6.4 ...
## $ PLWDHI prevalence : num [1:6005] 1.1 1.7 0.6 99999 0.1 ...
## $ % viral suppression : num [1:6005] 71 72 68 55 57 48 61 66 73 81 ...
## $ Deaths : num [1:6005] 2040 1423 605 12 1 ...
## $ Death rate : num [1:6005] 13.6 13.4 14 11.1 1.4 7.2 9.4 15.9 24.1 33.5 ...
## $ HIV-related death rate : num [1:6005] 5.8 5.7 6 5.7 1.4 3.2 5.7 7.8 11.5 10.6 ...
## $ Non-HIV-related death rate : num [1:6005] 7.8 7.7 8 5.4 0 4 3.7 8.1 12.6 22.9 ...
## - attr(*, "spec")=
## .. cols(
## .. Year = col_double(),
## .. Borough = col_character(),
## .. UHF = col_character(),
## .. Gender = col_character(),
## .. Age = col_character(),
## .. Race = col_character(),
## .. `HIV diagnoses` = col_double(),
## .. `HIV diagnosis rate` = col_double(),
## .. `Concurrent diagnoses` = col_double(),
## .. `% linked to care within 3 months` = col_double(),
## .. `AIDS diagnoses` = col_double(),
## .. `AIDS diagnosis rate` = col_double(),
## .. `PLWDHI prevalence` = col_double(),
## .. `% viral suppression` = col_double(),
## .. Deaths = col_double(),
## .. `Death rate` = col_double(),
## .. `HIV-related death rate` = col_double(),
## .. `Non-HIV-related death rate` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
dim(HIV_AIDS_NY)
## [1] 6005 18
summary(HIV_AIDS_NY)
## Year Borough UHF Gender
## Min. :2011 Length:6005 Length:6005 Length:6005
## 1st Qu.:2012 Class :character Class :character Class :character
## Median :2013 Mode :character Mode :character Mode :character
## Mean :2013
## 3rd Qu.:2014
## Max. :2015
## Age Race HIV diagnoses HIV diagnosis rate
## Length:6005 Length:6005 Min. : 0.0 Min. : 0.0
## Class :character Class :character 1st Qu.: 0.0 1st Qu.: 0.0
## Mode :character Mode :character Median : 3.0 Median : 18.5
## Mean : 26.5 Mean : 119.5
## 3rd Qu.: 13.0 3rd Qu.: 49.4
## Max. :3379.0 Max. :99999.0
## Concurrent diagnoses % linked to care within 3 months AIDS diagnoses
## Min. : 0.000 Min. : 0 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 67 1st Qu.: 0.0
## Median : 1.000 Median : 83 Median : 2.0
## Mean : 5.095 Mean :25399 Mean : 33.3
## 3rd Qu.: 3.000 3rd Qu.:99999 3rd Qu.: 8.0
## Max. :640.000 Max. :99999 Max. :99999.0
## AIDS diagnosis rate PLWDHI prevalence % viral suppression Deaths
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.2 1st Qu.: 71 1st Qu.: 0.00
## Median : 10.4 Median : 0.6 Median : 79 Median : 1.00
## Mean : 122.8 Mean : 317.5 Mean : 2656 Mean : 49.45
## 3rd Qu.: 30.6 3rd Qu.: 1.5 3rd Qu.: 87 3rd Qu.: 8.00
## Max. :99999.0 Max. :99999.0 Max. :99999 Max. :99999.00
## Death rate HIV-related death rate Non-HIV-related death rate
## Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 6.00 Median : 3.0 Median : 5.5
## Mean : 10.34 Mean :20003.2 Mean :20005.1
## 3rd Qu.: 14.10 3rd Qu.: 14.4 3rd Qu.: 22.1
## Max. :263.20 Max. :99999.0 Max. :99999.0
sum(is.na(HIV_AIDS_NY))
## [1] 0
head(HIV_AIDS_NY,10)
## # A tibble: 10 × 18
## Year Borough UHF Gender Age Race HIV d…¹ HIV d…² Concu…³ % lin…⁴
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2011 All All All All All 3379 48.3 640 66
## 2 2011 All All Male All All 2595 79.1 480 66
## 3 2011 All All Female All All 733 21.1 153 66
## 4 2011 All All Transgender All All 51 99999 7 63
## 5 2011 All All Female 13 - 19 All 47 13.6 4 64
## 6 2011 All All Female 20 - 29 All 178 24.7 20 67
## 7 2011 All All Female 30 - 39 All 176 26.9 31 66
## 8 2011 All All Female 40 - 49 All 195 33 50 62
## 9 2011 All All Female 50 - 59 All 130 23.5 32 72
## 10 2011 All All Female 60+ All 57 6.7 23 68
## # … with 8 more variables: `AIDS diagnoses` <dbl>, `AIDS diagnosis rate` <dbl>,
## # `PLWDHI prevalence` <dbl>, `% viral suppression` <dbl>, Deaths <dbl>,
## # `Death rate` <dbl>, `HIV-related death rate` <dbl>,
## # `Non-HIV-related death rate` <dbl>, and abbreviated variable names
## # ¹`HIV diagnoses`, ²`HIV diagnosis rate`, ³`Concurrent diagnoses`,
## # ⁴`% linked to care within 3 months`
tail(HIV_AIDS_NY, 10)
## # A tibble: 10 × 18
## Year Borough UHF Gender Age Race HIV d…¹ HIV d…² Concu…³ % lin…⁴
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2015 Staten Island Willo… Male 30 -… All 0 0 0 99999
## 2 2015 Staten Island Willo… Male 40 -… All 1 18.8 0 100
## 3 2015 Staten Island Willo… Male 50 -… All 0 0 0 99999
## 4 2015 Staten Island Willo… Male 60+ All 0 0 0 99999
## 5 2015 Staten Island Willo… Male All All 4 11.2 0 75
## 6 2015 Staten Island Willo… Male All Asia… 0 0 0 99999
## 7 2015 Staten Island Willo… Male All Black 1 72.4 0 100
## 8 2015 Staten Island Willo… Male All Lati… 2 43.9 0 100
## 9 2015 Staten Island Willo… Male All Othe… 1 219. 0 0
## 10 2015 Staten Island Willo… Male All White 0 0 0 99999
## # … with 8 more variables: `AIDS diagnoses` <dbl>, `AIDS diagnosis rate` <dbl>,
## # `PLWDHI prevalence` <dbl>, `% viral suppression` <dbl>, Deaths <dbl>,
## # `Death rate` <dbl>, `HIV-related death rate` <dbl>,
## # `Non-HIV-related death rate` <dbl>, and abbreviated variable names
## # ¹`HIV diagnoses`, ²`HIV diagnosis rate`, ³`Concurrent diagnoses`,
## # ⁴`% linked to care within 3 months`
library(tidyr)
#make lowercase, remove space, replace % to percent, and rename variables
names(HIV_AIDS_NY) <- tolower(names(HIV_AIDS_NY))
names(HIV_AIDS_NY) <- gsub(" ","",names(HIV_AIDS_NY))
names(HIV_AIDS_NY) <- gsub("-","",names(HIV_AIDS_NY))
names(HIV_AIDS_NY) <- gsub("%", "percent", names(HIV_AIDS_NY))
names(HIV_AIDS_NY[names(HIV_AIDS_NY) == "hiv-relateddeathrate"] <- "hivrelateddeathrate")
## NULL
names(HIV_AIDS_NY)[names(HIV_AIDS_NY) == "non-hiv-relateddeathrate"] <- "nonhivrelateddeathrate"
str(HIV_AIDS_NY)
## spc_tbl_ [6,005 × 18] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ year : num [1:6005] 2011 2011 2011 2011 2011 ...
## $ borough : chr [1:6005] "All" "All" "All" "All" ...
## $ uhf : chr [1:6005] "All" "All" "All" "All" ...
## $ gender : chr [1:6005] "All" "Male" "Female" "Transgender" ...
## $ age : chr [1:6005] "All" "All" "All" "All" ...
## $ race : chr [1:6005] "All" "All" "All" "All" ...
## $ hivdiagnoses : num [1:6005] 3379 2595 733 51 47 ...
## $ hivdiagnosisrate : num [1:6005] 48.3 79.1 21.1 99999 13.6 ...
## $ concurrentdiagnoses : num [1:6005] 640 480 153 7 4 20 31 50 32 23 ...
## $ percentlinkedtocarewithin3months: num [1:6005] 66 66 66 63 64 67 66 62 72 68 ...
## $ aidsdiagnoses : num [1:6005] 2366 1712 622 32 22 ...
## $ aidsdiagnosisrate : num [1:6005] 33.8 52.2 17.6 99999 6.4 ...
## $ plwdhiprevalence : num [1:6005] 1.1 1.7 0.6 99999 0.1 ...
## $ percentviralsuppression : num [1:6005] 71 72 68 55 57 48 61 66 73 81 ...
## $ deaths : num [1:6005] 2040 1423 605 12 1 ...
## $ deathrate : num [1:6005] 13.6 13.4 14 11.1 1.4 7.2 9.4 15.9 24.1 33.5 ...
## $ hivrelateddeathrate : num [1:6005] 5.8 5.7 6 5.7 1.4 3.2 5.7 7.8 11.5 10.6 ...
## $ nonhivrelateddeathrate : num [1:6005] 7.8 7.7 8 5.4 0 4 3.7 8.1 12.6 22.9 ...
## - attr(*, "spec")=
## .. cols(
## .. Year = col_double(),
## .. Borough = col_character(),
## .. UHF = col_character(),
## .. Gender = col_character(),
## .. Age = col_character(),
## .. Race = col_character(),
## .. `HIV diagnoses` = col_double(),
## .. `HIV diagnosis rate` = col_double(),
## .. `Concurrent diagnoses` = col_double(),
## .. `% linked to care within 3 months` = col_double(),
## .. `AIDS diagnoses` = col_double(),
## .. `AIDS diagnosis rate` = col_double(),
## .. `PLWDHI prevalence` = col_double(),
## .. `% viral suppression` = col_double(),
## .. Deaths = col_double(),
## .. `Death rate` = col_double(),
## .. `HIV-related death rate` = col_double(),
## .. `Non-HIV-related death rate` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
ls(HIV_AIDS_NY)
## [1] "age" "aidsdiagnoses"
## [3] "aidsdiagnosisrate" "borough"
## [5] "concurrentdiagnoses" "deathrate"
## [7] "deaths" "gender"
## [9] "hivdiagnoses" "hivdiagnosisrate"
## [11] "hivrelateddeathrate" "nonhivrelateddeathrate"
## [13] "percentlinkedtocarewithin3months" "percentviralsuppression"
## [15] "plwdhiprevalence" "race"
## [17] "uhf" "year"
#get rid of those weird 99999's
HIV_AIDS_NY <- HIV_AIDS_NY %>%
filter(hivdiagnoses != 99999.0,
hivdiagnosisrate != 99999.0,
percentlinkedtocarewithin3months != 99999.0,
aidsdiagnoses!= 99999.0,
aidsdiagnosisrate != 99999.0,
plwdhiprevalence != 99999.0,
deaths != 99999.0,
deathrate != 99999.0
)
str(HIV_AIDS_NY)
## spc_tbl_ [4,478 × 18] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ year : num [1:4478] 2011 2011 2011 2011 2011 ...
## $ borough : chr [1:4478] "All" "All" "All" "All" ...
## $ uhf : chr [1:4478] "All" "All" "All" "All" ...
## $ gender : chr [1:4478] "All" "Male" "Female" "Female" ...
## $ age : chr [1:4478] "All" "All" "All" "13 - 19" ...
## $ race : chr [1:4478] "All" "All" "All" "All" ...
## $ hivdiagnoses : num [1:4478] 3379 2595 733 47 178 ...
## $ hivdiagnosisrate : num [1:4478] 48.3 79.1 21.1 13.6 24.7 26.9 33 23.5 6.7 2.2 ...
## $ concurrentdiagnoses : num [1:4478] 640 480 153 4 20 31 50 32 23 2 ...
## $ percentlinkedtocarewithin3months: num [1:4478] 66 66 66 64 67 66 62 72 68 91 ...
## $ aidsdiagnoses : num [1:4478] 2366 1712 622 22 96 ...
## $ aidsdiagnosisrate : num [1:4478] 33.8 52.2 17.6 6.4 13.3 20.3 35.5 24 7.1 1.6 ...
## $ plwdhiprevalence : num [1:4478] 1.1 1.7 0.6 0.1 0.3 0.6 1.4 1.3 0.3 0.1 ...
## $ percentviralsuppression : num [1:4478] 71 72 68 57 48 61 66 73 81 77 ...
## $ deaths : num [1:4478] 2040 1423 605 1 19 ...
## $ deathrate : num [1:4478] 13.6 13.4 14 1.4 7.2 9.4 15.9 24.1 33.5 13.1 ...
## $ hivrelateddeathrate : num [1:4478] 5.8 5.7 6 1.4 3.2 5.7 7.8 11.5 10.6 2.6 ...
## $ nonhivrelateddeathrate : num [1:4478] 7.8 7.7 8 0 4 3.7 8.1 12.6 22.9 10.6 ...
## - attr(*, "spec")=
## .. cols(
## .. Year = col_double(),
## .. Borough = col_character(),
## .. UHF = col_character(),
## .. Gender = col_character(),
## .. Age = col_character(),
## .. Race = col_character(),
## .. `HIV diagnoses` = col_double(),
## .. `HIV diagnosis rate` = col_double(),
## .. `Concurrent diagnoses` = col_double(),
## .. `% linked to care within 3 months` = col_double(),
## .. `AIDS diagnoses` = col_double(),
## .. `AIDS diagnosis rate` = col_double(),
## .. `PLWDHI prevalence` = col_double(),
## .. `% viral suppression` = col_double(),
## .. Deaths = col_double(),
## .. `Death rate` = col_double(),
## .. `HIV-related death rate` = col_double(),
## .. `Non-HIV-related death rate` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
1- Are the number of HIV/AIDS and deaths decrease over time? 2- What race has higher deaths rate? 3- Are the number of HIV and AIDS rate among the cities are the same?(Anova test) 4- Is there any correlations between percent viral suppression and HIV related death rate? 5- What race have the highest Death rate, HIV diagnosis rate, and AIDS diagnosis rate?
To answer the first question about if the number of diagnosis people with HIV/AIDS decrease over time, I will create a new dataset to only include the relevant variables. I will use select(), group_by(, tidyr::gather(),and summarize functions on the HIV_AIDS_NY dataset to gather HIV diagnoses, AIDS diagnoses, and HIV related deaths into “Id” and their frequencies into “frequency”.
hiv_new_variable <- HIV_AIDS_NY %>%
select(year,deaths, aidsdiagnoses, hivdiagnoses) %>%
tidyr::gather("id", "frequency", 2:4) %>%
group_by(year,id) %>%
summarize(frequency = sum(frequency))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
hiv_new_variable
## # A tibble: 15 × 3
## # Groups: year [5]
## year id frequency
## <dbl> <chr> <dbl>
## 1 2011 aidsdiagnoses 25797
## 2 2011 deaths 21431
## 3 2011 hivdiagnoses 36657
## 4 2012 aidsdiagnoses 22681
## 5 2012 deaths 19770
## 6 2012 hivdiagnoses 33604
## 7 2013 aidsdiagnoses 20742
## 8 2013 deaths 19109
## 9 2013 hivdiagnoses 30959
## 10 2014 aidsdiagnoses 16110
## 11 2014 deaths 18323
## 12 2014 hivdiagnoses 29980
## 13 2015 aidsdiagnoses 14161
## 14 2015 deaths 17466
## 15 2015 hivdiagnoses 27670
We can see in 2011 more people were die from HIV/AIDS. With time , less people die from those diseases compare to the precedent years.For example, in 2015 the number of deaths were 17466.
library(ggalluvial)
p1 <- hiv_new_variable %>%
ggplot(aes(x=year, y=frequency, alluvium = id)) +
geom_alluvium(aes(fill = id, ymin=0, ymax=100000), alpha = .9,
position="stack", stat="identity", curve_type="sigmoid") +
xlab("Year") +
ylab("Frequency") +
ggtitle("HIV Related Deaths, AIDS Diagnoses,
and HIV Diagnoses from 2011-2015") +
scale_fill_brewer(palette = "Set2") +
scale_y_continuous() +
theme(axis.text.x = element_text(angle = 0))
p1
As we can see with this plot,in 2011 there were more people with
AIDS/HIV and the number of deaths decrease over time.
What race has higher deaths rate?
ggplot(data = HIV_AIDS_NY, aes(x = race, y = deathrate, color = race)) +
ylab("Frequency)") +
theme_minimal(base_size = 12) +
ggtitle("Number of Deaths per Cities") +
geom_boxplot() +
scale_color_brewer(palette = 'Set1')
In this visualization the death rate for black is a little higher
compare to the other races. Also, all of them have some outliers.
Are the number of HIV and AIDS rate among the cities are the same?(Anova test)
• Null hypothesis (H0): μa = μbx = μbr = μm = μq =μs • μa : Mean for ALL • μbx: Bronx • μbr: Brooklyn • μm: Manhattan • μq: Queens • μs: Staten Island • Confidence Level 95 % and alpha (0.05)
results <- aov(HIV_AIDS_NY$hivdiagnosisrate ~ HIV_AIDS_NY$borough)
summary(results)
## Df Sum Sq Mean Sq F value Pr(>F)
## HIV_AIDS_NY$borough 5 1125030 225006 82.68 <2e-16 ***
## Residuals 4472 12170155 2721
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P-Value : 2e-16 = 0.00000000000000022 < 0.05 Conclusion: Reject the Null Hypothesis H0. The number of HIV diagnoses are not the same in the cities.
Null hypothesis (H0): μa = μbx = μbr = μm = μq =μs μa : Mean for ALL μbx: Bronx μbr: Brooklyn μm: Manhattan μq: Queens μs: Staten Island
Confidence Level 95 % and alpha (0.05)
results1 <- aov(HIV_AIDS_NY$aidsdiagnosisrate ~ HIV_AIDS_NY$borough)
summary(results1)
## Df Sum Sq Mean Sq F value Pr(>F)
## HIV_AIDS_NY$borough 5 479480 95896 73.46 <2e-16 ***
## Residuals 4472 5837973 1305
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P-Value : 2e-16 = 0.00000000000000022 < 0.05 Conclusion: Reject the Null Hypothesis H0. The number of AIDS diagnoses are not the same in the cities.
for (i in 1:nrow(HIVVariable)) {
if (HIVVariable[i,2] == "percentlinkedtocarewithin3months") {
HIVVariable[i,2] <- "Linked to Care"
}
if (HIVVariable[i,2] == "hivrelateddeathrate") {
HIVVariable[i,2] <- "Death from HIV"
}
if (HIVVariable[i,2] == "hivdiagnosisrate") {
HIVVariable[i,2] <- "HIV Diagnosis"
}
if (HIVVariable[i,2] == "aidsdiagnosisrate") {
HIVVariable[i,2] <- "AIDS Diagnosis"
}
}
HIVVariable
## # A tibble: 15 × 3
## # Groups: race [5]
## race id rate
## <chr> <chr> <dbl>
## 1 Asian/Pacific Islander AIDS Diagnosis 11.6
## 2 Asian/Pacific Islander HIV Diagnosis 29.1
## 3 Asian/Pacific Islander Death from HIV 19234.
## 4 Black AIDS Diagnosis 66.3
## 5 Black HIV Diagnosis 93.4
## 6 Black Death from HIV 19108.
## 7 Latino/Hispanic AIDS Diagnosis 30.1
## 8 Latino/Hispanic HIV Diagnosis 54.2
## 9 Latino/Hispanic Death from HIV 19672.
## 10 Other/Unknown AIDS Diagnosis 20.9
## 11 Other/Unknown HIV Diagnosis 85.0
## 12 Other/Unknown Death from HIV 17744.
## 13 White AIDS Diagnosis 19.1
## 14 White HIV Diagnosis 40.4
## 15 White Death from HIV 18983.
raceplot <- HIVVariable %>%
ggplot() +
geom_histogram(aes(x=id, y=rate, fill = race),
position = "dodge", stat = "identity") +
ggtitle("AIDS diagnosis Rate, HIV Related Death Rate,
and HIV Rate For Each Race") +
ylab("Rate out of 100") +
scale_y_continuous(trans = 'log10') +
theme(axis.title.x=element_blank())
## Warning in geom_histogram(aes(x = id, y = rate, fill = race), position =
## "dodge", : Ignoring unknown parameters: `binwidth`, `bins`, and `pad`
raceplot
In this histogram, The numbers of HIV Diagnosis for each race are greater than the AIDS diagnosis.The number of Deaths from HIV are almost the same for almost the same.
p3 <- HIV3 %>%
ggplot(., aes(id, rate))+
geom_boxplot()+
aes(color = gender)+
facet_wrap(~gender)+
ylab("Rate of") +
xlab("") +
scale_y_continuous(trans = 'log10')
ggtitle("AIDS Diagnosis Rate, HIV Related Death Rate,
and HIV Diagnosis Rate For Gender") +
theme(axis.text.x = element_text(angle = -45))
## NULL
p3
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 2101 rows containing non-finite values (`stat_boxplot()`).
For this plot, I use the facet wrap. This visualization helps us to have an idea about gender-related to AIDS, HIV, and HIV deaths. We can see the median of HIV for females is higher than for males and a little slightly different for All(genders). However, the HIV death for males is higher. The median AIDS for males and all are almost the same but the AIDS for females is higher compared to the other gender.
HIV is caused by a virus. It can spread through sexual contact, illicit injection drug use or sharing needles, contact with infected blood, or from mother to child during pregnancy, childbirth, or breastfeeding according to the article entitled HIV/AIDS by the Mayo Clinic. Those viruses are life-threatening conditions caused by the human immunodeficiency virus and by damaging people’s immune systems. For those viruses, there is no cure, but medications can control the infection and prevent the progression of the disease. HIV typically turns into AIDS in about 8 to 10 years. However, Access to better antiviral treatments has dramatically decreased deaths from AIDS and most people today don’t develop AIDS.
While working with this dataset, I was a little concerned about the lack of transparency in the data. For example, in the column gender, we have female, male, transgender, and all. All is not a gender by itself. It can include males, females, and transgender also. For borough, race, and age variables, we also have all. Black, white, and Asian can be also included in all. It remains unclear and cannot help to get a real insight into HIV and AIDS. Additionally, there are not any variables that include which way someone can get the virus, we cannot make any regression linear because of that. For example, if HIV spreads through sexual contact, illicit injection drug, sharing needles, contact with infected blood, from mother to child during pregnancy, childbirth, or breastfeeding. This dataset is not sufficient to have a better understanding of the death rate related to HIV/AIDS. Additionally, I was asking myself a lot of questions such as whether is any consent needed to access people’s personal information, and who stored, owns, or controls this dataset although this dataset is from the New York government website. Finally, I wanted to create a multiple regression analysis to see which variable might be correlated, but I was not able to do that because of the dataset. However, I was surprised to see the fourth plot that the death rate among race is almost the same.
https://www.nyc.gov/site/doh/data/data-sets/hiv-aids-surveillance-and-epidemiology-reports.page
https://www.mayoclinic.org/diseases-conditions/hiv-aids/symptoms-causes/syc-20373524#:~:text=HIV%20is%20caused%20by%20a,helping%20your%20body%20fight%20disease.