Original Visualization
Source: CDC Youth Risk Behavior Surveillance
The data visualization shows the percentage in trends for three variables from the year 2001 to 2017. This data is sourced from the Centers for Disease Control and Prevention (CDC) Youth Risk Behavior Surveillance System focusing on students aged 13 to 18 that tracks behaviors that can range from sexual activities to recreational drugs. The CDC have these surveys every two years with those surveyed being anonymous and it is mainly to inform school districts and health departments of these trends to support the youth of the United States of America and thus reduce the transmission of sexually transmitted diseases/infections and human immunodeficiency virus cases.
Online Article
https://www.ashasexualhealth.org/national-youth-risk-behavior-survey-yrbs/
Objective
Based on the organization and the content it hosts on their website, their main audience are families with teenagers who wish to learn about sexually healthy lives with a focus on raising awareness via education and providing accessible information to achieve that. Their aim is to help increase the prevention rate of STIs or unwanted pregnancies among the youth by informing teenagers of the dangers and providing support to avoid that by practicing safe sex.
The article above shows a data visualization that indicates there is a downward trend in unsafe sexual activities as of 2017. This indicates that the organization (American Sexual Health Association) and other similar groups are on the right track and that their efforts are showing a positive result.
The objective of the data visualization is thus to inform all related parties that in general risky behavior of those surveyed has gone down in percentage compared to previous years. Hence, they should keep up the work to further decrease the percentage for the future.
Issues
Due to how the author scaled the y-axis with intervals of 20, it is rather hard to note the trend of the line graph which is the main point of this data visualization as explained in the objective. The author tried to rectify this by listing the highest value and the lowest value along with the values for 2001 for each category in the dataset to show that in general there is a downward trend by the latest year. Generally, you should not do this for line graphs.
If you read the online article, the author states three things which are percentage of those surveyed who: ever had sex, have used a condom prior to sexual intercourse and have used illicit drugs. The data visualization is missing data for the percentage that have taken drugs and instead shows percentage that has had 4 or more sexual partners. By comparing the original source of the data, I have concluded that the values for the mismatched categories are accurate but this discrepancy will definitely confuse readers as it gives the wrong answer to the question being asked.
The legend of the graph is rather confusing as the order of items does not match the order of the lines in the graph. This is because by default the order of items in the legend will be in alphabetical order. This may confuse the viewer and should be changed to avoid that.
Reconstruction Code
#for the dataset I have included values for the four categories mentioned in both the data visualization and the online article. also, I will be using a time frame from 2001 to 2019 to show a 20 year trend (since the article was published around 2018, the author only had access up to 2017. the author neglected to find data from 1999 so that they themselves can have a 20 year time frame however. CDC has documentation from all the way back to 1991)
dataset <- read.csv("Assignment2.csv")
dataset
## Year Ever.had.sex Had.4.or.more.partners Used.a.condom.during.sex
## 1 2001 45.6 14.2 57.9
## 2 2003 46.7 14.4 63.0
## 3 2005 46.8 14.3 62.8
## 4 2007 47.8 14.9 61.5
## 5 2009 46.0 13.8 61.1
## 6 2011 47.4 15.3 60.2
## 7 2013 46.8 15.0 59.1
## 8 2015 41.2 11.5 56.9
## 9 2017 39.5 9.7 53.8
## 10 2019 38.4 8.6 54.3
## Have.used.drugs
## 1 25.6
## 2 25.4
## 3 23.3
## 4 22.6
## 5 20.0
## 6 22.5
## 7 17.3
## 8 15.4
## 9 14.0
## 10 14.8
#use pivot_longer() to convert to long format
dataset <- dataset %>% pivot_longer(cols=c("Ever.had.sex","Had.4.or.more.partners","Used.a.condom.during.sex","Have.used.drugs"),names_to="Category",values_to="Percentage")
dataset
## # A tibble: 40 × 3
## Year Category Percentage
## <int> <chr> <dbl>
## 1 2001 Ever.had.sex 45.6
## 2 2001 Had.4.or.more.partners 14.2
## 3 2001 Used.a.condom.during.sex 57.9
## 4 2001 Have.used.drugs 25.6
## 5 2003 Ever.had.sex 46.7
## 6 2003 Had.4.or.more.partners 14.4
## 7 2003 Used.a.condom.during.sex 63
## 8 2003 Have.used.drugs 25.4
## 9 2005 Ever.had.sex 46.8
## 10 2005 Had.4.or.more.partners 14.3
## # ℹ 30 more rows
#change Category to factor and set according to intended legend ordering
dataset$Category <- dataset$Category %>% factor(levels=c("Used.a.condom.during.sex","Ever.had.sex","Have.used.drugs","Had.4.or.more.partners"))
#graph plotting, y-axis using intervals of 5 to better show the trend
datasetplot <- ggplot(dataset,aes(x=Year,y=Percentage,label=Percentage)) + geom_line(aes(color=Category),size=1) + scale_color_manual(name="Category",labels=c("Used a condom","Ever had sex","Have taken drugs","Had 4 or more sexual partners"),values=c("darkgrey","blue","darkgreen","red")) + scale_x_continuous(breaks=seq(2001,2019,2)) + scale_y_continuous(breaks=seq(0,70,5)) + ggtitle("Trends in Youth Risk Behaviors: 2001-2019")
datasetplot
References
American Sexual Health Association (2017-2018), National Youth Risk Behavior Survey, viewed 5th April 2023, https://www.ashasexualhealth.org/national-youth-risk-behavior-survey-yrbs/
Centers for Disease Control and Prevention (2001), Youth Risk Behavior Surveillance – United States, 2001, https://www.cdc.gov/mmwr/preview/mmwrhtml/ss5104a1.htm
Centers for Disease Control and Prevention (2003), Youth Risk Behavior Surveillance – United States, 2003, https://www.cdc.gov/mmwr/preview/mmwrhtml/ss5302a1.htm
Centers for Disease Control and Prevention (2005), Youth Risk Behavior Surveillance – United States, 2005, https://www.cdc.gov/mmwr/preview/mmwrhtml/ss5505a1.htm
Centers for Disease Control and Prevention (2018), Youth Risk Behavior Survey Data Summary & Trends Report 2007-2017, https://www.cdc.gov/healthyyouth/data/yrbs/pdf/trendsreport.pdf
Centers for Disease Control and Prevention (2020), Youth Risk Behavior Survey Data Summary & Trends Report 2009-2019, https://www.cdc.gov/healthyyouth/data/yrbs/pdf/YRBSDataSummaryTrendsReport2019-508.pdf