CDC Presentation
This dataset includes data on adolescent’s diet, physical activity, and weight status from Youth Risk Behavior Surveillance System (YRBSS). This data is used by the Division of Nutrition Physical Activity and Obesity DNPAO’s Data , Trends, and Maps database, which provides national and state specific data on obesity, nutrition, physical activity. It was Published by CDC Centers for Disease Control and Prevention
I chose this dataset because obesity is an issue in my immediate and stand family. I believe if a youth is educated in the importance of eating health and exercise it is possible to break the circle of obesity. As result, in the future we will have people with disposition to work with less health issues.
Overweight and Obesity
Health Issues
This data source was collected by Youth Risk Behavior Surveillance System YRBSS monitors six types of health-risk behaviors that contribute to the leading causes of death and disability among youth and adults.
It is the period that the search was done when it started and when it was ended.
It is the states of the United States and their abbreviation.
It is the question that was ask students in grades 9-12.
It is the number of participants or observations that was done in the study
It is a number, whose value is determined by the data, which is less than a certain parameter with a given degree of confidence.
It is a higher confidence level means a higher percentage of all samples produce a statistic close to the true value of the parameter. It is a small margin of error that allow people to get closer to the true value of the parameter.
Geo-location is a general term that encompasses all techniques to identify a user’s location and the location ID is the state code.
The tidyverse package was used to created the plot and setwd was used to created the path in order ro R to read the file.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.1.8
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
setwd("C:/Users/aline/Downloads/New folder Project 2 NEW")
This first chunk was named “youth obesity” in order to read from the csv file “Nutrition__Physical_Activity__and_Obesity_-_Youth_Risk_Behavior_Surveillance_System.” This file have 40096 observation and 31 variables. pretty much covers most of all states from the United states.
setwd("C:/Users/aline/Downloads/New folder Project 2 NEW")
youthobesity <- read_csv("Nutrition__Physical_Activity__and_Obesity_-_Youth_Risk_Behavior_Surveillance_System.csv")
## Rows: 40096 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (22): LocationAbbr, LocationDesc, Datasource, Class, Topic, Question, Da...
## dbl (8): YearStart, YearEnd, Data_Value, Data_Value_Alt, Low_Confidence_Lim...
## lgl (1): Data_Value_Unit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Second chunk. The summary of the file. It shows that the median of years which does students were analyzed it is 2011.
summary(youthobesity)
## YearStart YearEnd LocationAbbr LocationDesc
## Min. :2001 Min. :2001 Length:40096 Length:40096
## 1st Qu.:2007 1st Qu.:2007 Class :character Class :character
## Median :2011 Median :2011 Mode :character Mode :character
## Mean :2011 Mean :2011
## 3rd Qu.:2015 3rd Qu.:2015
## Max. :2019 Max. :2019
##
## Datasource Class Topic Question
## Length:40096 Length:40096 Length:40096 Length:40096
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Data_Value_Unit Data_Value_Type Data_Value Data_Value_Alt
## Mode:logical Length:40096 Min. : 0.00 Min. : 0.00
## NA's:40096 Class :character 1st Qu.:15.60 1st Qu.:15.60
## Mode :character Median :23.70 Median :23.70
## Mean :26.31 Mean :26.31
## 3rd Qu.:36.60 3rd Qu.:36.60
## Max. :81.90 Max. :81.90
## NA's :10636 NA's :10636
## Data_Value_Footnote_Symbol Data_Value_Footnote Low_Confidence_Limit
## Length:40096 Length:40096 Min. : 0.00
## Class :character Class :character 1st Qu.:12.20
## Mode :character Mode :character Median :19.00
## Mean :21.85
## 3rd Qu.:31.20
## Max. :75.80
## NA's :10636
## High_Confidence_Limit Sample_Size Total Gender
## Min. : 0.00 Min. : 100.0 Length:40096 Length:40096
## 1st Qu.:19.80 1st Qu.: 371.8 Class :character Class :character
## Median :29.40 Median : 710.0 Mode :character Mode :character
## Mean :31.53 Mean : 1333.2
## 3rd Qu.:42.20 3rd Qu.: 1394.0
## Max. :89.60 Max. :53042.0
## NA's :10636 NA's :10636
## Grade Race/Ethnicity GeoLocation ClassID
## Length:40096 Length:40096 Length:40096 Length:40096
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## TopicID QuestionID DataValueTypeID LocationID
## Length:40096 Length:40096 Length:40096 Min. : 1.00
## Class :character Class :character Class :character 1st Qu.:16.00
## Mode :character Mode :character Mode :character Median :30.00
## Mean :30.22
## 3rd Qu.:45.00
## Max. :78.00
##
## StratificationCategory1 Stratification1 StratificationCategoryId1
## Length:40096 Length:40096 Length:40096
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## StratificationID1
## Length:40096
## Class :character
## Mode :character
##
##
##
##
Third chunk. Checking how many column and rows the dataset have and which ones are quantitative and qualitative.
glimpse(youthobesity)
## Rows: 40,096
## Columns: 31
## $ YearStart <dbl> 2019, 2019, 2019, 2019, 2019, 2011, 2019, 2…
## $ YearEnd <dbl> 2019, 2019, 2019, 2019, 2019, 2011, 2019, 2…
## $ LocationAbbr <chr> "AK", "AZ", "DC", "IL", "MD", "RI", "MT", "…
## $ LocationDesc <chr> "Alaska", "Arizona", "District of Columbia"…
## $ Datasource <chr> "Youth Risk Behavior Surveillance System", …
## $ Class <chr> "Physical Activity", "Obesity / Weight Stat…
## $ Topic <chr> "Physical Activity - Behavior", "Obesity / …
## $ Question <chr> "Percent of students in grades 9-12 who ach…
## $ Data_Value_Unit <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Type <chr> "Value", "Value", "Value", "Value", "Value"…
## $ Data_Value <dbl> 9.6, 13.3, 17.1, 65.9, 16.7, 11.5, 12.2, 10…
## $ Data_Value_Alt <dbl> 9.6, 13.3, 17.1, 65.9, 16.7, 11.5, 12.2, 10…
## $ Data_Value_Footnote_Symbol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Footnote <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Low_Confidence_Limit <dbl> 5.5, 10.7, 16.2, 49.8, 14.5, 8.9, 10.0, 7.7…
## $ High_Confidence_Limit <dbl> 16.5, 16.5, 17.9, 79.1, 19.2, 14.8, 14.9, 1…
## $ Sample_Size <dbl> 130, 1739, 8978, 876, 2573, 1199, 1045, 649…
## $ Total <chr> NA, "Total", "Total", NA, NA, NA, NA, NA, N…
## $ Gender <chr> NA, NA, NA, NA, NA, NA, NA, "Female", NA, N…
## $ Grade <chr> NA, NA, NA, "10th", NA, "9th", "10th", NA, …
## $ `Race/Ethnicity` <chr> "Asian", NA, NA, NA, "2 or more races", NA,…
## $ GeoLocation <chr> "(64.845079957001, -147.722059036)", "(34.8…
## $ ClassID <chr> "PA", "OWS", "OWS", "PA", "OWS", "OWS", "OW…
## $ TopicID <chr> "PA1", "OWS1", "OWS1", "PA1", "OWS1", "OWS1…
## $ QuestionID <chr> "Q048", "Q038", "Q038", "Q049", "Q039", "Q0…
## $ DataValueTypeID <chr> "VALUE", "VALUE", "VALUE", "VALUE", "VALUE"…
## $ LocationID <dbl> 2, 4, 11, 17, 24, 44, 30, 31, 35, 36, 37, 3…
## $ StratificationCategory1 <chr> "Race/Ethnicity", "Total", "Total", "Grade"…
## $ Stratification1 <chr> "Asian", "Total", "Total", "10th", "2 or mo…
## $ StratificationCategoryId1 <chr> "RACE", "OVR", "OVR", "GRADE", "RACE", "GRA…
## $ StratificationID1 <chr> "RACEASN", "OVERALL", "OVERALL", "GRADE10",…
Fourth chunk. Chunk name obesity that reads from the dataset youth obesity. In this chunk will be filter the “Class” which will be analyze the students who have obesity or are overwighted.
obesity <- youthobesity %>%
filter(Class == "Obesity / Weight Status")
Fifth chunk. This chunk was named obesity2 and reads from the chunk obesity. The goal of this chunk is to filter the end year 2019.
obesity2 <- obesity %>%
filter(YearEnd == "2019")
Sixth chunk. This chunk was named obesity3 and reads from the chunk obesity2. The propose of this chunk is to filter the sample size which were higher than 1000.
obesity3 <- obesity2 %>%
filter(Sample_Size > 1000)
Seventh chunk. This chunk was named obesity4 and reads from the chunk obesity 3. the goal of this chunk is to filter just the state of Maryland.
obesity4 <- obesity3 %>%
filter(LocationDesc == "Maryland")
Eighth chunk. Named obesity5 this chunk reads from obesity4. The goal is to filter the grades 9th, 10th, 11th, and 12th.
obesity5 <- obesity4 %>%
filter(Grade %in% c("9th", "10th", "11th", "12th"))
Extra packages needed in order to create the highcharters
library(devtools)
## Loading required package: usethis
## Warning: package 'usethis' was built under R version 4.2.3
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.2.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
This plot named p1 was created base on the data from the chunk obesity3 which have all 31 variables most mentioned above in the glimpse. it was filtered by the “Sample Size” higher than 1000, have the year end in “2019”, the class chose was “obesity / weight status”, have all the grades and NA plus all the states.
p1 <- obesity3 %>%
#establish plot and aestetics
hchart('scatter', hcaes(x = Grade, y = Sample_Size, group = LocationDesc)) %>%
#establish color
hc_colorAxis() %>%
#establish font for chart
hc_chart(style = list(fontFamily = "NewCenturySchoolbook",
fontWeight = "bold")) %>%
#establish title on x and y axis
hc_xAxis(title = list(text="Grade")) %>%
hc_yAxis(title = list(text="Sample Size"))%>%
#establish title and subtitle
hc_title( text = "Obesity / Weight Status ") %>%
hc_subtitle(text = "at the Schools in USA") %>%
#establish theme
hc_add_theme(hc_theme_gridlight()) %>%
hc_tooltip(shared = TRUE)
p1
This second plot named p2 reads from the the data obesity5. This data have 8 observations and the 31 variables. Those variables were filtered and the mean ones that will be used in this plot are the state of Maryland, the Class Obesity and Weight Status, and the High Confidence Limit.
p2 <- obesity5 %>%
#establish plot and aestetics
hchart('scatter', hcaes(x = High_Confidence_Limit, y = Sample_Size, group = Grade)) %>%
#establish color
hc_colorAxis() %>%
#establish font for chart
hc_chart(style = list(fontFamily = "NewCenturySchoolbook",
fontWeight = "bold")) %>%
#establish title on x and y axis
hc_xAxis(title = list(text="High Confidence Limite")) %>%
hc_yAxis(title = list(text="Sample Size"))%>%
#establish title and subtitle
hc_title( text = "Obesity / Weight Status") %>%
hc_subtitle(text = "in the Schools in Maryland") %>%
#establish theme
hc_add_theme(hc_theme_gridlight()) %>%
hc_tooltip(shared = TRUE)
p2
Question: What the visualization represents, any interesting patterns or surprises that arise within the visualization, and anything that could have been shown that you could not get to work or that you wished you could have included.
The visualization P2 represents the samples sizes collected in the grades 9th, 10th, 11th , and 12th plus the high confident limit of each grade relate to obesity in the state of Maryland. Analyzing the visualization, the 11th grades have the higher sample sizes and the high confidence limit that was above ten. My goal was to see the obesity and overweight in Maryland, so I am wondering if that this visualization is trying to show that the index of obesity is higher in the 11 grades comparing to the other grades. What I wish and could not get to work was used this the data to create a Maryland map with GIS. I would like to try to make an USA map with some states beside Maryland to show their level of obesity or activity for each grade.