Nutrition, Physical Activity, and Obesity - Youth Risk Behavior Surveillance System

CDC Presentation

This dataset includes data on adolescent’s diet, physical activity, and weight status from Youth Risk Behavior Surveillance System (YRBSS). This data is used by the Division of Nutrition Physical Activity and Obesity DNPAO’s Data , Trends, and Maps database, which provides national and state specific data on obesity, nutrition, physical activity. It was Published by CDC Centers for Disease Control and Prevention

I chose this dataset because obesity is an issue in my immediate and stand family. I believe if a youth is educated in the importance of eating health and exercise it is possible to break the circle of obesity. As result, in the future we will have people with disposition to work with less health issues.

Overweight X Obesity

Overweight and Obesity

Health Issues

Health Issues

The Variables that this dataset have and were used:

Data source

This data source was collected by Youth Risk Behavior Surveillance System YRBSS monitors six types of health-risk behaviors that contribute to the leading causes of death and disability among youth and adults.

“YearStart” and “YearEnd”

It is the period that the search was done when it started and when it was ended.

LocationDes & Location Av]bbr

It is the states of the United States and their abbreviation.

“Class” and “Topic” were divided by:

  • Physical Activity - The students are physical active.
  • Obesity / Weight Status - The students are obese or over weighted.
  • Sugar Drinks - The students who drink sodas, juice and etc….
  • Fruits and Vegetables - The students who eat fruits and vegetables
  • Television Viewing - The students who spend time on television.

Question

It is the question that was ask students in grades 9-12.

Sample_Size

It is the number of participants or observations that was done in the study

Low_Confidence_Limit

It is a number, whose value is determined by the data, which is less than a certain parameter with a given degree of confidence.

High_Confidence_Limit

It is a higher confidence level means a higher percentage of all samples produce a statistic close to the true value of the parameter. It is a small margin of error that allow people to get closer to the true value of the parameter.

GeoLocation & Location ID

Geo-location is a general term that encompasses all techniques to identify a user’s location and the location ID is the state code.

Loading Tidyverse and Creating File Path

The tidyverse package was used to created the plot and setwd was used to created the path in order ro R to read the file.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
setwd("C:/Users/aline/Downloads/New folder Project 2 NEW")

Reading the CSV File

This first chunk was named “youth obesity” in order to read from the csv file “Nutrition__Physical_Activity__and_Obesity_-_Youth_Risk_Behavior_Surveillance_System.” This file have 40096 observation and 31 variables. pretty much covers most of all states from the United states.

setwd("C:/Users/aline/Downloads/New folder Project 2 NEW")
youthobesity <- read_csv("Nutrition__Physical_Activity__and_Obesity_-_Youth_Risk_Behavior_Surveillance_System.csv")
## Rows: 40096 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (22): LocationAbbr, LocationDesc, Datasource, Class, Topic, Question, Da...
## dbl  (8): YearStart, YearEnd, Data_Value, Data_Value_Alt, Low_Confidence_Lim...
## lgl  (1): Data_Value_Unit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

File Summary

Second chunk. The summary of the file. It shows that the median of years which does students were analyzed it is 2011.

summary(youthobesity)
##    YearStart       YearEnd     LocationAbbr       LocationDesc      
##  Min.   :2001   Min.   :2001   Length:40096       Length:40096      
##  1st Qu.:2007   1st Qu.:2007   Class :character   Class :character  
##  Median :2011   Median :2011   Mode  :character   Mode  :character  
##  Mean   :2011   Mean   :2011                                        
##  3rd Qu.:2015   3rd Qu.:2015                                        
##  Max.   :2019   Max.   :2019                                        
##                                                                     
##   Datasource           Class              Topic             Question        
##  Length:40096       Length:40096       Length:40096       Length:40096      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Data_Value_Unit Data_Value_Type      Data_Value    Data_Value_Alt 
##  Mode:logical    Length:40096       Min.   : 0.00   Min.   : 0.00  
##  NA's:40096      Class :character   1st Qu.:15.60   1st Qu.:15.60  
##                  Mode  :character   Median :23.70   Median :23.70  
##                                     Mean   :26.31   Mean   :26.31  
##                                     3rd Qu.:36.60   3rd Qu.:36.60  
##                                     Max.   :81.90   Max.   :81.90  
##                                     NA's   :10636   NA's   :10636  
##  Data_Value_Footnote_Symbol Data_Value_Footnote Low_Confidence_Limit
##  Length:40096               Length:40096        Min.   : 0.00       
##  Class :character           Class :character    1st Qu.:12.20       
##  Mode  :character           Mode  :character    Median :19.00       
##                                                 Mean   :21.85       
##                                                 3rd Qu.:31.20       
##                                                 Max.   :75.80       
##                                                 NA's   :10636       
##  High_Confidence_Limit  Sample_Size         Total              Gender         
##  Min.   : 0.00         Min.   :  100.0   Length:40096       Length:40096      
##  1st Qu.:19.80         1st Qu.:  371.8   Class :character   Class :character  
##  Median :29.40         Median :  710.0   Mode  :character   Mode  :character  
##  Mean   :31.53         Mean   : 1333.2                                        
##  3rd Qu.:42.20         3rd Qu.: 1394.0                                        
##  Max.   :89.60         Max.   :53042.0                                        
##  NA's   :10636         NA's   :10636                                          
##     Grade           Race/Ethnicity     GeoLocation          ClassID         
##  Length:40096       Length:40096       Length:40096       Length:40096      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    TopicID           QuestionID        DataValueTypeID      LocationID   
##  Length:40096       Length:40096       Length:40096       Min.   : 1.00  
##  Class :character   Class :character   Class :character   1st Qu.:16.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :30.00  
##                                                           Mean   :30.22  
##                                                           3rd Qu.:45.00  
##                                                           Max.   :78.00  
##                                                                          
##  StratificationCategory1 Stratification1    StratificationCategoryId1
##  Length:40096            Length:40096       Length:40096             
##  Class :character        Class :character   Class :character         
##  Mode  :character        Mode  :character   Mode  :character         
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##  StratificationID1 
##  Length:40096      
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

Glimpse

Third chunk. Checking how many column and rows the dataset have and which ones are quantitative and qualitative.

glimpse(youthobesity)
## Rows: 40,096
## Columns: 31
## $ YearStart                  <dbl> 2019, 2019, 2019, 2019, 2019, 2011, 2019, 2…
## $ YearEnd                    <dbl> 2019, 2019, 2019, 2019, 2019, 2011, 2019, 2…
## $ LocationAbbr               <chr> "AK", "AZ", "DC", "IL", "MD", "RI", "MT", "…
## $ LocationDesc               <chr> "Alaska", "Arizona", "District of Columbia"…
## $ Datasource                 <chr> "Youth Risk Behavior Surveillance System", …
## $ Class                      <chr> "Physical Activity", "Obesity / Weight Stat…
## $ Topic                      <chr> "Physical Activity - Behavior", "Obesity / …
## $ Question                   <chr> "Percent of students in grades 9-12 who ach…
## $ Data_Value_Unit            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Type            <chr> "Value", "Value", "Value", "Value", "Value"…
## $ Data_Value                 <dbl> 9.6, 13.3, 17.1, 65.9, 16.7, 11.5, 12.2, 10…
## $ Data_Value_Alt             <dbl> 9.6, 13.3, 17.1, 65.9, 16.7, 11.5, 12.2, 10…
## $ Data_Value_Footnote_Symbol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Footnote        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Low_Confidence_Limit       <dbl> 5.5, 10.7, 16.2, 49.8, 14.5, 8.9, 10.0, 7.7…
## $ High_Confidence_Limit      <dbl> 16.5, 16.5, 17.9, 79.1, 19.2, 14.8, 14.9, 1…
## $ Sample_Size                <dbl> 130, 1739, 8978, 876, 2573, 1199, 1045, 649…
## $ Total                      <chr> NA, "Total", "Total", NA, NA, NA, NA, NA, N…
## $ Gender                     <chr> NA, NA, NA, NA, NA, NA, NA, "Female", NA, N…
## $ Grade                      <chr> NA, NA, NA, "10th", NA, "9th", "10th", NA, …
## $ `Race/Ethnicity`           <chr> "Asian", NA, NA, NA, "2 or more races", NA,…
## $ GeoLocation                <chr> "(64.845079957001, -147.722059036)", "(34.8…
## $ ClassID                    <chr> "PA", "OWS", "OWS", "PA", "OWS", "OWS", "OW…
## $ TopicID                    <chr> "PA1", "OWS1", "OWS1", "PA1", "OWS1", "OWS1…
## $ QuestionID                 <chr> "Q048", "Q038", "Q038", "Q049", "Q039", "Q0…
## $ DataValueTypeID            <chr> "VALUE", "VALUE", "VALUE", "VALUE", "VALUE"…
## $ LocationID                 <dbl> 2, 4, 11, 17, 24, 44, 30, 31, 35, 36, 37, 3…
## $ StratificationCategory1    <chr> "Race/Ethnicity", "Total", "Total", "Grade"…
## $ Stratification1            <chr> "Asian", "Total", "Total", "10th", "2 or mo…
## $ StratificationCategoryId1  <chr> "RACE", "OVR", "OVR", "GRADE", "RACE", "GRA…
## $ StratificationID1          <chr> "RACEASN", "OVERALL", "OVERALL", "GRADE10",…

Filtering the File by Class

Fourth chunk. Chunk name obesity that reads from the dataset youth obesity. In this chunk will be filter the “Class” which will be analyze the students who have obesity or are overwighted.

obesity <- youthobesity %>%
  filter(Class ==   "Obesity / Weight Status")

Filtering the File by Year End

Fifth chunk. This chunk was named obesity2 and reads from the chunk obesity. The goal of this chunk is to filter the end year 2019.

obesity2 <- obesity %>%
  filter(YearEnd == "2019")

Filter the File by Sample Size

Sixth chunk. This chunk was named obesity3 and reads from the chunk obesity2. The propose of this chunk is to filter the sample size which were higher than 1000.

obesity3 <- obesity2 %>%
  filter(Sample_Size > 1000)

Filtering the File by Location

Seventh chunk. This chunk was named obesity4 and reads from the chunk obesity 3. the goal of this chunk is to filter just the state of Maryland.

obesity4 <- obesity3 %>%
  filter(LocationDesc == "Maryland")

Filtering the File by Grades

Eighth chunk. Named obesity5 this chunk reads from obesity4. The goal is to filter the grades 9th, 10th, 11th, and 12th.

obesity5 <- obesity4 %>%
  filter(Grade %in% c("9th", "10th", "11th", "12th"))

Loading Devtools and Higcharter

Extra packages needed in order to create the highcharters

library(devtools)
## Loading required package: usethis
## Warning: package 'usethis' was built under R version 4.2.3
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.2.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use

First Plot

This plot named p1 was created base on the data from the chunk obesity3 which have all 31 variables most mentioned above in the glimpse. it was filtered by the “Sample Size” higher than 1000, have the year end in “2019”, the class chose was “obesity / weight status”, have all the grades and NA plus all the states.

p1 <- obesity3 %>% 
#establish plot and aestetics 
  hchart('scatter', hcaes(x = Grade, y = Sample_Size, group = LocationDesc)) %>%
#establish color 
    hc_colorAxis() %>%
#establish font for chart
hc_chart(style = list(fontFamily = "NewCenturySchoolbook",
                        fontWeight = "bold")) %>%
#establish title on x  and y axis
      hc_xAxis(title = list(text="Grade")) %>%
      hc_yAxis(title = list(text="Sample Size"))%>%
#establish title and subtitle
   hc_title( text = "Obesity / Weight Status ") %>% 
      hc_subtitle(text = "at the Schools in USA") %>%
#establish theme 
   hc_add_theme(hc_theme_gridlight()) %>%
  hc_tooltip(shared = TRUE)
p1

Second Plot

This second plot named p2 reads from the the data obesity5. This data have 8 observations and the 31 variables. Those variables were filtered and the mean ones that will be used in this plot are the state of Maryland, the Class Obesity and Weight Status, and the High Confidence Limit.

p2 <- obesity5 %>% 
#establish plot and aestetics 
  hchart('scatter', hcaes(x = High_Confidence_Limit, y = Sample_Size, group = Grade)) %>%
#establish color 
    hc_colorAxis() %>%
#establish font for chart
hc_chart(style = list(fontFamily = "NewCenturySchoolbook",
                        fontWeight = "bold")) %>%
#establish title on x  and y axis
      hc_xAxis(title = list(text="High Confidence Limite")) %>%
      hc_yAxis(title = list(text="Sample Size"))%>%
#establish title and subtitle
   hc_title( text = "Obesity / Weight Status") %>% 
      hc_subtitle(text = "in the Schools in Maryland") %>%
#establish theme 
   hc_add_theme(hc_theme_gridlight()) %>%
  hc_tooltip(shared = TRUE)
p2

Question: What the visualization represents, any interesting patterns or surprises that arise within the visualization, and anything that could have been shown that you could not get to work or that you wished you could have included.

The visualization P2 represents the samples sizes collected in the grades 9th, 10th, 11th , and 12th plus the high confident limit of each grade relate to obesity in the state of Maryland. Analyzing the visualization, the 11th grades have the higher sample sizes and the high confidence limit that was above ten. My goal was to see the obesity and overweight in Maryland, so I am wondering if that this visualization is trying to show that the index of obesity is higher in the 11 grades comparing to the other grades. What I wish and could not get to work was used this the data to create a Maryland map with GIS. I would like to try to make an USA map with some states beside Maryland to show their level of obesity or activity for each grade.