Nutrition, Physical Activity, and Obesity - Youth Risk Behavior Surveillance System

Presentation

This dataset includes data on adolescent’s diet, physical activity, and weight status from Youth Risk Behavior Surveillance System (YRBSS). This data is used for Division of Nutrition Physical Activity and Obesity DNPAO’s Data , Trends, and Maps database, which provides national and state specific data on obesity, nutrition, physical activity. It was Published by CDC Centers for Disease Control and Prevention

I chose this dataset because obesity is an issue in my immediate and stand family. I believe if a youth is educated in the importance of eating health and exercise it is possible to break the circle of obesity. As result, the future will be people with more dispensation to work with less health issues.

Overweight X Obesity

Overweight and Obesity

Health Issues

Health Issues

The Variables that this dataset have and will be use are:

Data source

It was collected by Youth Risk Behavior Surveillance System YRBSS monitors six types of health-risk behaviors that contribute to the leading causes of death and disability among youth and adults.

“YearStart” and “YearEnd”

It is the period that the search was done.

LocationDes

It is the states and their abbreviation “LocationAbbr”

“Class” and “Topic” are divided by:

  • Physical Activity
  • Obesity / Weight Status
  • Sugar Drinks
  • Fruits and Vegetables
  • Television Viewing

Question

It is Percent of students in grades 9-12 who achieve 1 hour or are obesy or eat vegetables or drink sugar or watch TV

Sample_Size

It is the number of participants or observations included in a study

Low_Confidence_Limit

It is a number, whose value is determined by the data, which is less than a certain parameter with a given degree of confidence.

High_Confidence_Limit

It is a higher confidence level means a higher percentage of all samples produce a statistic close to the true value of the parameter. It is a small margin of error that allow people to get closer to the true value of the parameter.

GeoLocation & Location ID

Geo-location is a general term that encompasses all techniques to identify a user’s location.

Loading Tidyverse and Creating File Path

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
setwd("C:/Users/aline/Downloads/New folder Project 2 NEW")

Reading the CSV File

youthobesity <- read_csv("Nutrition__Physical_Activity__and_Obesity_-_Youth_Risk_Behavior_Surveillance_System.csv")
## Rows: 40096 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (22): LocationAbbr, LocationDesc, Datasource, Class, Topic, Question, Da...
## dbl  (8): YearStart, YearEnd, Data_Value, Data_Value_Alt, Low_Confidence_Lim...
## lgl  (1): Data_Value_Unit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

File Summary

summary(youthobesity)
##    YearStart       YearEnd     LocationAbbr       LocationDesc      
##  Min.   :2001   Min.   :2001   Length:40096       Length:40096      
##  1st Qu.:2007   1st Qu.:2007   Class :character   Class :character  
##  Median :2011   Median :2011   Mode  :character   Mode  :character  
##  Mean   :2011   Mean   :2011                                        
##  3rd Qu.:2015   3rd Qu.:2015                                        
##  Max.   :2019   Max.   :2019                                        
##                                                                     
##   Datasource           Class              Topic             Question        
##  Length:40096       Length:40096       Length:40096       Length:40096      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Data_Value_Unit Data_Value_Type      Data_Value    Data_Value_Alt 
##  Mode:logical    Length:40096       Min.   : 0.00   Min.   : 0.00  
##  NA's:40096      Class :character   1st Qu.:15.60   1st Qu.:15.60  
##                  Mode  :character   Median :23.70   Median :23.70  
##                                     Mean   :26.31   Mean   :26.31  
##                                     3rd Qu.:36.60   3rd Qu.:36.60  
##                                     Max.   :81.90   Max.   :81.90  
##                                     NA's   :10636   NA's   :10636  
##  Data_Value_Footnote_Symbol Data_Value_Footnote Low_Confidence_Limit
##  Length:40096               Length:40096        Min.   : 0.00       
##  Class :character           Class :character    1st Qu.:12.20       
##  Mode  :character           Mode  :character    Median :19.00       
##                                                 Mean   :21.85       
##                                                 3rd Qu.:31.20       
##                                                 Max.   :75.80       
##                                                 NA's   :10636       
##  High_Confidence_Limit  Sample_Size         Total              Gender         
##  Min.   : 0.00         Min.   :  100.0   Length:40096       Length:40096      
##  1st Qu.:19.80         1st Qu.:  371.8   Class :character   Class :character  
##  Median :29.40         Median :  710.0   Mode  :character   Mode  :character  
##  Mean   :31.53         Mean   : 1333.2                                        
##  3rd Qu.:42.20         3rd Qu.: 1394.0                                        
##  Max.   :89.60         Max.   :53042.0                                        
##  NA's   :10636         NA's   :10636                                          
##     Grade           Race/Ethnicity     GeoLocation          ClassID         
##  Length:40096       Length:40096       Length:40096       Length:40096      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    TopicID           QuestionID        DataValueTypeID      LocationID   
##  Length:40096       Length:40096       Length:40096       Min.   : 1.00  
##  Class :character   Class :character   Class :character   1st Qu.:16.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :30.00  
##                                                           Mean   :30.22  
##                                                           3rd Qu.:45.00  
##                                                           Max.   :78.00  
##                                                                          
##  StratificationCategory1 Stratification1    StratificationCategoryId1
##  Length:40096            Length:40096       Length:40096             
##  Class :character        Class :character   Class :character         
##  Mode  :character        Mode  :character   Mode  :character         
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##  StratificationID1 
##  Length:40096      
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

Glimpse

Checking the column and rows

glimpse(youthobesity)
## Rows: 40,096
## Columns: 31
## $ YearStart                  <dbl> 2019, 2019, 2019, 2019, 2019, 2011, 2019, 2…
## $ YearEnd                    <dbl> 2019, 2019, 2019, 2019, 2019, 2011, 2019, 2…
## $ LocationAbbr               <chr> "AK", "AZ", "DC", "IL", "MD", "RI", "MT", "…
## $ LocationDesc               <chr> "Alaska", "Arizona", "District of Columbia"…
## $ Datasource                 <chr> "Youth Risk Behavior Surveillance System", …
## $ Class                      <chr> "Physical Activity", "Obesity / Weight Stat…
## $ Topic                      <chr> "Physical Activity - Behavior", "Obesity / …
## $ Question                   <chr> "Percent of students in grades 9-12 who ach…
## $ Data_Value_Unit            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Type            <chr> "Value", "Value", "Value", "Value", "Value"…
## $ Data_Value                 <dbl> 9.6, 13.3, 17.1, 65.9, 16.7, 11.5, 12.2, 10…
## $ Data_Value_Alt             <dbl> 9.6, 13.3, 17.1, 65.9, 16.7, 11.5, 12.2, 10…
## $ Data_Value_Footnote_Symbol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Footnote        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Low_Confidence_Limit       <dbl> 5.5, 10.7, 16.2, 49.8, 14.5, 8.9, 10.0, 7.7…
## $ High_Confidence_Limit      <dbl> 16.5, 16.5, 17.9, 79.1, 19.2, 14.8, 14.9, 1…
## $ Sample_Size                <dbl> 130, 1739, 8978, 876, 2573, 1199, 1045, 649…
## $ Total                      <chr> NA, "Total", "Total", NA, NA, NA, NA, NA, N…
## $ Gender                     <chr> NA, NA, NA, NA, NA, NA, NA, "Female", NA, N…
## $ Grade                      <chr> NA, NA, NA, "10th", NA, "9th", "10th", NA, …
## $ `Race/Ethnicity`           <chr> "Asian", NA, NA, NA, "2 or more races", NA,…
## $ GeoLocation                <chr> "(64.845079957001, -147.722059036)", "(34.8…
## $ ClassID                    <chr> "PA", "OWS", "OWS", "PA", "OWS", "OWS", "OW…
## $ TopicID                    <chr> "PA1", "OWS1", "OWS1", "PA1", "OWS1", "OWS1…
## $ QuestionID                 <chr> "Q048", "Q038", "Q038", "Q049", "Q039", "Q0…
## $ DataValueTypeID            <chr> "VALUE", "VALUE", "VALUE", "VALUE", "VALUE"…
## $ LocationID                 <dbl> 2, 4, 11, 17, 24, 44, 30, 31, 35, 36, 37, 3…
## $ StratificationCategory1    <chr> "Race/Ethnicity", "Total", "Total", "Grade"…
## $ Stratification1            <chr> "Asian", "Total", "Total", "10th", "2 or mo…
## $ StratificationCategoryId1  <chr> "RACE", "OVR", "OVR", "GRADE", "RACE", "GRA…
## $ StratificationID1          <chr> "RACEASN", "OVERALL", "OVERALL", "GRADE10",…

Filtering the File by Class

obesity <- youthobesity %>%
  filter(Class ==   "Obesity / Weight Status")

Filtering the File by Year End

obesity2 <- obesity %>%
  filter(YearEnd == "2019")

Filter the File by Sample Size

obesity3 <- obesity2 %>%
  filter(Sample_Size > 1000)

Filtering the File by Location

obesity4 <- obesity3 %>%
  filter(LocationDesc %in% c("Maryland", "Distric of Columbia"))

Filtering the File by Grades

obesity5 <- obesity4 %>%
  filter(Grade %in% c("9th", "10th", "11th", "12th"))

Loading Devtools and Higcharter

library(devtools)
## Loading required package: usethis
## Warning: package 'usethis' was built under R version 4.2.3
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.2.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

First Plot

p1 <- obesity3 %>% 
#establish plot and aestetics 
  hchart('scatter', hcaes(x = Grade, y = Sample_Size, group = LocationDesc)) %>%
#establish color 
    hc_colorAxis() %>%
#establish font for chart
hc_chart(style = list(fontFamily = "NewCenturySchoolbook",
                        fontWeight = "bold")) %>%
#establish title on x  and y axis
      hc_xAxis(title = list(text="Grade")) %>%
      hc_yAxis(title = list(text="Sample Size"))%>%
#establish title and subtitle
   hc_title( text = "Obesity / Weight Status ") %>% 
      hc_subtitle(text = "at the Schools in USA") %>%
#establish theme 
   hc_add_theme(hc_theme_gridlight()) %>%
  hc_tooltip(shared = TRUE)
p1

Second Plot

p2 <- obesity5 %>% 
#establish plot and aestetics 
  hchart('scatter', hcaes(x = High_Confidence_Limit, y = Sample_Size, group = Grade)) %>%
#establish color 
    hc_colorAxis() %>%
#establish font for chart
hc_chart(style = list(fontFamily = "NewCenturySchoolbook",
                        fontWeight = "bold")) %>%
#establish title on x  and y axis
      hc_xAxis(title = list(text="High Confidence Limite")) %>%
      hc_yAxis(title = list(text="Sample Size"))%>%
#establish title and subtitle
   hc_title( text = "Obesity / Weight Status") %>% 
      hc_subtitle(text = "in the Schools in Maryland") %>%
#establish theme 
   hc_add_theme(hc_theme_gridlight()) %>%
  hc_tooltip(shared = TRUE)
p2

What the visualization represents, any interesting patterns or surprises that arise within the visualization, and anything that could have been shown that you could not get to work or that you wished you could have included.

The visualization P2 represents the samples sizes collected in the grades 9th, 10th, 11th , and 12th plus the high confident limit of each grade relate to obesity in the state of Maryland. Analyzing the visualization, the 11th grades the higher sample sizes and the high confidence limit it was above ten, so I am wondering if that try to show that the index of obesity is higher in the 11 grades. I could not get to work with GIS. I would like to try to make an USA map with some states showing their level of obesity or activity for each grade.