Presentation
This dataset includes data on adolescent’s diet, physical activity, and weight status from Youth Risk Behavior Surveillance System (YRBSS). This data is used for Division of Nutrition Physical Activity and Obesity DNPAO’s Data , Trends, and Maps database, which provides national and state specific data on obesity, nutrition, physical activity. It was Published by CDC Centers for Disease Control and Prevention
I chose this dataset because obesity is an issue in my immediate and stand family. I believe if a youth is educated in the importance of eating health and exercise it is possible to break the circle of obesity. As result, the future will be people with more dispensation to work with less health issues.
Overweight and Obesity
Health Issues
It was collected by Youth Risk Behavior Surveillance System YRBSS monitors six types of health-risk behaviors that contribute to the leading causes of death and disability among youth and adults.
It is the period that the search was done.
It is the states and their abbreviation “LocationAbbr”
It is Percent of students in grades 9-12 who achieve 1 hour or are obesy or eat vegetables or drink sugar or watch TV
It is the number of participants or observations included in a study
It is a number, whose value is determined by the data, which is less than a certain parameter with a given degree of confidence.
It is a higher confidence level means a higher percentage of all samples produce a statistic close to the true value of the parameter. It is a small margin of error that allow people to get closer to the true value of the parameter.
Geo-location is a general term that encompasses all techniques to identify a user’s location.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.1.8
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
setwd("C:/Users/aline/Downloads/New folder Project 2 NEW")
youthobesity <- read_csv("Nutrition__Physical_Activity__and_Obesity_-_Youth_Risk_Behavior_Surveillance_System.csv")
## Rows: 40096 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (22): LocationAbbr, LocationDesc, Datasource, Class, Topic, Question, Da...
## dbl (8): YearStart, YearEnd, Data_Value, Data_Value_Alt, Low_Confidence_Lim...
## lgl (1): Data_Value_Unit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(youthobesity)
## YearStart YearEnd LocationAbbr LocationDesc
## Min. :2001 Min. :2001 Length:40096 Length:40096
## 1st Qu.:2007 1st Qu.:2007 Class :character Class :character
## Median :2011 Median :2011 Mode :character Mode :character
## Mean :2011 Mean :2011
## 3rd Qu.:2015 3rd Qu.:2015
## Max. :2019 Max. :2019
##
## Datasource Class Topic Question
## Length:40096 Length:40096 Length:40096 Length:40096
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Data_Value_Unit Data_Value_Type Data_Value Data_Value_Alt
## Mode:logical Length:40096 Min. : 0.00 Min. : 0.00
## NA's:40096 Class :character 1st Qu.:15.60 1st Qu.:15.60
## Mode :character Median :23.70 Median :23.70
## Mean :26.31 Mean :26.31
## 3rd Qu.:36.60 3rd Qu.:36.60
## Max. :81.90 Max. :81.90
## NA's :10636 NA's :10636
## Data_Value_Footnote_Symbol Data_Value_Footnote Low_Confidence_Limit
## Length:40096 Length:40096 Min. : 0.00
## Class :character Class :character 1st Qu.:12.20
## Mode :character Mode :character Median :19.00
## Mean :21.85
## 3rd Qu.:31.20
## Max. :75.80
## NA's :10636
## High_Confidence_Limit Sample_Size Total Gender
## Min. : 0.00 Min. : 100.0 Length:40096 Length:40096
## 1st Qu.:19.80 1st Qu.: 371.8 Class :character Class :character
## Median :29.40 Median : 710.0 Mode :character Mode :character
## Mean :31.53 Mean : 1333.2
## 3rd Qu.:42.20 3rd Qu.: 1394.0
## Max. :89.60 Max. :53042.0
## NA's :10636 NA's :10636
## Grade Race/Ethnicity GeoLocation ClassID
## Length:40096 Length:40096 Length:40096 Length:40096
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## TopicID QuestionID DataValueTypeID LocationID
## Length:40096 Length:40096 Length:40096 Min. : 1.00
## Class :character Class :character Class :character 1st Qu.:16.00
## Mode :character Mode :character Mode :character Median :30.00
## Mean :30.22
## 3rd Qu.:45.00
## Max. :78.00
##
## StratificationCategory1 Stratification1 StratificationCategoryId1
## Length:40096 Length:40096 Length:40096
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## StratificationID1
## Length:40096
## Class :character
## Mode :character
##
##
##
##
Checking the column and rows
glimpse(youthobesity)
## Rows: 40,096
## Columns: 31
## $ YearStart <dbl> 2019, 2019, 2019, 2019, 2019, 2011, 2019, 2…
## $ YearEnd <dbl> 2019, 2019, 2019, 2019, 2019, 2011, 2019, 2…
## $ LocationAbbr <chr> "AK", "AZ", "DC", "IL", "MD", "RI", "MT", "…
## $ LocationDesc <chr> "Alaska", "Arizona", "District of Columbia"…
## $ Datasource <chr> "Youth Risk Behavior Surveillance System", …
## $ Class <chr> "Physical Activity", "Obesity / Weight Stat…
## $ Topic <chr> "Physical Activity - Behavior", "Obesity / …
## $ Question <chr> "Percent of students in grades 9-12 who ach…
## $ Data_Value_Unit <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Type <chr> "Value", "Value", "Value", "Value", "Value"…
## $ Data_Value <dbl> 9.6, 13.3, 17.1, 65.9, 16.7, 11.5, 12.2, 10…
## $ Data_Value_Alt <dbl> 9.6, 13.3, 17.1, 65.9, 16.7, 11.5, 12.2, 10…
## $ Data_Value_Footnote_Symbol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Footnote <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Low_Confidence_Limit <dbl> 5.5, 10.7, 16.2, 49.8, 14.5, 8.9, 10.0, 7.7…
## $ High_Confidence_Limit <dbl> 16.5, 16.5, 17.9, 79.1, 19.2, 14.8, 14.9, 1…
## $ Sample_Size <dbl> 130, 1739, 8978, 876, 2573, 1199, 1045, 649…
## $ Total <chr> NA, "Total", "Total", NA, NA, NA, NA, NA, N…
## $ Gender <chr> NA, NA, NA, NA, NA, NA, NA, "Female", NA, N…
## $ Grade <chr> NA, NA, NA, "10th", NA, "9th", "10th", NA, …
## $ `Race/Ethnicity` <chr> "Asian", NA, NA, NA, "2 or more races", NA,…
## $ GeoLocation <chr> "(64.845079957001, -147.722059036)", "(34.8…
## $ ClassID <chr> "PA", "OWS", "OWS", "PA", "OWS", "OWS", "OW…
## $ TopicID <chr> "PA1", "OWS1", "OWS1", "PA1", "OWS1", "OWS1…
## $ QuestionID <chr> "Q048", "Q038", "Q038", "Q049", "Q039", "Q0…
## $ DataValueTypeID <chr> "VALUE", "VALUE", "VALUE", "VALUE", "VALUE"…
## $ LocationID <dbl> 2, 4, 11, 17, 24, 44, 30, 31, 35, 36, 37, 3…
## $ StratificationCategory1 <chr> "Race/Ethnicity", "Total", "Total", "Grade"…
## $ Stratification1 <chr> "Asian", "Total", "Total", "10th", "2 or mo…
## $ StratificationCategoryId1 <chr> "RACE", "OVR", "OVR", "GRADE", "RACE", "GRA…
## $ StratificationID1 <chr> "RACEASN", "OVERALL", "OVERALL", "GRADE10",…
obesity <- youthobesity %>%
filter(Class == "Obesity / Weight Status")
obesity2 <- obesity %>%
filter(YearEnd == "2019")
obesity3 <- obesity2 %>%
filter(Sample_Size > 1000)
obesity4 <- obesity3 %>%
filter(LocationDesc %in% c("Maryland", "Distric of Columbia"))
obesity5 <- obesity4 %>%
filter(Grade %in% c("9th", "10th", "11th", "12th"))
library(devtools)
## Loading required package: usethis
## Warning: package 'usethis' was built under R version 4.2.3
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.2.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
p1 <- obesity3 %>%
#establish plot and aestetics
hchart('scatter', hcaes(x = Grade, y = Sample_Size, group = LocationDesc)) %>%
#establish color
hc_colorAxis() %>%
#establish font for chart
hc_chart(style = list(fontFamily = "NewCenturySchoolbook",
fontWeight = "bold")) %>%
#establish title on x and y axis
hc_xAxis(title = list(text="Grade")) %>%
hc_yAxis(title = list(text="Sample Size"))%>%
#establish title and subtitle
hc_title( text = "Obesity / Weight Status ") %>%
hc_subtitle(text = "at the Schools in USA") %>%
#establish theme
hc_add_theme(hc_theme_gridlight()) %>%
hc_tooltip(shared = TRUE)
p1
p2 <- obesity5 %>%
#establish plot and aestetics
hchart('scatter', hcaes(x = High_Confidence_Limit, y = Sample_Size, group = Grade)) %>%
#establish color
hc_colorAxis() %>%
#establish font for chart
hc_chart(style = list(fontFamily = "NewCenturySchoolbook",
fontWeight = "bold")) %>%
#establish title on x and y axis
hc_xAxis(title = list(text="High Confidence Limite")) %>%
hc_yAxis(title = list(text="Sample Size"))%>%
#establish title and subtitle
hc_title( text = "Obesity / Weight Status") %>%
hc_subtitle(text = "in the Schools in Maryland") %>%
#establish theme
hc_add_theme(hc_theme_gridlight()) %>%
hc_tooltip(shared = TRUE)
p2
What the visualization represents, any interesting patterns or surprises that arise within the visualization, and anything that could have been shown that you could not get to work or that you wished you could have included.
The visualization P2 represents the samples sizes collected in the grades 9th, 10th, 11th , and 12th plus the high confident limit of each grade relate to obesity in the state of Maryland. Analyzing the visualization, the 11th grades the higher sample sizes and the high confidence limit it was above ten, so I am wondering if that try to show that the index of obesity is higher in the 11 grades. I could not get to work with GIS. I would like to try to make an USA map with some states showing their level of obesity or activity for each grade.