This dataset1 contains data on nutrition status and food habits collected from 1265 students in rural and urban, private and government schools in Dehradun, Uttarakhand. The questionnaire used to collect the data covered an extensive range of questions, including the participant’s family type (nuclear, joint), their daily caloric intake (classified by breakfast, lunch and dinner), their physical activity, food habits, occupation of their parents, height, and weight.
knitr::opts_chunk$set(out.width='900px', dpi=200)
library(tidyverse)
library(knitr)
library(kableExtra)
library(DT)
library(dplyr)
library(viridis)
library(hrbrthemes)
library(ggThemeAssist)
library(ggridges)
library(forcats)
data = read.csv("https://raw.githubusercontent.com/thedivtagguy/blog/main/dataset.csv")
data$group_code <- gsub("UG", "Urban Government", data$group_code)
data$group_code <- gsub("RG", "Rural Government", data$group_code)
data$group_code <- gsub("RP", "Rural Private", data$group_code)
data$group_code <- gsub("UP", "Urban Private", data$group_code)
Let’s expand the table to see what is inside.
Variable_Explain <- c("Type of School (UG/UP/RP/RG)", "Class", "Gender", "Age", "Age Group (15-18, etc)", "Father's Education", "Father's Occupution", "Mother's Education", "Mother's Occupuation", "Vegetarian/Non-Vegetarian", "Do they have breakfast daily?", "Do they carry a school lunch?", "Do they have mid-day meal?", "Do they skip meals?", "How often do they eat out", "Do they eat in front of the TV?", "Calories for Breakfast", "Calories for Tiffin", "Calories for Lunch", "Calories for Evening", "Calories for Dinner", "Other Calories", "Calories for Snacks", "Total Calories", "Recommended Daily Allowance for calories", "Excess or Deficit of RDA", "Percentage of Excess or Deficit", "Frequency of having soft drinks", "Frequency of Having Junk Food", "Frequency of having sweets", "Frequency of having snacks", "Frequency of having noodles", "Frequency of having candies", "Fruit intake", "Distance to school", "How do they go to school?", "Does their school have a playground?", "Does their school give them sports materials?", "Does their school have rides?", "Do they play during recess?", "What do they do during recess?", "Do they play during the evening", "Do they watch TV?", "How long do they watch TV?", "Do they have a computer at home?", "Do they have a playground near their house?", "How do they see themselves?", "Height", "Weight", "BMI", "Z-score", "Nutrition category based on BMI")
library(DT)
cols <- colnames(data) %>% as.data.frame()
dataframe.variables <- data.frame(cols, Variable_Explain)
names(dataframe.variables)[names(dataframe.variables) == "."] <- "Variable Name"
names(dataframe.variables)[names(dataframe.variables) == "Variable_Explain"] <- "Variable Information"
DT::datatable(dataframe.variables)
The large amount of variables for each observation give us a chance to look into a lot of things, but for the purpose of this piece we will try to look at the following questions:
How many calories do students eat during the day and how does this vary with their age?
As students get older, how does their intake compare against the Recommended Daily Allowance?
Does the midday meal plan have anything to do with caloric intake through the ages?
What are their nutritional categories, and how do they vary with school?
data %>%
ggplot( aes(x=fct_reorder(group_code, total_calories, .desc = TRUE), y=total_calories, fill=group_code)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
theme_ipsum() +
theme(legend.position="none",plot.title = element_text(size=11)) +
xlab("Type of School") +
ylab("Calories")+ theme(axis.line = element_line(),
panel.border = element_blank(),
panel.background = element_blank(),
plot.background = element_rect(color = "#E5E5E5"),
axis.text.x = element_text(angle = 45, hjust = 1))
As the boxplot above shows, students from urban and rural private schools have a better daily caloric intake than those in government schools (irrespective of area).
How does this vary with the various ages, for each school type? For this, it makes sense to have some standard to compare it against. For this, we will use the RDA observations:
Recommended Dietary Allowances (RDAs) are the levels of intake of essential nutrients that, on the basis of scientific knowledge, are judged by the Food and Nutrition Board to be adequate to meet the known nutrient needs of practically all healthy persons.2
Alright! The percent_exc_def
observations for each participant tells us how much lesser or how much more are they consuming, compared against what is recommended for them at that age.
#Editing the dataset to something more meaningful. Mainly rounding off numbers.
data$age_years <- round(data$age_years)
data$age_years <- as.factor(data$age_years)
data$percent_exc_def <- round(data$percent_exc_def)
#New Dataframe with only these values
calories <- data %>% select(percent_exc_def, RDA, age_years, MDM, group_code)
calories %>%
#Reorder based on age
mutate(age_years = fct_reorder(age_years, percent_exc_def, .desc = TRUE)) %>%
ggplot( aes(y=age_years, x=percent_exc_def, fill=age_years)) +
geom_density_ridges(alpha=0.6) +
scale_fill_viridis(discrete=TRUE) +
scale_color_viridis(discrete=TRUE) +
theme_ipsum() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
labs(title= "Percentage of Excess or Deficit Caloric Intake", subtitle = "(by age) against the Recommended Daily Allowance")+
xlab("Percentage of Excess/Deficit") +
ylab("Age")
As students get older, they seem to be drifting further away from their recommended daily allowances and their caloric intakes are decreasing! What could be the possible reason for this pattern?
calories %>%
#Reorder based on age
mutate(age_years = fct_reorder(age_years, percent_exc_def, .desc = TRUE)) %>%
ggplot( aes(y=age_years, x=percent_exc_def, fill=MDM)) +
geom_density_ridges(alpha=0.6) +
#Facet Wrap by School Type
facet_wrap(~group_code) +
scale_fill_viridis(discrete=TRUE) +
scale_color_viridis(discrete=TRUE) +
theme_ipsum() +
theme(
panel.spacing = unit(2, "lines"),
strip.text.x = element_text(size = 8)) +
labs(title= "Percentage of Excess or Deficit Caloric Intake", subtitle = "(by age) against the Recommended Daily Allowance")+
xlab("Percentage of Excess/Deficit") +
ylab("Age")
Interesting! We can almost see the ages at which students stop receiving midday meals in government schools. This only exacerbates the shift away from their RDAs, which you can also compare with their private school counterparts. The difference is clear, there are more students in private schools, both rural and urban, whose intake is far lesser than what it should be at their age.
To see this, first, we need to count the number of students in each category:
#Count Number of Students in Each category
category <- data %>%
arrange(Nutrition_cat) %>%
group_by(Nutrition_cat, group_code) %>%
summarise(count = n())
Let’s see what this looks like:
And now let’s plot it:
ggplot(category, aes(fill=Nutrition_cat, y=count, x=group_code)) +
geom_bar(position="stack", stat="identity")+
scale_fill_viridis(discrete=TRUE) +
scale_color_viridis(discrete=TRUE) +
theme_ipsum() +
theme(
panel.spacing = unit(2, "lines"),
strip.text.x = element_text(size = 8)) +
labs(title= "Nutritional Category by School Type")+
xlab("School Type") +
ylab("Number of Students")
Students in Urban Private schools are more overweight, whereas those in Rural Government schools are more underweight. This makes sense, given that the kind of school is a good indicator of the student’s socio-economic status.
Madhavi Bhargava,Overweight and Obesity in School Children of a Hill State in North India, 2016↩︎