library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Obesity level dataset contains data for estimation of Obesity levels in individuals from Mexico, Peru and Columbia. The records are labeled with a class variable, NObesity (Obesity Level), which classifies individuals into categories: Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II, and Obesity Type III. This dataset comprises of 17 variables with 2111 obersvations. 23% of the data has been collected from users through an online survey and rest of the data has been synthetically generated through Weka tool and SMOTE filter. This dataset has no missing values.
Dataset and its breif documentation link and official documentation link.
What variables have stronger influence on Obesity levels determination?
How different variables impact on an individual’s obesity levels?
obdf <- read.csv("~/Downloads/ObesityDataSet_raw_and_data_sinthetic.csv", header=TRUE)
gas <-(obdf |> group_by(obdf[,c('Gender','NObeyesdad')])) |> summarize(Mean_BMI = median(Weight), count = n())
## `summarise()` has grouped output by 'Gender'. You can override using the
## `.groups` argument.
ggplot(gas, aes(x = NObeyesdad, y = count, fill = Gender)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = obdf[,c('NObeyesdad','FAF' )], aes(x = NObeyesdad, y = FAF ,fill = NObeyesdad )) + geom_boxplot() + theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Investigate correlations between variables and obesity levels(if possible between other varibales).
Test some the hypothesis below and try to draw some meanful conclusions from the dataset.
Higher vegetable consumption (FCVC) along with higher physical activity frequency (FAF) is associated with healthier obesity levels.
Does Age has a role in Obesity levels,example younger people are more normal weight category inclined than older people?
ggplot(obdf, aes(x = FCVC, y = FAF, color = NObeyesdad)) +
geom_point(alpha = 0.7) +
theme_minimal() +
scale_color_brewer(palette = "Set1") + facet_wrap(~ NObeyesdad)
ggplot(obdf, aes(x = NObeyesdad, y = Age)) +
geom_boxplot(fill = "lightblue") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))