Most people are struggling to lose weight. Others are struggling to gain weight. Among them there are also those who find it difficult to monitor and control their eating habits and other external factors that can help them.
I used obesity level category data for people from Mexico, Peru, and Columbia. Data were collected through an online survey containing eating habits and family history as internal factors and physical activity as external factors. I use this data to build software tools that will help people who want their weight to be ideal, especially for those who are obese.
obesity <- read.csv("Obesity.csv")Data Inspection
head(obesity)Data Cleansing and Coertions
Data Type Check
str(obesity)#> 'data.frame': 2111 obs. of 17 variables:
#> $ Gender : chr "Female" "Female" "Male" "Male" ...
#> $ Age : num 21 21 23 27 22 29 23 22 24 22 ...
#> $ Height : num 1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
#> $ Weight : num 64 56 77 87 89.8 53 55 53 64 68 ...
#> $ family_history_with_overweight: chr "yes" "yes" "yes" "no" ...
#> $ FAVC : chr "no" "no" "no" "no" ...
#> $ FCVC : num 2 3 2 3 2 2 3 2 3 2 ...
#> $ NCP : num 3 3 3 3 1 3 3 3 3 3 ...
#> $ CAEC : chr "Sometimes" "Sometimes" "Sometimes" "Sometimes" ...
#> $ SMOKE : chr "no" "yes" "no" "no" ...
#> $ CH2O : num 2 3 2 2 2 2 2 2 2 2 ...
#> $ SCC : chr "no" "yes" "no" "no" ...
#> $ FAF : num 0 3 2 2 0 0 1 3 1 1 ...
#> $ TUE : num 1 0 1 0 0 0 0 0 1 1 ...
#> $ CALC : chr "no" "Sometimes" "Frequently" "Frequently" ...
#> $ MTRANS : chr "Public_Transportation" "Public_Transportation" "Public_Transportation" "Walking" ...
#> $ NObeyesdad : chr "Normal_Weight" "Normal_Weight" "Normal_Weight" "Overweight_Level_I" ...
From the data, there are several data types that must be changed.
obesity$Gender <- as.factor(obesity$Gender)
obesity$family_history_with_overweight <- as.factor(obesity$family_history_with_overweight)
obesity$FAVC <- as.factor(obesity$FAVC)
obesity$CAEC <- as.factor(obesity$CAEC)
obesity$SMOKE <- as.factor(obesity$SMOKE)
obesity$SCC <- as.factor(obesity$SCC)
obesity$CALC <- as.factor(obesity$CALC)
obesity$MTRANS <- as.factor(obesity$MTRANS)
obesity$NObeyesdad <- as.factor(obesity$NObeyesdad)
str(obesity)#> 'data.frame': 2111 obs. of 17 variables:
#> $ Gender : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 2 1 2 2 2 ...
#> $ Age : num 21 21 23 27 22 29 23 22 24 22 ...
#> $ Height : num 1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
#> $ Weight : num 64 56 77 87 89.8 53 55 53 64 68 ...
#> $ family_history_with_overweight: Factor w/ 2 levels "no","yes": 2 2 2 1 1 1 2 1 2 2 ...
#> $ FAVC : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 2 1 2 2 ...
#> $ FCVC : num 2 3 2 3 2 2 3 2 3 2 ...
#> $ NCP : num 3 3 3 3 1 3 3 3 3 3 ...
#> $ CAEC : Factor w/ 4 levels "Always","Frequently",..: 4 4 4 4 4 4 4 4 4 4 ...
#> $ SMOKE : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
#> $ CH2O : num 2 3 2 2 2 2 2 2 2 2 ...
#> $ SCC : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
#> $ FAF : num 0 3 2 2 0 0 1 3 1 1 ...
#> $ TUE : num 1 0 1 0 0 0 0 0 1 1 ...
#> $ CALC : Factor w/ 4 levels "Always","Frequently",..: 3 4 2 2 4 4 4 4 2 3 ...
#> $ MTRANS : Factor w/ 5 levels "Automobile","Bike",..: 4 4 4 5 4 1 3 4 4 4 ...
#> $ NObeyesdad : Factor w/ 7 levels "Insufficient_Weight",..: 2 2 2 6 7 2 2 2 2 2 ...
Missing Value Check
colSums(is.na(obesity))#> Gender Age
#> 0 0
#> Height Weight
#> 0 0
#> family_history_with_overweight FAVC
#> 0 0
#> FCVC NCP
#> 0 0
#> CAEC SMOKE
#> 0 0
#> CH2O SCC
#> 0 0
#> FAF TUE
#> 0 0
#> CALC MTRANS
#> 0 0
#> NObeyesdad
#> 0
summary(obesity)#> Gender Age Height Weight
#> Female:1043 Min. :14.00 Min. :1.450 Min. : 39.00
#> Male :1068 1st Qu.:19.95 1st Qu.:1.630 1st Qu.: 65.47
#> Median :22.78 Median :1.700 Median : 83.00
#> Mean :24.31 Mean :1.702 Mean : 86.59
#> 3rd Qu.:26.00 3rd Qu.:1.768 3rd Qu.:107.43
#> Max. :61.00 Max. :1.980 Max. :173.00
#>
#> family_history_with_overweight FAVC FCVC NCP
#> no : 385 no : 245 Min. :1.000 Min. :1.000
#> yes:1726 yes:1866 1st Qu.:2.000 1st Qu.:2.659
#> Median :2.386 Median :3.000
#> Mean :2.419 Mean :2.686
#> 3rd Qu.:3.000 3rd Qu.:3.000
#> Max. :3.000 Max. :4.000
#>
#> CAEC SMOKE CH2O SCC FAF
#> Always : 53 no :2067 Min. :1.000 no :2015 Min. :0.0000
#> Frequently: 242 yes: 44 1st Qu.:1.585 yes: 96 1st Qu.:0.1245
#> no : 51 Median :2.000 Median :1.0000
#> Sometimes :1765 Mean :2.008 Mean :1.0103
#> 3rd Qu.:2.477 3rd Qu.:1.6667
#> Max. :3.000 Max. :3.0000
#>
#> TUE CALC MTRANS
#> Min. :0.0000 Always : 1 Automobile : 457
#> 1st Qu.:0.0000 Frequently: 70 Bike : 7
#> Median :0.6253 no : 639 Motorbike : 11
#> Mean :0.6579 Sometimes :1401 Public_Transportation:1580
#> 3rd Qu.:1.0000 Walking : 56
#> Max. :2.0000
#>
#> NObeyesdad
#> Insufficient_Weight:272
#> Normal_Weight :287
#> Obesity_Type_I :351
#> Obesity_Type_II :297
#> Obesity_Type_III :324
#> Overweight_Level_I :290
#> Overweight_Level_II:290
Covariance and Correlation of The Data
# Age ~ Weight
cov(obesity$Age, obesity$Weight)#> [1] 33.66718
cor(obesity$Age, obesity$Weight)#> [1] 0.2025601
Age has a weak positive correlation with body weight, where the older you get, the more weight you gain.
# Frequency of Consumption of Vegetables (FCVC) ~ Weight
cov(obesity$FCVC, obesity$Weight)#> [1] 3.022323
cor(obesity$FCVC, obesity$Weight)#> [1] 0.2161247
Frequency of Consumption of Vegetables (FCVC) has a weak positive correlation with body weight, where the more often you eat vegetables, the more you gain weight.
# Number of Main Meals (NCP) ~ Weight
cov(obesity$NCP, obesity$Weight)#> [1] 2.189976
cor(obesity$NCP, obesity$Weight)#> [1] 0.107469
Number of Main Meals (NCP) has a weak positive correlation with body weight, where the more the number of meals, the more weight gain.
# Consumption of Water Daily (CH2O) ~ Weight
cov(obesity$CH2O, obesity$Weight)#> [1] 3.220031
cor(obesity$CH2O, obesity$Weight)#> [1] 0.2005754
Consumption of Water Daily (CH2O) has a weak positive correlation with body weight, where the more volume of water you drink, the more weight you gain.
# Physical Activity Frequency (FAF) ~ Weight
cov(obesity$FAF, obesity$Weight)#> [1] -1.145898
cor(obesity$FAF, obesity$Weight)#> [1] -0.05143627
Physical Activity Frequency (FAF) has a weak negative correlation with body weight, where the more often you do physical activity, the body weight decreases.
Data Manipulation and Transformation
Classification of Respondents Based on Obesity Levels and Gender
# Female Respondents
table(obesity$NObeyesdad[obesity$Gender == 'Female'])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 173 141 156 2
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 323 145 103
# Male Respondents
table(obesity$NObeyesdad[obesity$Gender == 'Male'])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 99 146 195 295
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 1 145 187
A total of 323 female respondents suffered from obesity type III, 156 female respondents suffered from obesity type I, and 2 female respondents suffered from obesity type II. Meanwhile, 295 male respondents suffer from obesity type II, 195 male respondents suffer from obesity type I, and 1 male respondent suffers from obesity type III.
Classification of Respondents Who Have a Family History with Overweight Based on Obesity Levels
table(obesity$NObeyesdad[obesity$family_history_with_overweight == 'yes'])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 126 155 344 296
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 324 209 272
Of the 1,726 respondents who have a family history of being overweight, 481 of them are overweight, 344 respondents suffer from obesity type I, 296 respondents suffer from obesity type II, and 324 respondents suffer from obesity type III.
Classification of Respondents Based on Obesity Levels and Frequency of Consumption of High Caloric Food
table(obesity$NObeyesdad[obesity$FAVC == 'yes'])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 221 208 340 290
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 323 268 216
Of the 1,886 respondents who eat high-calorie foods frequently, 484 of them are overweight, 340 respondents suffer from obesity type I, 290 respondents suffer from obesity type II, and 323 respondents suffer from obesity type III.
Classification of Respondents Based on Obesity Levels and Frequency of Consumption of Vegetables
table(obesity$NObeyesdad[obesity$FCVC < 3])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 189 173 324 272
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 0 250 251
From the data of respondents who never or only occasionally eat vegetables in their meals, it is known that 501 respondents are overweight, 324 respondents suffer from type I obesity, and 272 respondents suffer from type II obesity.
Classification of Respondents Based on Obesity Levels and Number of Main Meals
table(obesity$NObeyesdad[obesity$NCP >= 3])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 204 235 180 186
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 324 164 138
From the data of respondents who have main meals equal to or more than 3, it is known that 302 respondents are overweight, 180 respondents suffer from obesity type I, 186 respondents suffer from obesity type II, and 324 respondents suffer from obesity type 3.
Classification of Respondents Based on Obesity Levels and Consumption of Food Between Meals
table(obesity$NObeyesdad[obesity$CAEC != "no"])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 269 277 350 296
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 324 255 289
From the data of respondents who eat any food between meals, it is known that 544 respondents are overweight, 350 respondents suffer from type I obesity, 296 respondents suffer from type II obesity, and 324 respondents suffer from type 3 obesity.
Classification of Respondents Based on Obesity Levels and Smoking
table(obesity$NObeyesdad[obesity$SMOKE == 'yes'])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 1 13 6 15
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 1 3 5
From these data it was found that most smokers are overweight and obese sufferers. Smokers who are overweight and obese have a greater risk of suffering cancer, heart disease, and diabetes.
Classification of Respondents Based on Obesity Levels and Consumption of Water Daily
table(obesity$NObeyesdad[obesity$CH2O < 2])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 128 83 119 128
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 110 96 105
Drinking water can reduce hunger. From the data above, it was found that most of the respondents who drank less than 1 liter of water were overweight and obese sufferers.
Classification of Respondents Based on Obesity Levels and Consumption of Alcohol
table(obesity$NObeyesdad[obesity$CALC != "no"])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 155 180 186 226
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 323 240 162
From the data on respondents who drink alcohol, it is known that 402 respondents are overweight, 186 respondents suffer from type I obesity, 226 respondents suffer from type II obesity, and 323 respondents suffer from type 3 obesity.
Classification of Respondents Based on Obesity Levels and Calories Consumption Monitoring
table(obesity$NObeyesdad[obesity$SCC == 'no'])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 250 257 349 296
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 324 253 286
table(obesity$NObeyesdad[obesity$SCC == 'yes'])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 22 30 2 1
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 0 37 4
Most of the respondents who are overweight and obese do not monitor the calories they eat daily.
Classification of Respondents Based on Obesity Levels and Physical Activity Frequency
table(obesity$NObeyesdad[obesity$FAF <= 1])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 115 177 215 151
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 207 185 195
From the data of respondents who have less than 2 days of physical activity, it is known that 380 respondents are overweight, 215 respondents suffer from obesity type I, 151 respondents suffer from obesity type II, and 207 respondents suffer from obesity type 3.
Classification of Respondents Based on Obesity Levels and Transportasy Used
table(obesity$NObeyesdad[obesity$MTRANS == "Public_Transportation"])#>
#> Insufficient_Weight Normal_Weight Obesity_Type_I Obesity_Type_II
#> 220 200 236 200
#> Obesity_Type_III Overweight_Level_I Overweight_Level_II
#> 323 212 189
From the data of respondents who usually use public transportation, it is known that 401 respondents are overweight, 236 respondents suffer from type I obesity, 200 respondents suffer from type II obesity, and 323 respondents suffer from type 3 obesity.
Men and women are equally likely to suffer from obesity. Most of the respondents who are overweight and obese have the following characteristics:
Based on the conclusions above, I recommend building software with the following features:
Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344.