Introduction

Most people are struggling to lose weight. Others are struggling to gain weight. Among them there are also those who find it difficult to monitor and control their eating habits and other external factors that can help them.

I used obesity level category data for people from Mexico, Peru, and Columbia. Data were collected through an online survey containing eating habits and family history as internal factors and physical activity as external factors. I use this data to build software tools that will help people who want their weight to be ideal, especially for those who are obese.

Import Data

obesity <- read.csv("Obesity.csv")

Data Inspection

head(obesity)

Data Cleansing and Coertions

Data Type Check

str(obesity)
#> 'data.frame':    2111 obs. of  17 variables:
#>  $ Gender                        : chr  "Female" "Female" "Male" "Male" ...
#>  $ Age                           : num  21 21 23 27 22 29 23 22 24 22 ...
#>  $ Height                        : num  1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
#>  $ Weight                        : num  64 56 77 87 89.8 53 55 53 64 68 ...
#>  $ family_history_with_overweight: chr  "yes" "yes" "yes" "no" ...
#>  $ FAVC                          : chr  "no" "no" "no" "no" ...
#>  $ FCVC                          : num  2 3 2 3 2 2 3 2 3 2 ...
#>  $ NCP                           : num  3 3 3 3 1 3 3 3 3 3 ...
#>  $ CAEC                          : chr  "Sometimes" "Sometimes" "Sometimes" "Sometimes" ...
#>  $ SMOKE                         : chr  "no" "yes" "no" "no" ...
#>  $ CH2O                          : num  2 3 2 2 2 2 2 2 2 2 ...
#>  $ SCC                           : chr  "no" "yes" "no" "no" ...
#>  $ FAF                           : num  0 3 2 2 0 0 1 3 1 1 ...
#>  $ TUE                           : num  1 0 1 0 0 0 0 0 1 1 ...
#>  $ CALC                          : chr  "no" "Sometimes" "Frequently" "Frequently" ...
#>  $ MTRANS                        : chr  "Public_Transportation" "Public_Transportation" "Public_Transportation" "Walking" ...
#>  $ NObeyesdad                    : chr  "Normal_Weight" "Normal_Weight" "Normal_Weight" "Overweight_Level_I" ...

From the data, there are several data types that must be changed.

obesity$Gender <- as.factor(obesity$Gender)
obesity$family_history_with_overweight <- as.factor(obesity$family_history_with_overweight)
obesity$FAVC <- as.factor(obesity$FAVC)
obesity$CAEC <- as.factor(obesity$CAEC)
obesity$SMOKE <- as.factor(obesity$SMOKE)
obesity$SCC <- as.factor(obesity$SCC)
obesity$CALC <- as.factor(obesity$CALC)
obesity$MTRANS <- as.factor(obesity$MTRANS)
obesity$NObeyesdad <- as.factor(obesity$NObeyesdad)

str(obesity)
#> 'data.frame':    2111 obs. of  17 variables:
#>  $ Gender                        : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 2 1 2 2 2 ...
#>  $ Age                           : num  21 21 23 27 22 29 23 22 24 22 ...
#>  $ Height                        : num  1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
#>  $ Weight                        : num  64 56 77 87 89.8 53 55 53 64 68 ...
#>  $ family_history_with_overweight: Factor w/ 2 levels "no","yes": 2 2 2 1 1 1 2 1 2 2 ...
#>  $ FAVC                          : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 2 1 2 2 ...
#>  $ FCVC                          : num  2 3 2 3 2 2 3 2 3 2 ...
#>  $ NCP                           : num  3 3 3 3 1 3 3 3 3 3 ...
#>  $ CAEC                          : Factor w/ 4 levels "Always","Frequently",..: 4 4 4 4 4 4 4 4 4 4 ...
#>  $ SMOKE                         : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
#>  $ CH2O                          : num  2 3 2 2 2 2 2 2 2 2 ...
#>  $ SCC                           : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
#>  $ FAF                           : num  0 3 2 2 0 0 1 3 1 1 ...
#>  $ TUE                           : num  1 0 1 0 0 0 0 0 1 1 ...
#>  $ CALC                          : Factor w/ 4 levels "Always","Frequently",..: 3 4 2 2 4 4 4 4 2 3 ...
#>  $ MTRANS                        : Factor w/ 5 levels "Automobile","Bike",..: 4 4 4 5 4 1 3 4 4 4 ...
#>  $ NObeyesdad                    : Factor w/ 7 levels "Insufficient_Weight",..: 2 2 2 6 7 2 2 2 2 2 ...

Missing Value Check

colSums(is.na(obesity))
#>                         Gender                            Age 
#>                              0                              0 
#>                         Height                         Weight 
#>                              0                              0 
#> family_history_with_overweight                           FAVC 
#>                              0                              0 
#>                           FCVC                            NCP 
#>                              0                              0 
#>                           CAEC                          SMOKE 
#>                              0                              0 
#>                           CH2O                            SCC 
#>                              0                              0 
#>                            FAF                            TUE 
#>                              0                              0 
#>                           CALC                         MTRANS 
#>                              0                              0 
#>                     NObeyesdad 
#>                              0

Data Explanation

summary(obesity)
#>     Gender          Age            Height          Weight      
#>  Female:1043   Min.   :14.00   Min.   :1.450   Min.   : 39.00  
#>  Male  :1068   1st Qu.:19.95   1st Qu.:1.630   1st Qu.: 65.47  
#>                Median :22.78   Median :1.700   Median : 83.00  
#>                Mean   :24.31   Mean   :1.702   Mean   : 86.59  
#>                3rd Qu.:26.00   3rd Qu.:1.768   3rd Qu.:107.43  
#>                Max.   :61.00   Max.   :1.980   Max.   :173.00  
#>                                                                
#>  family_history_with_overweight  FAVC           FCVC            NCP       
#>  no : 385                       no : 245   Min.   :1.000   Min.   :1.000  
#>  yes:1726                       yes:1866   1st Qu.:2.000   1st Qu.:2.659  
#>                                            Median :2.386   Median :3.000  
#>                                            Mean   :2.419   Mean   :2.686  
#>                                            3rd Qu.:3.000   3rd Qu.:3.000  
#>                                            Max.   :3.000   Max.   :4.000  
#>                                                                           
#>          CAEC      SMOKE           CH2O        SCC            FAF        
#>  Always    :  53   no :2067   Min.   :1.000   no :2015   Min.   :0.0000  
#>  Frequently: 242   yes:  44   1st Qu.:1.585   yes:  96   1st Qu.:0.1245  
#>  no        :  51              Median :2.000              Median :1.0000  
#>  Sometimes :1765              Mean   :2.008              Mean   :1.0103  
#>                               3rd Qu.:2.477              3rd Qu.:1.6667  
#>                               Max.   :3.000              Max.   :3.0000  
#>                                                                          
#>       TUE                 CALC                        MTRANS    
#>  Min.   :0.0000   Always    :   1   Automobile           : 457  
#>  1st Qu.:0.0000   Frequently:  70   Bike                 :   7  
#>  Median :0.6253   no        : 639   Motorbike            :  11  
#>  Mean   :0.6579   Sometimes :1401   Public_Transportation:1580  
#>  3rd Qu.:1.0000                     Walking              :  56  
#>  Max.   :2.0000                                                 
#>                                                                 
#>                NObeyesdad 
#>  Insufficient_Weight:272  
#>  Normal_Weight      :287  
#>  Obesity_Type_I     :351  
#>  Obesity_Type_II    :297  
#>  Obesity_Type_III   :324  
#>  Overweight_Level_I :290  
#>  Overweight_Level_II:290
  1. This data was obtained from 1043 female respondents and 1068 male respondents with an age range of 14 to 61 years.
  2. A total of 1,726 respondents had a family history of being overweight.
  3. A total of 1,866 respondents frequently consume high-calorie foods.
  4. The median value of the FCVC (Frequency of Consumption of Vegetables) data is 2.4, which means that respondents sometimes eat vegetables in their meals.
  5. The median value of the NCP (Number of Main Meals) data is 3.0 , that is, the respondents have three main meals daily.
  6. A total of 1,765 respondents sometimes eat any food between meals.
  7. A total of 2,067 respondents do not smoke. 8.The median value of the CH20 (Consumption of Water Daily) data is 2.0, that is, the respondents drink 1-2 L of water daily.
  8. A total of 2,015 respondents did not monitor the calories they eat daily.
  9. The median value of FAF (Physical Activity Frequency) is 1.0, that is, the respondents had 1 or 2 days of physical activity.
  10. The median value of TUE (Time Using Technology Devices) is 0.62, which means that respondents spend 2-3 hours using technology devices.
  11. A total of 1,401 respondents sometimes drink alcohol.
  12. A total of 1,580 respondents usually use public transportation.
  13. The three categories of obesity levels that most respondents have are obesity I, obesity III, and obesity II.

Data Exploratory

Covariance and Correlation of The Data

# Age ~ Weight
cov(obesity$Age, obesity$Weight)
#> [1] 33.66718
cor(obesity$Age, obesity$Weight)
#> [1] 0.2025601

Age has a weak positive correlation with body weight, where the older you get, the more weight you gain.

# Frequency of Consumption of Vegetables (FCVC) ~ Weight
cov(obesity$FCVC, obesity$Weight)
#> [1] 3.022323
cor(obesity$FCVC, obesity$Weight)
#> [1] 0.2161247

Frequency of Consumption of Vegetables (FCVC) has a weak positive correlation with body weight, where the more often you eat vegetables, the more you gain weight.

# Number of Main Meals (NCP) ~ Weight
cov(obesity$NCP, obesity$Weight)
#> [1] 2.189976
cor(obesity$NCP, obesity$Weight)
#> [1] 0.107469

Number of Main Meals (NCP) has a weak positive correlation with body weight, where the more the number of meals, the more weight gain.

# Consumption of Water Daily (CH2O) ~ Weight
cov(obesity$CH2O, obesity$Weight)
#> [1] 3.220031
cor(obesity$CH2O, obesity$Weight)
#> [1] 0.2005754

Consumption of Water Daily (CH2O) has a weak positive correlation with body weight, where the more volume of water you drink, the more weight you gain.

# Physical Activity Frequency (FAF) ~ Weight
cov(obesity$FAF, obesity$Weight)
#> [1] -1.145898
cor(obesity$FAF, obesity$Weight)
#> [1] -0.05143627

Physical Activity Frequency (FAF) has a weak negative correlation with body weight, where the more often you do physical activity, the body weight decreases.

Data Manipulation and Transformation

Classification of Respondents Based on Obesity Levels and Gender

# Female Respondents
table(obesity$NObeyesdad[obesity$Gender == 'Female'])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 173                 141                 156                   2 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 323                 145                 103
# Male Respondents
table(obesity$NObeyesdad[obesity$Gender == 'Male'])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                  99                 146                 195                 295 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                   1                 145                 187

A total of 323 female respondents suffered from obesity type III, 156 female respondents suffered from obesity type I, and 2 female respondents suffered from obesity type II. Meanwhile, 295 male respondents suffer from obesity type II, 195 male respondents suffer from obesity type I, and 1 male respondent suffers from obesity type III.

Classification of Respondents Who Have a Family History with Overweight Based on Obesity Levels

table(obesity$NObeyesdad[obesity$family_history_with_overweight == 'yes'])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 126                 155                 344                 296 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 324                 209                 272

Of the 1,726 respondents who have a family history of being overweight, 481 of them are overweight, 344 respondents suffer from obesity type I, 296 respondents suffer from obesity type II, and 324 respondents suffer from obesity type III.

Classification of Respondents Based on Obesity Levels and Frequency of Consumption of High Caloric Food

table(obesity$NObeyesdad[obesity$FAVC == 'yes'])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 221                 208                 340                 290 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 323                 268                 216

Of the 1,886 respondents who eat high-calorie foods frequently, 484 of them are overweight, 340 respondents suffer from obesity type I, 290 respondents suffer from obesity type II, and 323 respondents suffer from obesity type III.

Classification of Respondents Based on Obesity Levels and Frequency of Consumption of Vegetables

table(obesity$NObeyesdad[obesity$FCVC < 3])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 189                 173                 324                 272 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                   0                 250                 251

From the data of respondents who never or only occasionally eat vegetables in their meals, it is known that 501 respondents are overweight, 324 respondents suffer from type I obesity, and 272 respondents suffer from type II obesity.

Classification of Respondents Based on Obesity Levels and Number of Main Meals

table(obesity$NObeyesdad[obesity$NCP >= 3])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 204                 235                 180                 186 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 324                 164                 138

From the data of respondents who have main meals equal to or more than 3, it is known that 302 respondents are overweight, 180 respondents suffer from obesity type I, 186 respondents suffer from obesity type II, and 324 respondents suffer from obesity type 3.

Classification of Respondents Based on Obesity Levels and Consumption of Food Between Meals

table(obesity$NObeyesdad[obesity$CAEC != "no"])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 269                 277                 350                 296 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 324                 255                 289

From the data of respondents who eat any food between meals, it is known that 544 respondents are overweight, 350 respondents suffer from type I obesity, 296 respondents suffer from type II obesity, and 324 respondents suffer from type 3 obesity.

Classification of Respondents Based on Obesity Levels and Smoking

table(obesity$NObeyesdad[obesity$SMOKE == 'yes'])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                   1                  13                   6                  15 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                   1                   3                   5

From these data it was found that most smokers are overweight and obese sufferers. Smokers who are overweight and obese have a greater risk of suffering cancer, heart disease, and diabetes.

Classification of Respondents Based on Obesity Levels and Consumption of Water Daily

table(obesity$NObeyesdad[obesity$CH2O < 2])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 128                  83                 119                 128 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 110                  96                 105

Drinking water can reduce hunger. From the data above, it was found that most of the respondents who drank less than 1 liter of water were overweight and obese sufferers.

Classification of Respondents Based on Obesity Levels and Consumption of Alcohol

table(obesity$NObeyesdad[obesity$CALC != "no"])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 155                 180                 186                 226 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 323                 240                 162

From the data on respondents who drink alcohol, it is known that 402 respondents are overweight, 186 respondents suffer from type I obesity, 226 respondents suffer from type II obesity, and 323 respondents suffer from type 3 obesity.

Classification of Respondents Based on Obesity Levels and Calories Consumption Monitoring

table(obesity$NObeyesdad[obesity$SCC == 'no'])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 250                 257                 349                 296 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 324                 253                 286
table(obesity$NObeyesdad[obesity$SCC == 'yes'])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                  22                  30                   2                   1 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                   0                  37                   4

Most of the respondents who are overweight and obese do not monitor the calories they eat daily.

Classification of Respondents Based on Obesity Levels and Physical Activity Frequency

table(obesity$NObeyesdad[obesity$FAF <= 1])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 115                 177                 215                 151 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 207                 185                 195

From the data of respondents who have less than 2 days of physical activity, it is known that 380 respondents are overweight, 215 respondents suffer from obesity type I, 151 respondents suffer from obesity type II, and 207 respondents suffer from obesity type 3.

Classification of Respondents Based on Obesity Levels and Transportasy Used

table(obesity$NObeyesdad[obesity$MTRANS == "Public_Transportation"])
#> 
#> Insufficient_Weight       Normal_Weight      Obesity_Type_I     Obesity_Type_II 
#>                 220                 200                 236                 200 
#>    Obesity_Type_III  Overweight_Level_I Overweight_Level_II 
#>                 323                 212                 189

From the data of respondents who usually use public transportation, it is known that 401 respondents are overweight, 236 respondents suffer from type I obesity, 200 respondents suffer from type II obesity, and 323 respondents suffer from type 3 obesity.

Conclusion

Men and women are equally likely to suffer from obesity. Most of the respondents who are overweight and obese have the following characteristics:

  1. Have a family with a history of being overweight
  2. Eat high-calorie foods frequently
  3. Never or only occasionally eat vegetables in their meals
  4. Have at least 3 main meals a day
  5. Eat any food between meals
  6. Have a habit of smoking and drinking alcohol
  7. Not drinking enough water
  8. Not monitoring calories eaten daily
  9. Rarely do physical activity
  10. Using public transportation

Business Recomendation

Based on the conclusions above, I recommend building software with the following features:

  1. Monitor the number of calories eaten daily
  2. Low-calorie healthy food recipes
  3. Ensure the amount of water users drink daily
  4. Information about health, for example facts about the dangers of smoking or drinking alcohol for users, especially people who are overweight and obese
  5. Records user’s physical activity, estimates total calories burned, and recommends other physical activities
  6. Recommend the nearest fitness center from the user’s location
  7. It would be better if users could interact with the software through sound and images because most users only use technology devices 2-3 hours a day

References

Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344.