Group Members of Group 7

  • Muhd Izani Zulkifli (22102674)
  • Nur Fatihah Atikah Mohd Rashdan (17100287/2)
  • Pavitra a/p Palanisamy (S2181697)
  • Shameen Izwan Anthonysamy (s2180659)

Title

Obesity Level Predictive Modeling

Details Dataset

Dataset: Estimitaion of obesity Level

Source: https://archive.ics.uci.edu/dataset/544/estimation+of+obesity+levels+based+on+eating+habits+and+physical+condition

This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. It consists of 17 attributes and 2111 records, the records are labeled with the class variable NObesity (Obesity Level), that allows classification of the data using the values of Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, 23% of the data was collected directly from users through a web platform.

Introduction

Obesity is a global health challenge with significant implications for individuals and society. As the prevalence of obesity continues to rise, understanding the factors contributing to obesity and developing effective predictive models are crucial for preventive healthcare interventions. Predictive modeling in the context of obesity aims to anticipate and identify individuals at risk, enabling timely interventions and personalized healthcare strategies.

Questions

  1. Will various external factors frequency contribute to the prediction of body mass index (BMI)?

  2. How can we predict the obesity level based on the various external factors contributing to body mass index (BMI)?

Objectives:

  1. Predicting the body mass index (BMI) based on various features? (Linear Regression).

  2. Classifying individuals into groups of Obesity (Classification).


Initialization

Import necessary libraries

library(dplyr)
library(plotly)
library(ggplot2)
library(tidyr)
library(ggcorrplot)
library(e1071) 
library(caTools)
library(tidyverse)
library(caret)

Data Ingestion

Load dataset of Obesity Level as dataframe

obesityDF <- read.csv("ObesityDataSet_raw_and_data_sinthetic.csv")

Data Understanding

Check variables that attribute to the dataset

head(obesityDF)
##   Gender Age Height Weight family_history_with_overweight FAVC FCVC NCP
## 1 Female  21   1.62   64.0                            yes   no    2   3
## 2 Female  21   1.52   56.0                            yes   no    3   3
## 3   Male  23   1.80   77.0                            yes   no    2   3
## 4   Male  27   1.80   87.0                             no   no    3   3
## 5   Male  22   1.78   89.8                             no   no    2   1
## 6   Male  29   1.62   53.0                             no  yes    2   3
##        CAEC SMOKE CH2O SCC FAF TUE       CALC                MTRANS
## 1 Sometimes    no    2  no   0   1         no Public_Transportation
## 2 Sometimes   yes    3 yes   3   0  Sometimes Public_Transportation
## 3 Sometimes    no    2  no   2   1 Frequently Public_Transportation
## 4 Sometimes    no    2  no   2   0 Frequently               Walking
## 5 Sometimes    no    2  no   0   0  Sometimes Public_Transportation
## 6 Sometimes    no    2  no   0   0  Sometimes            Automobile
##            NObeyesdad
## 1       Normal_Weight
## 2       Normal_Weight
## 3       Normal_Weight
## 4  Overweight_Level_I
## 5 Overweight_Level_II
## 6       Normal_Weight

Data Preprocessing

1. Check for any NA values

colSums(is.na(obesityDF)) 
##                         Gender                            Age 
##                              0                              0 
##                         Height                         Weight 
##                              0                              0 
## family_history_with_overweight                           FAVC 
##                              0                              0 
##                           FCVC                            NCP 
##                              0                              0 
##                           CAEC                          SMOKE 
##                              0                              0 
##                           CH2O                            SCC 
##                              0                              0 
##                            FAF                            TUE 
##                              0                              0 
##                           CALC                         MTRANS 
##                              0                              0 
##                     NObeyesdad 
##                              0

2. Do data cleaning process

  • Round off the values of age variable from numeric to integer
  • Load into new dataframe with age integer
obesityDF$Age <- round(obesityDF$Age)
age_obesity <- obesityDF
  • Check what the dataframe is about
#not necessary but if want to see what the changes of column name, can use this
str(age_obesity)
## 'data.frame':    2111 obs. of  17 variables:
##  $ Gender                        : chr  "Female" "Female" "Male" "Male" ...
##  $ Age                           : num  21 21 23 27 22 29 23 22 24 22 ...
##  $ Height                        : num  1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
##  $ Weight                        : num  64 56 77 87 89.8 53 55 53 64 68 ...
##  $ family_history_with_overweight: chr  "yes" "yes" "yes" "no" ...
##  $ FAVC                          : chr  "no" "no" "no" "no" ...
##  $ FCVC                          : num  2 3 2 3 2 2 3 2 3 2 ...
##  $ NCP                           : num  3 3 3 3 1 3 3 3 3 3 ...
##  $ CAEC                          : chr  "Sometimes" "Sometimes" "Sometimes" "Sometimes" ...
##  $ SMOKE                         : chr  "no" "yes" "no" "no" ...
##  $ CH2O                          : num  2 3 2 2 2 2 2 2 2 2 ...
##  $ SCC                           : chr  "no" "yes" "no" "no" ...
##  $ FAF                           : num  0 3 2 2 0 0 1 3 1 1 ...
##  $ TUE                           : num  1 0 1 0 0 0 0 0 1 1 ...
##  $ CALC                          : chr  "no" "Sometimes" "Frequently" "Frequently" ...
##  $ MTRANS                        : chr  "Public_Transportation" "Public_Transportation" "Public_Transportation" "Walking" ...
##  $ NObeyesdad                    : chr  "Normal_Weight" "Normal_Weight" "Normal_Weight" "Overweight_Level_I" ...
  • Add BMI column
  • By calculate using “BMI = weight (kg) ÷ height2 (meters)”
age_obesity$BMI = age_obesity$Weight/(age_obesity$Height^2)
str(age_obesity)
## 'data.frame':    2111 obs. of  18 variables:
##  $ Gender                        : chr  "Female" "Female" "Male" "Male" ...
##  $ Age                           : num  21 21 23 27 22 29 23 22 24 22 ...
##  $ Height                        : num  1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
##  $ Weight                        : num  64 56 77 87 89.8 53 55 53 64 68 ...
##  $ family_history_with_overweight: chr  "yes" "yes" "yes" "no" ...
##  $ FAVC                          : chr  "no" "no" "no" "no" ...
##  $ FCVC                          : num  2 3 2 3 2 2 3 2 3 2 ...
##  $ NCP                           : num  3 3 3 3 1 3 3 3 3 3 ...
##  $ CAEC                          : chr  "Sometimes" "Sometimes" "Sometimes" "Sometimes" ...
##  $ SMOKE                         : chr  "no" "yes" "no" "no" ...
##  $ CH2O                          : num  2 3 2 2 2 2 2 2 2 2 ...
##  $ SCC                           : chr  "no" "yes" "no" "no" ...
##  $ FAF                           : num  0 3 2 2 0 0 1 3 1 1 ...
##  $ TUE                           : num  1 0 1 0 0 0 0 0 1 1 ...
##  $ CALC                          : chr  "no" "Sometimes" "Frequently" "Frequently" ...
##  $ MTRANS                        : chr  "Public_Transportation" "Public_Transportation" "Public_Transportation" "Walking" ...
##  $ NObeyesdad                    : chr  "Normal_Weight" "Normal_Weight" "Normal_Weight" "Overweight_Level_I" ...
##  $ BMI                           : num  24.4 24.2 23.8 26.9 28.3 ...
  • Reorder columns to put BMI immediately after Height and Weight
obesity_new <- age_obesity[, c("Gender", "Age", "Height", "Weight", "BMI", "family_history_with_overweight", "FAVC", "FCVC", "NCP", "CAEC", "SMOKE", "CH2O", "SCC", "FAF", "TUE", "CALC", "MTRANS", "NObeyesdad")]
str(obesity_new)
## 'data.frame':    2111 obs. of  18 variables:
##  $ Gender                        : chr  "Female" "Female" "Male" "Male" ...
##  $ Age                           : num  21 21 23 27 22 29 23 22 24 22 ...
##  $ Height                        : num  1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
##  $ Weight                        : num  64 56 77 87 89.8 53 55 53 64 68 ...
##  $ BMI                           : num  24.4 24.2 23.8 26.9 28.3 ...
##  $ family_history_with_overweight: chr  "yes" "yes" "yes" "no" ...
##  $ FAVC                          : chr  "no" "no" "no" "no" ...
##  $ FCVC                          : num  2 3 2 3 2 2 3 2 3 2 ...
##  $ NCP                           : num  3 3 3 3 1 3 3 3 3 3 ...
##  $ CAEC                          : chr  "Sometimes" "Sometimes" "Sometimes" "Sometimes" ...
##  $ SMOKE                         : chr  "no" "yes" "no" "no" ...
##  $ CH2O                          : num  2 3 2 2 2 2 2 2 2 2 ...
##  $ SCC                           : chr  "no" "yes" "no" "no" ...
##  $ FAF                           : num  0 3 2 2 0 0 1 3 1 1 ...
##  $ TUE                           : num  1 0 1 0 0 0 0 0 1 1 ...
##  $ CALC                          : chr  "no" "Sometimes" "Frequently" "Frequently" ...
##  $ MTRANS                        : chr  "Public_Transportation" "Public_Transportation" "Public_Transportation" "Walking" ...
##  $ NObeyesdad                    : chr  "Normal_Weight" "Normal_Weight" "Normal_Weight" "Overweight_Level_I" ...
  • Rename the column name for better reading
names(obesity_new) <- c("Gender", 
                        "Age", 
                        "Height", 
                        "Weight", 
                        "BMI", 
                        "Family_History_with_Overweight",
                        "High_Caloric_Food_Consumption",
                        "Frequency_Consumption_of_Vegetables", 
                        "Number_of_Main_Meals",
                        "Consumption_of_Food_Between_Meals",
                        "Smoke",
                        "Consumption_of_Water_Daily",
                        "Calories_Consumption_Monitoring",
                        "Physical_Activity_Frequency",
                        "Time_Using_Technology",
                        "Consumption_of_Alcohol", 
                        "Transportation_Used", 
                        "Obesity")
  • Remove ’_’ underscore in values of Obesity column
obesity_new$Obesity <- gsub("_", " ", obesity_new$Obesity)
head(obesity_new)
##   Gender Age Height Weight      BMI Family_History_with_Overweight
## 1 Female  21   1.62   64.0 24.38653                            yes
## 2 Female  21   1.52   56.0 24.23823                            yes
## 3   Male  23   1.80   77.0 23.76543                            yes
## 4   Male  27   1.80   87.0 26.85185                             no
## 5   Male  22   1.78   89.8 28.34238                             no
## 6   Male  29   1.62   53.0 20.19509                             no
##   High_Caloric_Food_Consumption Frequency_Consumption_of_Vegetables
## 1                            no                                   2
## 2                            no                                   3
## 3                            no                                   2
## 4                            no                                   3
## 5                            no                                   2
## 6                           yes                                   2
##   Number_of_Main_Meals Consumption_of_Food_Between_Meals Smoke
## 1                    3                         Sometimes    no
## 2                    3                         Sometimes   yes
## 3                    3                         Sometimes    no
## 4                    3                         Sometimes    no
## 5                    1                         Sometimes    no
## 6                    3                         Sometimes    no
##   Consumption_of_Water_Daily Calories_Consumption_Monitoring
## 1                          2                              no
## 2                          3                             yes
## 3                          2                              no
## 4                          2                              no
## 5                          2                              no
## 6                          2                              no
##   Physical_Activity_Frequency Time_Using_Technology Consumption_of_Alcohol
## 1                           0                     1                     no
## 2                           3                     0              Sometimes
## 3                           2                     1             Frequently
## 4                           2                     0             Frequently
## 5                           0                     0              Sometimes
## 6                           0                     0              Sometimes
##     Transportation_Used             Obesity
## 1 Public_Transportation       Normal Weight
## 2 Public_Transportation       Normal Weight
## 3 Public_Transportation       Normal Weight
## 4               Walking  Overweight Level I
## 5 Public_Transportation Overweight Level II
## 6            Automobile       Normal Weight
  • Save as new csv file
write.csv(obesity_new, "obesity_new1.csv", row.names = FALSE)

Exploratory Data Analysis

1. Plot histogram BMI with gender differentiation

ggplot(obesity_new, aes(x = BMI, fill = Gender)) +
geom_histogram(position = "identity", alpha = 0.7, bins = 20) +
labs(title = "Histogram of BMI by Gender",
       x = "BMI",
       y = "Frequency") +
scale_fill_manual(values = c("Male" = "blue", "Female" = "pink")) +
theme_minimal()

2. Plot correlation between age and BMI

ggplot(obesity_new, aes(x = Age, y = BMI)) +
  geom_bar(stat = "identity", fill = "lightcoral", width = 0.7) +
  labs(title = "Relationship between Age and BMI",
       x = "Age",
       y = "BMI")

3. Alcohol Consumption, Family History with Overweight vs BMI

ggplot(obesity_new, aes(x = as.factor(obesity_new$Consumption_of_Alcohol), y = obesity_new$BMI, fill = obesity_new$Family_History_with_Overweight)) +
  geom_bar(stat = "summary", fun = "mean", position = "dodge", width = 0.5) +
  labs(title = "BMI by Alcohol Consumption and Family History", x = "Alcohol Consumption", y = "BMI") +
  scale_fill_manual(values = c("#eae2b7", "#fcbf49") ) +
  theme_minimal() +
  labs(title = "Relationship between BMI, Alcohol Consumption and Family History with Overweight",
       x = "Alcohol Consumption",
       y = "BMI",
       fill = 'Family History with Overweight')

4. Obesity vs Age

ggplot(obesity_new, aes(x=as.factor(obesity_new$Obesity), y=obesity_new$Age)) + 
  geom_boxplot(
    
    color="#003049",
    fill="#eae2b7",
    alpha=0.2,
    
    notch=TRUE,
    notchwidth = 0.8,
    
    outlier.colour="#d62828",
    outlier.fill="#d62828",
    outlier.size=3
  ) +
  theme_minimal() +
  labs(title = "Relationship between Obesity Level and Age", x = "Obesity Level", y = "Age")

5. Find the correlation between the numerical fields

numericFields <- dplyr::select_if(obesity_new, is.numeric)
r <- cor(numericFields, use="complete.obs")
ggcorrplot(r)

Modeling

1. BMI Prediction (Regression Model)

  • Define a Rsquared function and remove rows with Null values
df_reg <- obesity_new 
df_reg <- df_reg %>%
  mutate_if(is.character, as.factor) %>%
  na.omit()
set.seed(123)
  • Train, Test, Split
splitIndex <- createDataPartition(df_reg$BMI, p = 0.8, list = FALSE)
training_data <- df_reg[splitIndex, ]
testing_data <- df_reg[-splitIndex, ]
  • Create Linear Regression model and train based on Training Data
model <- lm(BMI ~ ., data = training_data)
  • Make predictions with created model using Testing Data
pred_reg <- round(as.numeric(predict(model, newdata = testing_data)),digits = 2)
pred_reg
##   [1] 23.93 23.80 22.96 27.60 22.13 27.84 27.13 22.62 21.45 19.78 27.90 21.55
##  [13] 22.13 20.70 36.08 28.95 17.69 22.39 21.66 28.41 22.24 26.42 22.80 22.78
##  [25] 21.40 25.77 23.91 23.48 26.48 16.80 26.36 26.41 21.93 26.08 21.16 21.66
##  [37] 26.50 22.87 23.26 25.49 38.13 30.44 31.75 35.53 28.12 21.92 38.84 21.29
##  [49] 20.82 23.40 25.56 20.06 21.43 22.30 21.85 23.52 23.24 19.68 19.68 20.98
##  [61] 38.06 17.21 22.44 24.15 14.10 17.04 29.79 22.65 20.05 21.08 25.27 22.90
##  [73] 28.99 28.73 18.43 22.53 28.19 22.01 32.06 17.71 16.89 24.80 37.19 25.09
##  [85] 25.98 21.24 20.92 22.99 25.32 21.01 33.90 20.55 21.42 31.77 26.35 19.18
##  [97] 17.96 41.33 49.67 39.60 17.24 17.98 15.85 19.20 19.02 15.96 17.53 17.99
## [109] 18.08 16.98 16.70 19.47 17.94 16.48 17.94 15.00 15.39 18.90 16.52 18.16
## [121] 17.77 17.97 17.60 16.91 17.54 17.84 19.25 15.83 16.56 18.26 18.97 18.64
## [133] 16.89 16.33 16.68 16.29 17.30 15.26 17.70 17.45 15.53 15.16 19.12 16.65
## [145] 17.67 18.01 17.86 17.77 19.05 26.44 26.81 26.29 26.04 25.80 26.48 24.83
## [157] 25.71 25.36 26.52 25.43 25.13 26.22 25.72 26.55 26.09 27.75 25.36 26.50
## [169] 26.50 25.95 26.49 24.88 24.75 26.86 26.19 27.00 26.17 25.31 24.75 26.07
## [181] 25.82 26.18 25.58 25.88 26.12 26.30 25.91 24.99 25.62 26.23 25.49 25.26
## [193] 26.09 25.41 28.18 28.71 27.99 28.19 27.86 29.05 26.77 28.28 28.79 29.63
## [205] 29.28 28.16 27.98 28.16 28.25 27.55 27.80 28.47 28.56 27.25 28.73 27.73
## [217] 27.71 27.98 28.73 27.10 28.84 28.61 27.88 28.24 28.78 26.76 27.50 28.61
## [229] 28.03 28.07 27.66 27.35 28.30 27.85 27.94 29.39 28.22 28.18 28.74 27.65
## [241] 29.05 28.88 29.11 28.31 27.81 33.52 31.26 32.69 31.58 31.56 31.72 31.32
## [253] 30.80 31.03 32.96 32.11 31.57 30.88 32.55 31.66 32.58 31.21 32.37 32.08
## [265] 31.92 31.53 33.06 33.09 33.93 31.94 34.13 34.25 32.18 31.23 31.74 31.18
## [277] 31.88 33.77 31.69 31.79 32.30 32.39 31.82 32.61 30.03 32.51 32.91 31.50
## [289] 33.48 32.67 30.94 33.64 32.28 33.61 33.49 30.92 33.52 32.58 33.71 31.48
## [301] 33.00 32.55 33.32 36.52 38.00 37.58 37.83 36.62 36.02 36.29 36.39 36.61
## [313] 37.89 36.04 37.80 35.39 36.26 36.97 36.21 35.62 36.66 35.85 37.33 38.17
## [325] 36.02 36.12 37.04 36.27 38.04 37.07 36.12 38.16 37.19 36.86 36.28 38.30
## [337] 37.42 36.19 35.65 37.29 36.56 35.96 36.15 35.94 35.11 37.13 36.29 36.74
## [349] 36.47 35.79 36.08 36.15 38.42 37.61 35.28 36.05 37.13 36.43 36.50 35.63
## [361] 42.29 44.00 41.36 38.63 44.55 41.36 43.61 43.60 39.58 41.05 38.22 44.44
## [373] 40.94 46.79 41.10 40.60 44.10 38.19 38.26 44.11 47.25 43.96 39.54 40.81
## [385] 39.88 40.26 42.95 41.11 41.42 44.05 47.68 40.33 41.06 40.58 40.13 39.17
## [397] 44.27 47.01 40.72 40.91 44.40 44.14 44.84 44.71 44.77 43.96 43.63 41.10
## [409] 39.36 40.06 43.97 41.17 39.61 38.33 44.13 44.21 42.57 41.04 44.00 44.14
  • Check for prediction accuracy using Mean Square Error
mse_reg <- mean((testing_data$BMI - pred_reg)^2)
summary(model)
## 
## Call:
## lm(formula = BMI ~ ., data = training_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.40885 -0.31094  0.01986  0.33013  2.47279 
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                  4.850e+01  7.715e-01  62.864
## GenderMale                                   5.410e-01  4.471e-02  12.099
## Age                                          1.156e-03  3.271e-03   0.354
## Height                                      -2.636e+01  3.566e-01 -73.931
## Weight                                       2.608e-01  2.756e-03  94.599
## Family_History_with_Overweightyes            4.228e-01  4.560e-02   9.272
## High_Caloric_Food_Consumptionyes             1.162e-01  5.012e-02   2.319
## Frequency_Consumption_of_Vegetables          1.212e-01  3.178e-02   3.815
## Number_of_Main_Meals                         6.277e-02  2.006e-02   3.129
## Consumption_of_Food_Between_MealsFrequently -2.891e-01  1.002e-01  -2.885
## Consumption_of_Food_Between_Mealsno          9.128e-02  1.334e-01   0.684
## Consumption_of_Food_Between_MealsSometimes  -8.814e-02  9.337e-02  -0.944
## Smokeyes                                    -2.703e-01  1.038e-01  -2.604
## Consumption_of_Water_Daily                  -7.265e-02  2.508e-02  -2.896
## Calories_Consumption_Monitoringyes          -1.343e-01  7.221e-02  -1.860
## Physical_Activity_Frequency                 -2.396e-02  1.893e-02  -1.266
## Time_Using_Technology                        1.081e-02  2.524e-02   0.428
## Consumption_of_AlcoholFrequently             6.844e-02  5.895e-01   0.116
## Consumption_of_Alcoholno                     1.846e-01  5.853e-01   0.315
## Consumption_of_AlcoholSometimes             -5.956e-02  5.855e-01  -0.102
## Transportation_UsedBike                      4.064e-04  2.420e-01   0.002
## Transportation_UsedMotorbike                -3.408e-01  2.405e-01  -1.417
## Transportation_UsedPublic_Transportation    -2.703e-01  4.619e-02  -5.851
## Transportation_UsedWalking                   2.001e-03  9.998e-02   0.020
## ObesityNormal Weight                         8.860e-01  7.096e-02  12.485
## ObesityObesity Type I                        3.298e+00  1.403e-01  23.516
## ObesityObesity Type II                       3.737e+00  1.822e-01  20.516
## ObesityObesity Type III                      6.200e+00  2.157e-01  28.749
## ObesityOverweight Level I                    1.949e+00  9.612e-02  20.278
## ObesityOverweight Level II                   2.301e+00  1.147e-01  20.055
##                                             Pr(>|t|)    
## (Intercept)                                  < 2e-16 ***
## GenderMale                                   < 2e-16 ***
## Age                                         0.723719    
## Height                                       < 2e-16 ***
## Weight                                       < 2e-16 ***
## Family_History_with_Overweightyes            < 2e-16 ***
## High_Caloric_Food_Consumptionyes            0.020515 *  
## Frequency_Consumption_of_Vegetables         0.000141 ***
## Number_of_Main_Meals                        0.001782 ** 
## Consumption_of_Food_Between_MealsFrequently 0.003962 ** 
## Consumption_of_Food_Between_Mealsno         0.493915    
## Consumption_of_Food_Between_MealsSometimes  0.345283    
## Smokeyes                                    0.009294 ** 
## Consumption_of_Water_Daily                  0.003827 ** 
## Calories_Consumption_Monitoringyes          0.063075 .  
## Physical_Activity_Frequency                 0.205859    
## Time_Using_Technology                       0.668479    
## Consumption_of_AlcoholFrequently            0.907599    
## Consumption_of_Alcoholno                    0.752558    
## Consumption_of_AlcoholSometimes             0.918990    
## Transportation_UsedBike                     0.998660    
## Transportation_UsedMotorbike                0.156594    
## Transportation_UsedPublic_Transportation    5.89e-09 ***
## Transportation_UsedWalking                  0.984030    
## ObesityNormal Weight                         < 2e-16 ***
## ObesityObesity Type I                        < 2e-16 ***
## ObesityObesity Type II                       < 2e-16 ***
## ObesityObesity Type III                      < 2e-16 ***
## ObesityOverweight Level I                    < 2e-16 ***
## ObesityOverweight Level II                   < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5736 on 1661 degrees of freedom
## Multiple R-squared:  0.995,  Adjusted R-squared:  0.9949 
## F-statistic: 1.129e+04 on 29 and 1661 DF,  p-value: < 2.2e-16

2. Obesity Level Prediction (Classification Model)

  • Generate random elements without replacement
df_svm <- obesity_new
df_svm$Gender = sample(c("Male", "Female"), 2111, replace = TRUE)
df_svm$Obesity <- sample(c('Normal Weight','Insufficient Weight', 'Overweight Level I','Overweight Level II',
                            'Obesity Type I','Obesity Type II','Obesity Type III'), 2111, replace = TRUE)
  • Convert the variables into factors
df_svm$Gender <- as.factor(df_svm$Gender)
df_svm$Obesity <- as.factor(df_svm$Obesity)

set.seed(456)
  • Train, Test, Split
train_index <- sample(seq_len(nrow(df_svm)), 0.7 * nrow(df_svm))
train_data <- df_svm[train_index, ]
test_data <- df_svm[-train_index, ]
  • Create Support Vector Machine model and train based on Training Data
model <- svm(formula = Obesity ~ ., 
                 data = train_data,
                 kernel = 'linear')
  • Make predictions with created model using Testing Data
pred_svm <- predict(model, newdata = test_data)
pred_svm
##                   2                   6                   7                  14 
## Overweight Level II     Obesity Type II       Normal Weight     Obesity Type II 
##                  16                  19                  21                  25 
## Overweight Level II    Obesity Type III Insufficient Weight       Normal Weight 
##                  28                  30                  33                  38 
##     Obesity Type II Insufficient Weight Insufficient Weight     Obesity Type II 
##                  39                  45                  49                  50 
##       Normal Weight Overweight Level II Insufficient Weight     Obesity Type II 
##                  51                  52                  53                  54 
## Insufficient Weight     Obesity Type II Insufficient Weight Overweight Level II 
##                  55                  58                  59                  63 
## Overweight Level II    Obesity Type III Insufficient Weight      Obesity Type I 
##                  72                  75                  79                  81 
## Overweight Level II     Obesity Type II       Normal Weight Overweight Level II 
##                  83                  84                  87                  89 
##       Normal Weight Overweight Level II       Normal Weight    Obesity Type III 
##                  93                  94                  96                  98 
## Overweight Level II Insufficient Weight Insufficient Weight    Obesity Type III 
##                 102                 103                 104                 105 
## Overweight Level II    Obesity Type III       Normal Weight    Obesity Type III 
##                 109                 114                 120                 124 
## Overweight Level II      Obesity Type I Overweight Level II       Normal Weight 
##                 129                 134                 139                 140 
##    Obesity Type III       Normal Weight Overweight Level II     Obesity Type II 
##                 145                 146                 157                 158 
## Overweight Level II      Obesity Type I    Obesity Type III     Obesity Type II 
##                 163                 166                 167                 174 
##       Normal Weight Overweight Level II Overweight Level II Overweight Level II 
##                 175                 178                 183                 190 
##      Obesity Type I Insufficient Weight     Obesity Type II       Normal Weight 
##                 192                 193                 198                 199 
## Overweight Level II Overweight Level II     Obesity Type II    Obesity Type III 
##                 207                 216                 224                 226 
## Insufficient Weight       Normal Weight       Normal Weight Overweight Level II 
##                 227                 232                 233                 238 
##     Obesity Type II       Normal Weight Overweight Level II       Normal Weight 
##                 241                 244                 246                 248 
## Insufficient Weight       Normal Weight  Overweight Level I     Obesity Type II 
##                 251                 260                 264                 270 
##       Normal Weight    Obesity Type III Insufficient Weight Insufficient Weight 
##                 271                 273                 275                 278 
## Insufficient Weight     Obesity Type II Overweight Level II     Obesity Type II 
##                 282                 285                 287                 289 
##     Obesity Type II      Obesity Type I Insufficient Weight       Normal Weight 
##                 294                 296                 300                 303 
## Overweight Level II Insufficient Weight Insufficient Weight Insufficient Weight 
##                 311                 313                 315                 318 
## Insufficient Weight Insufficient Weight       Normal Weight       Normal Weight 
##                 321                 326                 332                 336 
##       Normal Weight     Obesity Type II Insufficient Weight     Obesity Type II 
##                 342                 347                 348                 354 
## Overweight Level II Overweight Level II Insufficient Weight      Obesity Type I 
##                 358                 365                 369                 379 
##      Obesity Type I       Normal Weight      Obesity Type I     Obesity Type II 
##                 382                 383                 391                 393 
##     Obesity Type II    Obesity Type III       Normal Weight Overweight Level II 
##                 397                 398                 400                 407 
## Insufficient Weight     Obesity Type II    Obesity Type III Overweight Level II 
##                 408                 409                 412                 417 
##    Obesity Type III Overweight Level II Overweight Level II Overweight Level II 
##                 418                 420                 421                 422 
##     Obesity Type II      Obesity Type I Insufficient Weight Insufficient Weight 
##                 423                 426                 427                 430 
##     Obesity Type II       Normal Weight Overweight Level II Overweight Level II 
##                 432                 435                 436                 437 
##     Obesity Type II Insufficient Weight Insufficient Weight     Obesity Type II 
##                 439                 444                 447                 452 
##     Obesity Type II       Normal Weight     Obesity Type II Overweight Level II 
##                 467                 468                 469                 471 
##    Obesity Type III    Obesity Type III     Obesity Type II     Obesity Type II 
##                 473                 479                 480                 481 
##    Obesity Type III Insufficient Weight     Obesity Type II Insufficient Weight 
##                 483                 487                 488                 490 
##    Obesity Type III Insufficient Weight  Overweight Level I    Obesity Type III 
##                 491                 497                 500                 505 
## Overweight Level II Insufficient Weight     Obesity Type II       Normal Weight 
##                 506                 508                 509                 513 
##       Normal Weight       Normal Weight Insufficient Weight    Obesity Type III 
##                 518                 520                 523                 525 
##    Obesity Type III       Normal Weight    Obesity Type III    Obesity Type III 
##                 527                 532                 534                 541 
##    Obesity Type III       Normal Weight       Normal Weight Overweight Level II 
##                 544                 547                 555                 560 
##       Normal Weight     Obesity Type II     Obesity Type II Insufficient Weight 
##                 563                 564                 571                 574 
##     Obesity Type II Insufficient Weight     Obesity Type II Insufficient Weight 
##                 579                 587                 588                 591 
##    Obesity Type III     Obesity Type II     Obesity Type II     Obesity Type II 
##                 593                 597                 601                 605 
##       Normal Weight       Normal Weight    Obesity Type III    Obesity Type III 
##                 607                 611                 612                 616 
## Insufficient Weight       Normal Weight Overweight Level II    Obesity Type III 
##                 617                 622                 628                 635 
## Overweight Level II       Normal Weight Insufficient Weight       Normal Weight 
##                 637                 639                 643                 646 
##     Obesity Type II       Normal Weight       Normal Weight       Normal Weight 
##                 648                 649                 651                 653 
##       Normal Weight    Obesity Type III Overweight Level II    Obesity Type III 
##                 654                 656                 658                 662 
##    Obesity Type III     Obesity Type II Insufficient Weight    Obesity Type III 
##                 669                 672                 675                 677 
## Insufficient Weight       Normal Weight     Obesity Type II     Obesity Type II 
##                 678                 680                 683                 690 
## Overweight Level II       Normal Weight Insufficient Weight     Obesity Type II 
##                 694                 700                 701                 704 
##       Normal Weight Insufficient Weight Insufficient Weight       Normal Weight 
##                 713                 714                 715                 722 
## Overweight Level II    Obesity Type III    Obesity Type III Insufficient Weight 
##                 729                 731                 732                 733 
##       Normal Weight Insufficient Weight       Normal Weight      Obesity Type I 
##                 741                 742                 747                 752 
##    Obesity Type III    Obesity Type III      Obesity Type I     Obesity Type II 
##                 757                 763                 764                 766 
##    Obesity Type III      Obesity Type I      Obesity Type I      Obesity Type I 
##                 768                 773                 775                 776 
##       Normal Weight       Normal Weight      Obesity Type I      Obesity Type I 
##                 777                 781                 785                 786 
##      Obesity Type I       Normal Weight    Obesity Type III    Obesity Type III 
##                 788                 790                 803                 805 
##    Obesity Type III      Obesity Type I    Obesity Type III Insufficient Weight 
##                 817                 818                 820                 823 
##       Normal Weight       Normal Weight       Normal Weight Insufficient Weight 
##                 824                 829                 834                 838 
##       Normal Weight      Obesity Type I      Obesity Type I       Normal Weight 
##                 842                 847                 849                 854 
##       Normal Weight Insufficient Weight Insufficient Weight      Obesity Type I 
##                 856                 860                 863                 865 
##      Obesity Type I       Normal Weight    Obesity Type III       Normal Weight 
##                 866                 867                 868                 871 
##    Obesity Type III    Obesity Type III    Obesity Type III    Obesity Type III 
##                 884                 885                 889                 893 
##      Obesity Type I Insufficient Weight     Obesity Type II    Obesity Type III 
##                 897                 902                 909                 919 
##      Obesity Type I       Normal Weight       Normal Weight      Obesity Type I 
##                 922                 926                 927                 929 
##      Obesity Type I      Obesity Type I       Normal Weight       Normal Weight 
##                 937                 956                 957                 959 
##      Obesity Type I    Obesity Type III Insufficient Weight      Obesity Type I 
##                 963                 965                 968                 971 
##    Obesity Type III    Obesity Type III    Obesity Type III       Normal Weight 
##                 973                 975                 976                 981 
## Insufficient Weight Insufficient Weight Insufficient Weight Insufficient Weight 
##                 982                 986                 987                 989 
## Insufficient Weight       Normal Weight Overweight Level II Insufficient Weight 
##                 994                 997                1000                1001 
##    Obesity Type III Insufficient Weight       Normal Weight       Normal Weight 
##                1005                1009                1011                1013 
## Overweight Level II       Normal Weight Insufficient Weight Insufficient Weight 
##                1015                1017                1022                1027 
##       Normal Weight       Normal Weight      Obesity Type I       Normal Weight 
##                1029                1037                1039                1045 
##    Obesity Type III       Normal Weight       Normal Weight Overweight Level II 
##                1047                1048                1056                1058 
##       Normal Weight Overweight Level II       Normal Weight       Normal Weight 
##                1061                1064                1070                1073 
##       Normal Weight       Normal Weight      Obesity Type I     Obesity Type II 
##                1075                1077                1079                1082 
##       Normal Weight Insufficient Weight       Normal Weight Overweight Level II 
##                1083                1087                1089                1091 
## Overweight Level II       Normal Weight       Normal Weight Insufficient Weight 
##                1093                1095                1100                1102 
## Insufficient Weight      Obesity Type I Overweight Level II       Normal Weight 
##                1103                1104                1106                1110 
## Overweight Level II Overweight Level II Insufficient Weight      Obesity Type I 
##                1112                1116                1119                1127 
## Insufficient Weight Insufficient Weight    Obesity Type III      Obesity Type I 
##                1128                1129                1136                1137 
##       Normal Weight       Normal Weight    Obesity Type III Insufficient Weight 
##                1139                1140                1149                1152 
## Overweight Level II       Normal Weight Insufficient Weight       Normal Weight 
##                1153                1160                1163                1167 
##       Normal Weight       Normal Weight       Normal Weight    Obesity Type III 
##                1174                1177                1179                1182 
##       Normal Weight Overweight Level II Overweight Level II       Normal Weight 
##                1184                1185                1189                1198 
##       Normal Weight       Normal Weight Overweight Level II      Obesity Type I 
##                1200                1205                1207                1211 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                1216                1217                1218                1222 
##    Obesity Type III       Normal Weight      Obesity Type I       Normal Weight 
##                1223                1234                1236                1238 
##       Normal Weight       Normal Weight       Normal Weight     Obesity Type II 
##                1240                1243                1244                1247 
##       Normal Weight    Obesity Type III     Obesity Type II Insufficient Weight 
##                1248                1250                1253                1257 
##      Obesity Type I Insufficient Weight Insufficient Weight Insufficient Weight 
##                1260                1263                1265                1266 
##       Normal Weight Insufficient Weight    Obesity Type III Insufficient Weight 
##                1274                1280                1281                1282 
##      Obesity Type I       Normal Weight Insufficient Weight Insufficient Weight 
##                1285                1287                1289                1290 
##      Obesity Type I      Obesity Type I      Obesity Type I       Normal Weight 
##                1294                1295                1296                1298 
##       Normal Weight       Normal Weight    Obesity Type III       Normal Weight 
##                1303                1312                1313                1316 
## Insufficient Weight      Obesity Type I    Obesity Type III    Obesity Type III 
##                1321                1322                1327                1334 
## Insufficient Weight       Normal Weight    Obesity Type III      Obesity Type I 
##                1341                1349                1355                1358 
## Insufficient Weight Insufficient Weight Insufficient Weight       Normal Weight 
##                1359                1362                1365                1367 
## Insufficient Weight Insufficient Weight      Obesity Type I      Obesity Type I 
##                1369                1371                1372                1374 
##       Normal Weight       Normal Weight Insufficient Weight Insufficient Weight 
##                1377                1378                1382                1384 
## Insufficient Weight       Normal Weight    Obesity Type III      Obesity Type I 
##                1387                1389                1393                1394 
##       Normal Weight       Normal Weight    Obesity Type III Insufficient Weight 
##                1395                1399                1401                1404 
## Insufficient Weight       Normal Weight       Normal Weight Insufficient Weight 
##                1418                1423                1424                1427 
##    Obesity Type III       Normal Weight     Obesity Type II       Normal Weight 
##                1435                1437                1439                1445 
##      Obesity Type I       Normal Weight    Obesity Type III      Obesity Type I 
##                1446                1453                1472                1475 
##       Normal Weight Insufficient Weight       Normal Weight       Normal Weight 
##                1479                1480                1482                1483 
## Insufficient Weight Insufficient Weight      Obesity Type I      Obesity Type I 
##                1484                1486                1492                1494 
##    Obesity Type III Insufficient Weight Insufficient Weight Insufficient Weight 
##                1498                1499                1500                1502 
## Insufficient Weight      Obesity Type I      Obesity Type I      Obesity Type I 
##                1512                1513                1516                1522 
## Insufficient Weight Insufficient Weight       Normal Weight Insufficient Weight 
##                1524                1528                1530                1532 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                1533                1536                1544                1552 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                1555                1557                1559                1560 
##       Normal Weight    Obesity Type III     Obesity Type II       Normal Weight 
##                1564                1565                1566                1568 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                1571                1572                1573                1576 
## Insufficient Weight       Normal Weight Insufficient Weight       Normal Weight 
##                1577                1580                1583                1584 
## Insufficient Weight Insufficient Weight Insufficient Weight       Normal Weight 
##                1585                1586                1588                1593 
##       Normal Weight Insufficient Weight       Normal Weight       Normal Weight 
##                1597                1598                1599                1603 
## Insufficient Weight    Obesity Type III       Normal Weight     Obesity Type II 
##                1607                1613                1614                1616 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                1617                1624                1625                1627 
##       Normal Weight Insufficient Weight       Normal Weight Insufficient Weight 
##                1629                1633                1635                1640 
##       Normal Weight       Normal Weight Insufficient Weight       Normal Weight 
##                1641                1643                1646                1650 
## Insufficient Weight Insufficient Weight Insufficient Weight       Normal Weight 
##                1651                1652                1658                1659 
##       Normal Weight       Normal Weight    Obesity Type III       Normal Weight 
##                1663                1666                1667                1668 
##       Normal Weight    Obesity Type III       Normal Weight Insufficient Weight 
##                1673                1675                1676                1685 
##       Normal Weight       Normal Weight       Normal Weight Insufficient Weight 
##                1686                1687                1690                1693 
## Insufficient Weight     Obesity Type II       Normal Weight Insufficient Weight 
##                1694                1695                1700                1701 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                1702                1705                1706                1709 
##       Normal Weight Insufficient Weight       Normal Weight    Obesity Type III 
##                1713                1716                1719                1720 
##       Normal Weight Insufficient Weight       Normal Weight      Obesity Type I 
##                1722                1727                1729                1730 
##       Normal Weight       Normal Weight       Normal Weight Insufficient Weight 
##                1731                1734                1736                1737 
##       Normal Weight Insufficient Weight Insufficient Weight Insufficient Weight 
##                1738                1740                1747                1752 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                1753                1756                1758                1761 
## Insufficient Weight       Normal Weight Insufficient Weight Insufficient Weight 
##                1769                1771                1777                1779 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                1782                1790                1791                1799 
##       Normal Weight Insufficient Weight       Normal Weight       Normal Weight 
##                1803                1805                1808                1811 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                1813                1818                1830                1834 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                1842                1850                1854                1858 
##       Normal Weight       Normal Weight Insufficient Weight       Normal Weight 
##                1859                1865                1868                1869 
##       Normal Weight Insufficient Weight       Normal Weight       Normal Weight 
##                1870                1872                1877                1883 
##       Normal Weight     Obesity Type II Insufficient Weight       Normal Weight 
##                1886                1889                1894                1898 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                1906                1919                1922                1931 
##       Normal Weight Insufficient Weight       Normal Weight       Normal Weight 
##                1936                1937                1941                1945 
##       Normal Weight       Normal Weight       Normal Weight Insufficient Weight 
##                1947                1948                1949                1950 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                1952                1955                1958                1969 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                1973                1974                1975                1982 
## Insufficient Weight Insufficient Weight Insufficient Weight       Normal Weight 
##                1985                1986                1997                1998 
##       Normal Weight Insufficient Weight Insufficient Weight       Normal Weight 
##                2000                2002                2008                2010 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                2011                2012                2019                2021 
##       Normal Weight Insufficient Weight       Normal Weight       Normal Weight 
##                2022                2023                2024                2025 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                2028                2039                2043                2045 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                2055                2056                2058                2062 
## Insufficient Weight       Normal Weight       Normal Weight       Normal Weight 
##                2063                2066                2072                2073 
##       Normal Weight       Normal Weight Insufficient Weight     Obesity Type II 
##                2074                2076                2077                2081 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                2085                2086                2087                2090 
##       Normal Weight       Normal Weight       Normal Weight       Normal Weight 
##                2091                2094                2098                2106 
##       Normal Weight       Normal Weight       Normal Weight Insufficient Weight 
##                2108                2109 
## Insufficient Weight Insufficient Weight 
## 7 Levels: Insufficient Weight Normal Weight Obesity Type I ... Overweight Level II
  • Check for prediction accuracy using Confusion Matrix
mse_svm <- confusionMatrix(pred_svm, test_data$Obesity)
mse_svm
## Confusion Matrix and Statistics
## 
##                      Reference
## Prediction            Insufficient Weight Normal Weight Obesity Type I
##   Insufficient Weight                  23            11             15
##   Normal Weight                        36            43             38
##   Obesity Type I                        4             9              7
##   Obesity Type II                       7             4              7
##   Obesity Type III                     21             8              6
##   Overweight Level I                    1             0              0
##   Overweight Level II                  14             8              7
##                      Reference
## Prediction            Obesity Type II Obesity Type III Overweight Level I
##   Insufficient Weight              29               26                 18
##   Normal Weight                    29               31                 42
##   Obesity Type I                    9               12                  6
##   Obesity Type II                   6               13                  8
##   Obesity Type III                 12                7                  9
##   Overweight Level I                1                0                  0
##   Overweight Level II               4                5                  6
##                      Reference
## Prediction            Overweight Level II
##   Insufficient Weight                  28
##   Normal Weight                        30
##   Obesity Type I                        7
##   Obesity Type II                       9
##   Obesity Type III                      9
##   Overweight Level I                    0
##   Overweight Level II                   9
## 
## Overall Statistics
##                                        
##                Accuracy : 0.1498       
##                  95% CI : (0.123, 0.18)
##     No Information Rate : 0.1672       
##     P-Value [Acc > NIR] : 0.8907       
##                                        
##                   Kappa : 0.0077       
##                                        
##  Mcnemar's Test P-Value : <2e-16       
## 
## Statistics by Class:
## 
##                      Class: Insufficient Weight Class: Normal Weight
## Sensitivity                             0.21698              0.51807
## Specificity                             0.75947              0.62613
## Pos Pred Value                          0.15333              0.17269
## Neg Pred Value                          0.82851              0.89610
## Prevalence                              0.16719              0.13091
## Detection Rate                          0.03628              0.06782
## Detection Prevalence                    0.23659              0.39274
## Balanced Accuracy                       0.48823              0.57210
##                      Class: Obesity Type I Class: Obesity Type II
## Sensitivity                        0.08750               0.066667
## Specificity                        0.91516               0.911765
## Pos Pred Value                     0.12963               0.111111
## Neg Pred Value                     0.87414               0.855172
## Prevalence                         0.12618               0.141956
## Detection Rate                     0.01104               0.009464
## Detection Prevalence               0.08517               0.085174
## Balanced Accuracy                  0.50133               0.489216
##                      Class: Obesity Type III Class: Overweight Level I
## Sensitivity                          0.07447                  0.000000
## Specificity                          0.87963                  0.996330
## Pos Pred Value                       0.09722                  0.000000
## Neg Pred Value                       0.84520                  0.859177
## Prevalence                           0.14826                  0.140379
## Detection Rate                       0.01104                  0.000000
## Detection Prevalence                 0.11356                  0.003155
## Balanced Accuracy                    0.47705                  0.498165
##                      Class: Overweight Level II
## Sensitivity                             0.09783
## Specificity                             0.91882
## Pos Pred Value                          0.16981
## Neg Pred Value                          0.85714
## Prevalence                              0.14511
## Detection Rate                          0.01420
## Detection Prevalence                    0.08360
## Balanced Accuracy                       0.50832
summary(pred_svm)
## Insufficient Weight       Normal Weight      Obesity Type I     Obesity Type II 
##                 150                 249                  54                  54 
##    Obesity Type III  Overweight Level I Overweight Level II 
##                  72                   2                  53

Conclusion

  1. BMI Prediction (Regression Model)

    Model able to predict BMI, holds RSE value of 0.995.

  2. Obesity Level Prediction (Classification Model)

    Model able to predict Obesity Level, though, requires more tuning as the accuracy is only 15%.