7004_G11.knit

Predicting the Work Life Balance Score

Group 11

AMANI ALSHANQITI (S2127083)

PRIYADARSHINI NAIR AP MUNIANDY(22062712)

DHIVASHINI LINGADARAN (S2127834)

CHE NADZIRAH CHE AB RAZAK (S2170502)

IZZAH ATHIRAH MOHAMAD RADZI (S2179297)

1.0 Introduction

In today’s fast-paced world, achieving a healthy work-life balance has become increasingly challenging. The global pandemic has further highlighted the importance of finding a harmonious equilibrium between work and personal life. Recognizing this need, we present a project aimed at developing a Work Life Balance Calculator, which will empower employees and citizens to assess their work-life balance and identify areas for improvement.

1.1 Objectives

The objective of this project is to develop a Work-Life Balance Calculator that can assess and predict work-life balance based on various variables. The dataset contains information related to different aspects of individuals’ lives, such as daily habits, stress levels, social connections, achievements. By analyzing this data, we aim to:

Predict the “WORK_LIFE_BALANCE_SCORE” variable using regression models: The goal is to understand the relationship between work-life balance and other variables in the dataset. We want to identify which factors significantly influence work-life balance and develop predictive models that can estimate work-life balance scores based on those factors.
Predict the “BMI_RANGE” variable using classification models: Here, the focus is on predicting the categorical variable “BMI_RANGE” based on the available features. The goal is to assess the accuracy of different classification models in predicting BMI ranges and identify the most effective model.

The questions we are interested in answering from this dataset include:

How accurately can we predict work-life balance scores using regression models? Which regression model performs the best in terms of predicting work-life balance?
How accurately can we classify individuals into different BMI ranges using classification models? Which classification model achieves the highest accuracy in predicting BMI ranges? By addressing these questions, we aim to gain insights into the factors influencing work-life balance and the ability to predict work-life balance scores, as well as the effectiveness of different models in predicting BMI ranges. These findings will contribute to the development of the Work-Life Balance Calculator and enable individuals and organizations to improve work-life balance and overall well-being.

2.0 Data Understanding

Load the dataset

dataset <- read.csv("BALANCESCORE.csv")
dataset <- dataset[, -which(names(dataset) == "Timestamp")]
dataset

Number of row and columns:

n_rows <- nrow(dataset)
n_cols <- ncol(dataset)
cat("Number of rows is", n_rows, "\n")

## Number of rows is 15972

cat("Number of columns is", n_cols, "\n")

## Number of columns is 23

2.1 Data Source

Source: https://www.kaggle.com/datasets/ydalat/lifestyle-and-wellbeing-data Title: Lifestyle_and_Wellbeing_Data Year : 2021 Purpose: To evaluate and understand how individuals can reinvent their lifestyles to optimize their overall well-being while supporting the UN Sustainable Development Goals. Total number of rows: 15972 Total number of columns: 24 Target Variable: WORK_LIFE_BALANCE_SCORE Features: FRUITS_VEGGIES, DAILY_STRESS, PLACES_VISITED, CORE_CIRCLE, SUPPORTING_OTHERS, SOCIAL_NETWORK, ACHIEVEMENT, DONATION, BMI_RANGE, TODO_COMPLETED, FLOW, DAILY_STEPS, LIVE_VISION, SLEEP_HOURS, LOST_VACATION, DAILY_SHOUTING, SUFFICIENT_INCOME, PERSONAL_AWARDS, TIME_FOR_PASSION, WEEKLY_MEDITATION, AGE, and GENDER.

2.2 Data Cleaning

##### 2.3 Data Pre-Processing

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Target variable and features:

dataset_structure <- str(dataset)

## 'data.frame':    15972 obs. of  23 variables:
##  $ FRUITS_VEGGIES         : int  4 1 2 4 4 3 2 5 2 1 ...
##  $ DAILY_STRESS           : chr  "2" "4" "2" "2" ...
##  $ PLACES_VISITED         : int  10 3 10 10 10 6 3 8 6 1 ...
##  $ CORE_CIRCLE            : int  6 8 5 4 10 10 8 6 10 3 ...
##  $ SUPPORTING_OTHERS      : int  10 0 2 6 5 10 6 10 10 6 ...
##  $ SOCIAL_NETWORK         : int  10 2 8 10 10 6 5 10 10 5 ...
##  $ ACHIEVEMENT            : int  3 1 3 4 0 3 1 6 10 3 ...
##  $ DONATION               : int  5 0 4 0 1 5 2 4 5 5 ...
##  $ BMI_RANGE              : int  2 1 2 1 1 2 1 2 1 1 ...
##  $ TODO_COMPLETED         : int  8 2 7 8 7 8 8 4 10 2 ...
##  $ FLOW                   : int  8 1 1 2 1 4 1 4 6 2 ...
##  $ DAILY_STEPS            : int  7 8 6 1 10 1 4 3 7 8 ...
##  $ LIVE_VISION            : int  5 2 10 1 2 5 6 3 4 1 ...
##  $ SLEEP_HOURS            : int  7 7 8 8 8 8 4 7 7 7 ...
##  $ LOST_VACATION          : int  10 7 0 1 0 0 0 1 0 0 ...
##  $ DAILY_SHOUTING         : int  0 1 0 1 3 0 3 2 1 6 ...
##  $ SUFFICIENT_INCOME      : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ PERSONAL_AWARDS        : int  10 4 5 3 4 5 3 10 10 5 ...
##  $ TIME_FOR_PASSION       : int  8 1 2 3 8 1 2 6 5 1 ...
##  $ WEEKLY_MEDITATION      : int  10 7 7 3 6 2 5 8 7 1 ...
##  $ AGE                    : chr  "51 or more" "21 to 35" "21 to 35" "21 to 35" ...
##  $ GENDER                 : chr  "Male" "Male" "Male" "Male" ...
##  $ WORK_LIFE_BALANCE_SCORE: num  727 619 686 674 708 ...

Summary of the data frame

summary(dataset)

##  FRUITS_VEGGIES  DAILY_STRESS       PLACES_VISITED    CORE_CIRCLE    
##  Min.   :0.000   Length:15972       Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:2.000   Class :character   1st Qu.: 2.000   1st Qu.: 3.000  
##  Median :3.000   Mode  :character   Median : 5.000   Median : 5.000  
##  Mean   :2.923                      Mean   : 5.233   Mean   : 5.508  
##  3rd Qu.:4.000                      3rd Qu.: 8.000   3rd Qu.: 8.000  
##  Max.   :5.000                      Max.   :10.000   Max.   :10.000  
##  SUPPORTING_OTHERS SOCIAL_NETWORK    ACHIEVEMENT        DONATION    
##  Min.   : 0.000    Min.   : 0.000   Min.   : 0.000   Min.   :0.000  
##  1st Qu.: 3.000    1st Qu.: 4.000   1st Qu.: 2.000   1st Qu.:1.000  
##  Median : 5.000    Median : 6.000   Median : 3.000   Median :3.000  
##  Mean   : 5.616    Mean   : 6.474   Mean   : 4.001   Mean   :2.715  
##  3rd Qu.:10.000    3rd Qu.:10.000   3rd Qu.: 6.000   3rd Qu.:5.000  
##  Max.   :10.000    Max.   :10.000   Max.   :10.000   Max.   :5.000  
##    BMI_RANGE     TODO_COMPLETED        FLOW         DAILY_STEPS    
##  Min.   :1.000   Min.   : 0.000   Min.   : 0.000   Min.   : 1.000  
##  1st Qu.:1.000   1st Qu.: 4.000   1st Qu.: 1.000   1st Qu.: 3.000  
##  Median :1.000   Median : 6.000   Median : 3.000   Median : 5.000  
##  Mean   :1.411   Mean   : 5.746   Mean   : 3.195   Mean   : 5.704  
##  3rd Qu.:2.000   3rd Qu.: 8.000   3rd Qu.: 5.000   3rd Qu.: 8.000  
##  Max.   :2.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##   LIVE_VISION      SLEEP_HOURS     LOST_VACATION    DAILY_SHOUTING  
##  Min.   : 0.000   Min.   : 1.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 1.000   1st Qu.: 6.000   1st Qu.: 0.000   1st Qu.: 1.000  
##  Median : 3.000   Median : 7.000   Median : 0.000   Median : 2.000  
##  Mean   : 3.752   Mean   : 7.043   Mean   : 2.899   Mean   : 2.931  
##  3rd Qu.: 5.000   3rd Qu.: 8.000   3rd Qu.: 5.000   3rd Qu.: 4.000  
##  Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##  SUFFICIENT_INCOME PERSONAL_AWARDS  TIME_FOR_PASSION WEEKLY_MEDITATION
##  Min.   :1.000     Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   
##  1st Qu.:1.000     1st Qu.: 3.000   1st Qu.: 1.000   1st Qu.: 4.000   
##  Median :2.000     Median : 5.000   Median : 3.000   Median : 7.000   
##  Mean   :1.729     Mean   : 5.712   Mean   : 3.327   Mean   : 6.233   
##  3rd Qu.:2.000     3rd Qu.: 9.000   3rd Qu.: 5.000   3rd Qu.:10.000   
##  Max.   :2.000     Max.   :10.000   Max.   :10.000   Max.   :10.000   
##      AGE               GENDER          WORK_LIFE_BALANCE_SCORE
##  Length:15972       Length:15972       Min.   :480.0          
##  Class :character   Class :character   1st Qu.:636.0          
##  Mode  :character   Mode  :character   Median :667.7          
##                                        Mean   :666.8          
##                                        3rd Qu.:698.5          
##                                        Max.   :820.2

Calculate the counts of unique values in the “DAILY_STRESS” column

dataset$DAILY_STRESS <- as.numeric(dataset$DAILY_STRESS)

## Warning: NAs introduced by coercion

value_counts <- table(dataset$DAILY_STRESS)
print(value_counts)

## 
##    0    1    2    3    4    5 
##  676 2478 3407 4398 2960 2052

Check for missing values

dataset <- na.omit(dataset)

Check if the number of missing values in each column is removed

missing_counts <- colSums(is.na(dataset))
print(missing_counts)

##          FRUITS_VEGGIES            DAILY_STRESS          PLACES_VISITED 
##                       0                       0                       0 
##             CORE_CIRCLE       SUPPORTING_OTHERS          SOCIAL_NETWORK 
##                       0                       0                       0 
##             ACHIEVEMENT                DONATION               BMI_RANGE 
##                       0                       0                       0 
##          TODO_COMPLETED                    FLOW             DAILY_STEPS 
##                       0                       0                       0 
##             LIVE_VISION             SLEEP_HOURS           LOST_VACATION 
##                       0                       0                       0 
##          DAILY_SHOUTING       SUFFICIENT_INCOME         PERSONAL_AWARDS 
##                       0                       0                       0 
##        TIME_FOR_PASSION       WEEKLY_MEDITATION                     AGE 
##                       0                       0                       0 
##                  GENDER WORK_LIFE_BALANCE_SCORE 
##                       0                       0

Missing values can distort statistical analysis and lead to inaccurate or biased results. The dataset has no misisng/null value, it’s a clean dataset.

View the dataset

3.0 Exploratory Data Analysis (EDA)

Load the tidyverse package

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Calculate the mean BMI range by age and gender using group_by and summarise

library(dplyr)
result <- dataset %>%
  group_by(AGE, GENDER) %>%
  summarise(mean_BMI_RANGE = mean(BMI_RANGE), .groups = "drop")

Pivot the result table

library(tidyr)
result_table <- result %>%
pivot_wider(names_from = GENDER, values_from = mean_BMI_RANGE)
print(result_table)

## # A tibble: 4 × 3
##   AGE          Female  Male
##   <chr>         <dbl> <dbl>
## 1 21 to 35       1.36  1.33
## 2 36 to 50       1.47  1.52
## 3 51 or more     1.53  1.52
## 4 Less than 20   1.23  1.22

Distribution of Age

library(ggplot2)
plot1 <- ggplot(dataset, aes(x = AGE)) +
  geom_density(fill = "lightblue") +
  labs(title = "Distribution of Age (Density Plot)")
plot1

Distribution of Gender

plot2 <- ggplot(dataset, aes(x = GENDER, fill = GENDER)) +
  geom_bar() +
  labs(title = "Distribution of Gender")
plot2

Distribution of Daily Stress by Gender

plot3 <- ggplot(dataset, aes(x = GENDER, y = DAILY_STRESS, fill = GENDER)) +
  geom_violin(scale ="width") +
  scale_fill_manual(values = c("pink", "blue")) +
  labs(x = "Gender", title = "Distribution of Daily Stress by Gender") +
  theme_minimal()

plot3

Work-Life Balance Score vs. Weekly Meditation

ggplot(dataset, aes(x = WORK_LIFE_BALANCE_SCORE, y = WEEKLY_MEDITATION)) +
  geom_point(color = "lightblue") +
  labs(title = "Work-Life Balance Score vs. Weekly Meditation")

Histogram of Age

ggplot(dataset, aes(x = AGE)) +
  geom_bar(stat = "count", fill = "steelblue", color = "black") +
  labs(x = "Age", y = "Frequency") +
  ggtitle("Distribution of Age")

Boxplot of Work Life Balance Score by Age

ggplot(dataset, aes(x = AGE, y = WORK_LIFE_BALANCE_SCORE)) +
  geom_boxplot(fill = "orange", color = "black") +
  labs(x = "", y = "Work-Life Balance Score") +
  ggtitle("Distribution of Work-Life Balance Score by Age")

#Daily Steps

plot5 <- ggplot(dataset, aes(x = DAILY_STEPS)) +
  geom_histogram(fill = "lightblue", bins = 20) +
  labs(title = "Histogram of Daily Steps")
plot5

HEALTHY BODY (HOW TO KEEP OUR BMI BELOW 25)

Install the necessary libraries

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

Create a subset of a dataset where BMI_RANGE is below 25

subset_data <- subset(dataset, BMI_RANGE < 25)

Plot 1: BODY_MASS_INDEX BY AGE (geom bar)

plot6 <- ggplot(subset_data, aes(x = AGE, y = BMI_RANGE)) +
  geom_bar(stat = "summary", fun = "mean", fill = "salmon") +
  labs(x = "AGE", y = "BMI") +
  ggtitle("BODY_MASS_INDEX BY AGE")
plot6

Plot 2: BODY_MASS_INDEX BY GENDER & AGE

plot8 <- ggplot(subset_data, aes(x = AGE, y = BMI_RANGE, fill = GENDER)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "BODY_MASS_INDEX BY GENDER & AGE") +
  scale_fill_manual(values = c("darksalmon", "cornflowerblue"))
plot9 <- plot8 + ggtitle("BODY_MASS_INDEX BY GENDER & AGE")
plot9

Plot 3: BODY_MASS_INDEX & SLEEP HOURS (geom_smooth)

plot4 <- ggplot(subset_data, aes(x = SLEEP_HOURS, y = BMI_RANGE)) +
  geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
  labs(x = "Sleep Hours", y = "BMI") +
  ggtitle("BODY_MASS_INDEX & SLEEP HOURS")
plot4

## `geom_smooth()` using formula = 'y ~ x'

Plot 4: BODY_MASS_INDEX & SERVINGS OF FRUITS/VEGGIES (geom_bar)

plot5 <- ggplot(subset_data, aes(x = FRUITS_VEGGIES, y = BMI_RANGE)) +
  geom_bar(stat = "summary", fun = "mean", fill = "yellow") +
  labs(x = "Servings of Fruits/Veggies", y = "BMI") +
  ggtitle("BODY_MASS_INDEX & SERVINGS OF FRUITS/VEGGIES")
plot5

Plot 5: BODY_MASS_INDEX & DAILY STEPS (geom_smooth)

plot6 <- ggplot(subset_data, aes(x = DAILY_STEPS, y = BMI_RANGE)) +
  geom_smooth(method = "lm", se = FALSE, color = "grey") +
  labs(x = "Daily Steps", y = "BMI") +
  ggtitle("BODY_MASS_INDEX & DAILY STEPS")
plot6

## `geom_smooth()` using formula = 'y ~ x'

HEALTHY MIND (WHAT DRIVES OUR DAILY_STRESS?)

Install library

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

Create the pivot table using dcast() function from reshape2 package

df3 <- dcast(dataset, AGE ~ GENDER, value.var = "DAILY_STRESS")

## Aggregation function missing: defaulting to length

head(df3)

Plot 1: AVERAGE DAILY_STRESS BY AGE GROUP

plot1 <- ggplot(dataset, aes(x = AGE, y = DAILY_STRESS, fill = GENDER)) +
  geom_bar(stat = "summary", fun = "mean", position = "dodge", color = "black") +
  labs(x = "Age Group", y = "Average Daily Stress") +
  ggtitle("AVERAGE DAILY_STRESS BY AGE GROUP")
plot1

Plot 2: DAILY_STRESS BY GENDER

plot2 <- ggplot(dataset, aes(x = GENDER, y = DAILY_STRESS, fill = GENDER)) +
  geom_violin(trim = FALSE, scale = "count") +
  labs(x = "Gender", y = "Daily Stress") +
  ggtitle("DAILY_STRESS BY GENDER")
plot2

PERSONAL ACHIEVEMENTS (WHAT DRIVE US TO ACHIEVE REMARKABLE THINGS?)

Plot 1: CORE_CIRCLE BY GENDER (Violin Plot)

plot1 <- ggplot(dataset, aes(x = GENDER, y = CORE_CIRCLE, fill = GENDER)) +
  geom_violin() +
  labs(x = "Gender", y = "Core Circle") +
  ggtitle("CORE_CIRCLE BY GENDER")
plot1

Plot 2: LOST_VACATION BY AGE GROUP (Box Plot)

plot2 <- ggplot(dataset, aes(x = AGE, y = LOST_VACATION)) +
  geom_boxplot() +
  labs(x = "Age Group", y = "Lost Vacation") +
  scale_x_discrete(limits = c("Less than 20", "21 to 35", "36 to 50", "51 or more")) +
  ggtitle("LOST_VACATION BY AGE GROUP")
plot2

Plot 3: PLACES & DAILY_STRESS (Bar Chart)

plot3 <- ggplot(dataset, aes(x = PLACES_VISITED, y = DAILY_STRESS)) +
  geom_bar(stat = "summary", fun = "mean", fill = "steelblue") +
  labs(x = "Places Visited", y = "Daily Stress") +
  ggtitle("PLACES & DAILY_STRESS")
plot3

Plot 4: LOST VACATION & DAILY_STRESS (Box Plot)

plot4 <- ggplot(dataset, aes(x = LOST_VACATION, y = DAILY_STRESS)) +
  geom_boxplot() +
  labs(x = "Lost Vacation", y = "Daily Stress") +
  ggtitle("LOST VACATION & DAILY_STRESS")
plot4

## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

Plot 5: FRIENDS & DAILY_STRESS (Bar Chart)

plot5 <- ggplot(dataset, aes(x = SOCIAL_NETWORK, y = DAILY_STRESS)) +
  geom_bar(stat = "summary", fun = "mean", fill = "steelblue") +
  labs(x = "Social Network", y = "Daily Stress") +
  ggtitle("FRIENDS & DAILY_STRESS")
plot5

columns <- setdiff(names(dataset), c("GENDER", "AGE", "DAILY_STRESS"))
cor_matrix <- cor(dataset[, columns])
cor_df <- reshape2::melt(cor_matrix)
ggplot(cor_df, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient(low = "blue", high = "red") +
  labs(x = "Features", y = "Features", title = "Correlation Matrix")

#### Modeling

In this part, we will do two different problems relating to our dataset.

The first problem

The first problem will be a regression problem to predict the “WORK_LIFE_BALANCE_SCORE” variable based on other variables in the dataset. The code is implementing three different regression models: Linear Regression, Support Vector Regression (SVR), and Random Forest.

##### Load the necessary libraries

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

## randomForest 4.7-1.1

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:gridExtra':
## 
##     combine

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:ggplot2':
## 
##     margin

Split the data into training and testing sets

set.seed(123)
train_indices <- createDataPartition(dataset$WORK_LIFE_BALANCE_SCORE, p = 0.8, list = FALSE)
train_data <- dataset[train_indices, ]
test_data <- dataset[-train_indices, ]
train_data

test_data

Linear Regression Model

lm_model <- lm(WORK_LIFE_BALANCE_SCORE ~ ., data = train_data)

Support Vector Regression (SVR) Model

svr_model <- svm(WORK_LIFE_BALANCE_SCORE ~ ., data = train_data)

Random Forest Model

rf_model <- randomForest(WORK_LIFE_BALANCE_SCORE ~ ., data = train_data)

#Compare the performance of the model

lm_predictions <- predict(lm_model, test_data)
svr_predictions <- predict(svr_model, test_data)
rf_predictions <- predict(rf_model, test_data)

Calculate the Root Mean Squared Error (RMSE) for each model

lm_rmse <- sqrt(mean((test_data$WORK_LIFE_BALANCE_SCORE - lm_predictions)^2))
svr_rmse <- sqrt(mean((test_data$WORK_LIFE_BALANCE_SCORE - svr_predictions)^2))
rf_rmse <- sqrt(mean((test_data$WORK_LIFE_BALANCE_SCORE - rf_predictions)^2))
cat("Linear Regression RMSE:", lm_rmse, "\n")

## Linear Regression RMSE: 1.597709e-12

cat("SVR RMSE:", svr_rmse, "\n")

## SVR RMSE: 3.087885

cat("Random Forest RMSE:", rf_rmse, "\n")

## Random Forest RMSE: 10.83695

Print the RMSE values

cat("Linear Regression RMSE:", lm_rmse, "\n")

## Linear Regression RMSE: 1.597709e-12

cat("SVR RMSE:", svr_rmse, "\n")

## SVR RMSE: 3.087885

cat("Random Forest RMSE:", rf_rmse, "\n")

## Random Forest RMSE: 10.83695

plot_data <- data.frame(
  Actual = test_data$WORK_LIFE_BALANCE_SCORE,
  Linear_Regression = lm_predictions,
  SVR = svr_predictions,
  Random_Forest = rf_predictions
)

plot_data <- reshape2::melt(plot_data, id.vars = "Actual", variable.name = "Model")
ggplot(plot_data, aes(x = Actual, y = value, color = Model)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "black") +
  labs(x = "Actual WORK_LIFE_BALANCE_SCORE", y = "Predicted WORK_LIFE_BALANCE_SCORE") +
  ggtitle("Comparison of Predicted vs Actual WORK_LIFE_BALANCE_SCORE") +
  theme_minimal()

The second problem

The second problem is a classification problem. The goal is to predict a categorical variable (BMI_RANGE) and to evaluate the accuracy, which measures the proportion of correctly predicted class labels.

Load the necessary libraries

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## Loaded glmnet 4.1-7

Convert relevant columns to factors

dataset$BMI_RANGE <- as.factor(dataset$BMI_RANGE)
dataset$GENDER <- as.factor(dataset$GENDER)
dataset$AGE <- as.factor(dataset$AGE)

Split the data into training and testing sets

Random Forest

model_rf <- randomForest(BMI_RANGE ~ ., data = train)
predictions_rf <- predict(model_rf, newdata = test)
accuracy_rf <- sum(predictions_rf == test$BMI_RANGE) / nrow(test)

Logistic Regression

model_lr <- glm(BMI_RANGE ~ ., data = train, family = binomial,control = list(maxit = 1000))

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

predictions_lr <- predict(model_lr, newdata = test, type = "response")
predictions_lr <- ifelse(predictions_lr > 0.5, "2", "1")
accuracy_lr <- sum(predictions_lr == test$BMI_RANGE) / nrow(test)

Decision Tree

model_dt <- rpart(BMI_RANGE ~ ., data = train, method = "class")
predictions_dt <- predict(model_dt, newdata = test, type = "class")
accuracy_dt <- sum(predictions_dt == test$BMI_RANGE) / nrow(test)

Print the accuracies

cat("Random Forest Accuracy:", accuracy_rf, "\n")

## Random Forest Accuracy: 0.7647673

cat("Logistic Regression Accuracy:", accuracy_lr, "\n")

## Logistic Regression Accuracy: 1

cat("Decision Tree Accuracy:", accuracy_dt, "\n")

## Decision Tree Accuracy: 0.6530996

plot_data <- data.frame(
  Actual = test$BMI_RANGE,
  Predicted = predictions_rf
)

ggplot(plot_data, aes(x = Actual, y = Predicted)) +
  geom_jitter(width = 0.1, height = 0.1) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "black") +
  labs(x = "Actual BMI_RANGE", y = "Predicted BMI_RANGE") +
  ggtitle("Comparison of Predicted vs Actual BMI_RANGE") +
  theme_minimal()

Conclusion

In conclusion, the development of a Work Life Balance In conclusion, the development of a Work Life Balance Calculator through this project addresses the pressing need for individuals and organizations to prioritize work-life balance in today’s fast-paced world. By leveraging data mining techniques and machine learning algorithms, we have made significant strides in understanding the key factors that contribute to work-life balance and identifying areas for improvement.

The Work Life Balance Calculator serves as a valuable tool for individuals to assess their work-life balance, understand their strengths and areas for improvement, and make informed decisions to enhance their overall well-being. For organizations, the calculator offers insights into employees’ work-life balance, enabling them to develop tailored plans to optimize productivity and support their workforce.

Ultimately, this project contributes to the promotion of work-life balance and the improvement of overall performance and well-being. By prioritizing work-life balance, individuals can achieve greater satisfaction and fulfillment, leading to a more productive and harmonious society as a whole.