In today’s fast-paced world, achieving a healthy work-life balance
has become increasingly challenging. The global pandemic has further
highlighted the importance of finding a harmonious equilibrium between
work and personal life. Recognizing this need, we present a project
aimed at developing a Work Life Balance Calculator, which will empower
employees and citizens to assess their work-life balance and identify
areas for improvement.
The questions we are interested in answering from this dataset include:
How accurately can we predict work-life balance scores using regression models? Which regression model performs the best in terms of predicting work-life balance?
How accurately can we classify individuals into different BMI ranges using classification models? Which classification model achieves the highest accuracy in predicting BMI ranges?
By addressing these questions, we aim to gain insights into the
factors influencing work-life balance and the ability to predict
work-life balance scores, as well as the effectiveness of different
models in predicting BMI ranges. These findings will contribute to the
development of the Work-Life Balance Calculator and enable individuals
and organizations to improve work-life balance and overall
well-being.
The objective of this project is to develop a Work-Life Balance Calculator that can assess and predict work-life balance based on various variables. The dataset contains information related to different aspects of individuals’ lives, such as daily habits, stress levels, social connections, achievements. By analyzing this data, we aim to:
Predict the “WORK_LIFE_BALANCE_SCORE” variable using regression models: The goal is to understand the relationship between work-life balance and other variables in the dataset. We want to identify which factors significantly influence work-life balance and develop predictive models that can estimate work-life balance scores based on those factors.
Predict the “BMI_RANGE” variable using classification models:
Here, the focus is on predicting the categorical variable “BMI_RANGE”
based on the available features. The goal is to assess the accuracy of
different classification models in predicting BMI ranges and identify
the most effective model.
Source: https://www.kaggle.com/datasets/ydalat/lifestyle-and-wellbeing-data
Title: Lifestyle_and_Wellbeing_Data
Year : 2021
Purpose:
To evaluate and understand how individuals can reinvent their lifestyles
to optimize their overall well-being while supporting the UN Sustainable
Development Goals
Target Variable: WORK_LIFE_BALANCE_SCORE
Features:
- FRUITS_VEGGIES : Fruits or vegetables eaten daily
- DAILY_STRESS : Stress experienced daily
- PLACES_VISITED : New
places visited
- CORE_CIRCLE : Number of people who are very close
to you
- SUPPORTING_OTHERS : Number of people you help to achieve
better life
- SOCIAL_NETWORK : Number of people you interact during
a typical day
- ACHIEVEMENT : Remarkable achievements youre proud
of
- DONATION : Number of times you donate your time or money to
good causes
- BMI_RANGE : BMI range
- TODO_COMPLETED :
Completion of weekly to-do lists
- FLOW : Hours you experience
“flow”
- DAILY_STEPS : Number of steps taken in a day
-
LIVE_VISION : Number of years to your life vision
- SLEEP_HOURS :
Hours of sleep
- LOST_VACATION : Number of vacation days lost in a
year
- DAILY_SHOUTING : Tendency to shout or sulk at people daily
- SUFFICIENT_INCOME : Sufficient income to cover basic needs
-
PERSONAL AWARDS : Recognitions received in life
- TIME_FOR_PASSION
: Number of hours spent doing your hobby
- WEEKLY_MEDITATION :
Number of times you get to meditate in a week
- AGE
- GENDER
dataset <- read.csv("BALANCESCORE.csv")
dataset <- dataset[, -which(names(dataset) == "Timestamp")]
dataset
n_rows <- nrow(dataset)
n_cols <- ncol(dataset)
cat("Number of rows is", n_rows, "\n")
## Number of rows is 15972
cat("Number of columns is", n_cols, "\n")
## Number of columns is 23
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dataset_structure <- str(dataset)
## 'data.frame': 15972 obs. of 23 variables:
## $ FRUITS_VEGGIES : int 3 2 2 3 5 3 4 3 5 4 ...
## $ DAILY_STRESS : chr "2" "3" "3" "3" ...
## $ PLACES_VISITED : int 2 4 3 10 3 3 10 5 6 2 ...
## $ CORE_CIRCLE : int 5 3 4 3 3 9 6 3 4 6 ...
## $ SUPPORTING_OTHERS : int 0 8 4 10 10 10 10 5 3 10 ...
## $ SOCIAL_NETWORK : int 5 10 10 7 4 10 10 7 3 10 ...
## $ ACHIEVEMENT : int 2 5 3 2 2 2 3 4 5 0 ...
## $ DONATION : int 0 2 2 5 4 3 5 0 4 4 ...
## $ BMI_RANGE : int 1 2 2 2 2 1 2 1 1 2 ...
## $ TODO_COMPLETED : int 6 5 2 3 5 6 8 8 10 3 ...
## $ FLOW : int 4 2 2 5 0 1 8 2 2 2 ...
## $ DAILY_STEPS : int 5 5 4 5 5 7 7 8 1 3 ...
## $ LIVE_VISION : int 0 5 5 0 0 10 5 10 5 0 ...
## $ SLEEP_HOURS : int 7 8 8 5 7 8 7 6 10 6 ...
## $ LOST_VACATION : int 5 2 10 7 0 0 10 0 0 0 ...
## $ DAILY_SHOUTING : int 5 2 2 5 0 2 0 2 2 0 ...
## $ SUFFICIENT_INCOME : int 1 2 2 1 2 2 2 2 2 1 ...
## $ PERSONAL_AWARDS : int 4 3 4 5 8 10 10 8 10 3 ...
## $ TIME_FOR_PASSION : int 0 2 8 2 1 8 8 2 3 8 ...
## $ WEEKLY_MEDITATION : int 5 6 3 0 5 3 10 2 10 1 ...
## $ AGE : chr "36 to 50" "36 to 50" "36 to 50" "51 or more" ...
## $ GENDER : chr "Female" "Female" "Female" "Female" ...
## $ WORK_LIFE_BALANCE_SCORE: num 610 656 632 623 664 ...
summary(dataset)
## FRUITS_VEGGIES DAILY_STRESS PLACES_VISITED CORE_CIRCLE
## Min. :0.000 Length:15972 Min. : 0.000 Min. : 0.000
## 1st Qu.:2.000 Class :character 1st Qu.: 2.000 1st Qu.: 3.000
## Median :3.000 Mode :character Median : 5.000 Median : 5.000
## Mean :2.923 Mean : 5.233 Mean : 5.508
## 3rd Qu.:4.000 3rd Qu.: 8.000 3rd Qu.: 8.000
## Max. :5.000 Max. :10.000 Max. :10.000
## SUPPORTING_OTHERS SOCIAL_NETWORK ACHIEVEMENT DONATION
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000
## 1st Qu.: 3.000 1st Qu.: 4.000 1st Qu.: 2.000 1st Qu.:1.000
## Median : 5.000 Median : 6.000 Median : 3.000 Median :3.000
## Mean : 5.616 Mean : 6.474 Mean : 4.001 Mean :2.715
## 3rd Qu.:10.000 3rd Qu.:10.000 3rd Qu.: 6.000 3rd Qu.:5.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :5.000
## BMI_RANGE TODO_COMPLETED FLOW DAILY_STEPS
## Min. :1.000 Min. : 0.000 Min. : 0.000 Min. : 1.000
## 1st Qu.:1.000 1st Qu.: 4.000 1st Qu.: 1.000 1st Qu.: 3.000
## Median :1.000 Median : 6.000 Median : 3.000 Median : 5.000
## Mean :1.411 Mean : 5.746 Mean : 3.195 Mean : 5.704
## 3rd Qu.:2.000 3rd Qu.: 8.000 3rd Qu.: 5.000 3rd Qu.: 8.000
## Max. :2.000 Max. :10.000 Max. :10.000 Max. :10.000
## LIVE_VISION SLEEP_HOURS LOST_VACATION DAILY_SHOUTING
## Min. : 0.000 Min. : 1.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 1.000 1st Qu.: 6.000 1st Qu.: 0.000 1st Qu.: 1.000
## Median : 3.000 Median : 7.000 Median : 0.000 Median : 2.000
## Mean : 3.752 Mean : 7.043 Mean : 2.899 Mean : 2.931
## 3rd Qu.: 5.000 3rd Qu.: 8.000 3rd Qu.: 5.000 3rd Qu.: 4.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## SUFFICIENT_INCOME PERSONAL_AWARDS TIME_FOR_PASSION WEEKLY_MEDITATION
## Min. :1.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:1.000 1st Qu.: 3.000 1st Qu.: 1.000 1st Qu.: 4.000
## Median :2.000 Median : 5.000 Median : 3.000 Median : 7.000
## Mean :1.729 Mean : 5.712 Mean : 3.327 Mean : 6.233
## 3rd Qu.:2.000 3rd Qu.: 9.000 3rd Qu.: 5.000 3rd Qu.:10.000
## Max. :2.000 Max. :10.000 Max. :10.000 Max. :10.000
## AGE GENDER WORK_LIFE_BALANCE_SCORE
## Length:15972 Length:15972 Min. :480.0
## Class :character Class :character 1st Qu.:636.0
## Mode :character Mode :character Median :667.7
## Mean :666.8
## 3rd Qu.:698.5
## Max. :820.2
## Warning: NAs introduced by coercion
##
## 0 1 2 3 4 5
## 676 2478 3407 4398 2960 2052
dataset <- na.omit(dataset)
missing_counts <- colSums(is.na(dataset))
print(missing_counts)
## FRUITS_VEGGIES DAILY_STRESS PLACES_VISITED
## 0 0 0
## CORE_CIRCLE SUPPORTING_OTHERS SOCIAL_NETWORK
## 0 0 0
## ACHIEVEMENT DONATION BMI_RANGE
## 0 0 0
## TODO_COMPLETED FLOW DAILY_STEPS
## 0 0 0
## LIVE_VISION SLEEP_HOURS LOST_VACATION
## 0 0 0
## DAILY_SHOUTING SUFFICIENT_INCOME PERSONAL_AWARDS
## 0 0 0
## TIME_FOR_PASSION WEEKLY_MEDITATION AGE
## 0 0 0
## GENDER WORK_LIFE_BALANCE_SCORE
## 0 0
Missing values can distort statistical analysis and lead to inaccurate or biased results. The dataset has no missing/null value, it’s a clean dataset.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
result <- dataset %>%
group_by(AGE, GENDER) %>%
summarise(mean_BMI_RANGE = mean(BMI_RANGE), .groups = "drop")
library(tidyr)
result_table <- result %>%
pivot_wider(names_from = GENDER, values_from = mean_BMI_RANGE)
print(result_table)
## # A tibble: 4 × 3
## AGE Female Male
## <chr> <dbl> <dbl>
## 1 21 to 35 1.36 1.33
## 2 36 to 50 1.47 1.52
## 3 51 or more 1.53 1.52
## 4 Less than 20 1.23 1.22
library(ggplot2)
plot1 <- ggplot(dataset, aes(x = AGE)) +
geom_density(fill = "lightblue") +
labs(title = "Distribution of Age (Density Plot)")
plot1
plot2 <- ggplot(dataset, aes(x = GENDER, fill = GENDER)) +
geom_bar() +
labs(title = "Distribution of Gender")
plot2
plot3 <- ggplot(dataset, aes(x = GENDER, y = DAILY_STRESS, fill = GENDER)) +
geom_violin(scale ="width") +
scale_fill_manual(values = c("pink", "blue")) +
labs(x = "Gender", title = "Distribution of Daily Stress by Gender") +
theme_minimal()
plot3
ggplot(dataset, aes(x = WORK_LIFE_BALANCE_SCORE, y = WEEKLY_MEDITATION)) +
geom_point(color = "green") +
labs(title = "Work-Life Balance Score vs. Weekly Meditation")
ggplot(dataset, aes(x = AGE)) +
geom_bar(stat = "count", fill = "steelblue", color = "black") +
labs(x = "Age", y = "Frequency") +
ggtitle("Distribution of Age")
ggplot(dataset, aes(x = AGE, y = WORK_LIFE_BALANCE_SCORE)) +
geom_boxplot(fill = "orange", color = "black") +
labs(x = "", y = "Work-Life Balance Score") +
ggtitle("Distribution of Work-Life Balance Score by Age")
plot5 <- ggplot(dataset, aes(x = DAILY_STEPS)) +
geom_histogram(fill = "lightblue", bins = 20) +
labs(title = "Histogram of Daily Steps")
plot5
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
subset_data <- subset(dataset, BMI_RANGE < 25)
plot6 <- ggplot(subset_data, aes(x = AGE, y = BMI_RANGE)) +
geom_bar(stat = "summary", fun = "mean", fill = "salmon") +
labs(x = "AGE", y = "BMI") +
ggtitle("BODY_MASS_INDEX BY AGE")
plot6
plot8 <- ggplot(subset_data, aes(x = AGE, y = BMI_RANGE, fill = GENDER)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "BODY_MASS_INDEX BY GENDER & AGE") +
scale_fill_manual(values = c("darksalmon", "cornflowerblue"))
plot9 <- plot8 + ggtitle("BODY_MASS_INDEX BY GENDER & AGE")
plot9
plot4 <- ggplot(subset_data, aes(x = SLEEP_HOURS, y = BMI_RANGE)) +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(x = "Sleep Hours", y = "BMI") +
ggtitle("BODY_MASS_INDEX vs SLEEP HOURS")
plot4
## `geom_smooth()` using formula = 'y ~ x'
plot5 <- ggplot(subset_data, aes(x = FRUITS_VEGGIES, y = BMI_RANGE)) +
geom_bar(stat = "summary", fun = "mean", fill = "yellow") +
labs(x = "Servings of Fruits/Veggies", y = "BMI") +
ggtitle("BODY_MASS_INDEX vs. SERVINGS OF FRUITS/VEGGIES")
plot5
plot6 <- ggplot(subset_data, aes(x = DAILY_STEPS, y = BMI_RANGE)) +
geom_smooth(method = "lm", se = FALSE, color = "grey") +
labs(x = "Daily Steps", y = "BMI") +
ggtitle("BODY_MASS_INDEX BY DAILY STEPS TAKEN")
plot6
## `geom_smooth()` using formula = 'y ~ x'
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
df3 <- dcast(dataset, AGE ~ GENDER, value.var = "DAILY_STRESS")
## Aggregation function missing: defaulting to length
head(df3)
plot1 <- ggplot(dataset, aes(x = AGE, y = DAILY_STRESS, fill = GENDER)) +
geom_bar(stat = "summary", fun = "mean", position = "dodge", color = "black") +
labs(x = "Age Group", y = "Average Daily Stress") +
ggtitle("AVERAGE DAILY_STRESS BY AGE GROUP")
plot1
plot2 <- ggplot(dataset, aes(x = GENDER, y = DAILY_STRESS, fill = GENDER)) +
geom_violin(trim = FALSE, scale = "count") +
labs(x = "Gender", y = "Daily Stress") +
ggtitle("DAILY_STRESS BY GENDER")
plot2
plot1 <- ggplot(dataset, aes(x = GENDER, y = CORE_CIRCLE, fill = GENDER)) +
geom_violin() +
labs(x = "Gender", y = "Core Circle") +
ggtitle("CORE CIRCLE BY GENDER")
plot1
plot2 <- ggplot(dataset, aes(x = AGE, y = LOST_VACATION)) +
geom_boxplot() +
labs(x = "Age Group", y = "Lost Vacation") +
scale_x_discrete(limits = c("Less than 20", "21 to 35", "36 to 50", "51 or more")) +
ggtitle("LOST VACATION BY AGE GROUP")
plot2
plot3 <- ggplot(dataset, aes(x = PLACES_VISITED, y = DAILY_STRESS)) +
geom_bar(stat = "summary", fun = "mean", fill = "steelblue") +
labs(x = "Places Visited", y = "Daily Stress") +
ggtitle("PLACES VISITED vs DAILY STRESS")
plot3
plot4 <- ggplot(dataset, aes(x = LOST_VACATION, y = DAILY_STRESS)) +
geom_boxplot() +
labs(x = "Lost Vacation", y = "Daily Stress") +
ggtitle("LOST VACATION vs DAILY STRESS")
plot4
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?
columns <- setdiff(names(dataset), c("GENDER", "AGE", "DAILY_STRESS"))
cor_matrix <- cor(dataset[, columns])
cor_df <- reshape2::melt(cor_matrix)
ggplot(cor_df, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient(low = "blue", high = "red") +
labs(x = "Features", y = "Features", title = "Correlation Matrix")
In this part, we will do two different problems relating to our dataset.
The first problem will be a regression problem to predict the “WORK_LIFE_BALANCE_SCORE” variable based on other variables in the dataset. The code is implementing three different regression models: Linear Regression, Support Vector Regression (SVR), and Random Forest.
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
set.seed(123)
train_indices <- createDataPartition(dataset$WORK_LIFE_BALANCE_SCORE, p = 0.8, list = FALSE)
train_data <- dataset[train_indices, ]
test_data <- dataset[-train_indices, ]
train_data
test_data
lm_model <- lm(WORK_LIFE_BALANCE_SCORE ~ ., data = train_data)
svr_model <- svm(WORK_LIFE_BALANCE_SCORE ~ ., data = train_data)
rf_model <- randomForest(WORK_LIFE_BALANCE_SCORE ~ ., data = train_data)
lm_predictions <- predict(lm_model, test_data)
svr_predictions <- predict(svr_model, test_data)
rf_predictions <- predict(rf_model, test_data)
lm_rmse <- sqrt(mean((test_data$WORK_LIFE_BALANCE_SCORE - lm_predictions)^2))
svr_rmse <- sqrt(mean((test_data$WORK_LIFE_BALANCE_SCORE - svr_predictions)^2))
rf_rmse <- sqrt(mean((test_data$WORK_LIFE_BALANCE_SCORE - rf_predictions)^2))
cat("Linear Regression RMSE:", lm_rmse, "\n")
## Linear Regression RMSE: 9.21053e-13
cat("SVR RMSE:", svr_rmse, "\n")
## SVR RMSE: 2.971714
cat("Random Forest RMSE:", rf_rmse, "\n")
## Random Forest RMSE: 10.73447
Comparing these root mean square error (RMSE) values, we can see that Linear Regression has the lowest RMSE value of 9.21053e-13, indicating very low error on the given dataset. SVR (Support Vector Regression) has a slightly higher RMSE value of 2.971714, indicating a moderate level of error. Random Forest has the highest RMSE value of 10.73447, suggesting the highest level of error among the three models.
plot_data <- data.frame(
Actual = test_data$WORK_LIFE_BALANCE_SCORE,
Linear_Regression = lm_predictions,
SVR = svr_predictions,
Random_Forest = rf_predictions)
plot_data <- reshape2::melt(plot_data, id.vars = "Actual", variable.name = "Model")
ggplot(plot_data, aes(x = Actual, y = value, color = Model)) +
geom_point() +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "black") +
labs(x = "Actual WORK_LIFE_BALANCE_SCORE", y = "Predicted WORK_LIFE_BALANCE_SCORE") +
ggtitle("Comparison of Predicted vs Actual WORK_LIFE_BALANCE_SCORE") +
theme_minimal()
The second problem is a classification problem. The goal is to predict a categorical variable (BMI_RANGE) and to evaluate the accuracy, which measures the proportion of correctly predicted class labels.
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loaded glmnet 4.1-7
dataset$BMI_RANGE <- as.factor(dataset$BMI_RANGE)
dataset$GENDER <- as.factor(dataset$GENDER)
dataset$AGE <- as.factor(dataset$AGE)
model_rf <- randomForest(BMI_RANGE ~ ., data = train)
predictions_rf <- predict(model_rf, newdata = test)
accuracy_rf <- sum(predictions_rf == test$BMI_RANGE) / nrow(test)
model_lr <- glm(BMI_RANGE ~ ., data = train, family = binomial,control = list(maxit = 1000))
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
predictions_lr <- predict(model_lr, newdata = test, type = "response")
predictions_lr <- ifelse(predictions_lr > 0.5, "2", "1")
accuracy_lr <- sum(predictions_lr == test$BMI_RANGE) / nrow(test)
model_dt <- rpart(BMI_RANGE ~ ., data = train, method = "class")
predictions_dt <- predict(model_dt, newdata = test, type = "class")
accuracy_dt <- sum(predictions_dt == test$BMI_RANGE) / nrow(test)
cat("Random Forest Accuracy:", accuracy_rf, "\n")
## Random Forest Accuracy: 0.768733
cat("Logistic Regression Accuracy:", accuracy_lr, "\n")
## Logistic Regression Accuracy: 1
cat("Decision Tree Accuracy:", accuracy_dt, "\n")
## Decision Tree Accuracy: 0.6551868
Comparing these accuracy values, Logistic Regression has the highest accuracy with a value of 1, indicating perfect accuracy on the given dataset. Random Forest has a lower accuracy of 0.768733, and Decision Tree has the lowest accuracy of 0.6551868.
# Create data frames for plotting
plot_data_rf <- data.frame(
Actual = test$BMI_RANGE,
Predicted = predictions_rf,
Model = "Random Forest"
)
plot_data_lr <- data.frame(
Actual = test$BMI_RANGE,
Predicted = predictions_lr,
Model = "Logistic Regression"
)
plot_data_dt <- data.frame(
Actual = test$BMI_RANGE,
Predicted = predictions_dt,
Model = "Decision Tree"
)
# Combine the data frames
plot_data <- rbind(plot_data_rf, plot_data_lr, plot_data_dt)
# Plot the comparisons
ggplot(plot_data, aes(x = Actual, y = Predicted, color = Model)) +
geom_jitter(width = 0.1, height = 0.1) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "black") +
labs(x = "Actual BMI_RANGE", y = "Predicted BMI_RANGE") +
ggtitle("Comparison of Predicted vs Actual BMI_RANGE") +
theme_minimal()
Question 1: How accurately can we predict work-life balance scores
using regression models? Which regression model performs the best in
terms of predicting work-life balance?
To answer the first question, we compare Root Mean Square Error (RMSE) values for each regression models we performed and we can see that Linear Regression has the lowest RMSE value of 9.21053e-13, indicating very low error on the given dataset. SVR (Support Vector Regression) has a slightly higher RMSE value of 2.971714, indicating a moderate level of error. Random Forest has the highest RMSE value of 10.73447, suggesting the highest level of error among the three models. The best model to be used to predict work-life balance is definitely Linear Regression.
Question 2: How accurately can we classify individuals into different
BMI ranges using classification models? Which classification model has
the highest accuracy in predicting BMI ranges?
As per the result, Random Forest’s Accuracy is 0.768733, Logistic Regression has a score of 1 while Decision Tree is having 0.6551868 accuracy rate. Comparing these accuracy values, Logistic Regression has the highest accuracy with a value of 1 which may indicates over fitting of the data. Random Forest has a lower accuracy of 0.768733, and Decision Tree has the lowest accuracy of 0.6551868.
In conclusion, the development of a Work Life Balance Calculator through this project addresses the pressing need for individuals and organizations to prioritize work-life balance in today’s fast-paced world. By leveraging data mining techniques and machine learning algorithms, we have made significant strides in understanding the key factors that contribute to work-life balance and identifying areas for improvement.
The Work Life Balance Calculator serves as a valuable tool for individuals to assess their work-life balance, understand their strengths and areas for improvement, and make informed decisions to enhance their overall well-being. For organizations, the calculator offers insights into employees’ work-life balance, enabling them to develop tailored plans to optimize productivity and support their workforce.
Ultimately, this project contributes to the promotion of work-life balance and the improvement of overall performance and well-being. By prioritizing work-life balance, individuals can achieve greater satisfaction and fulfillment, leading to a more productive and harmonious society as a whole.