Nowadays, weight management has become a significant concern for many individuals,and there has been and increase in childhood obesity[1] leading to the emergence of various theories and approaches. Some focus on the Body Mass Index (BMI) as a key metric, while others emphasize the importance of body fat percentage. Additionally, there are those who advocate for muscle gain or achieving a “size zero” figure. The primary concern arises when excessive weight gain leads to obesity, which can have severe health implications and even fatality [2].
Different schools of thought propose diverse strategies for weight management. Some argue that calorie counting is the most effective method, while others recommend specific dietary restrictions, such as gluten-free, organic, or sugar-free diets [3]. Furthermore, certain approaches highlight the importance of macro nutrient composition, particularly increasing protein intake, as a means to achieve weight loss or muscle growth [4].
This paper aims to explore the multifaceted factors influencing body weight, including age, calorie intake, physical activity, stress levels, sleep patterns, and others. Using a data set sourced from Kaggle[5], the study will first examine the correlation between body weight and age.From the data set for the first plot only the first two columns of the data were used and plots were done using R code [6]. Subsequently, the data will be clustered to identify patterns and relationships among the variables [7]. The findings will contribute to a deeper understanding of the complex interplay between these factors and their impact on weight management.
(echo = FALSE)
## [1] FALSE
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(cluster)
# Load the dataset
data <- read.csv("C:/Users/Ceee/Desktop/data science/2400-DS1UL/papers/weight.csv") # Replace with the actual file name if different
# Select the relevant columns for clustering
features <- data %>%
select(Age, Current.Weight..lbs.)
# Perform k-means clustering (e.g., 3 clusters)
set.seed(123) # Ensures reproducibility
kmeans_result <- kmeans(features, centers = 3)
# Add cluster assignments to the dataset
data$Cluster <- as.factor(kmeans_result$cluster)
# Create a scatter plot for Age vs Current Weight with Clusters
ggplot(data, aes(x = Age, y = Current.Weight..lbs., color = Cluster)) +
geom_point(size = 3) +
labs(title = "Clustering of Participants by Age and Current Weight",
x = "Age",
y = "Current Weight (lbs)") +
theme_minimal()
From the given plot the the we can clearly see that the weight is spread across all ages. young people have weight that other older people also have meaning age is not the only contributing factor for one’s weight.
The next plot will include more variables and we would like to see after including a couple of variable, how our attributes relate using two dimension reduction techniques, self organizing map[8] and bubble plots respectively. From the data set variables that were in factor form using was changed to ranked data in levels this was done to the weight 1 being the smallest and 7 the largest, weight change and other variables.
(echo = FALSE)
## [1] FALSE
data <- read.csv("C:/Users/Ceee/Desktop/data science/2400-DS1UL/papers/reduction.csv")
# Install and load the kohonen package
if (!requireNamespace("kohonen", quietly = TRUE)) {
install.packages("kohonen")
}
library(kohonen)
# Preprocess the data
# Exclude non-numeric or non-relevant columns (e.g., IDs or text labels) if necessary
numeric_data <- data[sapply(data, is.numeric)]
# Normalize the data
data_scaled <- scale(numeric_data)
# Define the grid for the SOM
som_grid <- somgrid(xdim = 5, ydim = 5, topo = "hexagonal") # Adjust dimensions as needed
# Train the SOM
som_model <- som(X = as.matrix(data_scaled), grid = som_grid, rlen = 100)
# Plot the SOM
plot(som_model, type = "codes") # Codebook vectors
plot(som_model, type = "changes") # Training progress
plot(som_model, type = "counts") # Number of data points in each node
plot(som_model, type = "dist.neighbours") # Distance between neighboring nodes
The codes plot analyses the relationship between various factors and weight change. The key variables are age, weight ranking, calorie surplus, exercise, sleep and stress, measured over a period of weeks. The colors represent the variables and the size of the sectors shows the ranking of the variable.
For example, the top right circle would represent people who are well advanced in age, have a high weight ranking, consume high excess calories, have gained weight, have participated for many weeks, have low stress levels and have low sleep quality.
The training data graph shows how the results improved as more entries of data were added from the first to the hundredth, the plot counts represent the population that is associated with it, for example, the count for the top right circle would be 4. The neighbor distance plot shows how closely the circles are related, for example, the bottom left circle is the furthest related to the neighbor, while those with the brightest colors are closer.
(echo = FALSE)
## [1] FALSE
library(ggplot2)
library(Rtsne)
# Load the data
data <- read.csv("C:/Users/Ceee/Desktop/data science/2400-DS1UL/papers/reduction.csv")
# Select relevant features for dimension reduction
features <- data[, c("Age", "weight.ranking", "calories.suplus.level",
"weight.change", "Duration..weeks.", "exercise.level",
"sleep.level", "Stress.Level")]
# Perform t-SNE (or you could use prcomp for PCA)
set.seed(123) # For reproducibility
tsne_result <- Rtsne(as.matrix(features), dims = 2, perplexity = 30)
# Add t-SNE results to the original data
data$Dim1 <- tsne_result$Y[,1]
data$Dim2 <- tsne_result$Y[,2]
# Create a bubble plot
ggplot(data, aes(x = Dim1, y = Dim2, size = weight.ranking, color = Age)) +
geom_point(alpha = 0.7) +
scale_size(range = c(3, 15)) + # Adjust bubble sizes
theme_minimal() +
labs(title = "Bubble Plot with t-SNE Dimension Reduction",
x = "Dimension 1",
y = "Dimension 2",
size = "weight.ranking",
color = "Age")
In this plot, the shade of blue represents the age of the participant,
starting with a darker shade for the younger and light for those over
50. The size of the circles represents the weight ranking; for example,
the smaller circle represents the lowest level and the bigger ones are
the largest circles. The other variables of the dataset are spread
throughout the plot, showing that very few of the participants have
attributes that are completely the same as the data when spread with
some overlap.
From the results, we observe that there are no attributes that perfectly predict or align with a specific variable. For instance, individuals within the same age group do not necessarily share the same weight, and participants with similar stress levels or who participated for the same duration do not consistently exhibit the same weight changes. This indicates that the relationships between variables are complex and influenced by multiple factors. However, certain patterns and associations do emerge in parts of the data, suggesting that while no single attribute is a definitive predictor, combinations of factors may play a role in influencing outcomes.
These associations will be explored in greater detail in the next section using association rules, which will help identify meaningful relationships and potential trends within the data set. Additionally, the analysis will highlight how variables such as physical activity level, sleep quality, and duration of participation interact to influence weight changes, providing deeper insights into the underlying dynamics of the data.
References:
1.Prevalence of Childhood and Adult Obesity in the United States, 2011-2012 Cynthia L. Ogden, PhD; Margaret D. Carroll, MSPH; Brian K. Kit,MD, MPH; Katherine M. Flegal,PHD
https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight
Mozaffarian, D., Hao, T., Rimm, E. B., Willett, W. C., & Hu, F. B. (2011). Changes in Diet and Lifestyle and Long-Term Weight Gain in Women and Men. The New England Journal of Medicine.
Leidy, H. J., Clifton, P. M., Astrup, A., Wycherley, T. P., Westerterp-Plantenga, M. S., Luscombe-Marsh, N. D., … & Mattes, R. D. (2015). The Role of Protein in Weight Loss and Maintenance. The American Journal of Clinical Nutrition.
5.httpsfragilestatesindex.orgexcel
6.Andrie de Vries Joris Meys(2012) R.for Dummies. John Wiley & Sons, Ltd
7.M.Emre Celebi.Kemal Aydin (2016) Unsupervised Learning Algorithms. Springer International
8.https://cran.r-project.org/web/packages/kohonen/kohonen.pdf