In recent years, the global consumption patterns of instant noodles have become a subject of interest, reflecting changes in lifestyles, dietary preferences, and economic factors. Instant noodles, a convenient and quick food option, are not only popular but also hold significance as an economic indicator. This study explores the consumption trends of instant noodles across different countries and regions, shedding light on the broader implications for economics and societal well-being. Additionally, it touches upon the relevance of instant food consumption in the broader context, emphasizing its economic and cultural dimensions.
Understanding the patterns and factors influencing instant noodle consumption is crucial for undersanding today’s market and social behaviors. Instant noodles are often considered a cost-effective and easily accessible food option, making them a staple for individuals with varying socio-economic backgrounds. The widespread popularity of instant noodles has experienced significant growth in recent decades, driven by their convenience and affordability (Park,Lee, et al., 2011). Analyzing consumption trends can provide insights into economic disparities and accessibility to food resources. Shifts in instant food consumption can serve as an economic indicator, reflecting changes in consumer behavior and overall economic health. Understanding the dietary patterns associated with adverse health effects, such as frequent instant noodle consumption in this case, can have implications for the food industry and policymakers. It may prompt discussions on food regulations, labeling requirements, and public health campaigns to promote healthier eating habits. A positive correlation was identified between the regular consumption of instant noodles and elevated levels of plasma triglycerides, diastolic blood pressure, and fasting blood glucose among college students in Korea. Individuals with a higher frequency of instant noodle consumption demonstrated a heightened likelihood of having multiple cardiometabolic risk factors. The odds ratio for hypertriglyceridemia was notably higher in those who consumed instant noodles three or more times per week compared to those with lower consumption frequencies (Huh, Kim, et al., 2017).
The dataset encompasses information about instant noodle consumption across different countries and regions. It includes data about consumption values in millions of US dollars for years 2018-2022, information about country and region,the ranking of the country based on population in 2022, country code, country and territory information, capital, and 2022 population.The dataset’s richness allows for a comprehensive analysis of the factors influencing instant noodle consumption, providing a nuanced understanding of the economic and cultural dynamics at play. The dataset can can be found under the link: https://www.kaggle.com/datasets/fortuneuwha/world-instant-noodles-consumption-2022
## Warning: pakiet 'dplyr' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'dplyr'
## Następujące obiekty zostały zakryte z 'package:stats':
##
## filter, lag
## Następujące obiekty zostały zakryte z 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: pakiet 'readr' został zbudowany w wersji R 4.3.2
## Warning: pakiet 'arules' został zbudowany w wersji R 4.3.2
## Ładowanie wymaganego pakietu: Matrix
## Warning: pakiet 'Matrix' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'arules'
## Następujący obiekt został zakryty z 'package:dplyr':
##
## recode
## Następujące obiekty zostały zakryte z 'package:base':
##
## abbreviate, write
## Warning: pakiet 'caret' został zbudowany w wersji R 4.3.2
## Ładowanie wymaganego pakietu: ggplot2
## Warning: pakiet 'ggplot2' został zbudowany w wersji R 4.3.2
## Ładowanie wymaganego pakietu: lattice
## Warning: pakiet 'factoextra' został zbudowany w wersji R 4.3.2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
## Warning: pakiet 'labdsv' został zbudowany w wersji R 4.3.2
## Ładowanie wymaganego pakietu: mgcv
## Ładowanie wymaganego pakietu: nlme
##
## Dołączanie pakietu: 'nlme'
## Następujący obiekt został zakryty z 'package:dplyr':
##
## collapse
## This is mgcv 1.8-42. For overview type 'help("mgcv-package")'.
## This is labdsv 2.1-0
## convert existing ordinations with as.dsvord()
##
## Dołączanie pakietu: 'labdsv'
## Następujący obiekt został zakryty z 'package:arules':
##
## predict
## Następujące obiekty zostały zakryte z 'package:stats':
##
## density, loadings
## Warning: pakiet 'psych' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'psych'
## Następujący obiekt został zakryty z 'package:labdsv':
##
## pca
## Następujące obiekty zostały zakryte z 'package:ggplot2':
##
## %+%, alpha
## Warning: pakiet 'cluster' został zbudowany w wersji R 4.3.2
## Warning: pakiet 'flexclust' został zbudowany w wersji R 4.3.2
## Ładowanie wymaganego pakietu: grid
## Ładowanie wymaganego pakietu: modeltools
## Ładowanie wymaganego pakietu: stats4
##
## Dołączanie pakietu: 'modeltools'
## Następujący obiekt został zakryty z 'package:arules':
##
## info
## Warning: pakiet 'fpc' został zbudowany w wersji R 4.3.2
## Package `clustertend` is deprecated. Use package `hopkins` instead.
## Warning: pakiet 'ggthemes' został zbudowany w wersji R 4.3.2
## Warning: pakiet 'plotly' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'plotly'
## Następujący obiekt został zakryty z 'package:ggplot2':
##
## last_plot
## Następujący obiekt został zakryty z 'package:stats':
##
## filter
## Następujący obiekt został zakryty z 'package:graphics':
##
## layout
## Warning: pakiet 'stringr' został zbudowany w wersji R 4.3.2
## Warning: pakiet 'missMDA' został zbudowany w wersji R 4.3.2
## Warning: pakiet 'ade4' został zbudowany w wersji R 4.3.2
## Registered S3 method overwritten by 'ade4':
## method from
## summary.dist labdsv
## Warning: pakiet 'smacof' został zbudowany w wersji R 4.3.2
## Ładowanie wymaganego pakietu: plotrix
## Warning: pakiet 'plotrix' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'plotrix'
## Następujący obiekt został zakryty z 'package:flexclust':
##
## placeLabels
## Następujący obiekt został zakryty z 'package:psych':
##
## rescale
## Ładowanie wymaganego pakietu: colorspace
## Warning: pakiet 'colorspace' został zbudowany w wersji R 4.3.2
## Ładowanie wymaganego pakietu: e1071
## Warning: pakiet 'e1071' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'e1071'
## Następujący obiekt został zakryty z 'package:flexclust':
##
## bclust
##
## Dołączanie pakietu: 'smacof'
## Następujący obiekt został zakryty z 'package:psych':
##
## Procrustes
## Następujący obiekt został zakryty z 'package:base':
##
## transform
## Warning: pakiet 'Rtsne' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'psy'
## Następujący obiekt został zakryty z 'package:psych':
##
## wkappa
## Warning: pakiet 'scales' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'scales'
## Następujący obiekt został zakryty z 'package:plotrix':
##
## rescale
## Następujące obiekty zostały zakryte z 'package:psych':
##
## alpha, rescale
## Następujący obiekt został zakryty z 'package:readr':
##
## col_factor
## Warning: pakiet 'kableExtra' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'kableExtra'
## Następujący obiekt został zakryty z 'package:dplyr':
##
## group_rows
## Warning: pakiet 'pdp' został zbudowany w wersji R 4.3.2
## Warning: pakiet 'corrplot' został zbudowany w wersji R 4.3.2
## corrplot 0.92 loaded
## Country.Region X2018 X2019 X2020 X2021 X2022 Rank CCA3 Country.Territory
## 1 China 40250 41450 46360 43990 45070 1 CHN China
## 2 Indonesia 12540 12520 12640 13270 14260 4 IDN Indonesia
## 3 India 6060 6730 6730 7560 7580 2 IND India
## 4 Japan 5780 5630 5970 5850 5980 11 JPN Japan
## 5 Philippines 3980 3850 4470 4440 4290 13 PHL Philippines
## 6 South Korea 3820 3900 4130 3790 3950 29 KOR South Korea
## Capital Continent X2022.Population
## 1 Beijing Asia 1425887337
## 2 Jakarta Asia 275501339
## 3 New Delhi Asia 1417173173
## 4 Tokyo Asia 123951692
## 5 Manila Asia 115559009
## 6 Seoul Asia 51815810
First, checking the missing data.
missing_val <- sapply(instant_noodles_df, function(x) sum(is.na(x))/nrow(instant_noodles_df))
print(missing_val)
## Country.Region X2018 X2019 X2020
## 0.00000000 0.01886792 0.01886792 0.01886792
## X2021 X2022 Rank CCA3
## 0.00000000 0.01886792 0.00000000 0.00000000
## Country.Territory Capital Continent X2022.Population
## 0.00000000 0.00000000 0.00000000 0.00000000
na_indices <- which(is.na(instant_noodles_df), arr.ind = TRUE)
print(na_indices)
## row col
## [1,] 43 2
## [2,] 43 3
## [3,] 43 4
## [4,] 53 6
#Veryfing the countries for NA values
print(instant_noodles_df$Country.Region[43])
## [1] "Serbia"
print(instant_noodles_df$Country.Region[53])
## [1] "Ukraine"
As there is missing data in Serbia and Ukraine, it can be replaced with a mean of rows. Applying mean from whole column is not realistic, as some countries, especially the continent of Asia, differ significantally from the numbers of other areas.
head(instant_noodles_df)
## Country.Region X2018 X2019 X2020 X2021 X2022 Rank CCA3 Country.Territory
## 1 China 40250 41450 46360 43990 45070 1 CHN China
## 2 Indonesia 12540 12520 12640 13270 14260 4 IDN Indonesia
## 3 India 6060 6730 6730 7560 7580 2 IND India
## 4 Japan 5780 5630 5970 5850 5980 11 JPN Japan
## 5 Philippines 3980 3850 4470 4440 4290 13 PHL Philippines
## 6 South Korea 3820 3900 4130 3790 3950 29 KOR South Korea
## Capital Continent X2022.Population
## 1 Beijing Asia 1425887337
## 2 Jakarta Asia 275501339
## 3 New Delhi Asia 1417173173
## 4 Tokyo Asia 123951692
## 5 Manila Asia 115559009
## 6 Seoul Asia 51815810
tail(instant_noodles_df)
## Country.Region X2018 X2019 X2020 X2021 X2022 Rank CCA3 Country.Territory
## 48 Denmark 20 20 10 10 20 115 DNK Denmark
## 49 Finland 10 20 20 20 20 118 FIN Finland
## 50 Switzerland 10 10 10 10 20 101 CHE Switzerland
## 51 Argentina 10 10 0 20 10 33 ARG Argentina
## 52 Costa Rica 10 10 10 20 10 124 CRI Costa Rica
## 53 Ukraine 320 340 320 350 NA 38 UKR Ukraine
## Capital Continent X2022.Population
## 48 Copenhagen Europe 5882261
## 49 Helsinki Europe 5540745
## 50 Bern Europe 8740472
## 51 Buenos Aires South America 45510318
## 52 San José North America 5180829
## 53 Kiev Europe 39701739
As the first step, let’s verify the information for Serbia.
target_row_S <- instant_noodles_df$Country.Region == "Serbia"
target_columns_S <- c("X2018","X2019","X2020", "X2021","X2022")
print(instant_noodles_df[target_row_S, target_columns_S])
## X2018 X2019 X2020 X2021 X2022
## 43 NA NA NA 50 50
As all of the present values are equal 50, the other years with no information will be replaced with this value.
instant_noodles_df <- instant_noodles_df %>%
mutate(
X2018 = ifelse(row_number() == 43, coalesce(X2018, 50), X2018),
X2019 = ifelse(row_number() == 43, coalesce(X2019, 50), X2019),
X2020 = ifelse(row_number() == 43, coalesce(X2020, 50), X2020)
)
Now, let’s verify information for Ukraine.
target_row_U <- instant_noodles_df$Country.Region == "Ukraine"
target_columns_U <- c("X2018","X2019","X2020", "X2021","X2022")
print(instant_noodles_df[target_row_U, target_columns_U])
## X2018 X2019 X2020 X2021 X2022
## 53 320 340 320 350 NA
As there is one year missing, the mean from the other observations will be calculated, to replace the missing value.
mean_U<-(mean(320,340,320,350))
instant_noodles_df$X2022[53] <- mean_U
After having all needed information, the next step will be choosing numerical variables which are relevant for this study.
instant_noodles<-select (instant_noodles_df, -c(Rank, CCA3, Country.Territory, Capital))
instant_noodles_numeric <- instant_noodles[, c(2:6)]
str(instant_noodles_numeric)
## 'data.frame': 53 obs. of 5 variables:
## $ X2018: num 40250 12540 6060 5780 3980 ...
## $ X2019: num 41450 12520 6730 5630 3850 ...
## $ X2020: num 46360 12640 6730 5970 4470 ...
## $ X2021: int 43990 13270 7560 5850 4440 3790 3630 2850 2620 2100 ...
## $ X2022: num 45070 14260 7580 5980 4290 ...
instant_noodles_numeric$X2021 <- as.numeric(instant_noodles_numeric$X2021)
All of the columns were set to numeric for the studies
operations.
Principal Component Analysis is a dimensionality reduction technique that transforms the original variables into a new set of uncorrelated variables, known as principal components. This method allows for simplifying the dataset while retaining the most important information, making it suitable for identifying patterns and trends in instant noodle consumption across different dimensions.
In the context of instant noodle consumption, have been applied to understand the topic of consumption in different years.
pca <- prcomp(instant_noodles_numeric)
pca$rotation
## PC1 PC2 PC3 PC4 PC5
## X2018 -0.4153831 0.12463223 -0.7419234 0.3072575 0.4087373
## X2019 -0.4268055 0.05931286 -0.2983563 -0.7232140 -0.4497379
## X2020 -0.4728055 -0.83377411 0.1539094 0.2143048 -0.1079864
## X2021 -0.4529543 0.18472313 0.5025249 -0.3207473 0.6366315
## X2022 -0.4653830 0.50164419 0.2903684 0.4834760 -0.4622867
summary(pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 1.396e+04 225.35105 149.30117 63.19075 46.54961
## Proportion of Variance 9.996e-01 0.00026 0.00011 0.00002 0.00001
## Cumulative Proportion 9.996e-01 0.99985 0.99997 0.99999 1.00000
Standard Deviation: Spread of the data along each principal component. A higher standard deviation means that the data points are more dispersed along that component.
Proportion of Variance: The proportion of the total variance in the data explained by each principal component. High variance indicates that it captures most of the information in the data and the subsequent components contribute less to the overall variance.
Cumulative Proportion: This is the cumulative sum of the proportions of variance.
fviz_pca_var(pca, col.var = "purple")
PC1 dominates the variance and likely represents an overall trend, while the subsequent principal components capture smaller, more specific patterns or variations associated with individual years. The cumulative proportion indicates the cumulative amount of variance explained by each principal component up to that point. As the starting year of the study, PC1 can represent a general trend or pattern in the consumption values across the years.
install.packages("pdp")
## Warning: pakiet 'pdp' jest w użyciu i nie zostanie zainstalowany
library(pdp)
pca_var <- get_pca_var(pca)
fviz_contrib(pca, "var", axes = 1:5, fill = "lightblue", color = "tomato")
Based on the results obtained from the PCA analysis and the
contributions of each year (2018 to 2022) to the principal components,
it is visible that the year 2020 stands out as the most influential in
the first five principal components. It contributes over 20%, indicating
that the patterns and variations captured by the first five components
are strongly influenced by the consumption values in 2020.Following
closely, the year 2022 makes a significant contribution, slightly less
than 2020 but still exceeding the 20% threshold.The contributions from
the year 2021 are just above 20%, indicating a moderate influence on the
principal components. While not as dominant as 2020 or 2022, 2021 still
contributes substantially to the observed patterns. Year 2019 and 2018
have relatively lower impact.The years 2019 and 2018 have contributions
below 20%, with 2019 having a slightly higher impact than 2018.These
earlier years contribute less to the principal components. The
diminishing contributions for earlier years suggest that consumption
patterns may have evolved or changed over time.
CLARA is clustering algorithm designed to handle large datasets efficiently. It rovides a way to obtain a representative sample from the dataset, perform k-medoids clustering on the sample, and then assign the rest of the data to the clusters found in the sample. This approach makes it suitable for large-scale applications where traditional clustering methods might be computationally expensive.
library(cluster)
library(factoextra)
for (i in 2:10) {
cl <- clara(instant_noodles_df, i)
print(paste('Average silhouette for',i,'clusters:',cl$silinfo$avg.width))
}
## [1] "Average silhouette for 2 clusters: 0.934338067785307"
## [1] "Average silhouette for 3 clusters: 0.64989627607545"
## [1] "Average silhouette for 4 clusters: 0.571684061013459"
## [1] "Average silhouette for 5 clusters: 0.612482760382871"
## [1] "Average silhouette for 6 clusters: 0.56930648529619"
## [1] "Average silhouette for 7 clusters: 0.57922499591618"
## [1] "Average silhouette for 8 clusters: 0.623826391890962"
## [1] "Average silhouette for 9 clusters: 0.653681379553354"
## [1] "Average silhouette for 10 clusters: 0.650406448529437"
The average silhouette width is a measure of how well-separated
clusters are in a clustering solution. It ranges from -1 to 1, where a
higher value indicates better-defined clusters.
In this case, the silhouette analysis suggests that a 2-cluster solution has a very high average silhouette width of 0.93. This indicates well-defined and separated clusters. As you increase the number of clusters, the average silhouette width decreases, suggesting that the clusters become less distinct or overlapping. Choosing 2 Clusters also shows high silhouette width, which can be applied with the data set 3 Clusters still have good separation, but less distinct than 2 clusters. In the case of 4-7 clusters some reduction in silhouette width is visible, indicating a bit of overlap or less distinct clusters. When it comes to 8-10 clusters, the silhouette width starts to stabilize, suggesting that additional clusters might not contribute significantly to better separation, but also showing higher silhouette width.
The Silhouette Method is a technique for choosing the optimal number of clusters in a dataset. The idea is to calculate the average silhouette width for different numbers of clusters and choose the number that maximizes this score. It can be applied to various clustering algorithms, such as K-Means.
set.seed(12345)
silhouette_values <- numeric(10)
for (k in 2:10) {
kmeans_model <- kmeans(instant_noodles_numeric, centers = k)
silhouette_values[k] <- silhouette(kmeans_model$cluster, dist(instant_noodles_numeric))
}
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
## Warning in silhouette_values[k] <- silhouette(kmeans_model$cluster,
## dist(instant_noodles_numeric)): liczba pozycji do zastąpienia nie jest
## wielokrotnością długości zamiany
plot(2:10, silhouette_values[2:10], type = "b", main = "Silhouette Method",
xlab = "Number of Clusters (k)", ylab = "Average Silhouette Score")
Based on the plot and the context of instant noodle consumption, the
optimal number of clusters obtained through the Silhouette Method
suggests that the dataset could be naturally grouped into more clusters,
each representing a distinct pattern or behavior related to instant
noodle consumption. This segmentation could be driven by various factors
such as geographical location, economic conditions, cultural
preferences, or other variables present in the dataset.
The study delved into the global consumption patterns of instant
noodles, providing valuable insights into the economic and societal
implications of this popular food item. The relevance of instant noodle
consumption in economics was highlighted, emphasizing its role as an
economic indicator and its impact on various aspects of well-being. The
application of Principal Component Analysis (PCA) was employed to
identify trends in instant noodle consumption over the specified years.
The results of PCA revealed the significant influence of the year 2020,
followed by 2022 and 2021, in explaining the variance in consumption
patterns. This temporal analysis provided a nuanced understanding of the
evolving dynamics of instant noodle consumption. The CLARA and
Silhouette method were applied to select the optimal number of clusters.
Clustering analysis applied to instant noodle consumption data can offer
valuable insights into distinct patterns, trends, and consumer
behaviors.
The overall findings emphasize the importance of recent
years in shaping consumption trends, suggesting potential shifts in
dietary preferences and economic factors. The observed trends in instant
noodle consumption could be indicative of broader lifestyle changes,
reflecting a preference for quick, convenient, and appetizing food
options. The claim that these mass-produced foods aid consumers in
utilizing limited time aligns with the fast-paced, restless lifestyle
that many individuals lead. The convenience and time-saving aspects of
instant noodles make them an attractive choice, especially for students
and those with busy schedules (Tran, Nguyen, 2015).It opens avenues for
future research exploring the intricate connections between food
consumption, economic indicators, and societal well-being.
Huh I.S., Kim H., Jo H.K., Lim C.S., Kim J.S., Kim S.J., Kwon O., Oh
B., Chang N. (2017). Instant noodle consumption is associated with
cardiometabolic risk factors among college students in Seoul, Nutrition
Research and Practice.
Park J., Lee J.S., Jang Y.A., Chung H.R.,
Kim J. (2011). A comparison of food and nutrient intake between instant
noodle consumers and non-instant noodle consumers in Korean adults,
Nutrition Research and Practice.
Tran, H.,Nguyen T. (2015). The
effect of instant foods. Academia.