Open Food Facts Dimension Reduction

Objectives

This work is conducted with the purpose of applying dimension reduction of factors on the Open Food Facts data. Then, clustering will be performed.

#install.packages('smacof')
#install.packages('psych')
#install.packages('mice')
#install.packages('clusterSim')

# Load necessary libraries
library(tidyverse)
library(cluster)
library(factoextra)
library(psych)
library(smacof)
library(gridExtra)
library(mice)
library(dplyr)
library(clusterSim)

Data - Open Food Facts

#Load the data
open_food_facts <- read.csv("trimmed_open_food_facts_dataset.csv", sep = ",")
head(open_food_facts)
##                  product_name          brands manufacturing_places
## 1 Roasted & Salted Pistachios Paramount Farms                     
## 2                   Panettone            Coop                     
## 3   Creme legere semi epaisse       Carrefour                     
## 4     Organic Cheese Crackers     Full Circle                     
## 5            Miel de Montagne        Eric Bur                     
## 6        Gummies, Peach Rings          Kroger                     
##   purchase_places allergens traces additives_n ingredients_from_palm_oil_n
## 1                                            0                           0
## 2                                           NA                          NA
## 3                                            0                           0
## 4                                            3                           0
## 5     Lyon,France                            0                           0
## 6                                            7                           0
##   energy.from.fat_100g trans.fat_100g sugars_100g fiber_100g proteins_100g
## 1                   NA              0         3.9        5.2         11.69
## 2                   NA             NA        29.0        2.0          7.00
## 3                   NA             NA         3.1         NA          2.80
## 4                   NA              0         0.0        3.3          6.67
## 5                   NA             NA        80.0        0.0          0.40
## 6                   NA              0        50.0        0.0          5.00
##   sodium_100g alcohol_100g fruits.vegetables.nuts_100g cocoa_100g
## 1 0.286000000           NA                          NA         NA
## 2 0.157480315           NA                          NA         NA
## 3 0.055118110           NA                          NA         NA
## 4 0.900000000           NA                          NA         NA
## 5 0.002362205           NA                          NA         NA
## 6 0.012000000           NA                          NA         NA
##   nutrition_grade_fr
## 1                  a
## 2                  d
## 3                  d
## 4                  d
## 5                  d
## 6                  d
str(open_food_facts)
## 'data.frame':    1500 obs. of  18 variables:
##  $ product_name               : chr  "Roasted & Salted Pistachios" "Panettone" "Creme legere semi epaisse" "Organic Cheese Crackers" ...
##  $ brands                     : chr  "Paramount Farms" "Coop" "Carrefour" "Full Circle" ...
##  $ manufacturing_places       : chr  "" "" "" "" ...
##  $ purchase_places            : chr  "" "" "" "" ...
##  $ allergens                  : chr  "" "" "" "" ...
##  $ traces                     : chr  "" "" "" "" ...
##  $ additives_n                : num  0 NA 0 3 0 7 0 5 1 NA ...
##  $ ingredients_from_palm_oil_n: num  0 NA 0 0 0 0 0 0 0 NA ...
##  $ energy.from.fat_100g       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ trans.fat_100g             : num  0 NA NA 0 NA 0 NA NA 0 NA ...
##  $ sugars_100g                : num  3.9 29 3.1 0 80 50 30 1.5 2.27 4 ...
##  $ fiber_100g                 : num  5.2 2 NA 3.3 0 0 NA 2 2.3 NA ...
##  $ proteins_100g              : num  11.69 7 2.8 6.67 0.4 ...
##  $ sodium_100g                : num  0.286 0.15748 0.05512 0.9 0.00236 ...
##  $ alcohol_100g               : num  NA NA NA NA NA NA NA 0 NA NA ...
##  $ fruits.vegetables.nuts_100g: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cocoa_100g                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nutrition_grade_fr         : chr  "a" "d" "d" "d" ...
summary(open_food_facts)
##  product_name          brands          manufacturing_places purchase_places   
##  Length:1500        Length:1500        Length:1500          Length:1500       
##  Class :character   Class :character   Class :character     Class :character  
##  Mode  :character   Mode  :character   Mode  :character     Mode  :character  
##                                                                               
##                                                                               
##                                                                               
##                                                                               
##   allergens            traces           additives_n    
##  Length:1500        Length:1500        Min.   : 0.000  
##  Class :character   Class :character   1st Qu.: 0.000  
##  Mode  :character   Mode  :character   Median : 1.000  
##                                        Mean   : 2.041  
##                                        3rd Qu.: 3.000  
##                                        Max.   :22.000  
##                                        NA's   :146     
##  ingredients_from_palm_oil_n energy.from.fat_100g trans.fat_100g  
##  Min.   :0.00000             Min.   :  0.0        Min.   :0.0000  
##  1st Qu.:0.00000             1st Qu.: 65.0        1st Qu.:0.0000  
##  Median :0.00000             Median :130.0        Median :0.0000  
##  Mean   :0.03323             Mean   :193.3        Mean   :0.0251  
##  3rd Qu.:0.00000             3rd Qu.:290.0        3rd Qu.:0.0000  
##  Max.   :2.00000             Max.   :450.0        Max.   :4.5500  
##  NA's   :146                 NA's   :1497         NA's   :746     
##   sugars_100g       fiber_100g    proteins_100g     sodium_100g     
##  Min.   :  0.00   Min.   : 0.00   Min.   : 0.000   Min.   : 0.0000  
##  1st Qu.:  1.23   1st Qu.: 0.00   1st Qu.: 1.820   1st Qu.: 0.0330  
##  Median :  5.00   Median : 1.40   Median : 5.710   Median : 0.2362  
##  Mean   : 14.92   Mean   : 2.79   Mean   : 7.834   Mean   : 0.4476  
##  3rd Qu.: 23.00   3rd Qu.: 3.60   3rd Qu.:11.000   3rd Qu.: 0.5118  
##  Max.   :100.00   Max.   :51.60   Max.   :55.000   Max.   :23.7500  
##  NA's   :1        NA's   :254     NA's   :1        NA's   :1        
##   alcohol_100g  fruits.vegetables.nuts_100g   cocoa_100g    nutrition_grade_fr
##  Min.   :0      Min.   :  0.00              Min.   :28.00   Length:1500       
##  1st Qu.:0      1st Qu.:  7.75              1st Qu.:29.50   Class :character  
##  Median :0      Median : 45.00              Median :38.50   Mode  :character  
##  Mean   :0      Mean   : 42.19              Mean   :43.75                     
##  3rd Qu.:0      3rd Qu.: 53.75              3rd Qu.:52.75                     
##  Max.   :0      Max.   :100.00              Max.   :70.00                     
##  NA's   :1489   NA's   :1486                NA's   :1496
dim(open_food_facts)
## [1] 1500   18
#Remove empty columns if there are more than 90% of the rows are empty
data_cleaned <- open_food_facts[, colSums(is.na(open_food_facts)) / nrow(open_food_facts) < 0.9]

# Imputation to eliminate any missing values
# mice() function performs imputation by filling in the missing values of data_cleaned
# method pnm (Predictive Mean Matching) is used to predict missing values using regression model and matches the predicted values to observe values closest in distribution and within the range of observed data
imputed_data <- mice(data_cleaned, method = "pmm", m = 5, maxit = 50, seed = 486)
## 
##  iter imp variable
##   1   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   1   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   1   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   1   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   1   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   2   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   2   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   2   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   2   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   2   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   3   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   3   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   3   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   3   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   3   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   4   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   4   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   4   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   4   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   4   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   5   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   5   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   5   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   5   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   5   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   6   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   6   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   6   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   6   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   6   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   7   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   7   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   7   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   7   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   7   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   8   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   8   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   8   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   8   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   8   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   9   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   9   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   9   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   9   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   9   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   10   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   10   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   10   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   10   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   10   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   11   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   11   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   11   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   11   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   11   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   12   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   12   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   12   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   12   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   12   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   13   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   13   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   13   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   13   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   13   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   14   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   14   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   14   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   14   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   14   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   15   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   15   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   15   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   15   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   15   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   16   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   16   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   16   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   16   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   16   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   17   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   17   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   17   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   17   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   17   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   18   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   18   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   18   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   18   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   18   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   19   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   19   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   19   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   19   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   19   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   20   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   20   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   20   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   20   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   20   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   21   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   21   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   21   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   21   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   21   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   22   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   22   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   22   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   22   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   22   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   23   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   23   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   23   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   23   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   23   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   24   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   24   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   24   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   24   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   24   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   25   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   25   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   25   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   25   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   25   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   26   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   26   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   26   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   26   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   26   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   27   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   27   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   27   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   27   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   27   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   28   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   28   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   28   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   28   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   28   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   29   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   29   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   29   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   29   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   29   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   30   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   30   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   30   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   30   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   30   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   31   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   31   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   31   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   31   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   31   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   32   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   32   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   32   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   32   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   32   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   33   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   33   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   33   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   33   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   33   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   34   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   34   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   34   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   34   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   34   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   35   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   35   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   35   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   35   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   35   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   36   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   36   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   36   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   36   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   36   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   37   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   37   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   37   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   37   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   37   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   38   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   38   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   38   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   38   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   38   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   39   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   39   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   39   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   39   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   39   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   40   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   40   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   40   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   40   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   40   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   41   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   41   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   41   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   41   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   41   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   42   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   42   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   42   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   42   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   42   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   43   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   43   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   43   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   43   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   43   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   44   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   44   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   44   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   44   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   44   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   45   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   45   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   45   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   45   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   45   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   46   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   46   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   46   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   46   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   46   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   47   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   47   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   47   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   47   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   47   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   48   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   48   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   48   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   48   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   48   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   49   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   49   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   49   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   49   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   49   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   50   1  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   50   2  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   50   3  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   50   4  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
##   50   5  additives_n  ingredients_from_palm_oil_n  trans.fat_100g  sugars_100g  fiber_100g  proteins_100g  sodium_100g
## Warning: Number of logged events: 7
# a completed dataset is extracted which has missing values filling in
data_complete <- complete(imputed_data)

# Check variable types
sapply(data_complete, class)
##                product_name                      brands 
##                 "character"                 "character" 
##        manufacturing_places             purchase_places 
##                 "character"                 "character" 
##                   allergens                      traces 
##                 "character"                 "character" 
##                 additives_n ingredients_from_palm_oil_n 
##                   "numeric"                   "numeric" 
##              trans.fat_100g                 sugars_100g 
##                   "numeric"                   "numeric" 
##                  fiber_100g               proteins_100g 
##                   "numeric"                   "numeric" 
##                 sodium_100g          nutrition_grade_fr 
##                   "numeric"                 "character"
# Convert non-numeric columns to numeric
data_complete <- data_complete %>%
  mutate_if(is.character, as.factor) %>%
  mutate_if(is.factor, as.numeric)

# To view the summary of the data
summary(data_complete)
##   product_name        brands       manufacturing_places purchase_places
##  Min.   :   1.0   Min.   :   1.0   Min.   :  1.000      Min.   :  1.0  
##  1st Qu.: 356.8   1st Qu.: 251.0   1st Qu.:  1.000      1st Qu.:  1.0  
##  Median : 723.5   Median : 557.0   Median :  1.000      Median :  1.0  
##  Mean   : 723.7   Mean   : 561.3   Mean   :  7.663      Mean   : 13.5  
##  3rd Qu.:1089.2   3rd Qu.: 864.2   3rd Qu.:  1.000      3rd Qu.:  1.0  
##  Max.   :1458.0   Max.   :1164.0   Max.   :111.000      Max.   :131.0  
##    allergens          traces         additives_n    
##  Min.   :  1.00   Min.   :  1.000   Min.   : 0.000  
##  1st Qu.:  1.00   1st Qu.:  1.000   1st Qu.: 0.000  
##  Median :  1.00   Median :  1.000   Median : 1.000  
##  Mean   : 12.82   Mean   :  5.603   Mean   : 2.003  
##  3rd Qu.:  1.00   3rd Qu.:  1.000   3rd Qu.: 3.000  
##  Max.   :175.00   Max.   :108.000   Max.   :22.000  
##  ingredients_from_palm_oil_n trans.fat_100g     sugars_100g     
##  Min.   :0.00000             Min.   :0.00000   Min.   :  0.000  
##  1st Qu.:0.00000             1st Qu.:0.00000   1st Qu.:  1.245  
##  Median :0.00000             Median :0.00000   Median :  5.000  
##  Mean   :0.03333             Mean   :0.02543   Mean   : 14.938  
##  3rd Qu.:0.00000             3rd Qu.:0.00000   3rd Qu.: 23.020  
##  Max.   :2.00000             Max.   :4.55000   Max.   :100.000  
##    fiber_100g     proteins_100g     sodium_100g      nutrition_grade_fr
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.0000   Min.   :1.000     
##  1st Qu.: 0.000   1st Qu.: 1.830   1st Qu.: 0.0330   1st Qu.:2.000     
##  Median : 1.400   Median : 5.705   Median : 0.2362   Median :3.000     
##  Mean   : 2.729   Mean   : 7.832   Mean   : 0.4476   Mean   :3.212     
##  3rd Qu.: 3.600   3rd Qu.:11.000   3rd Qu.: 0.5118   3rd Qu.:4.000     
##  Max.   :51.600   Max.   :55.000   Max.   :23.7500   Max.   :5.000
# To confirm if there is NAs
any(is.na(data_complete))
## [1] FALSE
# Normalize data for comparability
data_normalized <- data.Normalization(data_complete, type = "n1", normalization = "column")
str(data_normalized)
## 'data.frame':    1500 obs. of  14 variables:
##  $ product_name               : num  1.039 0.615 -0.833 0.49 0.241 ...
##  $ brands                     : num  0.681 -0.987 -1.169 -0.51 -0.677 ...
##  $ manufacturing_places       : num  -0.331 -0.331 -0.331 -0.331 -0.331 ...
##  $ purchase_places            : num  -0.436 -0.436 -0.436 -0.436 2.282 ...
##  $ allergens                  : num  -0.335 -0.335 -0.335 -0.335 -0.335 ...
##  $ traces                     : num  -0.268 -0.268 -0.268 -0.268 -0.268 ...
##  $ additives_n                : num  -0.75 -0.376 -0.75 0.373 -0.75 ...
##  $ ingredients_from_palm_oil_n: num  -0.178 -0.178 -0.178 -0.178 -0.178 ...
##  $ trans.fat_100g             : num  -0.0834 -0.0834 -0.0834 -0.0834 -0.0834 ...
##  $ sugars_100g                : num  -0.557 0.71 -0.597 -0.754 3.283 ...
##  $ fiber_100g                 : num  0.546 -0.161 -0.183 0.126 -0.603 ...
##  $ proteins_100g              : num  0.484 -0.104 -0.632 -0.146 -0.933 ...
##  $ sodium_100g                : num  -0.129 -0.231 -0.313 0.361 -0.355 ...
##  $ nutrition_grade_fr         : num  -1.643 0.585 0.585 0.585 0.585 ...
##  - attr(*, "normalized:shift")= Named num [1:14] 723.71 561.35 7.66 13.5 12.82 ...
##   ..- attr(*, "names")= chr [1:14] "product_name" "brands" "manufacturing_places" "purchase_places" ...
##  - attr(*, "normalized:scale")= Named num [1:14] 424.8 345.9 20.1 28.7 35.2 ...
##   ..- attr(*, "names")= chr [1:14] "product_name" "brands" "manufacturing_places" "purchase_places" ...

Dimensional Reduction - PCA

# Scree plot to find elbow point
fviz_eig(princomp(data_normalized), addlabels = TRUE, ylim = c(0, 20)) +
  geom_vline(xintercept = 4, linetype = "dashed", color = "red") +  # Add elbow point
  labs(title = "Scree Plot with Suggested Elbow Point")

# It was shown on the scree plot that the suggested elbow point is 4.

# Rotation (varimax) improves interpretability by maximizing the variance explained by each factor

# When nfactors = 4
pca <- principal(data_normalized, nfactors = 4, rotate = "varimax")
summary(pca)
## 
## Factor analysis with Call: principal(r = data_normalized, nfactors = 4, rotate = "varimax")
## 
## Test of the hypothesis that 4 factors are sufficient.
## The degrees of freedom for the model is 41  and the objective function was  1.15 
## The number of observations was  1500  with Chi Square =  1711.94  with prob <  0 
## 
## The root mean square of the residuals (RMSA) is  0.09
#Identify variables with high loadings for each factor
print(loadings(pca), digits = 3, cutoff = 0.4, sort = TRUE)
## 
## Loadings:
##                             RC1    RC2    RC3    RC4   
## manufacturing_places         0.717                     
## purchase_places              0.798                     
## allergens                    0.647                     
## traces                       0.629                     
## additives_n                         0.566              
## sugars_100g                         0.723              
## nutrition_grade_fr                  0.706              
## fiber_100g                                 0.800       
## proteins_100g                              0.645  0.470
## sodium_100g                                       0.781
## product_name                                           
## brands                                                 
## ingredients_from_palm_oil_n                            
## trans.fat_100g                             0.412       
## 
##                  RC1   RC2   RC3   RC4
## SS loadings    2.019 1.719 1.300 1.161
## Proportion Var 0.144 0.123 0.093 0.083
## Cumulative Var 0.144 0.267 0.360 0.443
# When nfactors = 3
pca <- principal(data_normalized, nfactors = 3, rotate = "varimax")
summary(pca)
## 
## Factor analysis with Call: principal(r = data_normalized, nfactors = 3, rotate = "varimax")
## 
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 52  and the objective function was  0.87 
## The number of observations was  1500  with Chi Square =  1299.83  with prob <  7.8e-238 
## 
## The root mean square of the residuals (RMSA) is  0.09
#Identify variables with high loadings for each factor
print(loadings(pca), digits = 3, cutoff = 0.4, sort = TRUE)
## 
## Loadings:
##                             RC1    RC2    RC3   
## manufacturing_places         0.714              
## purchase_places              0.799              
## allergens                    0.653              
## traces                       0.625              
## additives_n                         0.584       
## sugars_100g                         0.691       
## nutrition_grade_fr                  0.735       
## fiber_100g                                 0.680
## proteins_100g                              0.769
## product_name                                    
## brands                                          
## ingredients_from_palm_oil_n                     
## trans.fat_100g                                  
## sodium_100g                                     
## 
##                  RC1   RC2   RC3
## SS loadings    2.030 1.737 1.295
## Proportion Var 0.145 0.124 0.093
## Cumulative Var 0.145 0.269 0.362
# When nfactors = 5
pca <- principal(data_normalized, nfactors = 5, rotate = "varimax")
summary(pca)
## 
## Factor analysis with Call: principal(r = data_normalized, nfactors = 5, rotate = "varimax")
## 
## Test of the hypothesis that 5 factors are sufficient.
## The degrees of freedom for the model is 31  and the objective function was  1.5 
## The number of observations was  1500  with Chi Square =  2229.02  with prob <  0 
## 
## The root mean square of the residuals (RMSA) is  0.1
#Identify variables with high loadings for each factor
print(loadings(pca), digits = 3, cutoff = 0.4, sort = TRUE)
## 
## Loadings:
##                             RC1    RC2    RC3    RC4    RC5   
## manufacturing_places         0.721                            
## purchase_places              0.799                            
## allergens                    0.647                            
## traces                       0.635                            
## additives_n                         0.592                     
## sugars_100g                         0.702                     
## nutrition_grade_fr                  0.724                     
## fiber_100g                                 0.840              
## proteins_100g                              0.564  0.560       
## sodium_100g                                       0.780       
## product_name                                             0.662
## brands                                                   0.717
## ingredients_from_palm_oil_n                                   
## trans.fat_100g                             0.422              
## 
##                  RC1   RC2   RC3   RC4   RC5
## SS loadings    2.016 1.724 1.261 1.170 1.120
## Proportion Var 0.144 0.123 0.090 0.084 0.080
## Cumulative Var 0.144 0.267 0.357 0.441 0.521
# to visualise how data points are distributed in the reduced-dimensional space
biplot(pca)

Explanation of PCA results

The analysis used Principal Component Analysis (PCA) with 4 factors and varimax rotation to simplify interpretation by maximizing variance explained by each factor and their comparisons.

RMSR measures the average difference between observed and predicted correlations. Smaller values indicate a better fit.

Interpretation: RMSR < 0.05: Excellent fit; RMSR ~ 0.05 to 0.10: Adequate fit; RMSR > 0.10: Poor fit.

For experiment, PCA with nfactors = 3 and 5 are also conducted to compare the results.

nfactors = 4

When nfactors = 4, Chi-Square = 1711.94, p-value < 0, the null hypothesis that 4 factors are sufficient is rejected, indicating 4 factors are not necessarily to be sufficient, meaning 4 factors leave some unexplained variance. It could be due to the noise in the data since chi-square test is sensitive to it.

The objective function was 1.15. The smaller the values, the better the model fit.

The root mean square of the residuals (RMSR) is 0.09:

Since the RSMR = 0.09, it is considered to be adequately fit.

nfactors = 3

When nfactors = 3, Chi-Square = 1299.83, p-value < 7.8e-238, the null hypothesis that 3 factors are sufficient is rejected, indicating 3 factors are not sufficient to perfectly explain the data.

The objective function was 0.87 so it is slightly better than when nfactors = 4.

The root mean square of the residuals (RMSR) is 0.09:

Since the RSMR = 0.09, it is considered to be adequately fit.

nfactors = 5

When nfactors =5, Chi-Square = 2229.02, p-value < 0, the null hypothesis that 5 factors are sufficient is rejected, indicating 5 factors are not sufficient to perfectly explain the data.

The objective function was 1.15 and it is the worse among 3 models.

The root mean square of the residuals (RMSR) is 0.1, hence it is on the borderline of an acceptable fit and a poor fit.

Summary of PCA results

Metric nfactors = 3 nfactors = 4 nfactors = 5
Degrees of Freedom 52 41 31
Objective Function 0.87 1.15 1.5
Chi-Square 1299.83 1711.94 2229.02
RMSR 0.09 0.09 0.10

Analysis and visualisation of features

# Elbow point is 4
pca <- prcomp(data_normalized, scale. = TRUE)

# Contribution of variables to the first two principal components
contrib_PC1 <- fviz_contrib(pca, choice = "var", axes = 1, xtickslab.rt = 90)
contrib_PC2 <- fviz_contrib(pca, choice = "var", axes = 2, xtickslab.rt = 90)
gridExtra::grid.arrange(contrib_PC1, contrib_PC2,top='Contribution to the first two Principal Components')

# MDS
dist_matrix <- dist(data_normalized)
mds_fit <- mds(dist_matrix, ndim = 2, type = "ratio")
#summary(mds_fit)
plot(mds_fit, plot.type = "stressplot")

plot(mds_fit, pch = 21, cex = 1, bg = "coral2", main = "MDS Projection")

The red line in the plots created by fviz_contrib() represents the expected average contribution of each variable to the specified principal component.

Based on the Contribution to the first two Principal Components plot, purchase_places, manufacturing_places, allergens and traces contributed the most and above average contribution to Dim-1 whereas, sugars_100g, nutrition_grade_fr, additives_n and proteins_100g contributes the most and above average contribution to Dim-2.

Clustering

Clustering is performed for comparison.

# Determine optimal number of clusters
fviz_nbclust(data_normalized, kmeans, method = "silhouette") +
  labs(title = "Optimal Number of Clusters")

## K-means clustering
set.seed(791)
kmeans_result <- kmeans(data_normalized, centers = 4, nstart = 25)
fviz_cluster(kmeans_result, data = data_normalized, ellipse.type = "convex", main = "K-means Clustering")

## PAM clustering
pam_result <- pam(data_normalized, k = 4)
fviz_cluster(pam_result, data = data_normalized, ellipse.type = "convex", main = "PAM Clustering")

## Hierarchical clustering
hc <- hclust(dist_matrix, method = "ward.D2")
plot(hc, main = "Hierarchical Clustering Dendrogram")
rect.hclust(hc, k = 4, border = "red")

## Analysis and comparison
# Comparing clustering results
compare_clusters <- table(kmeans_result$cluster, pam_result$clustering)
print(compare_clusters)
##    
##       1   2   3   4
##   1  10 147 209   1
##   2  14  49  19 161
##   3   0   5   3   0
##   4 362 508   4   8

Summary

PAM Cluster 1 PAM Cluster 2 PAM Cluster 3 PAM Cluster 4
K-Means 1 10 147 209 1
K-Means 2 14 49 19 161
K-Means 3 0 5 2 0
K-Means 4 362 508 4 8

Based on the number of observations in the clusters shown in table above, K-Means Cluster 4 and PAM Cluster 2 have strong cluster agreements while K-Means Cluster 1 and PAM Cluster 3 have some agreements. On the other hand, K-Means Cluster 3 has very few points (8 total), spread across PAM Clusters 2 and 3 with 5 in PAM cluster 2 and 3 in PAM cluster 3. This may suggest that K-means Cluster 3 represents a small and distinct group or just noise.

Furthermore, the K-means clusters are well defined and separated whereas PAM clusters are defined with overlapped clusters. The shape of clusters varied for both clustering.

For hierarchical clustering, at height of 50, it shows 4 clusters.