Introduction

This project will be based on the Meat Consumption Per Capita Dataset from Kaggle (https://www.kaggle.com/datasets/scibearia/meat-consumption-per-capita). The dataset provides a comprehensive view of global meat consumption trends across different countries and years.

While the dataset can be used for time-series forecasting and economic modeling, this project will primarily focus on:

Principal Component Analysis (PCA) to explore whether dimensionality reduction can help in understanding global meat consumption patterns.

Dataset Overview

The dataset consists of multiple observations per country, detailing per capita consumption of different types of meat over several decades. The key variables include:

Country (Entity)

Year

Meat categories: Poultry, Beef, Pork, Fish & Seafood, Sheep & Goat, Other meats.

Main Objectives

Perform PCA analysis to identify key consumption trends and reduce dimensionality.

Visualize PCA-transformed data to see how different countries cluster based on dietary habits.

# Load necessary libraries
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(psych)       # Correlation analysis
## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(factoextra)  # PCA visualization
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(GGally)      # Pairwise correlation visualization
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(corrplot)    # Correlation heatmap
## corrplot 0.95 loaded
library(gridExtra)   # Grid arrangement for multiple plots
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
library(rgl)         # 3D visualization
library(viridis)     # Improved color schemes
## Loading required package: viridisLite
# Load dataset
data <- read.csv("Consumption of meat per capita.csv")

# Display data structure
str(data)
## 'data.frame':    10080 obs. of  8 variables:
##  $ Entity          : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Year            : int  1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
##  $ Poultry         : num  0.642 0.673 0.673 0.684 0.715 ...
##  $ Beef            : num  4.89 5.11 5.16 5.13 5.09 ...
##  $ Sheep.and.goat  : num  8.33 8.07 8.25 8.52 8.82 ...
##  $ Pork            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Other.meats     : num  0.866 0.897 1.091 1.031 1.082 ...
##  $ Fish.and.seafood: num  0.0306 0.0306 0.0306 0.0306 0.0306 ...
summary(data)
##     Entity               Year         Poultry            Beef       
##  Length:10080       Min.   :1961   Min.   : 0.000   Min.   : 0.000  
##  Class :character   1st Qu.:1977   1st Qu.: 1.985   1st Qu.: 4.251  
##  Mode  :character   Median :1993   Median : 7.303   Median : 8.111  
##                     Mean   :1992   Mean   :12.545   Mean   :11.862  
##                     3rd Qu.:2008   3rd Qu.:18.782   3rd Qu.:16.738  
##                     Max.   :2021   Max.   :89.530   Max.   :92.689  
##                                                     NA's   :1       
##  Sheep.and.goat        Pork         Other.meats       Fish.and.seafood
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.00000   Min.   :  0.00  
##  1st Qu.: 0.472   1st Qu.: 1.063   1st Qu.: 0.09998   1st Qu.:  5.01  
##  Median : 1.243   Median : 5.349   Median : 0.67702   Median : 11.85  
##  Mean   : 3.456   Mean   :11.687   Mean   : 1.86409   Mean   : 16.77  
##  3rd Qu.: 3.740   3rd Qu.:17.406   3rd Qu.: 1.77109   3rd Qu.: 23.26  
##  Max.   :66.910   Max.   :78.026   Max.   :70.12380   Max.   :183.21  
##                   NA's   :288      NA's   :7

Entity (Country/Region): The dataset includes multiple countries and regions.

Year: Ranges from 1961 to 2021

Poultry and beef are the most consumed meats globally, with poultry consumption increasing over time.

Pork consumption varies significantly by region, with many missing values likely due to cultural/religious dietary restrictions.

Sheep and goat meat is relatively less consumed, with a strong skew toward lower values.

Fish and seafood have a large variation, with some regions consuming significantly more than others.

Data preprocessing

# Fill missing values with 0 (assuming NA means no consumption)
data[is.na(data)] <- 0

# Remove the "Year" column (not needed for analysis)
data <- data[ , !names(data) %in% "Year"]

# Extract only numeric variables (exclude country names)
data_pca <- data[, !names(data) %in% "Entity"]

Identify highly correlated variables and understand data structure

# Compute correlation matrix (Pearson correlation)
corr_matrix <- cor(data_pca, method = "pearson")

# Plot correlation heatmap
corrplot(corr_matrix, method = "color", tl.col = "black", tl.srt = 45)

PCA will likely group Beef, Poultry, and Sheep & Goat into a similar principal component (PC1).

Fish & Other Meats might form PC2, representing alternative diets.

Pork may be independent or align with a separate component.

# Visualize pairwise relationships between variables
ggpairs(data_pca)

Countries consuming high beef and pork likely have a diet rich in red meat.

Countries consuming more poultry and fish may follow a leaner protein-based diet.

Sheep & Goat consumption seems independent and might be regionally specific.

Many scatter plots show non-linear patterns, meaning PCA might not capture all variations well.

Some categories have dense clusters, indicating potential subgroups in meat consumption habits.

PCA might group beef, pork, and poultry into one principal component, given their correlations.

Fish and Sheep/Goat might contribute to different components, as their correlations with other meats are weak.

# Standardize numerical data (mean = 0, standard deviation = 1)
data_scaled <- scale(data_pca)

# Display standardized data
head(data_scaled)
##         Poultry       Beef Sheep.and.goat       Pork Other.meats
## [1,] -0.8628289 -0.6179321      0.7421958 -0.7897296  -0.2167707
## [2,] -0.8605832 -0.5986854      0.7026601 -0.7897296  -0.2100034
## [3,] -0.8605520 -0.5938760      0.7298334 -0.7897296  -0.1677431
## [4,] -0.8597867 -0.5963497      0.7708730 -0.7897296  -0.1809317
## [5,] -0.8575461 -0.5997800      0.8164720 -0.7897296  -0.1697396
## [6,] -0.8560531 -0.4349283      0.8696415 -0.7897296  -0.1763428
##      Fish.and.seafood
## [1,]       -0.9712197
## [2,]       -0.9712186
## [3,]       -0.9712175
## [4,]       -0.9712166
## [5,]       -0.9712158
## [6,]       -0.9712154

Meat consumption variables have different units. If we don’t standardize, variables with larger values dominate PCA.

# Perform PCA
pca_result <- prcomp(data_scaled, center = TRUE, scale. = TRUE)

# Display PCA summary (variance explained by each component)
summary(pca_result)
## Importance of components:
##                           PC1    PC2    PC3    PC4    PC5     PC6
## Standard deviation     1.3413 1.1576 0.9954 0.8924 0.7848 0.67677
## Proportion of Variance 0.2998 0.2233 0.1651 0.1327 0.1026 0.07634
## Cumulative Proportion  0.2998 0.5232 0.6883 0.8210 0.9237 1.00000
# Plot Scree Plot (Eigenvalue visualization)
fviz_eig(pca_result)
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.

# Visualize variable contributions to PCA
fviz_pca_var(pca_result, col.var = "steelblue")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the factoextra package.
##   Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

PC1 + PC2 together capture over half the variance (52.3%), meaning a 2D visualization will still be informative.

PC1 + PC2 + PC3 = 68.8%, meaning a 3D PCA plot will be useful.

Most of the meaningful structure is captured in the first 3 PCs. Further PCs might contain noise or minor variations.

# Extract PCA variable contributions
var <- get_pca_var(pca_result)

# Visualize contributions of variables to the first five principal components
a <- fviz_contrib(pca_result, "var", axes = 1)
b <- fviz_contrib(pca_result, "var", axes = 2)
c <- fviz_contrib(pca_result, "var", axes = 3)
d <- fviz_contrib(pca_result, "var", axes = 4)
e <- fviz_contrib(pca_result, "var", axes = 5)

# Arrange all contribution plots in a grid layout
grid.arrange(a, b, c, d, e, top = 'Contribution to the Principal Components')

PC1 (Dim-1): Poultry, Pork, and Beef Dominate

PC2 (Dim-2): Sheep & Goat and Other Meats Dominate

PC3 (Dim-3): Fish & Seafood is Dominant

PC4 (Dim-4): Other Meats, Pork, and Sheep & Goat

PC5 (Dim-5): Poultry and Fish & Seafood

To find which countries contribute most to each PC, we sort them based on PCA scores:

# Convert PCA results into a data frame
pca_scores <- as.data.frame(pca_result$x)
# Add country names to the PCA scores
pca_scores$Entity <- data$Entity  
# Display the first few rows
head(pca_scores)
##         PC1        PC2        PC3        PC4         PC5       PC6      Entity
## 1 -1.515253 -0.5211654 -0.4739945 -0.4726498 -0.09443124 0.5162450 Afghanistan
## 2 -1.508497 -0.5030524 -0.4800276 -0.4496135 -0.08724477 0.4847503 Afghanistan
## 3 -1.506462 -0.5482398 -0.4633194 -0.4365870 -0.08134442 0.4889619 Afghanistan
## 4 -1.503114 -0.5691420 -0.4684336 -0.4646208 -0.09008138 0.5119487 Afghanistan
## 5 -1.500449 -0.6067225 -0.4617617 -0.4801807 -0.09210960 0.5345582 Afghanistan
## 6 -1.416930 -0.6970382 -0.5486674 -0.5069828 -0.12094715 0.4583083 Afghanistan
# Function to get top & bottom 10 countries for each principal component
get_top_countries <- function(pc_num, top_n = 10) {
  pca_scores %>%
    select(Entity, all_of(pc_num)) %>%
    arrange(desc(get(pc_num))) %>%
    head(top_n)
}

get_bottom_countries <- function(pc_num, bottom_n = 10) {
  pca_scores %>%
    select(Entity, all_of(pc_num)) %>%
    arrange(get(pc_num)) %>%
    head(bottom_n)
}

# Get top & bottom 10 countries for PC1 - PC5
top_pc1 <- get_top_countries("PC1")
bottom_pc1 <- get_bottom_countries("PC1")

top_pc2 <- get_top_countries("PC2")
bottom_pc2 <- get_bottom_countries("PC2")

top_pc3 <- get_top_countries("PC3")
bottom_pc3 <- get_bottom_countries("PC3")

top_pc4 <- get_top_countries("PC4")
bottom_pc4 <- get_bottom_countries("PC4")

top_pc5 <- get_top_countries("PC5")
bottom_pc5 <- get_bottom_countries("PC5")

# Display results
print("Top 10 PC1 countries:"); print(top_pc1)
## [1] "Top 10 PC1 countries:"
##       Entity      PC1
## 1  Hong Kong 5.380615
## 2  Hong Kong 5.307077
## 3  Hong Kong 5.301547
## 4  Hong Kong 5.137700
## 5  Hong Kong 5.087819
## 6  Hong Kong 4.981222
## 7  Hong Kong 4.920758
## 8  Hong Kong 4.919761
## 9  Hong Kong 4.906405
## 10 Hong Kong 4.894522
print("Bottom 10 PC1 countries:"); print(bottom_pc1)
## [1] "Bottom 10 PC1 countries:"
##        Entity       PC1
## 1  East Timor -1.997005
## 2  East Timor -1.994470
## 3  East Timor -1.992430
## 4  East Timor -1.974470
## 5  East Timor -1.964462
## 6  East Timor -1.958699
## 7  East Timor -1.953250
## 8  East Timor -1.948576
## 9  East Timor -1.926350
## 10 East Timor -1.925148
print("Top 10 PC2 countries:"); print(top_pc2)
## [1] "Top 10 PC2 countries:"
##      Entity      PC2
## 1  Maldives 1.709515
## 2  Maldives 1.665250
## 3  Maldives 1.629073
## 4  Maldives 1.554424
## 5  Maldives 1.532603
## 6  Maldives 1.525660
## 7  Maldives 1.487139
## 8    Poland 1.482120
## 9    Poland 1.473206
## 10   Poland 1.466066
print("Bottom 10 PC2 countries:"); print(bottom_pc2)
## [1] "Bottom 10 PC2 countries:"
##      Entity       PC2
## 1  Mongolia -11.81983
## 2  Mongolia -10.76036
## 3  Mongolia -10.63994
## 4  Mongolia -10.44542
## 5  Mongolia -10.37681
## 6  Mongolia -10.31988
## 7  Mongolia -10.28165
## 8  Mongolia -10.26857
## 9  Mongolia -10.23558
## 10 Mongolia -10.17855
print("Top 10 PC3 countries:"); print(top_pc3)
## [1] "Top 10 PC3 countries:"
##        Entity      PC3
## 1    Maldives 7.439996
## 2    Maldives 7.238787
## 3    Maldives 7.150949
## 4    Maldives 7.058171
## 5    Maldives 7.019179
## 6    Maldives 6.632838
## 7    Maldives 6.543751
## 8    Maldives 6.476443
## 9  East Timor 6.445988
## 10   Maldives 6.440809
print("Bottom 10 PC3 countries:"); print(bottom_pc3)
## [1] "Bottom 10 PC3 countries:"
##         Entity       PC3
## 1    Argentina -4.124698
## 2      Uruguay -4.094990
## 3    Argentina -4.055789
## 4    Argentina -4.023726
## 5    Argentina -3.993172
## 6    Argentina -3.958204
## 7    Argentina -3.939394
## 8  New Zealand -3.925270
## 9      Uruguay -3.918682
## 10   Argentina -3.915795
print("Top 10 PC4 countries:"); print(top_pc4)
## [1] "Top 10 PC4 countries:"
##        Entity      PC4
## 1  East Timor 9.604594
## 2  East Timor 9.451846
## 3  East Timor 9.280131
## 4  East Timor 9.057036
## 5  East Timor 8.927782
## 6  East Timor 8.771158
## 7  East Timor 8.658840
## 8  East Timor 8.632725
## 9  East Timor 8.516485
## 10 East Timor 8.507668
print("Bottom 10 PC4 countries:"); print(bottom_pc4)
## [1] "Bottom 10 PC4 countries:"
##     Entity       PC4
## 1  Iceland -4.545047
## 2  Iceland -4.083219
## 3  Iceland -4.070697
## 4  Iceland -3.982658
## 5  Iceland -3.954908
## 6  Iceland -3.882869
## 7  Iceland -3.880827
## 8  Iceland -3.778623
## 9  Iceland -3.773029
## 10 Iceland -3.763433
print("Top 10 PC5 countries:"); print(top_pc5)
## [1] "Top 10 PC5 countries:"
##                              Entity      PC5
## 1                            Kuwait 4.174722
## 2                            Kuwait 4.018714
## 3                        East Timor 3.884854
## 4  Saint Vincent and the Grenadines 3.845636
## 5  Saint Vincent and the Grenadines 3.830227
## 6                        East Timor 3.806357
## 7                        East Timor 3.749014
## 8                        East Timor 3.742880
## 9  Saint Vincent and the Grenadines 3.712498
## 10 Saint Vincent and the Grenadines 3.664930
print("Bottom 10 PC5 countries:"); print(bottom_pc5)
## [1] "Bottom 10 PC5 countries:"
##      Entity       PC5
## 1  Maldives -3.886574
## 2  Maldives -3.853110
## 3  Maldives -3.842812
## 4  Maldives -3.801326
## 5  Maldives -3.799984
## 6  Maldives -3.578453
## 7  Maldives -3.557185
## 8  Maldives -3.464623
## 9  Maldives -3.437015
## 10 Maldives -3.342350

This 3D PCA scatter plot represents the principal component scores for different countries.

The majority of the countries seem to be clustered in the middle, suggesting that most countries have similar meat consumption patterns.This indicates that many countries do not differ significantly in their dietary habits, at least in terms of meat type preference.

There are a few countries positioned far from the dense cluster.These points likely represent outliers, which could be:

Countries with extremely high or low meat consumption.

Countries that primarily consume one type of meat.

Countries with unique dietary restrictions or cultural influences.

### **3D PCA Visualization**
# Generate color coding for countries
classification_col <- as.factor(data$Entity)  
Entity_col <- viridis(length(unique(classification_col)))[as.numeric(classification_col)]


# Open 3D visualization window
open3d()
## wgl 
##   1
# Plot 3D PCA results with transparency and optimized size
plot3d(pca_result$x[,1:3], 
       col = Entity_col, 
       alpha = 0.5,  # Transparency to avoid overlapping issues
       size = 1.2,   # Reduce point size
       type = "s", 
       xlab = "PC1", ylab = "PC2", zlab = "PC3")

# Ensure each country is labeled only once
unique_countries <- unique(data$Entity)
indices <- match(unique_countries, data$Entity)  # Get first occurrence index

# Display country names only once
text3d(pca_result$x[indices,1:3], 
       texts = unique_countries, 
       col = "black", cex = 0.7)

# Draw contribution vectors for variables
coords <- NULL
for (i in 1:nrow(pca_result$rotation)) {
  coords <- rbind(coords, rbind(c(0,0,0), pca_result$rotation[i,1:3]))
}
lines3d(coords, col = "red", lwd = 2)

# Display interactive 3D plot
rglwidget()