1 Data import

library(readxl)
mydata <- read_xlsx("/cloud/project/Student depression/2015.xlsx")
mydata <- as.data.frame(mydata) 

2 Data description

colnames(mydata) [1] <- "Country"
colnames(mydata) [2] <- "Region"
colnames(mydata) [3] <- "ID"
colnames(mydata) [4] <- "Happiness_Score"
colnames(mydata) [5] <- "GDP_per_capita"
colnames(mydata) [6] <- "Family"
colnames(mydata) [7] <- "Life_Expectancy"
colnames(mydata) [8] <- "Freedom"
colnames(mydata) [9] <- "Trust"
colnames(mydata) [10] <- "Generosity"
colnames(mydata) [11] <- "Dystopia"


head(mydata)
##       Country         Region ID Happiness_Score GDP_per_capita  Family
## 1 Switzerland Western Europe  1           7.587        1.39651 1.34951
## 2     Iceland Western Europe  2           7.561        1.30232 1.40223
## 3     Denmark Western Europe  3           7.527        1.32548 1.36058
## 4      Norway Western Europe  4           7.522        1.45900 1.33095
## 5      Canada  North America  5           7.427        1.32629 1.32261
## 6     Finland Western Europe  6           7.406        1.29025 1.31826
##   Life_Expectancy Freedom   Trust Generosity Dystopia
## 1         0.94143 0.66557 0.41978    0.29678  2.51738
## 2         0.94784 0.62877 0.14145    0.43630  2.70201
## 3         0.87464 0.64938 0.48357    0.34139  2.49204
## 4         0.88521 0.66973 0.36503    0.34699  2.46531
## 5         0.90563 0.63297 0.32957    0.45811  2.45176
## 6         0.88911 0.64169 0.41372    0.23351  2.61955

Source: 2015.csv. Kaggle;

https://www.kaggle.com/datasets/unsdsn/world-happiness

Unit of Observation: Each row represents a country.

Sample Size: 158 countries (rows in the dataset).

Variables Analyzed:

  1. Economy (GDP per Capita): The extent to which GDP contributes to the calculation of the Happiness Score.

  2. Family: The extent to which Social support contributes to the calculation of the Happiness Score.

  3. Health (Life Expectancy): The extent to which Life expectancy contributed to the calculation of the Happiness Score.

  4. Freedom: The extent to which Perceived Freedom to make life choices contributed to the calculation of the Happiness Score.

  5. Trust (Government Corruption): The extent to which the Perception of government corruption contributes to Happiness Score.

  6. Generosity: The extent to which Willingness to donate to others contributes to Happiness Score.

  7. Dystopia Residual: The extent to which the Baseline metric for unhappiness (Dystopia Residual) contributes to Happiness Score.

summary(mydata[ , c(-1, -2, -3)])
##  Happiness_Score GDP_per_capita       Family       Life_Expectancy 
##  Min.   :2.839   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:4.526   1st Qu.:0.5458   1st Qu.:0.8568   1st Qu.:0.4392  
##  Median :5.232   Median :0.9102   Median :1.0295   Median :0.6967  
##  Mean   :5.376   Mean   :0.8461   Mean   :0.9910   Mean   :0.6303  
##  3rd Qu.:6.244   3rd Qu.:1.1584   3rd Qu.:1.2144   3rd Qu.:0.8110  
##  Max.   :7.587   Max.   :1.6904   Max.   :1.4022   Max.   :1.0252  
##     Freedom           Trust           Generosity        Dystopia     
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000   Min.   :0.3286  
##  1st Qu.:0.3283   1st Qu.:0.06168   1st Qu.:0.1506   1st Qu.:1.7594  
##  Median :0.4355   Median :0.10722   Median :0.2161   Median :2.0954  
##  Mean   :0.4286   Mean   :0.14342   Mean   :0.2373   Mean   :2.0990  
##  3rd Qu.:0.5491   3rd Qu.:0.18025   3rd Qu.:0.3099   3rd Qu.:2.4624  
##  Max.   :0.6697   Max.   :0.55191   Max.   :0.7959   Max.   :3.6021

The average happiness score across all countries is 5.376, the median is slightly lower than the mean, suggesting that the distribution of happiness scores might be slightly skewed to the right.

The lowest happiness score is 2.839, and the highest is 7.587, showing a wide range of happiness across countries.

The average GDP per capita contribution to happiness is 0.8461.

The median generosity score is 0.216, indicating that half of the countries have a generosity score below this value and the other half above it

4 Conclusion

We clustered 153 countries based on five standardized variables: Family, Life Expectancy, Freedom, Trust, and Generosity. The analysis revealed three distinct clusters that highlight significant disparities in happiness-related factors.

Cluster 1: High Performers in Happiness

This cluster consists of 31 countries (20.3%), which have above-average values across all variables. These countries enjoy strong social support systems, higher life expectancy, greater freedom, high levels of trust, and generosity, all contributing to their higher happiness scores.

Statistical tests confirmed that Cluster 1 had significantly higher average values for Family, Life Expectancy, Freedom, Trust, and Generosity (p < 0.001). However, the normality test for GDP per capita revealed that the distribution within this cluster is not normal, limiting further interpretations.

Cluster 2: Moderate Performers in Happiness

This cluster represents 73 countries (47.7%), which show average to below-average performance in happiness-related variables. While Family and Life Expectancy approach average values, Trust and Generosity are consistently below average.

Kruskal-Wallis tests confirmed significant differences in GDP per capita between clusters (p < 0.001). However, tests for homogeneity of variance revealed marginally non-significant differences (p = 0.051). This cluster represents countries where improvements in institutional trust and generosity could have significant impacts on happiness levels.

Cluster 3: Low Performers in Happiness

This group includes 49 countries (32%) that face significant socioeconomic challenges and exhibit below-average values across all variables, particularly Family and Life Expectancy.

Chi-squared analysis revealed a significant association between region and cluster assignment (p < 0.001), with Sub-Saharan Africa dominating this group. However, chi-squared residuals could not be interpreted due to unmet assumptions, limiting further insights.

Validated Findings:

Differences in Family, Life Expectancy, Freedom, Trust, and Generosity between clusters were statistically significant (p < 0.001). GDP per capita contributions significantly differed between clusters, as confirmed by the Kruskal-Wallis test (p < 0.001).

Limitations:

Normality assumptions for GDP per capita were violated in Cluster 1. Chi-squared residuals could not be interpreted due to small expected frequencies, rendering the test invalid.