Introduction

This project focuses on analyzing the World Happiness dataset to understand the factors that influence happiness across different countries. The analysis explores how economic, social, and health-related variables such as GDP, social support, freedom, and life expectancy impact overall well-being.

The goal is to identify patterns, relationships, and trends in the data using statistical techniques and visualizations.

Objectives

The main objectives of this analysis are:

Dataset Description

The dataset used in this project is the World Happiness dataset, which contains information about different countries and their happiness scores.

It includes variables such as: - Happiness score - GDP per capita - Social support - Healthy life expectancy - Freedom to make life choices - Generosity - Perceptions of corruption - Suicide rate and unemployment (in some datasets)

The dataset consists of multiple countries across different regions, allowing comparative analysis.

library(readxl)
## Warning: package 'readxl' was built under R version 4.5.3
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)
data <- read_excel("C:/Users/Ankur Raj/Downloads/world_happiness_2019_clean.xlsx")

print(data)
## # A tibble: 155 × 12
##    Ranking Country     `Regional indicator`  `Happiness score` `GDP per capita`
##      <dbl> <chr>       <chr>                             <dbl>            <dbl>
##  1       1 Finland     Western Europe                    77689           795824
##  2       2 Denmark     Western Europe                    76001           821474
##  3       3 Norway      Western Europe                    75539           883423
##  4       4 Iceland     Western Europe                    74936           819529
##  5       5 Netherlands Western Europe                    74876           828945
##  6       6 Switzerland Western Europe                    74802            86233
##  7       7 Sweden      Western Europe                    73433           823337
##  8       8 New Zealand North America and ANZ             73075           773464
##  9       9 Canada      North America and ANZ             72781           810463
## 10      10 Austria     Western Europe                     7246           816785
## # ℹ 145 more rows
## # ℹ 7 more variables: `Social support` <dbl>, `Healthy life expectancy` <dbl>,
## #   `Freedom to make life choices` <dbl>, Generosity <dbl>,
## #   `Perceptions of corruption` <dbl>, `Suicide Rate` <dbl>, UnEmployment <dbl>
print(ncol(data))
## [1] 12
#Question 1: Are there countries with high freedom but still low happiness?

data %>%
  filter(`Freedom to make life choices` > mean(`Freedom to make life choices`, na.rm = TRUE),
         `Happiness score` < mean(`Happiness score`, na.rm = TRUE))
#interpretation: Yes, countries like Australia and belgium which shows freedom alone is not enough—other factors like health and social support are necessary
#Question 2: Which region has the strongest social support?

 data %>%
  group_by(`Regional indicator`) %>%
  summarise(avg_support = mean(`Social support`)) %>%
  arrange(desc(avg_support))
#interpretation: Regions like North America with strong social systems also show higher happiness,reinforcing the importance of community.
#Question 3:Does freedom clearly separate happy and unhappy countries?
    
  library(ggplot2)
  
    ggplot(data, aes(x = `Freedom to make life choices`, y = `Happiness score`)) +
    geom_point()

#interpretation:Freedom and happiness show a positive relationship, as higher freedom generally corresponds to higher happiness scores.However, the scattered points indicate that freedom alone is not enough, and other factors also influence happiness.
 #Question 4: How are happiness scores distributed globally?
    
      ggplot(data, aes(x = `Happiness score`)) +
      geom_histogram(binwidth = 5000)

#interpretation: The histogram shows that most countries have happiness scores concentrated in the mid to higher range (around 40,000–65,000). A few countries have very low happiness scores, creating a tail on the left side. This indicates a slightly negatively skewed distribution, where most countries are relatively happier
#Question 5: Which countries have high GDP but lower than average happiness?
        
        data %>%
        filter(`GDP per capita` > mean(`GDP per capita`),
               `Happiness score` < mean(`Happiness score`))
#interpretation: Countries like Austria, Germany indicate that money alone is not enough for happiness.
#Question 6: Which countries are happy despite low GDP?
          
          data %>%
          filter(`GDP per capita` < mean(`GDP per capita`),
                 `Happiness score` > mean(`Happiness score`))
#interpretation: countries like Switzerland,Uzbekistan comes in the list of high happiness score despite of low GDP
 #Question 7: Does higher happiness always mean lower suicide rates?
            
          s_mean <- mean(data$`Suicide Rate`, na.rm = TRUE)
          
          data$`Suicide group` <- ifelse(data$`Suicide Rate` < s_mean*0.8, "Low",
                                       ifelse(data$`Suicide Rate` < s_mean*1.2, "Medium", "High"))
          
          
          library(dplyr)
          
          bar_data <- data %>%
            group_by(`Suicide group`) %>%
            summarise(avg_happiness = mean(`Happiness score`, na.rm = TRUE))
          
          library(ggplot2)
          
          ggplot(bar_data, aes(x = `Suicide group`, y = avg_happiness)) +
            geom_bar(stat = "identity")

#interpretation:  The bar chart shows a gradual decline in happiness as suicide rates increase, suggesting an inverse relationship.However, the relatively small differences indicate that suicide rate alone does not fully explain variations in happiness.
 #Question 8: Do countries with high social support always rank high in happiness?
            data %>%
            filter(`Social support` > mean(`Social support`)) %>%
            arrange(desc(`Happiness score`))
#interpretation:Countries with above-average social support, such as Denmark, Norway, and Sweden, consistently rank among the top in happiness.This shows that strong social support is a key factor contributing to higher happiness levels, although it works alongside other factors like GDP and freedom.
 #Question 9: Which regions have both high freedom and high happiness?
              data %>%
              group_by(`Regional indicator`) %>%
              summarise(avg_freedom = mean(`Freedom to make life choices`),
                        avg_happiness = mean(`Happiness score`)) %>%
              arrange(desc(avg_happiness))
 #interpretation:  Regions like Western Europe show both high freedom and high happiness, indicating a positive relationship. However, this pattern is not consistent across all regions, suggesting that freedom alone is not sufficient to determine happiness.
 #Question 10:  Which countries perform well in happiness despite high unemployment?
                data %>%
                filter(`UnEmployment` > mean(`UnEmployment`, na.rm = TRUE),
                       `Happiness score` > mean(`Happiness score`, na.rm = TRUE))
 #interpretation: Some countries, including Sweden and Costa Rica, show high happiness despite higher unemployment. This indicates that social support and living conditions can reduce the negative impact of unemployment on well-being.
#Question 11: Do top 10 happiest countries share common characteristics?
     top10 <- data %>%
       arrange(desc(`Happiness score`)) %>%
       slice(1:10)
     
     top10 %>%
       summarise(
         avg_gdp = mean(`GDP per capita`),
         avg_support = mean(`Social support`),
         avg_freedom = mean(`Freedom to make life choices`)
       ) 
   print(top10)  
## # A tibble: 10 × 13
##    Ranking Country     `Regional indicator`   `Happiness score` `GDP per capita`
##      <dbl> <chr>       <chr>                              <dbl>            <dbl>
##  1       1 Finland     Western Europe                     77689           795824
##  2       2 Denmark     Western Europe                     76001           821474
##  3       3 Norway      Western Europe                     75539           883423
##  4       4 Iceland     Western Europe                     74936           819529
##  5       5 Netherlands Western Europe                     74876           828945
##  6       6 Switzerland Western Europe                     74802            86233
##  7       7 Sweden      Western Europe                     73433           823337
##  8       8 New Zealand North America and ANZ              73075           773464
##  9       9 Canada      North America and ANZ              72781           810463
## 10      12 Costa Rica  Latin America and Car…             71674           614128
## # ℹ 8 more variables: `Social support` <dbl>, `Healthy life expectancy` <dbl>,
## #   `Freedom to make life choices` <dbl>, Generosity <dbl>,
## #   `Perceptions of corruption` <dbl>, `Suicide Rate` <dbl>,
## #   UnEmployment <dbl>, `Suicide group` <chr>
    # Top happiness countries are mostly from Western Europe and show high GDP, strong social support, and good health conditions.This suggests that balanced development across multiple factors leads to higher happiness.
   # Question 12: Which countries have extreme happiness values? 
     data %>%
       arrange(desc(`Happiness score`)) %>%
       slice(1:3)
     #Finland, Denmark, and Norway—all belong to Western Europe are happiest countries and show high values across key factors such as GDP, social support, and life expectancy.
  # Question 13: Do countries with both high unemployment and high suicide rates show the lowest happiness?
     
     data %>%
       filter(`UnEmployment` > mean(`UnEmployment`, na.rm = TRUE),
              `Suicide Rate` > mean(`Suicide Rate`, na.rm = TRUE)) %>%
       arrange(`Happiness score`)
     #interpretation:Most countries with high unemployment and suicide rates show lower happiness, but exceptions like Sweden exist.This suggests that other factors such as social support and governance can balance negative conditions.
  # Question 14: Does generosity improve happiness only when combined with social support?
       
     
     data %>%
       filter(`Generosity` > mean(`Generosity`, na.rm = TRUE),
              `Social support` > mean(`Social support`, na.rm = TRUE)) %>%
       arrange(desc(`Happiness score`))
    #interpretation: Countries with both high generosity and strong social support show very high happiness levels.This suggests that generosity is most effective when combined with strong social systems.
#Question 15:Which countries have the highest healthy life expectancy?

data %>%
  arrange(desc(`Healthy life expectancy`)) %>%
  slice(1:5)
#interpretation:These countries are generally happier, showing that good health improves quality of life.
  #Question 16:How does happiness distribution differ between high GDP and low GDP countries?

 library(ggplot2)
     
     data$GDP_group <- ifelse(data$`GDP per capita` > mean(data$`GDP per capita`, na.rm = TRUE),
                              "High GDP", "Low GDP")
     
     ggplot(data, aes(x = GDP_group,
                      y = `Happiness score`,
                      fill = GDP_group)) +
       geom_boxplot(outlier.color = "red",
                    alpha = 0.8) +
       labs(title = "Happiness Distribution by GDP Level",
            subtitle = "Comparison of happiness between high and low GDP countries",
            x = "GDP Group",
            y = "Happiness Score") +
       theme_minimal()

       #interpretation:The boxplot shows that high GDP countries have a significantly higher median happiness compared to low GDP countries, indicating a strong positive influence of economic conditions on well-being.
#Question 17: How does happiness vary with freedom across different regions?

   ggplot(data, aes(x = `Freedom to make life choices`, y = `Happiness score`)) +
       geom_point() +
       geom_smooth(method = "lm", se = FALSE) +
       facet_wrap(~`Regional indicator`) +
       theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

 #interpretation: Western Europe, South Asia, Middle East & North Africa shows Clear positive slope. when Freedom is increasing, happiness index also increasing
#Question 18: Does GDP significantly influence happiness?
   model1 <- lm(`Happiness score` ~ `GDP per capita`, data = data)
       summary(model1)
## 
## Call:
## lm(formula = `Happiness score` ~ `GDP per capita`, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -50337  -3266   3256   9301  32896 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      4.041e+04  2.856e+03  14.148  < 2e-16 ***
## `GDP per capita` 1.735e-02  5.162e-03   3.361 0.000982 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17120 on 153 degrees of freedom
## Multiple R-squared:  0.06874,    Adjusted R-squared:  0.06265 
## F-statistic: 11.29 on 1 and 153 DF,  p-value: 0.0009821
       plot(data$`GDP per capita`, data$`Happiness score`,
            main = "Regression: GDP vs Happiness",
            xlab = "GDP per capita",
            ylab = "Happiness score",
            pch = 16)
       abline(model1, col = "red")

  #interpretation:The regression plot shows a positive relationship between GDP per capita and happiness, as indicated by the upward sloping line. However, the wide spread of data points suggests that GDP alone does not strongly explain variations in happiness, and other factors also play an important role.
#Question 19:  How do multiple factors together influence happiness?
         
         model2 <- lm(`Happiness score` ~ `GDP per capita` +
                        `Social support` +
                        `Freedom to make life choices`,
                      data = data)
       
       summary(model2)
## 
## Call:
## lm(formula = `Happiness score` ~ `GDP per capita` + `Social support` + 
##     `Freedom to make life choices`, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54547  -2015   2550   8484  36385 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    2.981e+04  4.344e+03   6.863 1.63e-10 ***
## `GDP per capita`               1.310e-02  5.239e-03   2.500   0.0135 *  
## `Social support`               9.834e-02  5.265e-02   1.868   0.0637 .  
## `Freedom to make life choices` 1.125e-01  4.923e-02   2.285   0.0237 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16660 on 151 degrees of freedom
## Multiple R-squared:  0.1293, Adjusted R-squared:  0.112 
## F-statistic: 7.473 on 3 and 151 DF,  p-value: 0.0001067
   #interpretation:The regression model shows that GDP and freedom to make life choices have a statistically significant positive impact on happiness. Social support also contributes to happiness, but its effect is weaker and less consistent. The R-squared value indicates that these factors together explain a portion of happiness, but additional variables are needed to fully understand overall well-being.
  #Question 20:  Is the relationship between GDP and happiness non-linear?
       ggplot(data, aes(x = `GDP per capita`, y = `Happiness score`)) +
         geom_point() +
         stat_smooth(method = "lm",
                     formula = y ~ poly(x, 2),
                     se = FALSE) +
         labs(title = "Polynomial Regression (GDP vs Happiness)",
              x = "GDP per capita",
              y = "Happiness score")

  #interpretation:The polynomial regression curve shows that the relationship between GDP and happiness is not strictly linear. At lower levels of GDP, increases have a smaller impact on happiness, while at higher levels, the increase in happiness becomes more pronounced. This indicates a non-linear relationship where economic growth has a stronger effect on well-being at higher income levels.
 # Question 21: Question: How much does each region contribute to total global happiness?
       
       region_happiness <- aggregate(`Happiness score` ~ `Regional indicator`, data = data, sum)
     
     pie(region_happiness$`Happiness score`,
         labels = region_happiness$`Regional indicator`,
         main = "Region-wise Contribution to Total Happiness")

 #      This pie chart shows how different regions contribute to global happiness. 
     #Regions with larger slices have a greater overall contribution to global well-being.
 #Question 22: How does happiness vary between countries with high and low unemployment
       data$Unemp_group <- ifelse(data$`UnEmployment` > mean(data$`UnEmployment`, na.rm = TRUE),
                                  "High Unemployment", "Low Unemployment")
       
       ggplot(data, aes(x = Unemp_group,
                        y = `Happiness score`,
                        fill = Unemp_group)) +
         geom_boxplot(outlier.color = "red", alpha = 0.8) +
         labs(title = "Happiness Distribution by Unemployment Level",
              subtitle = "Comparison of happiness between high and low unemployment countries",
              x = "Unemployment Group",
              y = "Happiness Score") +
         theme_minimal()+coord_flip()

  #The boxplot shows that countries with lower unemployment generally have higher median happiness compared to those with higher unemployment. However, the overlap between groups indicates that unemployment alone does not fully determine happiness
 #Question 23: How are economic, social, and health factors collectively related to happiness?
   pairs(data[, c("Happiness score",
                      "GDP per capita",
                      "Social support",
                      "Healthy life expectancy")],
             main = "Pair Plot of Key Factors Influencing Happiness",
             col = "blue", pch = 16)

 #interpretation: Each row of the pair plot represents how one variable relates to all others. The first row shows how happiness varies with GDP, social support, and health,  all showing positive relationships. The other rows confirm that these variables are also interrelated, indicating that economic, social, and health factors together influence happiness rather than acting independently
# Question 24: Is there a statistically noticeable difference in median happiness between high and low social support countries?
data$Support_group <- ifelse(data$`Social support` > mean(data$`Social support`, na.rm = TRUE),
                             "High Support", "Low Support")

ggplot(data, aes(x = Support_group, y = `Happiness score`, fill = Support_group)) +
  geom_boxplot(notch = TRUE) +
  labs(title = "Happiness Comparison by Social Support Level",
       x = "Social Support Group",
       y = "Happiness Score") +
  theme_minimal()

 #interpretation: The notched boxplot shows that countries with high social support have a higher median happiness compared to those with low social support. The notches indicate that this difference in medians is statistically meaningful, suggesting that social support plays a significant role in overall well-being.