World Happiness Capstone

Data Cleaning and Preparation in Google Sheets

Step 1: Receiving and Cleaning the Data

The raw datasets were provided as CSV files containing various indicators such as GDP, Freedom, Life Expectancy, and Happiness Score.

Tasks Performed in SQL

Data Import and Type Conversion
Imported raw data from multiple years (2015-2019) and converted columns to appropriate data types using CAST and SAFE_CAST.
Data Transformation
Renamed columns for consistency across datasets (e.g., Country to country, Happiness Rank to happiness_rank).
Data Integration
Combined data from multiple years using UNION ALL to create a single comprehensive dataset.
Data Filtering
Removed header rows by filtering out entries where the column values were equal to column names.
Final Output
Sorted the combined dataset by happiness_score in descending order to facilitate further analysis.


```sql
SELECT * FROM (
    -- 2015 Data
    SELECT
        CAST(`Country` AS STRING) AS country,
        CAST(`Happiness Rank` AS INT64) AS happiness_rank,
        SAFE_CAST(`Happiness Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`Economy _GDP per Capita_` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Family` AS FLOAT64) AS social_support,
        SAFE_CAST(`Health _Life Expectancy_` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
        SAFE_CAST(`Trust _Government Corruption_` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2015 AS year
    FROM capstone-443017.worldhappiness.2015
    WHERE `Country` != 'Country'

    UNION ALL

    -- 2016 Data
    SELECT
        CAST(`Country` AS STRING) AS country,
        CAST(`Happiness Rank` AS INT64) AS happiness_rank,
        SAFE_CAST(`Happiness Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`Economy _GDP per Capita_` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Family` AS FLOAT64) AS social_support,
        SAFE_CAST(`Health _Life Expectancy_` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
        SAFE_CAST(`Trust _Government Corruption_` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2016 AS year
    FROM capstone-443017.worldhappiness.2016
    WHERE `Country` != 'Country'

    UNION ALL

    -- 2017 Data
    SELECT
        CAST(`Country` AS STRING) AS country,
        CAST(`Happiness_Rank` AS INT64) AS happiness_rank,
        SAFE_CAST(`Happiness_Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`Economy__GDP_per_Capita_` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Family` AS FLOAT64) AS social_support,
        SAFE_CAST(`Health__Life_Expectancy_` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
        SAFE_CAST(`Trust__Government_Corruption_` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2017 AS year
    FROM capstone-443017.worldhappiness.2017
    WHERE `Country` != 'Country'

    UNION ALL

    -- 2018 Data (without Dystopia Residual)
    SELECT
        CAST(`Country or region` AS STRING) AS country,
        CAST(`Overall rank` AS INT64) AS happiness_rank,  -- Correcting column name
        SAFE_CAST(`Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`GDP per capita` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Social support` AS FLOAT64) AS social_support,
        SAFE_CAST(`Healthy life expectancy` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom to make life choices` AS FLOAT64) AS freedom,
        SAFE_CAST(`Perceptions of corruption` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2018 AS year
    FROM capstone-443017.worldhappiness.2018
    WHERE `Country or region` != 'Country or region'

    UNION ALL

    -- 2019 Data (without Dystopia Residual)
    SELECT
        CAST(`Country or region` AS STRING) AS country,
        CAST(`Overall rank` AS INT64) AS happiness_rank,  -- Correcting column name
        SAFE_CAST(`Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`GDP per capita` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Social support` AS FLOAT64) AS social_support,
        SAFE_CAST(`Healthy life expectancy` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom to make life choices` AS FLOAT64) AS freedom,
        SAFE_CAST(`Perceptions of corruption` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2019 AS year
    FROM capstone-443017.worldhappiness.2019
    WHERE `Country or region` != 'Country or region'
)
ORDER BY happiness_score DESC

cleaned_data.csv

Step 2: Loading and Exploring the Data in R

The cleaned datasets were imported into R for analysis. In this step, the code calculates the average values for key factors such as happiness score, GDP per capita, social support, life expectancy, freedom, government trust, and generosity for each country. The data is grouped by country, and averages are computed across multiple years. Additionally, the number of years each country appears in the dataset is recorded. Afterward, the countries are ranked in descending order based on their average happiness score.

# Load necessary library
library(dplyr)

# Calculate the average happiness score and other relevant factors for each country
country_averages <- cleanworldhappiness %>%
  group_by(country) %>%
  summarise(
    avg_happiness_score = mean(happiness_score, na.rm = TRUE),
    avg_gdp_per_capita = mean(gdp_per_capita, na.rm = TRUE),
    avg_social_support = mean(social_support, na.rm = TRUE),
    avg_life_expectancy = mean(life_expectancy, na.rm = TRUE),
    avg_freedom = mean(freedom, na.rm = TRUE),
    avg_government_trust = mean(government_trust, na.rm = TRUE),
    avg_generosity = mean(generosity, na.rm = TRUE),
    num_years = n()
  ) 

# Rank countries based on average happiness score
country_averages <- country_averages %>%
  arrange(desc(avg_happiness_score)) %>%
  mutate(rank = row_number())

To visualize the relationships between some of the variables, we created a series of data visualizations through R. These visualizations helped us identify potential trends and correlations within the data.

Step 3: Creating Interactive Dashboards in Tableau

After performing initial analysis in R, we developed more interactive and insightful visualizations in Tableau. Below is an overview of the dashboard:

Maps visualizing global happiness distribution.
Scatter Plots showing relationships between different factors and happiness scores.
Bar Charts ranking factors by how much they impact overall happiness.

View the Interactive Dashboard on Tableau Public where you can click through the visualization sheets

World Map

Scatter Plots

Bar Chart

Key Insights

GDP per Capita, Life Expectancy, and Social Support are the strongest predictors of happiness.
Freedom is also positively correlated with happiness, though not as strongly as social factors.
Trust in Government and Generosity show weaker correlations but remain significant for certain regions.

Conclusion

Investing in social and economic infrastructure is key to improving global happiness levels.

Acknowledgments

I would like to thank the following:

Kaggle for providing the datasets.
Google Sheets, BigQuery, R, and Tableau for the tools used in this project.

World Happiness Capstone

Sarah Froxylia

2024-12-02

Project Overview

Objectives

Specific Objectives: