Project Overview

This project demonstrates the practical application of data analysis skills acquired through the Google Data Analytics Professional Certificate. The workflow follows the structured learning path of the program, covering essential stages of the data analysis process:

Objectives

The goal of this capstone project is to analyze the factors influencing the average happiness scores of countries worldwide. This project demonstrates an end-to-end data analysis workflow, from data acquisition to visualization, by applying key skills learned during the Google Data Analytics Professional Certificate program.

Specific Objectives:

  1. Frame Data-Driven Questions
    Identify and define the key questions that guide the analysis, such as:
    • What are the primary factors influencing global happiness?
    • How do economic, social, and governmental factors correlate with happiness?
    • Which variables have the strongest impact on happiness across different regions?
  2. Clean and Prepare Data
    Source and clean the dataset to ensure data integrity by:
    • Handling missing values and duplicate entries.
    • Standardizing data formats and ensuring consistency across variables.
  3. Analyze Data
    Perform exploratory and statistical analyses to uncover relationships between happiness and influencing factors like:
    • Average GDP per capita
    • Average life expectancy
    • Average social support
    • Average freedom
    • Average generosity
    • Average trust in government
  4. Visualize Data
    Create interactive and static visualizations to communicate insights effectively by:
    • Using scatter plots, bar charts, and dashboards to showcase correlations and trends.
    • Highlighting key takeaways through visual storytelling.
  5. Share Insights
    Present the findings in a professional HTML report and share the final project on platforms like LinkedIn and Tableau Public to demonstrate expertise in:
    • Data cleaning and analysis.
    • Data visualization and storytelling.
    • Communicating data-driven insights effectively.

Data Cleaning and Preparation in Google Sheets

Step 1: Receiving and Cleaning the Data

The raw datasets were provided as CSV files containing various indicators such as GDP, Freedom, Life Expectancy, and Happiness Score.

Tasks Performed in SQL

  1. Data Import and Type Conversion
    Imported raw data from multiple years (2015-2019) and converted columns to appropriate data types using CAST and SAFE_CAST.

  2. Data Transformation
    Renamed columns for consistency across datasets (e.g., Country to country, Happiness Rank to happiness_rank).

  3. Data Integration
    Combined data from multiple years using UNION ALL to create a single comprehensive dataset.

  4. Data Filtering
    Removed header rows by filtering out entries where the column values were equal to column names.

  5. Final Output
    Sorted the combined dataset by happiness_score in descending order to facilitate further analysis.


```sql
SELECT * FROM (
    -- 2015 Data
    SELECT
        CAST(`Country` AS STRING) AS country,
        CAST(`Happiness Rank` AS INT64) AS happiness_rank,
        SAFE_CAST(`Happiness Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`Economy _GDP per Capita_` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Family` AS FLOAT64) AS social_support,
        SAFE_CAST(`Health _Life Expectancy_` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
        SAFE_CAST(`Trust _Government Corruption_` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2015 AS year
    FROM capstone-443017.worldhappiness.2015
    WHERE `Country` != 'Country'

    UNION ALL

    -- 2016 Data
    SELECT
        CAST(`Country` AS STRING) AS country,
        CAST(`Happiness Rank` AS INT64) AS happiness_rank,
        SAFE_CAST(`Happiness Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`Economy _GDP per Capita_` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Family` AS FLOAT64) AS social_support,
        SAFE_CAST(`Health _Life Expectancy_` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
        SAFE_CAST(`Trust _Government Corruption_` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2016 AS year
    FROM capstone-443017.worldhappiness.2016
    WHERE `Country` != 'Country'

    UNION ALL

    -- 2017 Data
    SELECT
        CAST(`Country` AS STRING) AS country,
        CAST(`Happiness_Rank` AS INT64) AS happiness_rank,
        SAFE_CAST(`Happiness_Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`Economy__GDP_per_Capita_` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Family` AS FLOAT64) AS social_support,
        SAFE_CAST(`Health__Life_Expectancy_` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
        SAFE_CAST(`Trust__Government_Corruption_` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2017 AS year
    FROM capstone-443017.worldhappiness.2017
    WHERE `Country` != 'Country'

    UNION ALL

    -- 2018 Data (without Dystopia Residual)
    SELECT
        CAST(`Country or region` AS STRING) AS country,
        CAST(`Overall rank` AS INT64) AS happiness_rank,  -- Correcting column name
        SAFE_CAST(`Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`GDP per capita` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Social support` AS FLOAT64) AS social_support,
        SAFE_CAST(`Healthy life expectancy` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom to make life choices` AS FLOAT64) AS freedom,
        SAFE_CAST(`Perceptions of corruption` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2018 AS year
    FROM capstone-443017.worldhappiness.2018
    WHERE `Country or region` != 'Country or region'

    UNION ALL

    -- 2019 Data (without Dystopia Residual)
    SELECT
        CAST(`Country or region` AS STRING) AS country,
        CAST(`Overall rank` AS INT64) AS happiness_rank,  -- Correcting column name
        SAFE_CAST(`Score` AS FLOAT64) AS happiness_score,
        SAFE_CAST(`GDP per capita` AS FLOAT64) AS gdp_per_capita,
        SAFE_CAST(`Social support` AS FLOAT64) AS social_support,
        SAFE_CAST(`Healthy life expectancy` AS FLOAT64) AS life_expectancy,
        SAFE_CAST(`Freedom to make life choices` AS FLOAT64) AS freedom,
        SAFE_CAST(`Perceptions of corruption` AS FLOAT64) AS government_trust,
        SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
        2019 AS year
    FROM capstone-443017.worldhappiness.2019
    WHERE `Country or region` != 'Country or region'
)
ORDER BY happiness_score DESC

Step 2: Loading and Exploring the Data in R

The cleaned datasets were imported into R for analysis. In this step, the code calculates the average values for key factors such as happiness score, GDP per capita, social support, life expectancy, freedom, government trust, and generosity for each country. The data is grouped by country, and averages are computed across multiple years. Additionally, the number of years each country appears in the dataset is recorded. Afterward, the countries are ranked in descending order based on their average happiness score.

# Load necessary library
library(dplyr)

# Calculate the average happiness score and other relevant factors for each country
country_averages <- cleanworldhappiness %>%
  group_by(country) %>%
  summarise(
    avg_happiness_score = mean(happiness_score, na.rm = TRUE),
    avg_gdp_per_capita = mean(gdp_per_capita, na.rm = TRUE),
    avg_social_support = mean(social_support, na.rm = TRUE),
    avg_life_expectancy = mean(life_expectancy, na.rm = TRUE),
    avg_freedom = mean(freedom, na.rm = TRUE),
    avg_government_trust = mean(government_trust, na.rm = TRUE),
    avg_generosity = mean(generosity, na.rm = TRUE),
    num_years = n()
  ) 

# Rank countries based on average happiness score
country_averages <- country_averages %>%
  arrange(desc(avg_happiness_score)) %>%
  mutate(rank = row_number())

To visualize the relationships between some of the variables, we created a series of data visualizations through R. These visualizations helped us identify potential trends and correlations within the data.

Step 3: Creating Interactive Dashboards in Tableau

After performing initial analysis in R, we developed more interactive and insightful visualizations in Tableau. Below is an overview of the dashboard:

  • Maps visualizing global happiness distribution.
  • Scatter Plots showing relationships between different factors and happiness scores.
  • Bar Charts ranking factors by how much they impact overall happiness.

Key Insights

  1. GDP per Capita, Life Expectancy, and Social Support are the strongest predictors of happiness.
  2. Freedom is also positively correlated with happiness, though not as strongly as social factors.
  3. Trust in Government and Generosity show weaker correlations but remain significant for certain regions.

Conclusion

Investing in social and economic infrastructure is key to improving global happiness levels.

Acknowledgments

I would like to thank the following:

  • Kaggle for providing the datasets.
  • Google Sheets, BigQuery, R, and Tableau for the tools used in this project.