This project demonstrates the practical application of data analysis skills acquired through the Google Data Analytics Professional Certificate. The workflow follows the structured learning path of the program, covering essential stages of the data analysis process:
The goal of this capstone project is to analyze the factors influencing the average happiness scores of countries worldwide. This project demonstrates an end-to-end data analysis workflow, from data acquisition to visualization, by applying key skills learned during the Google Data Analytics Professional Certificate program.
The raw datasets were provided as CSV files containing various indicators such as GDP, Freedom, Life Expectancy, and Happiness Score.
Data Import and Type Conversion
Imported raw data from multiple years (2015-2019) and converted columns
to appropriate data types using CAST and
SAFE_CAST.
Data Transformation
Renamed columns for consistency across datasets (e.g.,
Country to country,
Happiness Rank to happiness_rank).
Data Integration
Combined data from multiple years using UNION ALL to create
a single comprehensive dataset.
Data Filtering
Removed header rows by filtering out entries where the column values
were equal to column names.
Final Output
Sorted the combined dataset by happiness_score in
descending order to facilitate further analysis.
```sql
SELECT * FROM (
-- 2015 Data
SELECT
CAST(`Country` AS STRING) AS country,
CAST(`Happiness Rank` AS INT64) AS happiness_rank,
SAFE_CAST(`Happiness Score` AS FLOAT64) AS happiness_score,
SAFE_CAST(`Economy _GDP per Capita_` AS FLOAT64) AS gdp_per_capita,
SAFE_CAST(`Family` AS FLOAT64) AS social_support,
SAFE_CAST(`Health _Life Expectancy_` AS FLOAT64) AS life_expectancy,
SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
SAFE_CAST(`Trust _Government Corruption_` AS FLOAT64) AS government_trust,
SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
2015 AS year
FROM capstone-443017.worldhappiness.2015
WHERE `Country` != 'Country'
UNION ALL
-- 2016 Data
SELECT
CAST(`Country` AS STRING) AS country,
CAST(`Happiness Rank` AS INT64) AS happiness_rank,
SAFE_CAST(`Happiness Score` AS FLOAT64) AS happiness_score,
SAFE_CAST(`Economy _GDP per Capita_` AS FLOAT64) AS gdp_per_capita,
SAFE_CAST(`Family` AS FLOAT64) AS social_support,
SAFE_CAST(`Health _Life Expectancy_` AS FLOAT64) AS life_expectancy,
SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
SAFE_CAST(`Trust _Government Corruption_` AS FLOAT64) AS government_trust,
SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
2016 AS year
FROM capstone-443017.worldhappiness.2016
WHERE `Country` != 'Country'
UNION ALL
-- 2017 Data
SELECT
CAST(`Country` AS STRING) AS country,
CAST(`Happiness_Rank` AS INT64) AS happiness_rank,
SAFE_CAST(`Happiness_Score` AS FLOAT64) AS happiness_score,
SAFE_CAST(`Economy__GDP_per_Capita_` AS FLOAT64) AS gdp_per_capita,
SAFE_CAST(`Family` AS FLOAT64) AS social_support,
SAFE_CAST(`Health__Life_Expectancy_` AS FLOAT64) AS life_expectancy,
SAFE_CAST(`Freedom` AS FLOAT64) AS freedom,
SAFE_CAST(`Trust__Government_Corruption_` AS FLOAT64) AS government_trust,
SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
2017 AS year
FROM capstone-443017.worldhappiness.2017
WHERE `Country` != 'Country'
UNION ALL
-- 2018 Data (without Dystopia Residual)
SELECT
CAST(`Country or region` AS STRING) AS country,
CAST(`Overall rank` AS INT64) AS happiness_rank, -- Correcting column name
SAFE_CAST(`Score` AS FLOAT64) AS happiness_score,
SAFE_CAST(`GDP per capita` AS FLOAT64) AS gdp_per_capita,
SAFE_CAST(`Social support` AS FLOAT64) AS social_support,
SAFE_CAST(`Healthy life expectancy` AS FLOAT64) AS life_expectancy,
SAFE_CAST(`Freedom to make life choices` AS FLOAT64) AS freedom,
SAFE_CAST(`Perceptions of corruption` AS FLOAT64) AS government_trust,
SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
2018 AS year
FROM capstone-443017.worldhappiness.2018
WHERE `Country or region` != 'Country or region'
UNION ALL
-- 2019 Data (without Dystopia Residual)
SELECT
CAST(`Country or region` AS STRING) AS country,
CAST(`Overall rank` AS INT64) AS happiness_rank, -- Correcting column name
SAFE_CAST(`Score` AS FLOAT64) AS happiness_score,
SAFE_CAST(`GDP per capita` AS FLOAT64) AS gdp_per_capita,
SAFE_CAST(`Social support` AS FLOAT64) AS social_support,
SAFE_CAST(`Healthy life expectancy` AS FLOAT64) AS life_expectancy,
SAFE_CAST(`Freedom to make life choices` AS FLOAT64) AS freedom,
SAFE_CAST(`Perceptions of corruption` AS FLOAT64) AS government_trust,
SAFE_CAST(`Generosity` AS FLOAT64) AS generosity,
2019 AS year
FROM capstone-443017.worldhappiness.2019
WHERE `Country or region` != 'Country or region'
)
ORDER BY happiness_score DESC
The cleaned datasets were imported into R for analysis. In this step, the code calculates the average values for key factors such as happiness score, GDP per capita, social support, life expectancy, freedom, government trust, and generosity for each country. The data is grouped by country, and averages are computed across multiple years. Additionally, the number of years each country appears in the dataset is recorded. Afterward, the countries are ranked in descending order based on their average happiness score.
# Load necessary library
library(dplyr)
# Calculate the average happiness score and other relevant factors for each country
country_averages <- cleanworldhappiness %>%
group_by(country) %>%
summarise(
avg_happiness_score = mean(happiness_score, na.rm = TRUE),
avg_gdp_per_capita = mean(gdp_per_capita, na.rm = TRUE),
avg_social_support = mean(social_support, na.rm = TRUE),
avg_life_expectancy = mean(life_expectancy, na.rm = TRUE),
avg_freedom = mean(freedom, na.rm = TRUE),
avg_government_trust = mean(government_trust, na.rm = TRUE),
avg_generosity = mean(generosity, na.rm = TRUE),
num_years = n()
)
# Rank countries based on average happiness score
country_averages <- country_averages %>%
arrange(desc(avg_happiness_score)) %>%
mutate(rank = row_number())
To visualize the relationships between some of the variables, we created a series of data visualizations through R. These visualizations helped us identify potential trends and correlations within the data.
After performing initial analysis in R, we developed more interactive and insightful visualizations in Tableau. Below is an overview of the dashboard:
Investing in social and economic infrastructure is key to improving global happiness levels.
I would like to thank the following: