import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH']='C:/ProgramData/Anaconda3/Library/plugins/platforms'

Introduction

This World Happiness Report is a publication of the United Nations Sustainable Development Solutions Network. This report is conducted every year and includes articles and rankings of the happiness levels of a country. The data comes from citizens’ rankings of their own lives, which the report then correlates with various quality of life factors. The report uses data from the annual reports of the Gallup World Poll.

Dataset

This dataset looks at results from the World Happiness Report conducted in 2019. Taking data from 156 countries, this dataset ranks the happiness score of each country based on six variables. These variables were GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity, and perceptions of corruption. This data was collected using various surveys. The ranking of national happiness by country is based on a Cantril ladder survey. In this kind of survey, respondents are asked to think of a ladder, with the best possible quality of life for them being “at the top”, or assigned a value of 10. On the other end, if respondents consider themselves to have the worst life possible, they rate their lives as a 0. The World Happiness Report uses data already available to determine the GDP and life expectancy for each country. However, to determine social support, freedom to make life choices, generosity, and perception of corruption, a sample size of citizens of each country is taken to participate in a survey. In the criteria of social support, participants are asked, “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them?” In terms of freedom to make life choices, the question was, “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?” This question refers to freedom of speech, religion, education, clothing, etc. To determine generosity, participants are asked whether or not they have donated to charity in the last month. Lastly, participants were asked, “Is corruption widespread throughout the government? Is corruption widespread within businesses?” These variables were determined by binary answers, using 0 for no and 1 as yes. These variables are intended to illustrate potential correlations between different factors and situations in each country, and how happy citizens are in their country. The numbers in the dataset do not directly correlate with the title of the variable. For example, the country with the highest healthy life expectancy, Singapore, has a value of 1.141. This does not mean the life expectancy of Singapore citizens is only 1.141 years, but rather displays how Singapore’s happiness ranking can be explained by its life expectancy.

Findings

Something I found before looking at the relationship between different variables was that although a country may have a high overall score does not mean they were one of the top scoring countries for the specific categories, and vice versa. Some countries with a relatively low happiness score had higher values in areas such as GDP per capita. For example, Kuwait, which had an overall ranking of 51, was the country with the 5th highest GDP per capita. Now although GDP is an indicator of how well a country’s economy is doing, it does not directly correlate with the happiness of its citizens. This is because GDP can increase drastically over the years for reasons that do not make citizens happier. For example, when a country goes to war, more weapons are created and more factories are being used to manufacture these weapons, and as a result GDP increases as the country is producing more goods. However, their home country being at war typically does not increase the happiness level of citizens. Several civil wars in middle eastern countries erupting right now may provide an explanation for Kuwait’s high GDP, but relatively low happiness. On the other end of the data, the happiest country, Finland had one of the highest values for perception of corruption. This shows that although Finnish people are considered happy, they have some mistrust in their government.

Happiness Score vs GDP Scatter plot

In this graph, I created a scatter plot to demonstrate the relationship between the GDP of a country and its happiness score. Each dot represents 1 of the 156 countries in this dataset, and it is color coded by the region of the country. The 10 regions included Australia and New Zealand, Central and Eastern Europe, Eastern Asia, Latin America and Caribbean, Middle East and Northern Africa, North America, Southeastern Asia, Southern Asia, Sub-Saharan Africa, and Western Europe. Looking at the results, there seems to be a strong, positive correlation between GDP and happiness score. Although this does not mean a country’s high GDP causes its citizens to have high happiness scores, there is still some relationship between the two. It was also interesting to see which regions scored generally low and which scored generally high. Towards the bottom left, the yellow dots that represent Sub-Saharan Africa can be seen. These markers on the graph indicate that countries in the Sub-Saharan Africa region have generally low happiness scores along with a generally low GDP. On the other end of the graph, in the upper right, most of the values from the Western Europe region can be seen. These countries tended to have some of the highest happiness scores and the highest GDPs.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import seaborn as sns 
import plotly.graph_objs as go
import matplotlib.patches as mpatches
import plotly.io as pio
from plotly.subplots import make_subplots
from plotly.offline import iplot

warnings.filterwarnings("ignore")

path = "U:"

filename = path + "2019.csv"

df = pd.read_csv(filename, usecols = ['Score', 'Region', 'GDPPerCapita'])

categories = np.unique(df['Region'])
colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]

plt.figure(figsize=(15, 10), dpi= 60, facecolor='w', edgecolor='k');

for i, Region in enumerate(categories):
    plt.scatter('GDPPerCapita', 'Score',
               data=df.loc[df.Region==Region, :],
               s=100, color=colors[i], label=str(Region));

plt.title('Country Happiness Score vs GDP per Capita by Region', fontsize=22);    
plt.xlabel('GDP per Capita', fontsize=18);
plt.ylabel('Happiness Score', fontsize=18);
plt.legend(fontsize=14);
plt.gca().set(xlim=(-0.5, 1.8), ylim=(1.5, 9));

plt.show()

Dual Axis Bar Chart of Average GDP and Life Expectancy

For this graph, the average values of GDP and healthy life expectancy are displayed on the y-axes. The data on the x-axis comes from the 10 different regions in this dataset. It was interesting to look at these results to see if there was a relationship between how much a country’s GDP affected its happiness score, and how much the healthy life expectancy affected its happiness score. These average values were taken from each region. It was interesting to see that in a few of the regions with higher GDPs, they also had a higher average life expectancy value. However, when looking at the bars that represent the Latin America and Caribbean region, I found that although this region ranked 8th out of 10 for average GDP with a value of 0.924, they ranked 5th out of 10 for average healthy life expectancy, with a value of 0.812. In general, some countries experience higher life expectancy when they have higher national income overall. However, there are other significant factors that contribute to a healthy life expectancy not displayed here, such as genetics, access to healthcare, diet and nutrition, and crime rates.

df2 = pd.read_csv(filename, usecols = ['Region', 'GDPPerCapita', 'HealthyLifeExpectancy'])

x = df2.groupby(['Region']).agg({'Region':['count'], 'GDPPerCapita':['mean'], 'HealthyLifeExpectancy':['mean']}).reset_index()

x.columns = ['Region', 'Count', 'AverGDP', 'AverLife']

x = x.sort_values('AverGDP', ascending=False)

def autolabel(these_bars, this_ax, place_of_decimals, symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, symbol+format(height, place_of_decimals), 
                    fontsize=11, color='black', ha='center', va='bottom')
        
fig = plt.figure(figsize=(18,10));
ax1 = fig.add_subplot(1, 1, 1);
ax2 = ax1.twinx();
bar_width = 0.4

x_pos = np.arange(10)
aver_gdp_bars = ax1.bar(x_pos-(0.5*bar_width), x.AverGDP, bar_width, color='pink', edgecolor='black', label='Average GDP Per Capita');
aver_life_bars = ax2.bar(x_pos+(0.5*bar_width), x.AverLife, bar_width, color='orange', edgecolor='black', label='Average Healthy Life Expectancy');

ax1.set_xlabel('Region', fontsize=18);
ax1.set_ylabel('Average GDP Per Capita', fontsize=18, labelpad=20);
ax2.set_ylabel('Average Healthy Life Expectancy', fontsize=18, rotation=270, labelpad=20);
ax1.tick_params(axis='y', labelsize=14);
ax2.tick_params(axis='y', labelsize=14);

plt.title('Average GDP and Life Expectancy by Region', fontsize=18);
ax1.set_xticks(x_pos);
ax1.set_xticklabels(x.Region, fontsize=14, rotation=270);

avg1_color, avg1_label = ax1.get_legend_handles_labels();
avg2_color, avg2_label = ax2.get_legend_handles_labels();
legend = ax1.legend(avg1_color + avg2_color, avg1_label + avg2_label, loc='upper right', frameon=True, ncol=1, shadow=True,
                   borderpad=1, fontsize=14);
ax1.set_ylim(0, x.AverGDP.max()*1.50);

autolabel(aver_gdp_bars, ax1, '.3f', '');
autolabel(aver_life_bars, ax2, '.3f', '');

plt.show()

Heatmap of Variable Correlations

For my next visualization, I created a correlogram to visualize the relationships between each variable in the dataset. The colors used were shades of red and blue. The darker the red, the more the two variables had a strong, positive correlation. A lighter shade of red indicated that the two variables still had a positive relationship, just not as strong. On the other end, those squares in dark blue indicate that the variables had a strong, negative correlation. In other words, as one goes up, the other goes down. The lighter shades of blue display relationships that may have little to no correlation at all. The numbers on each square also represent how strong the relationship is. The closer the number is to 0, the weaker the relationship. Looking at the chart, one of the strongest correlations is between GDP per capita and healthy life expectancy. With a value of 0.84, it is evident that there is a relatively positive correlation between these two when looking at all 156 countries and their individual values. This suggests that a country with a high GDP may also have a high healthy life expectancy which contributes to its citizens happiness levels. The visualization may also provide some insight into those variables that are almost not related at all. For example, generosity and overall rank. With a correlation value of -0.048, the two may have a slight negative correlation, meaning that as citizens of a country are more generous, they tend to be less happy and vice versa, but not enough individual values from the set of countries align with this suggestion relationship.

df_new = pd.read_csv(filename)

import seaborn as sns
from matplotlib.ticker import FuncFormatter
import matplotlib.pyplot as matplotlib

fig = plt.figure(figsize=(20,14));
sns.heatmap(df_new.corr(), cmap = 'coolwarm', annot=True, cbar_kws={'label': 'Strength of Correlation', 'orientation':
                                                                  'vertical'});
sns.set(font_scale=1.5);

plt.title('Correlations of Variables Heatmap', fontsize=20, pad=15);

plt.xticks(size=18);
plt.yticks(size=18);

plt.show()

Horizontal Bar Chart of Most Freedom to Make Life Choices

This next chart displays the top 20 countries with the highest values of freedom to make life choices. Citizens of these countries typicallly responded yes when asked the question, “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?” In turn, it was found that responses to this question also contributed greatly to how happy citizens are living in their country. The bars are also color coded by how close they are to the mean of freedom to make life choices of the entire dataset. The pink bars are countries with greater than average freedom to make life choices values, and the green bars are countries whose freedom to make life choices value is below the mean. The black bars represent the countries in the top 20 with values that are within 1% of the mean. The average freedom to make life choices value among the whole dataset was 0.57825, and this number is shown as the dashed line. From the graph, it is clear Uzbekistan was the country ranked highest for their freedom to make life choices, with a value of 0.631.


def pick_colors_according_to_mean_freedom(this_data):
    colors=[]
    avg = this_data.FreedomToMakeLifeChoices.mean()
    for each in this_data.FreedomToMakeLifeChoices:
        if each > avg*1.01:
            colors.append('lightcoral')
        elif each < avg*0.99:
            colors.append('green')
        else:
            colors.append('black')
    return colors

bottom3 = 0
top3 = 20

df_bar = pd.read_csv(filename, usecols = ['Region', 'Country', 'FreedomToMakeLifeChoices'])
df_bar = df_bar.sort_values('FreedomToMakeLifeChoices', ascending=False)
df_bar.reset_index(inplace=True, drop=True)

d3 = df_bar[bottom3:top3]
d3 = d3.sort_values('FreedomToMakeLifeChoices', ascending=True)
d3.reset_index(inplace=True, drop=True)
my_colors3 = pick_colors_according_to_mean_freedom(d3)

Above = mpatches.Patch(color='lightcoral', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of the Average')
Below = mpatches.Patch(color='green', label='Below Average')

fig = plt.figure(figsize=(18,12));
ax1 = fig.add_subplot(1, 1, 1);
ax1.barh(d3.Country, d3.FreedomToMakeLifeChoices, color=my_colors3);

for row_counter, value_at_row_counter in enumerate(d3.FreedomToMakeLifeChoices):
    ax1.text(value_at_row_counter+.03, row_counter, str(value_at_row_counter), color='black', size=12, fontweight='bold')
plt.xlim(0, d3.FreedomToMakeLifeChoices.max()*1.5);


ax1.legend(loc='upper right', handles=[Above, At, Below], fontsize=16);
plt.axvline(d3.FreedomToMakeLifeChoices.mean(), color='black', linestyle='dashed');
ax1.text(d3.FreedomToMakeLifeChoices.mean()+0.01, -.8, 'Mean= '+ str(d3.FreedomToMakeLifeChoices.mean()), fontsize=14, ha='left', va='center');

ax1.set_title('Top ' + str(top3) + ' Freedom to Make Life Choices', size=20);
ax1.set_xlabel('Freedom to Make Life Choices Value', fontsize=20);
ax1.set_ylabel('Country', fontsize=20);
plt.xticks(fontsize=14);
plt.yticks(fontsize=14);
plt.show()

Line Graph of Generosity and Social Support in Top 100 Happiest Countries

For this graph, I wanted to see if countries whose happiness ranking is closely related to their social support score also had a generosity score that contributed to their happiness. This line graph specifically looks at the top 100 happiest countries. The question asked to participants to determine the value of social support was, “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them?” To determine generosity, participants were asked whether or not they had donated to charity in the last month. I thought it would be interesting to see if there was any relationship between these two variables, as I speculated people would feel more willing to help others and donate to those in need if they had a good support system of their friends and family who are willing to help them when they need it. However, as found in the correlogram earlier, these two variables had a -0.048 correlation value among all 156 countries, meaning that there was not much of a relationship, but it was slightly negative. I wanted to see if this trend was different in the top 100 happiest countries, or point out specific countries that supported the negative correlation. One country whose values stuck out to me was Bulgaria, which is ranked 97 out of the top 100 happiest countries. Bulgaria had one of the highest social support values, at 1.513, meaning Bulgarian citizens generally believe they can rely on their friends and family as a support system in hard times, and this contributes greatly to their happiness levels. However, Bulgaria had one of the lowest generosity values, at 0.081, meaning not many Bulgarian citizens had donated to charity recently. Bulgaria’s values were interesting to point out, since they stick out on the graph and support the negative correlation found between generosity and social support.


df_new.columns = ['Rank', 'Country', 'Score', 'GDPPerCapita', 'SocialSupport',
       'HealthyLifeExpectancy', 'FreedomToMakeLifeChoices', 'Generosity',
       'PerceptionsOfCorruption', 'Region']
       
df_new = df_new.iloc[:100, :]

trace1= go.Scatter(x = df_new.Rank, 
                   y = df_new.Generosity, 
                   mode = 'lines + markers', 
                   name = 'Generosity', 
                   marker = dict(color = 'rgba(0, 128, 0, 1.0)'), 
                   text = df_new.Country)

trace2= go.Scatter(x = df_new.Rank, 
                   y = df_new.SocialSupport, 
                  mode = 'lines + markers', 
                  name = 'Social Support', 
                  marker = dict(color = 'rgba(255, 0, 255, 0.6)'), 
                  text = df_new.Country)

#p1 = go.Figure(data=[trace1])
#p2 = go.Figure(data=[trace2])

from plotly.subplots import make_subplots
fig_grid = make_subplots(rows=2, 
                         cols=1, 
                         x_title='Happiness Rank',
                         shared_xaxes=True);
fig_grid.add_trace(trace1,row=1, col=1);
fig_grid.add_trace(trace2,row=2, col=1);
fig_grid.update_layout(title='Generosity and Social Support in Top 100 Happiest Countries', title_x=0.5);
fig_grid.show()

#data = [trace1, trace2]

#layout = dict(title = 'Generosity and Social Support in Top 100 Happiest Countries',
#             xaxis = dict(title = 'Happiness Rank', ticklen = 5, zeroline = True))

#fig = dict(data = data, layout = layout)
#iplot(fig) this line does not print out in the knitted RMD file
#p1.show()

Geographical Visualization of Happiness Scores

In this visual, a 2d world map is shown. This map is color coded by the happiness score of the country. The more green a country appears on the map, the higher its happiness score is. The more red a country appears on the map, the lower its happiness score is. I thought this would be interesting to visualize the general happiness trends among different continents. Although this data can be found in the original dataset, it is much easier to view and understand on a map. From this visualization, it appears Africa has the most countries with low happiness scores, as the majority of the continent is red or orange. On the other end, most of Europe seems to be green, meaning it has the majority of the happiest countries. This can be related to income, health, and freedom of each country.


import plotly.graph_objects as go
from plotly.offline import iplot

data = dict(type = 'choropleth',
           locations = df_new['Country'],
           locationmode = 'country names',
           colorscale='RdYlGn',
           z = df_new['Score'],
           colorbar = {'title': 'Happiness Score'})

layout = dict(title = 'Geographical Visualization of Happiness Score', titlefont = dict(size=26),
             geo = dict(showframe = True, projection = {'type': 'equirectangular'}))

choromap3 = go.Figure(data = [data], layout=layout)
#iplot(choromap3)
pio.show(choromap3)

Conclusion

After reviewing the data, it was interesting to not only see which individual countries ranked high or low in terms of happiness, but also when grouped together, which regions were overall happy or unhappy. After creating multiple visualizations, Sub-Saharan Africa appeared to have the most unhappy countries, and Western Europe, specifically the Nordic countries, seemed to be the happiest. There are many key factors that contribute to how happy citizens rank themselves to be. Many African countries lack the infrastructure needed to keep up with rapid population growth. If governments cannot provide their citizens with basic necessities such as food and shelter, it is highly unlikely citizens will feel happy living under that government reign. Something I found particularly intriguing was the slightly positive relationship between perception of corruption and happiness score. When looking at the data, I found that 6 out of the 10 countries with the highest values for perception of corruption were also 6 out of the 10 countries with the highest happiness ranking. This was interesting to me, as I thought people who felt their government and local businesses were dishonest would not be happy living in that country. However, as this question was only presented to participants as a yes or no question that asks if corruption is widespread throughout the government and businesses, there is not much room for participants to say if they felt their government was slightly or moderately corrupt. Overall, this data provided rich insight into what makes humans happy and makes their lives worth living.