Introduction

This project will be analyzing the World Happiness Reports of 2015, 2016, and 2017. This data is extrapolated from Gallup poll results. These data sets create scores of happiness for countries and regions, based on 6 factors: GDP per capita, family or social support, health as defined by years of life expectancy, perceived freedom to make life decisions, trust in government, and generosity measured by recent donations.

I chose to create two visualizations. The first shows change over time for Brazilian, Russian, Indian, and Chinese economies. I thought it would be interesting to examine these countries, because they are compared due to similar statuses as countries in the process of being developed industrially. My second visualization shows the Happiness Score versus the Freedom Score factor in 2017 for all countries. I also included a third variable of GDP, which is represented by the size of the points on the chart. I thought it would be interesting to see how the Freedom variable made an impact on Happiness scores, if at all. If perceived freedom to make life decisions would mean a higher happiness score, than we would expect that scatter plot to appear linear with a positive slope.

I chose this topic and dataset, because I wanted to better understand the World Happiness Report and how certain factors may have impacted the Happiness score. I think it is interesting to compare countries this way, as it gives several variables beyond just quality of life. I was also interested in working with different data sets, so I could practice cleaning and binding them together.

Reading the data sets

First, I imported the three data sets and stored it in the “district” variable, so it is easier to call. I used the readr package.

library("readr")
happiness2015 <-read_csv("happiness2015.csv")
## Parsed with column specification:
## cols(
##   Country = col_character(),
##   Region = col_character(),
##   `Happiness Rank` = col_double(),
##   `Happiness Score` = col_double(),
##   `Standard Error` = col_double(),
##   `Economy (GDP per Capita)` = col_double(),
##   Family = col_double(),
##   `Health (Life Expectancy)` = col_double(),
##   Freedom = col_double(),
##   `Trust (Government Corruption)` = col_double(),
##   Generosity = col_double(),
##   `Dystopia Residual` = col_double()
## )
happiness2016 <-read_csv("happiness2016.csv")
## Parsed with column specification:
## cols(
##   Country = col_character(),
##   Region = col_character(),
##   `Happiness Rank` = col_double(),
##   `Happiness Score` = col_double(),
##   `Lower Confidence Interval` = col_double(),
##   `Upper Confidence Interval` = col_double(),
##   `Economy (GDP per Capita)` = col_double(),
##   Family = col_double(),
##   `Health (Life Expectancy)` = col_double(),
##   Freedom = col_double(),
##   `Trust (Government Corruption)` = col_double(),
##   Generosity = col_double(),
##   `Dystopia Residual` = col_double()
## )
happiness2017 <-read_csv("happiness2017.csv")
## Parsed with column specification:
## cols(
##   Country = col_character(),
##   Happiness.Rank = col_double(),
##   Happiness.Score = col_double(),
##   Whisker.high = col_double(),
##   Whisker.low = col_double(),
##   Economy..GDP.per.Capita. = col_double(),
##   Family = col_double(),
##   Health..Life.Expectancy. = col_double(),
##   Freedom = col_double(),
##   Generosity = col_double(),
##   Trust..Government.Corruption. = col_double(),
##   Dystopia.Residual = col_double()
## )

I used functions from the dplyr, tidyr, ggplot2, and plotly libraries. They can be installed with install.packages() if it is not already on the device.

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(plotly)
## Warning: package 'plotly' was built under R version 3.6.1
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Exploring the data sets

I assumed the imported data will be a data frame, as it contained data of different data types. I confirmed this by utilizing the class() function. I also checked the number of rows and columns in the data frame using the dim() function.

#checking characteristics of the dataset
class(happiness2015)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
class(happiness2016)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
class(happiness2017)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
#checking the dimensions of the data frames
dim(happiness2015)
## [1] 158  12
dim(happiness2016)
## [1] 157  13
dim(happiness2017)
## [1] 155  12

I checked the structure of the data frame to see the names of the columns and tables as well as the specific data type of each column.

## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 158 obs. of  12 variables:
##  $ Country                      : chr  "Switzerland" "Iceland" "Denmark" "Norway" ...
##  $ Region                       : chr  "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ Happiness Rank               : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Happiness Score              : num  7.59 7.56 7.53 7.52 7.43 ...
##  $ Standard Error               : num  0.0341 0.0488 0.0333 0.0388 0.0355 ...
##  $ Economy (GDP per Capita)     : num  1.4 1.3 1.33 1.46 1.33 ...
##  $ Family                       : num  1.35 1.4 1.36 1.33 1.32 ...
##  $ Health (Life Expectancy)     : num  0.941 0.948 0.875 0.885 0.906 ...
##  $ Freedom                      : num  0.666 0.629 0.649 0.67 0.633 ...
##  $ Trust (Government Corruption): num  0.42 0.141 0.484 0.365 0.33 ...
##  $ Generosity                   : num  0.297 0.436 0.341 0.347 0.458 ...
##  $ Dystopia Residual            : num  2.52 2.7 2.49 2.47 2.45 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Country = col_character(),
##   ..   Region = col_character(),
##   ..   `Happiness Rank` = col_double(),
##   ..   `Happiness Score` = col_double(),
##   ..   `Standard Error` = col_double(),
##   ..   `Economy (GDP per Capita)` = col_double(),
##   ..   Family = col_double(),
##   ..   `Health (Life Expectancy)` = col_double(),
##   ..   Freedom = col_double(),
##   ..   `Trust (Government Corruption)` = col_double(),
##   ..   Generosity = col_double(),
##   ..   `Dystopia Residual` = col_double()
##   .. )
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 157 obs. of  13 variables:
##  $ Country                      : chr  "Denmark" "Switzerland" "Iceland" "Norway" ...
##  $ Region                       : chr  "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ Happiness Rank               : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Happiness Score              : num  7.53 7.51 7.5 7.5 7.41 ...
##  $ Lower Confidence Interval    : num  7.46 7.43 7.33 7.42 7.35 ...
##  $ Upper Confidence Interval    : num  7.59 7.59 7.67 7.58 7.47 ...
##  $ Economy (GDP per Capita)     : num  1.44 1.53 1.43 1.58 1.41 ...
##  $ Family                       : num  1.16 1.15 1.18 1.13 1.13 ...
##  $ Health (Life Expectancy)     : num  0.795 0.863 0.867 0.796 0.811 ...
##  $ Freedom                      : num  0.579 0.586 0.566 0.596 0.571 ...
##  $ Trust (Government Corruption): num  0.445 0.412 0.15 0.358 0.41 ...
##  $ Generosity                   : num  0.362 0.281 0.477 0.379 0.255 ...
##  $ Dystopia Residual            : num  2.74 2.69 2.83 2.66 2.83 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Country = col_character(),
##   ..   Region = col_character(),
##   ..   `Happiness Rank` = col_double(),
##   ..   `Happiness Score` = col_double(),
##   ..   `Lower Confidence Interval` = col_double(),
##   ..   `Upper Confidence Interval` = col_double(),
##   ..   `Economy (GDP per Capita)` = col_double(),
##   ..   Family = col_double(),
##   ..   `Health (Life Expectancy)` = col_double(),
##   ..   Freedom = col_double(),
##   ..   `Trust (Government Corruption)` = col_double(),
##   ..   Generosity = col_double(),
##   ..   `Dystopia Residual` = col_double()
##   .. )
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 155 obs. of  12 variables:
##  $ Country                      : chr  "Norway" "Denmark" "Iceland" "Switzerland" ...
##  $ Happiness.Rank               : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Happiness.Score              : num  7.54 7.52 7.5 7.49 7.47 ...
##  $ Whisker.high                 : num  7.59 7.58 7.62 7.56 7.53 ...
##  $ Whisker.low                  : num  7.48 7.46 7.39 7.43 7.41 ...
##  $ Economy..GDP.per.Capita.     : num  1.62 1.48 1.48 1.56 1.44 ...
##  $ Family                       : num  1.53 1.55 1.61 1.52 1.54 ...
##  $ Health..Life.Expectancy.     : num  0.797 0.793 0.834 0.858 0.809 ...
##  $ Freedom                      : num  0.635 0.626 0.627 0.62 0.618 ...
##  $ Generosity                   : num  0.362 0.355 0.476 0.291 0.245 ...
##  $ Trust..Government.Corruption.: num  0.316 0.401 0.154 0.367 0.383 ...
##  $ Dystopia.Residual            : num  2.28 2.31 2.32 2.28 2.43 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Country = col_character(),
##   ..   Happiness.Rank = col_double(),
##   ..   Happiness.Score = col_double(),
##   ..   Whisker.high = col_double(),
##   ..   Whisker.low = col_double(),
##   ..   Economy..GDP.per.Capita. = col_double(),
##   ..   Family = col_double(),
##   ..   Health..Life.Expectancy. = col_double(),
##   ..   Freedom = col_double(),
##   ..   Generosity = col_double(),
##   ..   Trust..Government.Corruption. = col_double(),
##   ..   Dystopia.Residual = col_double()
##   .. )

I then checked the head and tails of the data. I set the second argument in these functions to 10 in order to see the first 10 observations at the beginning and end of the data frames.

#head of data
head(happiness2015, 10)
## # A tibble: 10 x 12
##    Country Region `Happiness Rank` `Happiness Scor~ `Standard Error`
##    <chr>   <chr>             <dbl>            <dbl>            <dbl>
##  1 Switze~ Weste~                1             7.59           0.0341
##  2 Iceland Weste~                2             7.56           0.0488
##  3 Denmark Weste~                3             7.53           0.0333
##  4 Norway  Weste~                4             7.52           0.0388
##  5 Canada  North~                5             7.43           0.0355
##  6 Finland Weste~                6             7.41           0.0314
##  7 Nether~ Weste~                7             7.38           0.0280
##  8 Sweden  Weste~                8             7.36           0.0316
##  9 New Ze~ Austr~                9             7.29           0.0337
## 10 Austra~ Austr~               10             7.28           0.0408
## # ... with 7 more variables: `Economy (GDP per Capita)` <dbl>,
## #   Family <dbl>, `Health (Life Expectancy)` <dbl>, Freedom <dbl>, `Trust
## #   (Government Corruption)` <dbl>, Generosity <dbl>, `Dystopia
## #   Residual` <dbl>
head(happiness2016, 10)
## # A tibble: 10 x 13
##    Country Region `Happiness Rank` `Happiness Scor~ `Lower Confiden~
##    <chr>   <chr>             <dbl>            <dbl>            <dbl>
##  1 Denmark Weste~                1             7.53             7.46
##  2 Switze~ Weste~                2             7.51             7.43
##  3 Iceland Weste~                3             7.50             7.33
##  4 Norway  Weste~                4             7.50             7.42
##  5 Finland Weste~                5             7.41             7.35
##  6 Canada  North~                6             7.40             7.34
##  7 Nether~ Weste~                7             7.34             7.28
##  8 New Ze~ Austr~                8             7.33             7.26
##  9 Austra~ Austr~                9             7.31             7.24
## 10 Sweden  Weste~               10             7.29             7.23
## # ... with 8 more variables: `Upper Confidence Interval` <dbl>, `Economy
## #   (GDP per Capita)` <dbl>, Family <dbl>, `Health (Life
## #   Expectancy)` <dbl>, Freedom <dbl>, `Trust (Government
## #   Corruption)` <dbl>, Generosity <dbl>, `Dystopia Residual` <dbl>
head(happiness2017, 10)
## # A tibble: 10 x 12
##    Country Happiness.Rank Happiness.Score Whisker.high Whisker.low
##    <chr>            <dbl>           <dbl>        <dbl>       <dbl>
##  1 Norway               1            7.54         7.59        7.48
##  2 Denmark              2            7.52         7.58        7.46
##  3 Iceland              3            7.50         7.62        7.39
##  4 Switze~              4            7.49         7.56        7.43
##  5 Finland              5            7.47         7.53        7.41
##  6 Nether~              6            7.38         7.43        7.33
##  7 Canada               7            7.32         7.38        7.25
##  8 New Ze~              8            7.31         7.38        7.25
##  9 Sweden               9            7.28         7.34        7.22
## 10 Austra~             10            7.28         7.36        7.21
## # ... with 7 more variables: Economy..GDP.per.Capita. <dbl>, Family <dbl>,
## #   Health..Life.Expectancy. <dbl>, Freedom <dbl>, Generosity <dbl>,
## #   Trust..Government.Corruption. <dbl>, Dystopia.Residual <dbl>
#tail of data
tail(happiness2015, 10)
## # A tibble: 10 x 12
##    Country Region `Happiness Rank` `Happiness Scor~ `Standard Error`
##    <chr>   <chr>             <dbl>            <dbl>            <dbl>
##  1 Chad    Sub-S~              149             3.67           0.0383
##  2 Guinea  Sub-S~              150             3.66           0.0359
##  3 Ivory ~ Sub-S~              151             3.66           0.0514
##  4 Burkin~ Sub-S~              152             3.59           0.0432
##  5 Afghan~ South~              153             3.58           0.0308
##  6 Rwanda  Sub-S~              154             3.46           0.0346
##  7 Benin   Sub-S~              155             3.34           0.0366
##  8 Syria   Middl~              156             3.01           0.0502
##  9 Burundi Sub-S~              157             2.90           0.0866
## 10 Togo    Sub-S~              158             2.84           0.0673
## # ... with 7 more variables: `Economy (GDP per Capita)` <dbl>,
## #   Family <dbl>, `Health (Life Expectancy)` <dbl>, Freedom <dbl>, `Trust
## #   (Government Corruption)` <dbl>, Generosity <dbl>, `Dystopia
## #   Residual` <dbl>
tail(happiness2016, 10)
## # A tibble: 10 x 13
##    Country Region `Happiness Rank` `Happiness Scor~ `Lower Confiden~
##    <chr>   <chr>             <dbl>            <dbl>            <dbl>
##  1 Madaga~ Sub-S~              148             3.70             3.62
##  2 Tanzan~ Sub-S~              149             3.67             3.56
##  3 Liberia Sub-S~              150             3.62             3.46
##  4 Guinea  Sub-S~              151             3.61             3.53
##  5 Rwanda  Sub-S~              152             3.52             3.44
##  6 Benin   Sub-S~              153             3.48             3.40
##  7 Afghan~ South~              154             3.36             3.29
##  8 Togo    Sub-S~              155             3.30             3.19
##  9 Syria   Middl~              156             3.07             2.94
## 10 Burundi Sub-S~              157             2.90             2.73
## # ... with 8 more variables: `Upper Confidence Interval` <dbl>, `Economy
## #   (GDP per Capita)` <dbl>, Family <dbl>, `Health (Life
## #   Expectancy)` <dbl>, Freedom <dbl>, `Trust (Government
## #   Corruption)` <dbl>, Generosity <dbl>, `Dystopia Residual` <dbl>
tail(happiness2017, 10)
## # A tibble: 10 x 12
##    Country Happiness.Rank Happiness.Score Whisker.high Whisker.low
##    <chr>            <dbl>           <dbl>        <dbl>       <dbl>
##  1 Yemen              146            3.59         3.69        3.49
##  2 South ~            147            3.59         3.73        3.46
##  3 Liberia            148            3.53         3.65        3.41
##  4 Guinea             149            3.51         3.58        3.43
##  5 Togo               150            3.49         3.59        3.40
##  6 Rwanda             151            3.47         3.54        3.40
##  7 Syria              152            3.46         3.66        3.26
##  8 Tanzan~            153            3.35         3.46        3.24
##  9 Burundi            154            2.90         3.07        2.74
## 10 Centra~            155            2.69         2.86        2.52
## # ... with 7 more variables: Economy..GDP.per.Capita. <dbl>, Family <dbl>,
## #   Health..Life.Expectancy. <dbl>, Freedom <dbl>, Generosity <dbl>,
## #   Trust..Government.Corruption. <dbl>, Dystopia.Residual <dbl>

Cleaning the data sets

To make it easier to work with the data, it was necessary to clean the data extensively. All three data sets had different labels for the same columns, and some data sets did not have data that others did. In order to bind them together as one data frame, I renamed the columns with consistency and removed columns that did not exist in all three data sets. I replaced spaces with underscores, so that they were easier to call. I also added a column in each data frame for the year it represented. Finally, I ordered the columns in all of the data frames in descending order, so that they would bind without any errors.

####### cleaning the 2015 data set
happiness2015_cleaned <- happiness2015 %>% select(-(Region), -(`Standard Error`)) %>% dplyr::rename(Happiness_Rank = `Happiness Rank`, Happiness_Score = `Happiness Score`, Economy_GDP_per_Capita = `Economy (GDP per Capita)`, Health_Life_Expectancy = `Health (Life Expectancy)`, Trust_Govt_Corruption = `Trust (Government Corruption)`, Dystopia_Residual = `Dystopia Residual`)

#adding the year column
happiness2015_cleaned['Year'] = '2015'

#reordering the columns
happiness2015_final <- happiness2015_cleaned[,order(colnames(happiness2015_cleaned),decreasing=TRUE)]

#checking the changes for the 2017 data set
head(happiness2015_final)
## # A tibble: 6 x 11
##   Year  Trust_Govt_Corr~ Health_Life_Exp~ Happiness_Score Happiness_Rank
##   <chr>            <dbl>            <dbl>           <dbl>          <dbl>
## 1 2015             0.420            0.941            7.59              1
## 2 2015             0.141            0.948            7.56              2
## 3 2015             0.484            0.875            7.53              3
## 4 2015             0.365            0.885            7.52              4
## 5 2015             0.330            0.906            7.43              5
## 6 2015             0.414            0.889            7.41              6
## # ... with 6 more variables: Generosity <dbl>, Freedom <dbl>,
## #   Family <dbl>, Economy_GDP_per_Capita <dbl>, Dystopia_Residual <dbl>,
## #   Country <chr>
####### cleaning the 2016 data set
happiness2016_cleaned <- happiness2016 %>% select(-(Region), -(`Upper Confidence Interval`), -(`Lower Confidence Interval`)) %>% dplyr::rename(Happiness_Rank = `Happiness Rank`, Happiness_Score = `Happiness Score`, Economy_GDP_per_Capita = `Economy (GDP per Capita)`, Health_Life_Expectancy = `Health (Life Expectancy)`, Trust_Govt_Corruption = `Trust (Government Corruption)`, Dystopia_Residual = `Dystopia Residual`)

#adding the year column
happiness2016_cleaned['Year'] = '2016'

#reordering the columns
happiness2016_final <- happiness2016_cleaned[,order(colnames(happiness2016_cleaned),decreasing=TRUE)]

#checking the changes for the 2017 data set
head(happiness2016_final)
## # A tibble: 6 x 11
##   Year  Trust_Govt_Corr~ Health_Life_Exp~ Happiness_Score Happiness_Rank
##   <chr>            <dbl>            <dbl>           <dbl>          <dbl>
## 1 2016             0.445            0.795            7.53              1
## 2 2016             0.412            0.863            7.51              2
## 3 2016             0.150            0.867            7.50              3
## 4 2016             0.358            0.796            7.50              4
## 5 2016             0.410            0.811            7.41              5
## 6 2016             0.313            0.828            7.40              6
## # ... with 6 more variables: Generosity <dbl>, Freedom <dbl>,
## #   Family <dbl>, Economy_GDP_per_Capita <dbl>, Dystopia_Residual <dbl>,
## #   Country <chr>
####### cleaning the 2017 data set
happiness2017_cleaned <- happiness2017 %>% select(-(Whisker.low), -(Whisker.high)) %>% dplyr::rename(Happiness_Rank = Happiness.Rank, Happiness_Score = Happiness.Score, Economy_GDP_per_Capita = Economy..GDP.per.Capita., Health_Life_Expectancy = Health..Life.Expectancy., Trust_Govt_Corruption = Trust..Government.Corruption., Dystopia_Residual = Dystopia.Residual)

#adding the year column
happiness2017_cleaned['Year'] = '2017'

#reordering the columns
happiness2017_final <- happiness2017_cleaned[,order(colnames(happiness2017_cleaned),decreasing=TRUE)]

#checking the changes for the 2017 data set
head(happiness2017_final)
## # A tibble: 6 x 11
##   Year  Trust_Govt_Corr~ Health_Life_Exp~ Happiness_Score Happiness_Rank
##   <chr>            <dbl>            <dbl>           <dbl>          <dbl>
## 1 2017             0.316            0.797            7.54              1
## 2 2017             0.401            0.793            7.52              2
## 3 2017             0.154            0.834            7.50              3
## 4 2017             0.367            0.858            7.49              4
## 5 2017             0.383            0.809            7.47              5
## 6 2017             0.283            0.811            7.38              6
## # ... with 6 more variables: Generosity <dbl>, Freedom <dbl>,
## #   Family <dbl>, Economy_GDP_per_Capita <dbl>, Dystopia_Residual <dbl>,
## #   Country <chr>

Creating a new Data Frame

This code created the new data frame utilizing the rbind() function. I checked the heads and tails to ensure that all data was combined accurately.

#merging the two data frames by the variable "Country"
happiness_full <- rbind(happiness2015_final, happiness2016_final, happiness2017_final)
head(happiness_full)
## # A tibble: 6 x 11
##   Year  Trust_Govt_Corr~ Health_Life_Exp~ Happiness_Score Happiness_Rank
##   <chr>            <dbl>            <dbl>           <dbl>          <dbl>
## 1 2015             0.420            0.941            7.59              1
## 2 2015             0.141            0.948            7.56              2
## 3 2015             0.484            0.875            7.53              3
## 4 2015             0.365            0.885            7.52              4
## 5 2015             0.330            0.906            7.43              5
## 6 2015             0.414            0.889            7.41              6
## # ... with 6 more variables: Generosity <dbl>, Freedom <dbl>,
## #   Family <dbl>, Economy_GDP_per_Capita <dbl>, Dystopia_Residual <dbl>,
## #   Country <chr>
tail(happiness_full)
## # A tibble: 6 x 11
##   Year  Trust_Govt_Corr~ Health_Life_Exp~ Happiness_Score Happiness_Rank
##   <chr>            <dbl>            <dbl>           <dbl>          <dbl>
## 1 2017            0.0957           0.247             3.49            150
## 2 2017            0.455            0.326             3.47            151
## 3 2017            0.151            0.501             3.46            152
## 4 2017            0.0660           0.365             3.35            153
## 5 2017            0.0841           0.152             2.90            154
## 6 2017            0.0566           0.0188            2.69            155
## # ... with 6 more variables: Generosity <dbl>, Freedom <dbl>,
## #   Family <dbl>, Economy_GDP_per_Capita <dbl>, Dystopia_Residual <dbl>,
## #   Country <chr>

Creating a chart for GDP of BRIC Economies

This chart shows GDP per capita, for the 4 “BRIC” countries from the time period 2015 to 2017, according to the data in our data set. BRIC is an acronym for Brazil, Russia, India, and China. They are grouped together, because they are deemed developed countries at similar stages of newly advanced economic development, on their way to becoming developed countries. I thought it may be interesting to visualize information about the BRIC countries.

chart1 <- happiness_full %>% filter(Country=="Brazil" | Country=="Russia" | Country=="India" | Country=="China") %>% ggplot(aes(x = Year, y = Economy_GDP_per_Capita, color=Country, group=Country)) +
  xlab("year") + 
  ylab("GDP (per capita)") +
  ggtitle("Rising GDP of BRIC Countries") +
  scale_color_brewer(palette = "Set1") +
  theme_minimal(base_size = 12) +
  geom_point () +
  geom_line()
chart1

The chart is interesting, because it shows that while their GDP per Capita increased over these three years in a similar pattern, the GDP of India remained the smallest, then China, India, and Russia consistently had the greatest GDP per Capita. I am surprised, because I would have assumed that China had the greatest GDP per capita, as it has one of the largest global economies in the world. I also realize that this graph is not very representative of GDP per capita over time, because it is limited by data only over 3 years. Further, this value was provided in per capita already. This made me realize that this data set is lacking, because it provides no raw data. It would be interesting to recreate this graph with other raw data and see if the results still look similar.

Creating a chart for Happiness Score vs. Freedom for Countries in 2017

This chart shows the Happiness Score versus Freedom Score for countries in the dataset in the year 2017. GDP per capita is represented by the size of the blue circles. The reduced opacity enables users to more easily see the overlapping points.

chart2 <- happiness_full %>% filter(Year=="2017") %>% ggplot(aes(x = Happiness_Score, y = Freedom, size = Economy_GDP_per_Capita, text = paste("Country:", Country))) +
  theme_minimal(base_size = 12) +
  geom_point(alpha = 0.3, color = "blue") +
  ggtitle("Happiness Score vs. Freedom Score in 2017", subtitle = "Sizes of circles are proportional to GDP") +
  xlab("Happiness Score") + 
  ylab("Freedom Score")
chart2 <- ggplotly(chart2)
chart2

This freedom score is perceived freedom to make life decisions. I think the title of the variable in the data set can be a bit misleading, because I had originally assumed that freedom referred to incarceration rates or extent of human rights. The perceived freedom to make decisions is more ambiguous, and people ranking it for themselves seems inaccurate, because people’s preferences or utility cannot really be compared. I found that my assumption that the freedom and happiness scores would be correlated is correct; overall, there is a trend that countries with higher Freedom scores had Higher happiness scores. However, there is quite a bit of variation and it is not entirely linear with a positive slope.

Final Conclusions

I think this data set was flawed in that it failed to provide raw data from which the scores were developed. I am not entirely sure if these visual representations can be fully explantory, as they are only showing scores already calculated to make up the total Happiness score. These values may better be explained by a regression, which can show how the happiness score was impacted by each factor; however, this would mean writing a regression for each country or region.

Originally, I wanted to created visualizations by region, but 2017 did not have this information. It was interesting to clean data sets for the same survey over different years, and to see how similar and different this data was represented in each data frame. I was surprised at the lack of consistency, and at the use of spaces and parenthesis in column names, because this made the data much more difficult to work with.

My recommendation for the surveryors is to provide the raw data for at least some of the variables, such as population and GDP. I also think it would be useful for them to write variable names in the codebook that are easier to call and consistent with previous years.