Introduction

This report will be looking at the connection between the average healthy life years and the average household savings rate for countries in the EU.

Data

Procuring Data

Both data sets used in this project were obtained from eurostat. The data on healthy life years can be found here and the data on gross household savings rate can be found here. I was able to confirm that the data is open by checking the copyright notice.

Importing Data

Data was loaded from excel spreadsheets.

library(dplyr)
library(tidyverse)
library(rsconnect) # loaded to publish on rpubs
library(knitr)
library(readxl)
AvgHouseholdSavings <- read_excel("AvgHouseholdSavings.xlsx", 
    na = "NA")
HealthyLifeYears <- read_excel("HealthyLifeYears.xlsx", 
    na = "NA")

Joining Data

First, I changed both datasets from wide to long. I then made a new tibble called ‘HealthySavings’, which combines both datasets by country with an inner join. I renamed some columns for clarity.

AvgHouseholdSavings2 <- gather(AvgHouseholdSavings, "year", "savingsrate", -TIME)

HealthyLifeYears2 <- gather(HealthyLifeYears, "year", "age", -TIME)

HealthySavings<- HealthyLifeYears2 %>% inner_join(AvgHouseholdSavings2, by= c("TIME", "year"))

HealthySavings2 <- HealthySavings %>% drop_na(any_of(c("savingsrate", "age")))

HealthySavings3 <- HealthySavings2 %>% rename("Country" = "TIME", "livableyears" = "age")

Plots

Scatterplot

For my first chart, I made a scatterplot. Because there is so much data, I am not concerned about which country is which or what year the data is from, but am instead trying to get a broad view on whether livable years and savings rate could be related.

ggplot(data=HealthySavings3, aes(y= savingsrate, x= livableyears)) +
  geom_point()

Boxplots

Since there was no apparent trend in the scatter plot, I created two boxplots from the data. This is a valuable visual because you are able to get a quick view of central tendency while also seeing the variance in the spread of the data. This was especially helpful since the data appeared so weakly correlated.

ggplot(data=HealthySavings3, aes(x= year, y=savingsrate)) +
  geom_boxplot(outlier.color = "red")

ggplot(data=HealthySavings3, aes(x= year, y=livableyears)) +
  geom_boxplot(outlier.color= "red")

Table

Below is a table of the 10 countries with the highest number of livable years in 2018 (the most recent in the dataset). I was curious to see whether these countries had similar savings rates.

Top10LivableYears <- HealthySavings3 %>% 
  filter(year== "2018") %>%
  arrange(desc(livableyears)) %>% top_n(10)
kable(Top10LivableYears, padding = 10)
Country year livableyears savingsrate
Sweden 2018 72.8 16.01
Norway 2018 70.4 12.63
Germany (until 1990 former territory of the FRG) 2018 65.8 18.32
France 2018 63.9 14.11
Czechia 2018 62.7 12.12
Hungary 2018 61.1 12.68
Luxembourg 2018 60.7 21.41
Netherlands 2018 59.2 15.64
Austria 2018 56.9 13.23
Slovenia 2018 55.5 13.52

Conclusion

While it would have been interesting to see that a country’s livable years was related to their savings rate (perhaps meaning that both are related to a people’s broader wealth), this does not appear to be the case. That being said, wealth and savings do not necessarily go hand in hand and could vary culturally. There also does not seem to be anything particularly impactful to observe in the change in savings rate or in livable years over the course of this dataset’s timeframe. It would have been interesting to see data that went back a little farther so that we could track the savings rate changes proceeding and through the 2008 recession.