This report will be looking at the connection between the average healthy life years and the average household savings rate for countries in the EU.
Both data sets used in this project were obtained from eurostat. The data on healthy life years can be found here and the data on gross household savings rate can be found here. I was able to confirm that the data is open by checking the copyright notice.
Data was loaded from excel spreadsheets.
library(dplyr)
library(tidyverse)
library(rsconnect) # loaded to publish on rpubs
library(knitr)
library(readxl)
AvgHouseholdSavings <- read_excel("AvgHouseholdSavings.xlsx",
na = "NA")
HealthyLifeYears <- read_excel("HealthyLifeYears.xlsx",
na = "NA")
First, I changed both datasets from wide to long. I then made a new tibble called ‘HealthySavings’, which combines both datasets by country with an inner join. I renamed some columns for clarity.
AvgHouseholdSavings2 <- gather(AvgHouseholdSavings, "year", "savingsrate", -TIME)
HealthyLifeYears2 <- gather(HealthyLifeYears, "year", "age", -TIME)
HealthySavings<- HealthyLifeYears2 %>% inner_join(AvgHouseholdSavings2, by= c("TIME", "year"))
HealthySavings2 <- HealthySavings %>% drop_na(any_of(c("savingsrate", "age")))
HealthySavings3 <- HealthySavings2 %>% rename("Country" = "TIME", "livableyears" = "age")
For my first chart, I made a scatterplot. Because there is so much data, I am not concerned about which country is which or what year the data is from, but am instead trying to get a broad view on whether livable years and savings rate could be related.
ggplot(data=HealthySavings3, aes(y= savingsrate, x= livableyears)) +
geom_point()
Since there was no apparent trend in the scatter plot, I created two boxplots from the data. This is a valuable visual because you are able to get a quick view of central tendency while also seeing the variance in the spread of the data. This was especially helpful since the data appeared so weakly correlated.
ggplot(data=HealthySavings3, aes(x= year, y=savingsrate)) +
geom_boxplot(outlier.color = "red")
ggplot(data=HealthySavings3, aes(x= year, y=livableyears)) +
geom_boxplot(outlier.color= "red")
Below is a table of the 10 countries with the highest number of livable years in 2018 (the most recent in the dataset). I was curious to see whether these countries had similar savings rates.
Top10LivableYears <- HealthySavings3 %>%
filter(year== "2018") %>%
arrange(desc(livableyears)) %>% top_n(10)
kable(Top10LivableYears, padding = 10)
| Country | year | livableyears | savingsrate |
|---|---|---|---|
| Sweden | 2018 | 72.8 | 16.01 |
| Norway | 2018 | 70.4 | 12.63 |
| Germany (until 1990 former territory of the FRG) | 2018 | 65.8 | 18.32 |
| France | 2018 | 63.9 | 14.11 |
| Czechia | 2018 | 62.7 | 12.12 |
| Hungary | 2018 | 61.1 | 12.68 |
| Luxembourg | 2018 | 60.7 | 21.41 |
| Netherlands | 2018 | 59.2 | 15.64 |
| Austria | 2018 | 56.9 | 13.23 |
| Slovenia | 2018 | 55.5 | 13.52 |
While it would have been interesting to see that a country’s livable years was related to their savings rate (perhaps meaning that both are related to a people’s broader wealth), this does not appear to be the case. That being said, wealth and savings do not necessarily go hand in hand and could vary culturally. There also does not seem to be anything particularly impactful to observe in the change in savings rate or in livable years over the course of this dataset’s timeframe. It would have been interesting to see data that went back a little farther so that we could track the savings rate changes proceeding and through the 2008 recession.