Verify Open Data

I chose to look at the open data source Statistics Iceland. I was curious about Icelandic citizens’ mental health, so I looked at the 2015 Health Interview Survey data. It was not immediately evident whether this data was open source and the website does not have a “terms of service” section. I did find a page titled Fundamental Principles of Official Statistics, which included language about transparency and openness of the statistics to the public. The page can be found here: Statistics Iceland Fundamental Principles of Official Statistics

Depression Data Import

The first thing I did was select the variables and values I was interested in: sex, age group, proportion, and total depression symptoms (option to see major and minor depression symptoms). It was nice to customize my CSV file before downloading it. This means I did not need to do as much filtering or manipulation in excel or R. I downloaded CSV files for each data set and imported the files. I did discover some issues with the formatting of the CSV files. For example, one variable was titled “Proportion (%) Total” which created problems in the code, so I renamed it in excel before importing it. I also encountered issues with symbols in the age variable “>=65 years old” the “>=” character threw off the ordering of the age groups, so I had to rename the age group “65+ years old” in Excel.

library(knitr)
library(tidyverse)
library(tidyr)
library(dplyr)
library(ggplot2)
Depression_by_age <- read_csv("~/Downloads/HER02211 (5).csv") %>% rename(Age_Group = Variables)
head(Depression_by_age)
## # A tibble: 6 x 3
##   Sex   Age_Group        Proportion_Total
##   <chr> <chr>                       <dbl>
## 1 Male  15-24 years old              10  
## 2 Male  25-34 years old               7.9
## 3 Male  35-44 years olds              6.3
## 4 Male  45-54 years old               5  
## 5 Male  55-64 years old               5.8
## 6 Male  65+ years old                 4.6
Depression_by_urban <- read_csv("~/Downloads/HER02211 (6).csv") %>% rename(Degree_of_Urbanization = Variables)
head(Depression_by_urban)
## # A tibble: 6 x 3
##   Sex    Degree_of_Urbanization Proportion_Total
##   <chr>  <chr>                             <dbl>
## 1 Male   Densely populated                   7.3
## 2 Male   Intermediate density                6.8
## 3 Male   Sparsely populated                  4.7
## 4 Female Densely populated                   9.5
## 5 Female Intermediate density               15  
## 6 Female Sparsely populated                 11.2
Depression_by_country_M <- read_csv("~/Downloads/HER02212 (3).csv") %>% rename(Total_Proportion = "Total Total")
head(Depression_by_country_M)
## # A tibble: 6 x 3
##   Sex   Country        Total_Proportion
##   <chr> <chr>                     <dbl>
## 1 Male  European Union              5.4
## 2 Male  Bulgaria                    6.5
## 3 Male  Czech Republic              2.4
## 4 Male  Denmark                     4.8
## 5 Male  Germany                     7.9
## 6 Male  Estonia                     5.4
Depression_by_country_F <- read_csv("~/Downloads/HER02212 (4).csv") %>% rename(Total_Proportion = "Total Total")
head(Depression_by_country_F)
## # A tibble: 6 x 3
##   Sex    Country        Total_Proportion
##   <chr>  <chr>                     <dbl>
## 1 Female European Union              7.9
## 2 Female Bulgaria                    9.1
## 3 Female Czech Republic              3.9
## 4 Female Denmark                     7.8
## 5 Female Germany                     9.1
## 6 Female Estonia                     8
Depression_by_country <- Depression_by_country_F %>% inner_join(Depression_by_country_M, by="Country")
head(Depression_by_country)
## # A tibble: 6 x 5
##   Sex.x  Country        Total_Proportion.x Sex.y Total_Proportion.y
##   <chr>  <chr>                       <dbl> <chr>              <dbl>
## 1 Female European Union                7.9 Male                 5.4
## 2 Female Bulgaria                      9.1 Male                 6.5
## 3 Female Czech Republic                3.9 Male                 2.4
## 4 Female Denmark                       7.8 Male                 4.8
## 5 Female Germany                       9.1 Male                 7.9
## 6 Female Estonia                       8   Male                 5.4
Depression_abv_avg <- Depression_by_country %>% rename(Depression_Proportion_female =  Total_Proportion.x, Depression_Proportion_male = Total_Proportion.y) %>% filter(Depression_Proportion_female > mean(Depression_Proportion_female)) %>% select(Country, Depression_Proportion_female, Depression_Proportion_male) %>%
arrange(desc(Depression_Proportion_female))

Table Creation to Compare Depression Proportions in European Countries

The depression rates among Icelandic women appear high to me, so I was curious about how the data compared to depression rates in other European countries. I imported the data set comparing country and total depression proportion for males and then imported the data set comparing country and depression proportion for females and did an inner join by country. I then filtered and arranged the data set to display (in descending order) the countries where total female depression proportion was greater than the mean of female depression proportions. I then used the Kable function to put the data frame in a nice table format. As we can see in the table, compared to other European countries, Iceland comes in at number four (After Portugal, Hungary, and Sweden) for the highest proportion of depression in women for 2015.

library(knitr)
kable(Depression_abv_avg, align = "lcc", col.names = c('Country', 'Depression Proportion, Female', 'Depression Proportion, Male'), caption = "European Countries with High Prevalence of Depression in Females, 2015")
European Countries with High Prevalence of Depression in Females, 2015
Country Depression Proportion, Female Depression Proportion, Male
Portugal 13.8 5.9
Hungary 12.1 8.4
Sweden 11.0 7.3
Iceland 10.8 6.8
Bulgaria 9.1 6.5
Germany 9.1 7.9
Luxembourg 9.1 7.5
Spain 9.0 4.3
France 9.0 5.0
United Kingdom 8.9 7.4
Turkey 8.5 5.0
Estonia 8.0 5.4
European Union 7.9 5.4
Denmark 7.8 4.8