I chose to look at the open data source Statistics Iceland. I was curious about Icelandic citizens’ mental health, so I looked at the 2015 Health Interview Survey data. It was not immediately evident whether this data was open source and the website does not have a “terms of service” section. I did find a page titled Fundamental Principles of Official Statistics, which included language about transparency and openness of the statistics to the public. The page can be found here: Statistics Iceland Fundamental Principles of Official Statistics
The first thing I did was select the variables and values I was interested in: sex, age group, proportion, and total depression symptoms (option to see major and minor depression symptoms). It was nice to customize my CSV file before downloading it. This means I did not need to do as much filtering or manipulation in excel or R. I downloaded CSV files for each data set and imported the files. I did discover some issues with the formatting of the CSV files. For example, one variable was titled “Proportion (%) Total” which created problems in the code, so I renamed it in excel before importing it. I also encountered issues with symbols in the age variable “>=65 years old” the “>=” character threw off the ordering of the age groups, so I had to rename the age group “65+ years old” in Excel.
library(knitr)
library(tidyverse)
library(tidyr)
library(dplyr)
library(ggplot2)
Depression_by_age <- read_csv("~/Downloads/HER02211 (5).csv") %>% rename(Age_Group = Variables)
head(Depression_by_age)
## # A tibble: 6 x 3
## Sex Age_Group Proportion_Total
## <chr> <chr> <dbl>
## 1 Male 15-24 years old 10
## 2 Male 25-34 years old 7.9
## 3 Male 35-44 years olds 6.3
## 4 Male 45-54 years old 5
## 5 Male 55-64 years old 5.8
## 6 Male 65+ years old 4.6
Depression_by_urban <- read_csv("~/Downloads/HER02211 (6).csv") %>% rename(Degree_of_Urbanization = Variables)
head(Depression_by_urban)
## # A tibble: 6 x 3
## Sex Degree_of_Urbanization Proportion_Total
## <chr> <chr> <dbl>
## 1 Male Densely populated 7.3
## 2 Male Intermediate density 6.8
## 3 Male Sparsely populated 4.7
## 4 Female Densely populated 9.5
## 5 Female Intermediate density 15
## 6 Female Sparsely populated 11.2
Depression_by_country_M <- read_csv("~/Downloads/HER02212 (3).csv") %>% rename(Total_Proportion = "Total Total")
head(Depression_by_country_M)
## # A tibble: 6 x 3
## Sex Country Total_Proportion
## <chr> <chr> <dbl>
## 1 Male European Union 5.4
## 2 Male Bulgaria 6.5
## 3 Male Czech Republic 2.4
## 4 Male Denmark 4.8
## 5 Male Germany 7.9
## 6 Male Estonia 5.4
Depression_by_country_F <- read_csv("~/Downloads/HER02212 (4).csv") %>% rename(Total_Proportion = "Total Total")
head(Depression_by_country_F)
## # A tibble: 6 x 3
## Sex Country Total_Proportion
## <chr> <chr> <dbl>
## 1 Female European Union 7.9
## 2 Female Bulgaria 9.1
## 3 Female Czech Republic 3.9
## 4 Female Denmark 7.8
## 5 Female Germany 9.1
## 6 Female Estonia 8
Depression_by_country <- Depression_by_country_F %>% inner_join(Depression_by_country_M, by="Country")
head(Depression_by_country)
## # A tibble: 6 x 5
## Sex.x Country Total_Proportion.x Sex.y Total_Proportion.y
## <chr> <chr> <dbl> <chr> <dbl>
## 1 Female European Union 7.9 Male 5.4
## 2 Female Bulgaria 9.1 Male 6.5
## 3 Female Czech Republic 3.9 Male 2.4
## 4 Female Denmark 7.8 Male 4.8
## 5 Female Germany 9.1 Male 7.9
## 6 Female Estonia 8 Male 5.4
Depression_abv_avg <- Depression_by_country %>% rename(Depression_Proportion_female = Total_Proportion.x, Depression_Proportion_male = Total_Proportion.y) %>% filter(Depression_Proportion_female > mean(Depression_Proportion_female)) %>% select(Country, Depression_Proportion_female, Depression_Proportion_male) %>%
arrange(desc(Depression_Proportion_female))
The data indicate that women reported higher depression symptoms in 2015, so I was curious if things changed based on age group or degree of urbanization. I used the ggplot2 function to create charts for the discrete variables. From the charts, we can see that women have higher rates of depression across all age groups, but young women (age 15-24) and older women (age 65+) have higher depression rates than males and other age groups. Depression in women aged 15-24 is 1.74 times higher than males aged 15-24, and depression for women aged 65+ is 2.43 times higher than males aged 65+. The second chart indicates that women living in intermediately populated and sparsely populated areas have a higher proportion of depression than those living in densely populated areas. Interestingly, women in intermediately populated regions had the highest proportion of depression (at 15%).
ggplot(data = Depression_by_age %>%
filter(Age_Group %in% c("15-24 years old", "25-34 years old", "35-44 years old", "45-54 years old", "55-64 years old", "65+ years old")), aes(x = Age_Group, y = Proportion_Total, color = Sex)) +
geom_point(size = 3) + scale_color_manual(values = c("Female" = "#56B4E9", "Male" = "#D55E00")) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 100)) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle("Prevalence of Depression in Iceland by Age Group, 2015") + theme(plot.title = element_text(hjust = 0.5))
ggplot(data = Depression_by_urban %>%
filter(Degree_of_Urbanization %in% c("Densely populated", "Intermediate density", "Sparsely populated")), aes(x = Degree_of_Urbanization, y = Proportion_Total, fill = Sex)) +
geom_bar(stat="identity", position = "dodge") + scale_fill_manual(values = c("Female" = "#56B4E9", "Male" = "#D55E00")) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 100)) +
ggtitle("Prevalence of Depression in Iceland by Degree of Urbanization, 2015") + theme(plot.title = element_text(hjust = 0.5))
The depression rates among Icelandic women appear high to me, so I was curious about how the data compared to depression rates in other European countries. I imported the data set comparing country and total depression proportion for males and then imported the data set comparing country and depression proportion for females and did an inner join by country. I then filtered and arranged the data set to display (in descending order) the countries where total female depression proportion was greater than the mean of female depression proportions. I then used the Kable function to put the data frame in a nice table format. As we can see in the table, compared to other European countries, Iceland comes in at number four (After Portugal, Hungary, and Sweden) for the highest proportion of depression in women for 2015.
library(knitr)
kable(Depression_abv_avg, align = "lcc", col.names = c('Country', 'Depression Proportion, Female', 'Depression Proportion, Male'), caption = "European Countries with High Prevalence of Depression in Females, 2015")
Country | Depression Proportion, Female | Depression Proportion, Male |
---|---|---|
Portugal | 13.8 | 5.9 |
Hungary | 12.1 | 8.4 |
Sweden | 11.0 | 7.3 |
Iceland | 10.8 | 6.8 |
Bulgaria | 9.1 | 6.5 |
Germany | 9.1 | 7.9 |
Luxembourg | 9.1 | 7.5 |
Spain | 9.0 | 4.3 |
France | 9.0 | 5.0 |
United Kingdom | 8.9 | 7.4 |
Turkey | 8.5 | 5.0 |
Estonia | 8.0 | 5.4 |
European Union | 7.9 | 5.4 |
Denmark | 7.8 | 4.8 |