For this project I decided to look at the Age Dependency Ratio (old) variable from the World Bank. This metric measures the ratio of a country’s “old” population, aged 64 and above, against the ratio of the working-aged population, aged 15-64. I thought this was interesting because it brings insight into variability of population age distributions in different countries. My selection of countries was intentionally done to present values across the entire spectrum, from lowest to hightest ratio. Let’s begin.

First, I loaded all the packages and libraries that I’ll be needing as well as establishing the path and importing . I went back and added to this list as went through my project.

install.packages("data.table")
install.packages("prophet")
library(data.table)
library(ggplot2)
library(tidyverse)
library(utils)
library(dplyr)
library(tibble)
library(prophet)

path <- file.path ("/Users/krgr.df/Downloads/age_dep_old_csv.csv")

age <- read.csv(path)

For Visualization 1 (Viz1), I cleaned the data and created a simple time series plot to show the change in dependency ratios over the past 50 years (1968-2017). Even this simple task took way longer than it I expected and it threw my timeline out the window.Oh well.

For Viz2 I decided to normalize the data so that they all begin in the same start point and the variation over time of for the countries are more easily indentifiable. Same issues with Viz2 but at this point I’ve accepted that this is all part of the learning curve.

For Viz3, I decided to do a facet_wrap() to isolate each country and their their patterns individually. This is the only Viz that behaved exactly how DataCamp said it would.

Viz4 was more for aesthetics than function but it still highlighted the dependency ratio gap between countries, especially Japan compared to the rest of the world.

Viz5 is a boxplot that gives a unique view of the dataset outside of the lines and bar graph style visualizations I’ve used in the first 4. Viz5 offers more insight into the flux and variation between the dependency ratios. Granted time is not shown here, I see that as a benefit because the overlap between the countries is more easily recognizable. It can be insightful to know that maybe the variation in dependency ratios is not so unique to each individual country and that countries at one point or another have had similar dependency ratios.

Reflections: At countless times during this project I was tempted to go back to Excel and just do the analysis there but I figured that would be entirely counter-productive to the purposes of this class. Overall, I think I got really good at using the r terms I already know and to string them together into a somewhat coherent phrase that was google-able.

---
title: "Age Dependency Ratio Visualizations"
output:
  html_notebook: default
  pdf_document: default
---

For this project I decided to look at the Age Dependency Ratio (old) variable from the World Bank. This metric measures the ratio of a country's "old" population, aged 64 and above, against the ratio of the working-aged population, aged 15-64. I thought this was interesting because it brings insight into variability of population age distributions in different countries. My selection of countries was intentionally done to present values across the entire spectrum, from lowest to hightest ratio. Let's begin.

First, I loaded all the packages and libraries that I'll be needing as well as establishing the path and importing . I went back and added to this list as went through my project.

```{r}
install.packages("data.table")
install.packages("prophet")
library(data.table)
library(ggplot2)
library(tidyverse)
library(utils)
library(dplyr)
library(tibble)
library(prophet)

path <- file.path ("/Users/krgr.df/Downloads/age_dep_old_csv.csv")

age <- read.csv(path)
```

For Visualization 1 (Viz1), I cleaned the data and created a simple time series plot to show the change in dependency ratios over the past 50 years (1968-2017). Even this simple task took way longer than it I expected and it threw my timeline out the window.Oh well.

```{r}
#clean-up and filter data (remove columns, rename remaining columns)
head(age)
age_clean <- subset(age, select = -c(Series.Name, Series.Code, Country.Code))
colnames(age_clean) <- c("country", 1968:2017)
age_clean <- age_clean[1:12, ]

#use the gather() function to convert data frame from wide to long format
age_gather <- gather(age_clean, year, value, -1)

#convert variable classifications
age_gather_t <- transform(age_gather,year = as.numeric(year))
str(age_gather_t)

#finally start plotting (Viz1)
plot.1 <- ggplot(age_gather_t, aes(x = year, y = value, color = country)) +
  geom_line()

#Fix Viz1 labels
plot.1 + labs(title = "Age Dependency Ratio", subtitle = "Ratio of dependents (> 64) to working-age (15-64)", 
              x = "Year (1968-2017)", y = "Value (Proportion of dependents per 100 workers)", color = "Country")
```

For Viz2 I decided to normalize the data so that they all begin in the same start point and the variation over time of for the countries are more easily indentifiable. Same issues with Viz2 but at this point I've accepted that this is all part of the learning curve.

```{r}

#Use sweep() to divide all functions by first value
divide_by <- c(9.572706, 16.089119, 5.939409, 2.736794, 15.793881, 14.942065, 6.014367, 5.410147, 
               9.606593, 9.890223, 5.493581, 16.522577)
age_normal <- sweep(age_clean, 1, divide_by, FUN = `/`)

#Problem solve column that was not accepting values
age_normal <- subset(age_normal, select = -c(country))
age_normal$country <- as.factor(c("Japan", "United.States", "Philippines", "United.Arab.Emirates", "Netherlands", "Iceland", "New.Caledonia", "Dominican.Republic", "Cuba", "Albania", "Togo", "Italy"))
age_normal <- age_normal[,c("country", 1968:2017)]

#Gather to convert from wide to long format
age_gather_normal <- gather(age_normal, year, value, -1)
age_gather_normal <- transform(age_gather_normal,year = as.numeric(year))

#Viz2
plot.2 <- ggplot(age_gather_normal, aes(x = year, y = value, color = country)) +
  geom_line()+
  scale_y_continuous(breaks = c(0, .5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5))

#Fix Viz2 labels
plot.2 + labs(title = "Age Dependency Ratio (Normalized)", subtitle = "Ratio of dependents (> 64) to working-age (15-64)", 
              x = "Year(1968-2017)", y = "Value (Proportion of dependents per 100 workers)", color = "Country")
```

For Viz3, I decided to do a facet_wrap() to isolate each country and their their patterns individually. This is the only Viz that behaved exactly how DataCamp said it would.  

```{r}
#Viz3
plot.3 <- ggplot(age_gather_normal, aes(x = year, y = value, color = country)) +
  geom_line() +
  facet_wrap(~country)+
  theme(axis.text.x = element_text(face = "bold", size = 6, angle = 45))

#Fix Viz3 labels
plot.3 + labs( x = "Year(1968-2017)", y = "Value (Proportion of dependents per 100 workers)", color = "Country")
```

Viz4 was more for aesthetics than function but it still highlighted the dependency ratio gap between countries, especially Japan compared to the rest of the world.

```{r}
#set-up data frame to contain ordered variables
age_filter2017 <- age_gather_normal %>%
    filter(year == 2017)
age_filter2017$type <- ifelse(age_filter2017$value < 1, "below", "above")
age_filter2017 <- age_filter2017[order(-age_filter2017$value), ]
age_filter2017$country <- factor(age_filter2017$country, levels = age_filter2017$country[order(-age_filter2017$value)])
str(age_filter2017)

#Plot Viz4
theme_set(theme_bw()) 
plot.4 <- ggplot(age_filter2017, aes(x= country, y= value, label = "", color = country)) +  
    geom_point(stat='identity', size=6) + 
  geom_segment(aes(y = 0, 
                   x = country, 
                   yend = value, 
                   xend = country)) +
  geom_text(color="white", size=2) + 
  coord_flip() + 
  theme(legend.position="none")

#Fix Viz4 labels
plot.4 + labs(title = "Age Dependency Ratio (2017)", subtitle = "Dependency ratio of >64 to 15-64 in 2017", 
              x = "Country", y = "Value (Proportion of dependents per 100 workers)", color = "Country")

```

Viz5 is a boxplot that gives a unique view of the dataset outside of the lines and bar graph style visualizations I've used in the first 4. Viz5 offers more insight into the flux and variation between the dependency ratios. Granted time is not shown here, I see that as a benefit because the overlap between the countries is more easily recognizable. It can be insightful to know that maybe the variation in dependency ratios is not so unique to each individual country and that countries at one point or another have had similar dependency ratios.

```{r}
#Viz5
plot.5 <- ggplot(age_gather_normal, aes(y = value, group = country, color = country)) +
  geom_boxplot()+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

#Fix Viz5 labels
plot.5 + labs(title = "Age Dependency Ratio (Normalized)", subtitle = "Ratio of dependents (> 64) to working-age (15-64)", 
              x = "Year(1968-2017)", y = "Value (Proportion of dependents per 100 workers)", color = "Country")

```

Reflections:
At countless times during this project I was tempted to go back to Excel and just do the analysis there but I figured that would be entirely counter-productive to the purposes of this class. Overall, I think I got really good at using the r terms I already know and to string them together into a somewhat coherent phrase that was google-able. 