Visualization of three or more variables

The relations between population and GDP per capita during the years 1960 to 2010. The population measured by the percentage of children’s age 0-4 with respect to total population and by life expectancy.

Reading the data, than gather and rearrange it using functions from the tidyr library. Finally audit the data so each variable include similar years and countries.

 library(ggplot2)
 library(tidyr)
 library(gridExtra)

## Loading required package: grid

 setwd('C:/Shmuel/Nanodegree/Explorer_Data_R')


 per_chil <- read.csv('indicator_total0-4percen.csv')
 life_exp <- read.csv('life_expectancy_at_birth.csv')
 gdp <- read.csv('GDPC.csv')


# Rearrange the data, names(per_chil)  X1950 X2015
Per_chil <- gather(per_chil,'year','n',2:22)
Life_exp <- gather(life_exp,'year','n',2:206)
GDP <- gather(gdp,'year','n',2:53)


# Remove the x from the year 
Per_chil$year <- as.numeric(gsub('X','',Per_chil$year))
Life_exp$year <- as.numeric(gsub('X','',Life_exp$year))
GDP$year <- as.numeric(gsub('X','',GDP$year))
Life_exp$Total <- Life_exp[[1]]
GDP$Total <- GDP[[1]]

# Find the years that included in each data frame

A1 <- levels(factor(Per_chil $year))
A2 <- levels(factor(Life_exp$year))
A3 <- levels(factor(GDP$year))

# Find the countries that included in each data frame
B1 <- levels(factor(Per_chil $Total))
B2 <- levels(factor(Life_exp$Total))
B3 <- levels(factor(GDP$Total))

# Find the years and the countries that are common to all data frame
AT <- Reduce(intersect, list(A1,A3,A2))
BT <- Reduce(intersect, list(B1,B3,B2))

# Subset each data frames according to the common years and countries
Per_chil_A <- subset(Per_chil,Total %in% BT & year%in% AT)
Life_exp_A <- subset(Life_exp,Total %in% BT & year%in% AT,select = c(2:4))
GDP_A <-      subset(GDP,     Total %in% BT & year%in% AT,select = c(2:4))

# Merge the 3 data frames and give names to the variables
total <- merge(Per_chil_A,Life_exp_A ,by=c('year','Total'))
total <- merge(total,GDP_A ,by=c('year','Total'))

total$chil <- total$n.x
total$Life <- total$n.y
total$GDP <- total$n

# Remove NA values
total <- subset(total,!is.na(n))

Cut the Variables into 4 bucket that include 1st - 4th quantile.

a=summary(total$GDP)
b=summary(total$chil)
c=summary(total$Life)


total$GDP.bucket <- cut(total$GDP, c(floor(a[1]), floor(a[2]), floor(a[3]), floor(a[5]), ceiling(a[6])))
total$chil.bucket <- cut(total$chil, c(floor(b[1]), floor(b[2]), floor(b[3]), floor(b[5]), ceiling(b[6])))
total$Life.bucket <- cut(total$Life, c(floor(c[1]), floor(c[2]), floor(c[3]), floor(c[5]), ceiling(c[6])))

total <- subset(total, !is.na(n))

Scatterplot of GDP ver. Years each point represents a country GDP in a specific year. The different GDP quintile can be clearly seeing as different colors.

ggplot(aes(x = year, y = GDP),
       data = subset( total, !is.na(GDP.bucket) )) +
       geom_line(aes(color = GDP.bucket), stat = 'summary', fun.y = median, size = 1)+
       scale_y_log10()+
       geom_jitter(aes(alpha = 1/20, color =  GDP.bucket))+
       geom_smooth(aes(color = GDP.bucket))

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Scatterplot, percentage of children’s at age 0-4 ver. years. Each point represents a country percentage of children’s in a specific year. The colors codes different GDP quintile see legend.

ggplot(aes(x = year, y = chil),
       data =subset( total, !is.na(GDP.bucket) )) +
       geom_line(aes(color = GDP.bucket), stat = 'summary', fun.y = median, size = 1)+
       geom_jitter(aes(alpha = 1/20, color =  GDP.bucket))+
       geom_smooth(aes(color = GDP.bucket))

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Scatterplot, life expectancy ver. years. Each point represents a country life expectancy in a specific year. The colors codes different GDP quintile see legend.

ggplot(aes(x = year, y = Life),
       data =subset( total, !is.na(GDP.bucket) )) +
       geom_line(aes(color = GDP.bucket),stat = 'summary', fun.y = median, size = 1)+
       geom_jitter(aes(alpha = 1/20,color =  GDP.bucket))+
       geom_smooth(aes(color = GDP.bucket))

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Histogram depicit the distribution of percentage of children’s at age 0-4 ver (X-axis), each chart include countries from a specific GDP quintile, the colors codes different life expectancy ( see legend).

ggplot(subset(total, !is.na(GDP.bucket)), aes(x = chil))+
  facet_wrap(~GDP.bucket) +
  geom_histogram(alpha = 1/2, aes(fill = as.factor(Life.bucket )), binwidth = 1)+
  scale_x_continuous(breaks = seq(2, 24, 2))

Histogram depicit the distribution of life expectancy (X-axis), each chart include countries from a specific GDP quintile, the colors codes different percentage of children’s at age 0-4 ( see legend).

ggplot(subset(total, !is.na(GDP.bucket)), aes(x = Life))+
  facet_wrap(~GDP.bucket) +
  geom_histogram(alpha = 1/2, aes(fill = as.factor(chil.bucket )), binwidth = 2)+
  scale_x_continuous(breaks = seq( 30,90,10))

Visualization of three or more variables

The relations between population and GDP per capita during the years 1960 to 2010. The population measured by the percentage of children’s age 0-4 with respect to total population and by life expectancy.

Reading the data, than gather and rearrange it using functions from the tidyr library. Finally audit the data so each variable include similar years and countries.

Cut the Variables into 4 bucket that include 1st - 4th quantile.

Scatterplot of GDP ver. Years each point represents a country GDP in a specific year. The different GDP quintile can be clearly seeing as different colors.

Scatterplot, percentage of children’s at age 0-4 ver. years. Each point represents a country percentage of children’s in a specific year. The colors codes different GDP quintile see legend.

Scatterplot, life expectancy ver. years. Each point represents a country life expectancy in a specific year. The colors codes different GDP quintile see legend.

Histogram depicit the distribution of percentage of children’s at age 0-4 ver (X-axis), each chart include countries from a specific GDP quintile, the colors codes different life expectancy ( see legend).

Histogram depicit the distribution of life expectancy (X-axis), each chart include countries from a specific GDP quintile, the colors codes different percentage of children’s at age 0-4 ( see legend).

data obtained from http://www.gapminder.org/data/

.

.