Gapminder Uncut

Details

For the gapminder report, i didn’t add some graphs or analysis to avoid bloating the report. So i will be collating some (not all) of those graphs and Analysis here.

Which countries were the best countries to live in 1952 and how do they compare with the best countries to live in 2007 ?

For this, i will add a tree-map and put up a few comments, then add the code to the end of the comments.

Kuwait was the best country to live in 1952. Now, Norway have that position of the best country to live in (2007), then followed by few Asian countries and United States. Now, that’s good news if you are a citizen of those countries ?

What if you are an immigrant ? Then other factors comes in like Human rights record of these countries and also maybe how liberal that society is. In that case, Singapore, Kuwait, Hong Kong, China should not be in our Top 5 or you can make the case that they should not be above the Unites States. The United states still remains the best country for immigrants (undispusted) and followed by canada or European countries.

code: below - Note that: you can make this shorter by grouping by year, but i didn’t take that route. Instead i repeated the same process for each year to make it easier for anyone who is reading this code (to decode how it was done)

For 1952

#Best and worst countries to live in 1952
#Prologue: subset 1952 data only from gapminder
#step 1: scale all countries (scale gdpPercap & lifeExpectancy);
#step 2: Assign a score to all countries based on the scaled data; - use apply functional repetition
#step 3: bind score to the whole dataset
#step 4: perform quantile -> needed to grade countries
#step 5: Assign a grade to all countries where "A" is best to live in and "F" worst to live in
#step 6: order by scores(in descending)

#Prologue step
 gap <- subset(
  gapminder, 
  gapminder$year == 1952, 
  select = c(
    country, continent,year,lifeExp, gdpPercap))



#step 1: scale all countries
gap_scaled <- scale(gap[,c(4,5)])


#step 2: Assign a score to all countries based on scaled data
gap_scaled_scores <- apply(gap_scaled, 1, mean)

#step 3:
gap2 <- cbind(gap, gap_scaled_scores)

#step 4: perform quantile

gap_quantile <- quantile(gap_scaled_scores, c(0.8,0.6,0.4,0.2))

#step 5: Assign a grade
gap3 <- within(gap2,{
  grade <- NA
  grade[gap_scaled_scores >= gap_quantile[1]] <- "A"
  grade[gap_scaled_scores < gap_quantile[1] & gap_scaled_scores >= gap_quantile[2]] <- "B"
  grade[gap_scaled_scores < gap_quantile[2] & gap_scaled_scores >= gap_quantile[3]] <- "C"
  grade[gap_scaled_scores < gap_quantile[3] & gap_scaled_scores >= gap_quantile[4]] <- "D"
  grade[gap_scaled_scores < gap_quantile[4]] <- "F"
})

#step 6: Order by scores
gap_1952 <- gap3[order(-gap3$gap_scaled_scores),]


p1 <- ggplot(gap_1952 %>% top_n(30, gap_scaled_scores), aes(area = gap_scaled_scores, fill=log(gdpPercap), label = country)) + 
  geom_treemap() +
  geom_treemap_text(color= "white") + theme(legend.position = "none", plot.subtitle = element_text(hjust = .5,lineheight = .20)) + labs(subtitle = "1952")

#Kuwait was the best country to live in 1952. Backed by enormous oil reserve and a very small population - this created a hug gap between it's gdpPercap compared to other countries.

For 2007

#Best and worst countries to live in 2007
#Prologue: subset 2007 data only from gapminder
#step 1: scale all countries (scale gdpPercap & lifeExpectancy);
#step 2: Assign a score to all countries based on the scaled data; - use apply functional repetition
#step 3: bind score to the whole dataset
#step 4: perform quantile -> needed to grade countries
#step 5: Assign a grade to all countries where "A" is best to live in and "F" worst to live in
#step 6: order by scores(in descending)

#Prologue step
 gap <- subset(
  gapminder, 
  gapminder$year == 2007, 
  select = c(
    country, continent,year, lifeExp, gdpPercap))



#step 1: scale all countries
gap_scaled <- scale(gap[,c(4,5)])


#step 2: Assign a score to all countries based on scaled data
gap_scaled_scores <- apply(gap_scaled, 1, mean)

#step 3:
gap2 <- cbind(gap, gap_scaled_scores)

#step 4: perform quantile

gap_quantile <- quantile(gap_scaled_scores, c(0.8,0.6,0.4,0.2))

#step 5: Assign a grade
gap3 <- within(gap2,{
  grade <- NA
  grade[gap_scaled_scores >= gap_quantile[1]] <- "A"
  grade[gap_scaled_scores < gap_quantile[1] & gap_scaled_scores >= gap_quantile[2]] <- "B"
  grade[gap_scaled_scores < gap_quantile[2] & gap_scaled_scores >= gap_quantile[3]] <- "C"
  grade[gap_scaled_scores < gap_quantile[3] & gap_scaled_scores >= gap_quantile[4]] <- "D"
  grade[gap_scaled_scores < gap_quantile[4]] <- "F"
})

#step 6: Order by scores
gap_2007 <- gap3[order(-gap3$gap_scaled_scores),]

p2 <- ggplot(gap_2007 %>% top_n(29, gap_scaled_scores), aes(area = gap_scaled_scores, fill=log(gdpPercap), label = country)) + 
  geom_treemap() +
  geom_treemap_text(color= "white") + theme(legend.position = "bottom",legend.justification = c(0.1,20), plot.subtitle = element_text(hjust = .5,lineheight = .20)) + labs(subtitle = "2007")



#p1 + p2 + 
#  labs(title = "Kuwait with it's small population & huge reserve, was the best country to live in 1952") + 
#  theme(plot.title = element_text(hjust = 1,lineheight = .20))

Gapminder Uncut

Lekan Ali

4/8/2021

Details