Refer to the dataset “gapminder” from package “gapminder”.

Part(a)

Summarize and plot the median life expectancy (MLE) in 1952 for each of the 5 countries with the top MLE’s and for each of the 5 countries with the bottom MLE’s. You should have one summary table and one plot.

Hint: Use the data in 1952 to find the top 5 countries and bottom 5 countries in MLE. Then, plot the MLE for each of the 10 countries. You should have 10 bars in one graph.

DF =  select(filter(gapminder, year == 1952 ) , c(country,lifeExp))

top = arrange(DF, desc(lifeExp)) 
bottom = arrange(DF, lifeExp)
NewDF <- rbind(top[1:5,],bottom[1:5,])

ggplot(NewDF, aes(x = country, y = lifeExp))+
  geom_col(fill = "#006400")+
  labs(title = 'Top and Bottom five Countries in life Expectancy in 1952', x = 'Country', y = "life Expectancy")+
  theme(axis.text.x = element_text(angle = 45))

NewDF
## # A tibble: 10 x 2
##    country      lifeExp
##    <fct>          <dbl>
##  1 Norway          72.7
##  2 Iceland         72.5
##  3 Netherlands     72.1
##  4 Sweden          71.9
##  5 Denmark         70.8
##  6 Afghanistan     28.8
##  7 Gambia          30  
##  8 Angola          30.0
##  9 Sierra Leone    30.3
## 10 Mozambique      31.3

As we can see from the graph and table above there was a huge gap in life expectancy between developed and undeveloped countries with the top and bottom country having a difference of almost 44 years.

Part(b)

Summarize and plot the median life expectancy (MLE) in 2007 for each of the 5 countries with the top MLE’s and for each of the 5 countries with the bottom MLE’s. You should have one summary table and one plot.

DF =  select(filter(gapminder, year == 2007 ) , c(country,lifeExp))

top = arrange(DF, desc(lifeExp)) 
bottom = arrange(DF, lifeExp)
NewDF <- rbind(top[1:5,],bottom[1:5,])

ggplot(NewDF, aes(x = country, y = lifeExp))+
  geom_col(fill = "#006400") +
  labs(title = 'Top and Bottom five Countries in life Expectancy in 2007', x = 'Country', y = 'Life Expectancy') +
  theme(axis.text.x = element_text(angle = 45))

  NewDF
## # A tibble: 10 x 2
##    country          lifeExp
##    <fct>              <dbl>
##  1 Japan               82.6
##  2 Hong Kong, China    82.2
##  3 Iceland             81.8
##  4 Switzerland         81.7
##  5 Australia           81.2
##  6 Swaziland           39.6
##  7 Mozambique          42.1
##  8 Zambia              42.4
##  9 Sierra Leone        42.6
## 10 Lesotho             42.6

Note that from the graph and table above we can deduce that the range in life expectancy in 2007 is about 43 years this has shrunk and shifted since 1952 but not by a lot. Shift about 10 years, and the gap shrunk almost a year.

Part(c)

Summarize and plot the median life expectancy in each year for the largest 5 countries in terms of 2007 population. You should have one summary table and one plot.

Hint: Use the data in 2007 to find the top 5 countries in population. Then, plot MLE vs year for each of the 5 countries. You should have 5 curves and you should overlay them in one graph.

#DF =  arrange(filter(gapminder, year == 2007), desc(pop) )
#DF[1:5,c(1)]
#China          
#India          
#United States          
#Indonesia          
#Brazil
#This is how I found the top 5 countries in population in 2007

DFNew = select(filter(gapminder, country == 'China' | country == 'India' | country == 'United States' | country == 'Indonesia' | country == 'Brazil'), c(country,year,lifeExp))     

ggplot(DFNew, aes(x = year, y = lifeExp, color = country))+
  geom_line()+
  labs(title = 'Top five countries in Population between 1952 and 2007', x = '', y = 'Life Expectancy', color = 'Country')

DFNew
## # A tibble: 60 x 3
##    country  year lifeExp
##    <fct>   <int>   <dbl>
##  1 Brazil   1952    50.9
##  2 Brazil   1957    53.3
##  3 Brazil   1962    55.7
##  4 Brazil   1967    57.6
##  5 Brazil   1972    59.5
##  6 Brazil   1977    61.5
##  7 Brazil   1982    63.3
##  8 Brazil   1987    65.2
##  9 Brazil   1992    67.1
## 10 Brazil   1997    69.4
## # ... with 50 more rows

Notice that every on this graph as had a net positive increase in life expectancy. Especially those that would have been considered developing countries in the 1950s.

Part(d)

Summarize and plot the median life expectancy in each year for each continent. You should have one summary table and one plot.

DF = gapminder %>% group_by(continent, year) %>% mutate(lifeExp = sum(pop*lifeExp)/sum(pop)) %>%  subset(select = c(continent, year, lifeExp)) %>% unique()


ggplot(DF, aes(x = year, y = lifeExp, color = continent)) +
  geom_line() +
  labs(title = 'Average Continent life Expectancy between 1952 and 2007', x = '', y = 'Life Expectancy', color = 'Continent')

DF
## # A tibble: 60 x 3
## # Groups:   continent, year [60]
##    continent  year lifeExp
##    <fct>     <int>   <dbl>
##  1 Asia       1952    42.9
##  2 Asia       1957    47.3
##  3 Asia       1962    46.6
##  4 Asia       1967    53.9
##  5 Asia       1972    57.5
##  6 Asia       1977    59.6
##  7 Asia       1982    61.6
##  8 Asia       1987    63.5
##  9 Asia       1992    65.1
## 10 Asia       1997    66.8
## # ... with 50 more rows

Note from the graph and table above we can see that the top three continents(Americas, Europe, and Oceania) have been steadily increasing their life expectancy on average. While Asia is very close behind them. However, we must note that Africa seems to be having trouble to increase their life expectancy. that is, it appears that life expectancy in Africa on average was stagnate between 1986 and 2003.

Write a comment on your findings, one for each question.

Submission: Submit your R Markdown file to dropbox “Project #4” and provide a link to your project published via RPubs.com.

Grading: a = b = c = d = 2, R Markdown and link = 1 point, comment = 1 point