Refer to the dataset “gapminder” from package “gapminder”.
Summarize and plot the median life expectancy (MLE) in 1952 for each of the 5 countries with the top MLE’s and for each of the 5 countries with the bottom MLE’s. You should have one summary table and one plot.
Hint: Use the data in 1952 to find the top 5 countries and bottom 5 countries in MLE. Then, plot the MLE for each of the 10 countries. You should have 10 bars in one graph.
DF = select(filter(gapminder, year == 1952 ) , c(country,lifeExp))
top = arrange(DF, desc(lifeExp))
bottom = arrange(DF, lifeExp)
NewDF <- rbind(top[1:5,],bottom[1:5,])
ggplot(NewDF, aes(x = country, y = lifeExp))+
geom_col(fill = "#006400")+
labs(title = 'Top and Bottom five Countries in life Expectancy in 1952', x = 'Country', y = "life Expectancy")+
theme(axis.text.x = element_text(angle = 45))
NewDF
## # A tibble: 10 x 2
## country lifeExp
## <fct> <dbl>
## 1 Norway 72.7
## 2 Iceland 72.5
## 3 Netherlands 72.1
## 4 Sweden 71.9
## 5 Denmark 70.8
## 6 Afghanistan 28.8
## 7 Gambia 30
## 8 Angola 30.0
## 9 Sierra Leone 30.3
## 10 Mozambique 31.3
As we can see from the graph and table above there was a huge gap in life expectancy between developed and undeveloped countries with the top and bottom country having a difference of almost 44 years.
Summarize and plot the median life expectancy (MLE) in 2007 for each of the 5 countries with the top MLE’s and for each of the 5 countries with the bottom MLE’s. You should have one summary table and one plot.
DF = select(filter(gapminder, year == 2007 ) , c(country,lifeExp))
top = arrange(DF, desc(lifeExp))
bottom = arrange(DF, lifeExp)
NewDF <- rbind(top[1:5,],bottom[1:5,])
ggplot(NewDF, aes(x = country, y = lifeExp))+
geom_col(fill = "#006400") +
labs(title = 'Top and Bottom five Countries in life Expectancy in 2007', x = 'Country', y = 'Life Expectancy') +
theme(axis.text.x = element_text(angle = 45))
NewDF
## # A tibble: 10 x 2
## country lifeExp
## <fct> <dbl>
## 1 Japan 82.6
## 2 Hong Kong, China 82.2
## 3 Iceland 81.8
## 4 Switzerland 81.7
## 5 Australia 81.2
## 6 Swaziland 39.6
## 7 Mozambique 42.1
## 8 Zambia 42.4
## 9 Sierra Leone 42.6
## 10 Lesotho 42.6
Note that from the graph and table above we can deduce that the range in life expectancy in 2007 is about 43 years this has shrunk and shifted since 1952 but not by a lot. Shift about 10 years, and the gap shrunk almost a year.
Summarize and plot the median life expectancy in each year for the largest 5 countries in terms of 2007 population. You should have one summary table and one plot.
Hint: Use the data in 2007 to find the top 5 countries in population. Then, plot MLE vs year for each of the 5 countries. You should have 5 curves and you should overlay them in one graph.
#DF = arrange(filter(gapminder, year == 2007), desc(pop) )
#DF[1:5,c(1)]
#China
#India
#United States
#Indonesia
#Brazil
#This is how I found the top 5 countries in population in 2007
DFNew = select(filter(gapminder, country == 'China' | country == 'India' | country == 'United States' | country == 'Indonesia' | country == 'Brazil'), c(country,year,lifeExp))
ggplot(DFNew, aes(x = year, y = lifeExp, color = country))+
geom_line()+
labs(title = 'Top five countries in Population between 1952 and 2007', x = '', y = 'Life Expectancy', color = 'Country')
DFNew
## # A tibble: 60 x 3
## country year lifeExp
## <fct> <int> <dbl>
## 1 Brazil 1952 50.9
## 2 Brazil 1957 53.3
## 3 Brazil 1962 55.7
## 4 Brazil 1967 57.6
## 5 Brazil 1972 59.5
## 6 Brazil 1977 61.5
## 7 Brazil 1982 63.3
## 8 Brazil 1987 65.2
## 9 Brazil 1992 67.1
## 10 Brazil 1997 69.4
## # ... with 50 more rows
Notice that every on this graph as had a net positive increase in life expectancy. Especially those that would have been considered developing countries in the 1950s.
Summarize and plot the median life expectancy in each year for each continent. You should have one summary table and one plot.
DF = gapminder %>% group_by(continent, year) %>% mutate(lifeExp = sum(pop*lifeExp)/sum(pop)) %>% subset(select = c(continent, year, lifeExp)) %>% unique()
ggplot(DF, aes(x = year, y = lifeExp, color = continent)) +
geom_line() +
labs(title = 'Average Continent life Expectancy between 1952 and 2007', x = '', y = 'Life Expectancy', color = 'Continent')
DF
## # A tibble: 60 x 3
## # Groups: continent, year [60]
## continent year lifeExp
## <fct> <int> <dbl>
## 1 Asia 1952 42.9
## 2 Asia 1957 47.3
## 3 Asia 1962 46.6
## 4 Asia 1967 53.9
## 5 Asia 1972 57.5
## 6 Asia 1977 59.6
## 7 Asia 1982 61.6
## 8 Asia 1987 63.5
## 9 Asia 1992 65.1
## 10 Asia 1997 66.8
## # ... with 50 more rows
Note from the graph and table above we can see that the top three continents(Americas, Europe, and Oceania) have been steadily increasing their life expectancy on average. While Asia is very close behind them. However, we must note that Africa seems to be having trouble to increase their life expectancy. that is, it appears that life expectancy in Africa on average was stagnate between 1986 and 2003.
Write a comment on your findings, one for each question.
Submission: Submit your R Markdown file to dropbox “Project #4” and provide a link to your project published via RPubs.com.
Grading: a = b = c = d = 2, R Markdown and link = 1 point, comment = 1 point