Introduction: The project utilizes a compiled dataset of social, economic, health, and political indicators using data from the World Health Organization and partner organizations from the website [http://www.exploredata.net/Downloads/WHO-Data-Set]. The indicators represent data for the year 2011. The data is then used to create visualizations in order to identify trends and relation between the various indicators. The visualizations are created using ggplot2 package in R. The plots created in ggplot2 are static which are converted into interactive plots using Plotly. Plotly, also known by its URL, Plot.ly, is an online analytics and data visualization tool. Plotly provides online graphing, analytics, and stats tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.
Visualizations:
require(gdata)
WHO <- read.xls ("WHO_project2.xlsx", sheet = 1, header = TRUE)
WHO <- na.omit(WHO)
library (ggplot2)
library(plotly)
The first visualization plots a scatterplot to show the relation between the indicators annual population growth rate(%) and literacy rate. Intuitively, population growth decreases with increase in literacy rate. The same can be seen in the plot where the two indicators are strongly negatively correlated. A regression line is added to show the relationship with 99% confidence interval.
model = lm(PopulationGrowth ~ AdultLiteracyRate, data = WHO)
scatterplot1 <- ggplot(WHO, aes(x = AdultLiteracyRate, y = PopulationGrowth)) + geom_point() + stat_smooth(method = "lm", level = 0.99, color = "orange")
scatterplot1
ggplotly(scatterplot1)
The second visualization plots the same variables as the previous plot with added continents in four different colors. The correlation between the indicators can now be seen for the four continents. The correlation stands true for all continents except Europe with a sharp decline in population growth while the literacy rate remains constant(highest).
scatterplot2 <- ggplot(WHO, aes(x = AdultLiteracyRate, y = PopulationGrowth, color = Continent)) + geom_point(size = 3, shape = 15) + xlab("Literacy Rate(%)") + ylab("Population Growth") + ggtitle("Adult Literacy Rate vs Population Growth (Annual) 2011")
scatterplot2
ggplotly(scatterplot2)
The third visualization plots gross national income per capita versus total fertility rate for the five continents. It is noted that there is an inverse relationship between fertility and income. The inverse relationship between income and fertility has been termed a demographic-economic “paradox” by the notion that greater means would enable the production of more offspring as suggested by the influential Thomas Malthus. Roughly speaking, nations or subpopulations with higher GDP per capita are observed to have fewer children, even though a richer population can support more children. Malthus held that in order to prevent widespread suffering, from famine for example, what he called “moral restraint” (which included abstinence) was required. The demographic-economic paradox suggests that reproductive restraint arises naturally as a consequence of economic progress. [https://en.wikipedia.org/wiki/Income_and_fertility]
A similar correlation can be seen in our next plot where the total fertility rate is higher where the per capita income is lower. Among the 4 continents, Africa has the highest fertility rates with low per capita income whereas Europe and America has low fertility rate with high incomes.
scatterplot3 <- ggplot(WHO, aes(x = FertilityRate, y = GNIperCapita, color = Continent)) + geom_point(size=3) + xlab("Total Fertility Rate") + ylab("Gross National Income($)") + ggtitle("Gross National Income vs Fertility Rate 2011 by Continents")
scatterplot3
ggplotly(scatterplot3)
The last visualization plots the same two indiactors as the previous plot with life expectancy for each continent. The red color indicates higher life expectancy and blue indicates lower life expectancy. The plot shows that households with low income per capita have high fertility rates and low life expectancies. There is no cause and effect here just the observed trends.
scatterplot4 <- ggplot(WHO, aes(x = GNIperCapita, y = FertilityRate, color = Life_expectancy_at_birth)) + geom_point(size=3) + xlab("Gross National Income($)") + ylab("Fertility Rate (per woman)") + ggtitle("Gross National Income vs Fertility Rate 2011 by Life Expectancy") + scale_color_gradient(low="blue", high="red")
scatterplot4
ggplotly(scatterplot4)
install.packages(“devtools”) devtools::install_github(“bheenigarg/packagename”)