LINK TO OUTPUT FILE


Question

To measure the Income and Life expectance rate in different Geographic location


Data Description

  • Region/Country fields describes the continent and country the Life expectancy is measured
  • Additionally, Population, Income, and Life expectance is provided for each country from years 1800 to 2015
Income - Gross domestic product per person adjusted for differences in purchasing power (GDP/capita, PPP$ inflation adjusted)
Life - The average number of years a newborn child would live if current mortality pattern were to stay the same
population - Population for the given country in the given year (Source: GapMinder)

Data Preparation

  1. Importing required libraries
  2. Import the Raw data
  3. Summarize Data
  4. Study the scope

Importing/Calling required libraries

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
## 
##     count

Summary of data

## 'data.frame':    41284 obs. of  6 variables:
##  $ Country   : Factor w/ 197 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Year      : int  1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 ...
##  $ life      : num  28.2 28.2 28.2 28.2 28.2 ...
##  $ population: Factor w/ 15260 levels "","1,005,328,574",..: 7490 1 1 1 1 1 1 1 1 1 ...
##  $ income    : int  603 603 603 603 603 603 603 603 603 603 ...
##  $ region    : Factor w/ 6 levels "America","East Asia & Pacific",..: 5 5 5 5 5 5 5 5 5 5 ...

Here, we study that Country, population, and income are Factors. Life and income are of num data type while year is of int datatype. We notice that population is a number but due to commas, it is turned a factor. hece we clean the data by removing the commas and creating another field as numeric. ### Study the scope

##                 Country           Year           life      
##  Afghanistan        :  216   Min.   :1800   Min.   : 1.00  
##  Albania            :  216   1st Qu.:1854   1st Qu.:31.00  
##  Algeria            :  216   Median :1908   Median :35.12  
##  Angola             :  216   Mean   :1907   Mean   :42.88  
##  Antigua and Barbuda:  216   3rd Qu.:1962   3rd Qu.:55.60  
##  Argentina          :  216   Max.   :2015   Max.   :84.10  
##  (Other)            :39988                                 
##    population        income                              region     
##         :25817   Min.   :   142   America                   : 7961  
##  121000 :    6   1st Qu.:   883   East Asia & Pacific       : 6256  
##  14092  :    6   Median :  1450   Europe & Central Asia     :10468  
##  1432000:    6   Mean   :  4571   Middle East & North Africa: 4309  
##  229000 :    6   3rd Qu.:  3483   South Asia                : 1728  
##  2574000:    6   Max.   :182668   Sub-Saharan Africa        :10562  
##  (Other):15437   NA's   :2341
## [1] 197

Looking at the summary, now we know there are 197 countries spread across 6 regions, each having 216 entries for years from 1800 to 2015. The life expectancy ranges from 1 to 84%


Analysis


Income

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     142    2978    6651   12397   14744  182668       1
## [1] 1182
## [1] 3900.356
## Warning: Removed 823 rows containing non-finite values (stat_smooth).
## Warning: Removed 823 rows containing missing values (geom_point).

Looking at the histogram, a high frequency of income lay in the range below 50000. Further, when we look at the ranges summary, we find that the mean and the median lies in the 2nd quadrant with minimum value of 142 and maximum of 182668
However, things change drastically, when we look at the weighted median and weighted mean based on the region. the Median reduces from 6651 to 1457, while the Mean reduces from 12397 to **4631.
We can say that the
income* ranges are different in different regions.
From the plot above (population vs income), we can also find that as the population increases, the income hardly increases for most, while is linear for some. However, if the population is lower always, the income increases exponentially.

###Life Expectancy

Let’s look at the frequencies of the life expectancy and income ranges for the complete data

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   31.00   35.12   42.88   55.60   84.10
## [1] 33.86702
## [1] 41.45024
## Warning: Removed 823 rows containing non-finite values (stat_smooth).
## Warning: Removed 823 rows containing missing values (geom_point).


There seems to be a high frequency of life expectancy in the range of 25-35 years
In the next graph, we can see that the life expectancy grows regardless of the income growth. However, as the life expectancy grows past 60, there is a big growth in income.

Population

## [1] 3358089
## [1] 21187964
## [1] 3474657
## [1] 19078834

The weighted mean is much lower than that of the actual mean when Countries are considered. This mean that there is a big different between population of some countries.

p <- ggplot(gapminder1, aes(reorder(region, population1, FUN=function(x) mean(log10(x))), population1))
p <- p + scale_y_log10()
p + geom_boxplot(outlier.colour="red") + geom_jitter(alpha=1/2)

Working with Country data

India and China after 1950

We can see here there income rose for China and India after population crossed 1.1 billion. However, the rise in life expectancy reduced when the population crossed 600 million and flattened between 55 to 65 years.


Working with 2015 data

##                 Country         Year           life      
##  Afghanistan        :  1   Min.   :2015   Min.   :48.50  
##  Albania            :  1   1st Qu.:2015   1st Qu.:65.35  
##  Algeria            :  1   Median :2015   Median :73.50  
##  Andorra            :  1   Mean   :2015   Mean   :71.76  
##  Angola             :  1   3rd Qu.:2015   3rd Qu.:77.97  
##  Antigua and Barbuda:  1   Max.   :2015   Max.   :84.10  
##  (Other)            :172                                 
##          population      income                              region  
##               :  1   Min.   :   624   America                   :31  
##  1,376,048,943:  1   1st Qu.:  3715   East Asia & Pacific       :25  
##  10,349,803   :  1   Median : 11360   Europe & Central Asia     :48  
##  10,954,617   :  1   Mean   : 17717   Middle East & North Africa:19  
##  100699395    :  1   3rd Qu.: 24290   South Asia                : 8  
##  104460       :  1   Max.   :132877   Sub-Saharan Africa        :47  
##  (Other)      :172                                                   
##   population1       
##  Min.   :5.299e+04  
##  1st Qu.:2.235e+06  
##  Median :8.545e+06  
##  Mean   :4.050e+07  
##  3rd Qu.:2.851e+07  
##  Max.   :1.376e+09  
##  NA's   :1


Let us take a case for a single year (2015) and see at the income range and life expectancy comparison with respect to regions.

Life Expectancy in 2015

Life Expectancy for America in 2015

Let us focus on a single region and in it a single country. Here, the example considered is the region of America and the country sample is United States.

Calucating the standard deviation for America in 2015:

America and United States in 2015

Plotting Life Expectancy in 2015:

  • America region vs other region in world
  • United States country vs other countries in America

The graph shows that the Life Expectancy is below average int he early years for America, but since 1950s, America’s life expectancy has improved and to be above average than most of the other regions.

While comparing United States to America, the Life expectancy of United States is higher than most of other countries in the same region since the beginning.


Summary


Income
Income for some of the regions are higher than compared to other regions. In 2015, average Income Middle East & North Africa has been the highest compared to South Asia where it recorded lowest average income. Income rises when the population remains lower. The average income is 12397 across all region, but if the regions are considered, then the income reduces to 3900, meaning that income in some of the region is much lower than that of other regions.

Life Expectancy
We find that life expectancy in the given dataset is higher in the range of 25-35 years. The mean of Life Expectancy in 2015 for Sub-Saharan Africa region is lower than all other regions, while Europe & Central Asia is highest. There is a relationship between Life Exepectancy and income. As the life expectancy grows above 60, the income levels rises higher. When magnified the analysis to American region, the life Expectancy is nearly 75%. Though the average life expectancy of America improve from below average to above average from 1800 to 2015, the life expectancy of United States has been highest than most of the American countries from 1800 to 2015.

Population
We see that India and China are higher in population than other countries. There is a exponential growth in these countries after 1950. We also saw that population causes a flattening effect on life expectancy (over 600 million) and also extreme population (over 1 billion) witnesses an increased growth in income.