Description

The project relies on accuracy of data. The Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countriesTherefore, in this project we have considered data from year 2000-2015 for 193 countries for further analysis. .

Tasks

Working on the dataset

Prepare Phase

Installing packages like ‘tidyverse’, ‘ggplot2’, ‘lubridate’, ‘dplyr’, ‘tidyr’, ‘here’, ‘skimr’, ‘janitor’ that will help in cleaning, analyzing and plotting our data.

# Loading packages :

library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)

Importing datasets

Importing Life Expectancy Data.csv

expectancy<- read.csv("C:/Users/saksh/Desktop/Life Expectancy Data.csv")
View(expectancy)
glimpse(expectancy)
## Rows: 2,938
## Columns: 22
## $ Country                         <chr> "Afghanistan", "Afghanistan", "Afghani…
## $ Year                            <int> 2015, 2014, 2013, 2012, 2011, 2010, 20…
## $ Status                          <chr> "Developing", "Developing", "Developin…
## $ Life.expectancy                 <dbl> 65.0, 59.9, 59.9, 59.5, 59.2, 58.8, 58…
## $ Adult.Mortality                 <int> 263, 271, 268, 272, 275, 279, 281, 287…
## $ infant.deaths                   <int> 62, 64, 66, 69, 71, 74, 77, 80, 82, 84…
## $ Alcohol                         <dbl> 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.…
## $ percentage.expenditure          <dbl> 71.279624, 73.523582, 73.219243, 78.18…
## $ Hepatitis.B                     <int> 65, 62, 64, 67, 68, 66, 63, 64, 63, 64…
## $ Measles                         <int> 1154, 492, 430, 2787, 3013, 1989, 2861…
## $ BMI                             <dbl> 19.1, 18.6, 18.1, 17.6, 17.2, 16.7, 16…
## $ under.five.deaths               <int> 83, 86, 89, 93, 97, 102, 106, 110, 113…
## $ Polio                           <int> 6, 58, 62, 67, 68, 66, 63, 64, 63, 58,…
## $ Total.expenditure               <dbl> 8.16, 8.18, 8.13, 8.52, 7.87, 9.20, 9.…
## $ Diphtheria                      <int> 65, 62, 64, 67, 68, 66, 63, 64, 63, 58…
## $ HIV.AIDS                        <dbl> 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1…
## $ GDP                             <dbl> 584.25921, 612.69651, 631.74498, 669.9…
## $ Population                      <dbl> 33736494, 327582, 31731688, 3696958, 2…
## $ thinness..1.19.years            <dbl> 17.2, 17.5, 17.7, 17.9, 18.2, 18.4, 18…
## $ thinness.5.9.years              <dbl> 17.3, 17.5, 17.7, 18.0, 18.2, 18.4, 18…
## $ Income.composition.of.resources <dbl> 0.479, 0.476, 0.470, 0.463, 0.454, 0.4…
## $ Schooling                       <dbl> 10.1, 10.0, 9.9, 9.8, 9.5, 9.2, 8.9, 8…

So now, we can see that the file was imported correctly.

Process Phase

And here some cleaning steps I followed:

  • I did not found any Spelling errors, Extra and blank space and duplicated value in the data.
  • This is the data of 193 distinct countries.

expectancy[!duplicated(expectancy), ] unique(expectancy$Country)

As our data is cleaned we will move to our further analysis.

Analyze Phase

expectancy %>% 
  select(Life.expectancy,infant.deaths, under.five.deaths,Adult.Mortality, Population, BMI) %>% 
  summary()
##  Life.expectancy infant.deaths    under.five.deaths Adult.Mortality
##  Min.   :36.30   Min.   :   0.0   Min.   :   0.00   Min.   :  1.0  
##  1st Qu.:63.10   1st Qu.:   0.0   1st Qu.:   0.00   1st Qu.: 74.0  
##  Median :72.10   Median :   3.0   Median :   4.00   Median :144.0  
##  Mean   :69.22   Mean   :  30.3   Mean   :  42.04   Mean   :164.8  
##  3rd Qu.:75.70   3rd Qu.:  22.0   3rd Qu.:  28.00   3rd Qu.:228.0  
##  Max.   :89.00   Max.   :1800.0   Max.   :2500.00   Max.   :723.0  
##  NA's   :10                                         NA's   :10     
##    Population             BMI       
##  Min.   :3.400e+01   Min.   : 1.00  
##  1st Qu.:1.958e+05   1st Qu.:19.30  
##  Median :1.387e+06   Median :43.50  
##  Mean   :1.275e+07   Mean   :38.32  
##  3rd Qu.:7.420e+06   3rd Qu.:56.20  
##  Max.   :1.294e+09   Max.   :87.30  
##  NA's   :652         NA's   :34

I can see that the average life span of a person is 69 and the average of infants death rate is of 30 per 1000 population.

Share Phase

Now let’s visualize some key explorations.

Relationship between Life expectancy and adult mortality according to the country status

ggplot(expectancy, aes( Life.expectancy, Adult.Mortality, colour = Status))+
  geom_point(alpha = 0.5)+
  geom_smooth()

As i can see that the developed countries have a good life expectancy than the countries which are in the phase of development.

Relationship between life expectancy and alcohol

ggplot(expectancy, aes (Life.expectancy, Alcohol))+
  geom_point(alpha=0.5)+
  geom_smooth()

I can see the a week positive correlation between the pure alcohol and life expectancy.

Relationship between life expectancy and schooling

ggplot(expectancy, aes (Life.expectancy, Schooling))+
  geom_point(alpha=0.5)+
  geom_smooth()

There is a positive correlation between the schooling and life expectancy, it means schooling leads to a good life span of a person.

Relationship between life expentancy and income

ggplot(expectancy, aes (Life.expectancy, Income.composition.of.resources ))+
  geom_point( alpha= 0.5)+
  geom_smooth()

This graph shows a positive correlation between the life expectancy and the income composition of resources.

Relationship between life expectancy and HIV AIDS

ggplot(expectancy,aes( Life.expectancy,HIV.AIDS))+
  geom_point(alpha=0.5)+
  geom_smooth()

As I can see there is a very week negative correlation between HIV AIDS and life expectancy.

Conclusion

After looking at the data and the insights we created

  • Schooling effects the life expectancy of the peoples as the more people are educated the more knowledge they have of how they can get a good life span and what factors effect it.

  • We saw a positive correlation between the income composition of resource and life expectancy, as the people have more income composition of resources the more good life expectancy they have as they invest on themselves.

  • The HIV ADIS were negatively related to the life expectancy as the less people have HIV and AIDS the more life span they had .

  • In developed countries the life expectancy was more and adult mortality was less as compared to the countries that were developing where life expectancy was low .

Thank you very much for your interest!

And I would appreciate any comments and recommendations for improvement!