For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.
The tidyverse is a set of packages that work in harmony because they share common data representations and API design. The tidyverse package is designed to make it easy to install and load core packages from the tidyverse in a single command.
Tidyverse installs many packages used for data import , manipulation and modeling, Library (tidyverse) installs the core tidyverse packages that you are likely to use in almost every analysis. The other packages in tidyverse need to be called explicitly
Core Tidyverse packages ggplot2, for data visualisation. dplyr, for data manipulation. tidyr, for data tidying. readr, for data import. purrr, for functional programming. tibble, for tibbles, a modern re-imagining of data frames.
library (gapminder)
library (tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
Create a copy of the gapminder_unfiltered dataset to avoid damaging the original dataset
datagap<- (gapminder_unfiltered)
datagap
## # A tibble: 3,313 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
## 7 Afghanistan Asia 1982 39.854 12881816 978.0114
## 8 Afghanistan Asia 1987 40.822 13867957 852.3959
## 9 Afghanistan Asia 1992 41.674 16317921 649.3414
## 10 Afghanistan Asia 1997 41.763 22227415 635.3414
## # ... with 3,303 more rows
Dataset has 6 variables and 3313 observations Sorted by Country and Year (ascending)
?gapminder_unfiltered The supplemental data frame gapminder_unfiltered was not filtered on year or for complete data and has 3313 rows. Everything else is as documented in gapminder.
Our ambition with Gapminder World is to enable the display of data for all the countries and territories of the world. Therefore, the guiding principle has been to include as many entities as possible for which data might be available.
Please note that the inclusion of any geaographical area in this data set is based solely on data availability and convenience for possible users. Our choice of names for any of the included countries and territories is likewise made solely for the convenience of users. The notes on international status are based on Wikipedia. Neither this nor the inclusion/exclusion of a specific country or territory implies a stated opinion of Gapminder regarding the legal or political status of the geographica area in question. Neither do the names imply a stated opinion of Gapminder on the correct naming of an entity.
gapminder_unfiltered was not filtered on year or for complete data and has 3313 rows.
The number of countries and territories to include is arbitrary, but we have decided to include the following entities:
192 UN members (as of April 2008) 51 other entities listed in the “List of countries” in Wikipedia (2008-05-13). These include the Vatican, dependent territories, special entities and disputed territories. We have excluded the two “sub-dependencies” Ascension Island and Tristan da Cunha, although they are listed by Wikipedia. 4 French overseas territories (Guadeloupe, Martinique, Reunion and French Guyana), although they are considered an integral part of France 10 former states 2 ad-hoc areas: “Serbia excluding Kosovo” and “the Channel Islands”. The latter is the collective name of the two dependent territories Guernsey and Jersey.
factor with 187 levels
factor with 5 levels
ranges from 1952 to 2007. Some countries have data only every fifth year starting in 1952. Other countries have data for every year
life expectancy at birth, in years The data in this file was combined from hundreds of sources, in four steps:
a) The period 1990 to 2015, uses data from IHME
Downloaded from this file: Data after 1990 comes from: Global Burden of Disease Study 2015 (GBD 2015) Results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2016. Available from http://ghdx.healthdata.org/gbd-results-tool. b) Data before 1990 uses Gapminder Historic Life Expectancy data,
population
Gross Domestic Product per capita in constant 2000 US$. The inflation but not the differences in the cost of living between countries has been taken into account.
GDP per capita measures how much have been produced in a country during a year, divided by the number of people. The data is adjusted for inflation and differences in the cost of living between countries. Cross-country data for 2005 is mainly based on the 2005 round of the International Comparison Program. Real growth rates were linked to the 2005 levels. Several sources are used for these growth rates, such as the data of Angus Maddison. In addition we utilised a couple of cross-country comparisons for earlier years, which required that we adjusted the growth rates. The unit is in international dollars, fixed 2005 prices.
Life expectancy at birth, IHME downloaded 2015 jan from: http://ghdx.healthdata.org/record/global-burden-disease-study-2013-gbd-2013-age-sex-specific-all-cause-and-cause-specific Contributor: Global Burden of Disease Study 2013 Publication year : 2014
The data in this file is estimated and was combined from hundreds of sources, in four steps:
a) The period 1990 to 2015, uses data from IHME
Downloaded from this file: Data after 1990 comes from: Global Burden of Disease Study 2015 (GBD 2015) Results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2016. Available from http://ghdx.healthdata.org/gbd-results-tool. b) Data before 1990 uses Gapminder Historic Life Expectancy data,
dim(datagap)
## [1] 3313 6
summary(datagap)
## country continent year lifeExp
## Czech Republic: 58 Africa : 637 Min. :1950 Min. :23.60
## Denmark : 58 Americas: 470 1st Qu.:1967 1st Qu.:58.33
## Finland : 58 Asia : 578 Median :1982 Median :69.61
## Iceland : 58 Europe :1302 Mean :1980 Mean :65.24
## Japan : 58 FSU : 139 3rd Qu.:1996 3rd Qu.:73.66
## Netherlands : 58 Oceania : 187 Max. :2007 Max. :82.67
## (Other) :2965
## pop gdpPercap
## Min. :5.941e+04 Min. : 241.2
## 1st Qu.:2.680e+06 1st Qu.: 2505.3
## Median :7.560e+06 Median : 7825.8
## Mean :3.177e+07 Mean : 11313.8
## 3rd Qu.:1.961e+07 3rd Qu.: 17355.8
## Max. :1.319e+09 Max. :113523.1
##
names(datagap)
## [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
str(datagap)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3313 obs. of 6 variables:
## $ country : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
sum(is.na(datagap))
## [1] 0
summary(datagap)
## country continent year lifeExp
## Czech Republic: 58 Africa : 637 Min. :1950 Min. :23.60
## Denmark : 58 Americas: 470 1st Qu.:1967 1st Qu.:58.33
## Finland : 58 Asia : 578 Median :1982 Median :69.61
## Iceland : 58 Europe :1302 Mean :1980 Mean :65.24
## Japan : 58 FSU : 139 3rd Qu.:1996 3rd Qu.:73.66
## Netherlands : 58 Oceania : 187 Max. :2007 Max. :82.67
## (Other) :2965
## pop gdpPercap
## Min. :5.941e+04 Min. : 241.2
## 1st Qu.:2.680e+06 1st Qu.: 2505.3
## Median :7.560e+06 Median : 7825.8
## Mean :3.177e+07 Mean : 11313.8
## 3rd Qu.:1.961e+07 3rd Qu.: 17355.8
## Max. :1.319e+09 Max. :113523.1
##
unique(datagap$continent)
## [1] Asia Europe Africa Americas FSU Oceania
## Levels: Africa Americas Asia Europe FSU Oceania
unique(datagap$year)
## [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007 1950 1951
## [15] 1953 1954 1955 1956 1958 1959 1960 1961 1963 1964 1965 1966 1968 1969
## [29] 1970 1971 1973 1974 1975 1976 1978 1979 1980 1981 1983 1984 1985 1986
## [43] 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 2000 2001 2003 2004
## [57] 2005 2006
table(datagap$year)
##
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
## 39 24 144 24 24 24 24 144 25 25 26 26 151 26 26
## 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
## 27 27 156 27 27 27 27 168 32 27 27 27 171 27 27
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
## 27 27 171 27 27 27 27 171 27 27 32 33 183 33 33
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
## 33 33 184 33 33 33 33 187 33 32 30 18 183
GDP2007 <- filter(datagap, year== "2007")
GDP2007
## # A tibble: 183 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 2007 43.828 31889923 974.5803
## 2 Albania Europe 2007 76.423 3600523 5937.0295
## 3 Algeria Africa 2007 72.301 33333216 6223.3675
## 4 Angola Africa 2007 42.731 12420476 4797.2313
## 5 Argentina Americas 2007 75.320 40301927 12779.3796
## 6 Armenia FSU 2007 71.965 2971650 4942.5439
## 7 Aruba Americas 2007 74.239 72194 27230.6752
## 8 Australia Oceania 2007 81.235 20434176 34435.3674
## 9 Austria Europe 2007 79.829 8199783 36126.4927
## 10 Azerbaijan Asia 2007 67.487 8017309 7708.6112
## # ... with 173 more rows
ggplot(GDP2007,aes(x=GDP2007$gdpPercap)) +
geom_histogram(aes(y=..density..),
binwidth=2500,
colour="black", fill="red") +
geom_density(alpha=.2, fill="#99CCFF") +
labs(title="Histogram of All countries GDP per Capita in 2007 with Mean") +
labs(x="GDP per Capita", y="Density") +
geom_vline(aes(xintercept=mean(gdpPercap, na.rm = TRUE)),
color="blue", linetype="dashed", size=1)
ggplot(GDP2007, aes(x=gdpPercap, fill=continent)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
GDP2007<-arrange(GDP2007, desc(gdpPercap))
GDP2007
## # A tibble: 183 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Qatar Asia 2007 75.588 907229 82010.98
## 2 Macao, China Asia 2007 80.718 456989 54589.82
## 3 Norway Europe 2007 80.196 4627926 49357.19
## 4 Brunei Asia 2007 77.118 386511 48014.59
## 5 Kuwait Asia 2007 77.588 2505559 47306.99
## 6 Singapore Asia 2007 79.972 4553009 47143.18
## 7 United States Americas 2007 78.242 301139947 42951.65
## 8 Ireland Europe 2007 78.885 4109086 40676.00
## 9 Hong Kong, China Asia 2007 82.208 6980412 39724.98
## 10 Switzerland Europe 2007 81.701 7554661 37506.42
## # ... with 173 more rows
Canada <- datagap %>%
filter(country=="Canada") %>%
select(year, gdpPercap)
Canada
## # A tibble: 57 × 2
## year gdpPercap
## <int> <dbl>
## 1 1950 10581.27
## 2 1951 10932.47
## 3 1952 11367.16
## 4 1953 11586.61
## 5 1954 11173.26
## 6 1955 11901.51
## 7 1956 12555.55
## 8 1957 12489.95
## 9 1958 12384.41
## 10 1959 12590.80
## # ... with 47 more rows
ggplot(Canada, aes(x=year, y=gdpPercap)) +
geom_line() +
geom_point() +
labs(title="Canadian GDP per capita from 1950 to 2005") +
labs(x="Year", y="GDP per Capita")
GDPChange<-Canada %>%
mutate(GDPChange=(gdpPercap-lag(gdpPercap))/lag(gdpPercap))
tail(GDPChange)
## # A tibble: 6 × 3
## year gdpPercap GDPChange
## <int> <dbl> <dbl>
## 1 2001 32570.57 0.003758522
## 2 2002 33328.97 0.023284784
## 3 2003 33635.25 0.009189884
## 4 2004 34346.97 0.021159677
## 5 2005 35078.00 0.021283816
## 6 2007 36319.24 0.035384999
ggplot(GDPChange, aes(x=year, y=GDPChange)) +
geom_line() +
geom_point() +
labs(title="Change in Canadian GDP per capita from 1950 to 2005") +
labs(x="Year", y="Change in GDP per Capita")
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).