Nuevos paquetes.

Cargar paquetes

Desactivar notación científica

options(scipen=999)

Establecer directorio

Obtener datos usando WDI El paquete WDI funciona trayendo datos del banco mundial mediante ciertos códigos. Para buscar los códigos se necesita usar WDIsearch y en el argumento string ingresar keywords de lo que yo quiero buscar. Buscaré los códigos que tengan que ver con esperanza de vida (life expectancy)

query<-WDIsearch(string = "life expectancy", field = "name", short = TRUE, cache = NULL) 
kable(query)
indicator name
SE.SCH.LIFE School life expectancy, primary to tertiary, both sexes (years)
SE.SCH.LIFE.FE School life expectancy, primary to tertiary, female (years)
SE.SCH.LIFE.MA School life expectancy, primary to tertiary, male (years)
SP.DYN.LE00.FE.IN Life expectancy at birth, female (years)
SP.DYN.LE00.IN Life expectancy at birth, total (years)
SP.DYN.LE00.MA.IN Life expectancy at birth, male (years)
SP.DYN.LE60.FE.IN Life expectancy at age 60, female (years)
SP.DYN.LE60.MA.IN Life expectancy at age 60, male (years)
SP.DYN.LIFE.MF Life Expectancy at Birth(years)
UIS.SLE.02 School life expectancy, pre-primary, both sexes (years)
UIS.SLE.02.F School life expectancy, pre-primary, female (years)
UIS.SLE.02.GPI School life expectancy, pre-primary, gender parity index (GPI)
UIS.SLE.02.M School life expectancy, pre-primary, male (years)
UIS.SLE.1 School life expectancy, primary, both sexes (years)
UIS.SLE.1.F School life expectancy, primary, female (years)
UIS.SLE.1.GPI School life expectancy, primary, gender parity index (GPI)
UIS.SLE.1.M School life expectancy, primary, male (years)
UIS.SLE.12 School life expectancy, primary and lower secondary, both sexes (years)
UIS.SLE.12.F School life expectancy, primary and lower secondary, female (years)
UIS.SLE.12.M School life expectancy, primary and lower secondary, male (years)
UIS.SLE.123 School life expectancy, primary and secondary, both sexes (years)
UIS.SLE.123.F School life expectancy, primary and secondary, female (years)
UIS.SLE.123.GPI School life expectancy, primary and secondary, gender parity index (GPI)
UIS.SLE.123.M School life expectancy, primary and secondary, male (years)
UIS.SLE.1t6.GPI School life expectancy, primary to tertiary, gender parity index (GPI)
UIS.SLE.23 School life expectancy, secondary, both sexes (years)
UIS.SLE.23.F School life expectancy, secondary, female (years)
UIS.SLE.23.GPI School life expectancy, secondary, gender parity index (GPI)
UIS.SLE.23.M School life expectancy, secondary, male (years)
UIS.SLE.4 School life expectancy, post-secondary non-tertiary, both sexes (years)
UIS.SLE.4.F School life expectancy, post-secondary non-tertiary, female (years)
UIS.SLE.4.GPI School life expectancy, post-secondary non-tertiary, gender parity index (GPI)
UIS.SLE.4.M School life expectancy, post-secondary non-tertiary, male (years)
UIS.SLE.56 School life expectancy, tertiary, both sexes (years)
UIS.SLE.56.F School life expectancy, tertiary, female (years)
UIS.SLE.56.GPI School life expectancy, tertiary, gender parity index (GPI)
UIS.SLE.56.M School life expectancy, tertiary, male (years)
UIS.SLEN.12.F School life expectancy, primary and lower secondary (excluding repetition), female (years)
UIS.SLEN.12.GPI School life expectancy, primary and lower secondary (excluding repetition), gender parity index (GPI)
UIS.SLEN.12.M School life expectancy, primary and lower secondary (excluding repetition), male (years)
UIS.SLEN.12.T School life expectancy, primary and lower secondary (excluding repetition), both sexes (years)

Extraeré información de esperanza de vida de hombres y mujeres al nacer para todos los países de 1960 a 2014. Creare un dataframe wdi con estos indicadores.

wdi<-WDI(indicator = c("SP.DYN.LE00.FE.IN","SP.DYN.LE00.MA.IN"),
         start = 1960, end = 2014)
kable(head(wdi))
iso2c country year SP.DYN.LE00.FE.IN SP.DYN.LE00.MA.IN
1A Arab World 1960 47.62956 45.50474
1A Arab World 1961 48.22032 46.10173
1A Arab World 1962 48.81102 46.68912
1A Arab World 1963 49.40685 47.26981
1A Arab World 1964 50.01012 47.84458
1A Arab World 1965 50.61657 48.41137

Estadística descriptiva y Wrangling

Estadística descriptiva es la exploración de datos con indicadores de tendencia central, disperción, distribución etc. Wrangling o cleaning es la forma en la que se maneja bases de datos. En R es espcialmente últil el usar el paquete dplyr que viene incluído al instalar tidyverse. Primero, cambieremos los dataframes usados hasta este punto por tibbles. Un tibble es una actualización al dataframe con mejor visualización.

wdi<-as_tibble(wdi)

Segundo, adaptaremos nuestro uso de funciones al uso del operador pipeline %>%. Para incluirlo simpemente presionar cmd/ctr + m. El pipeline nos va a permitir encadenar tareas de una manera elegante. Observemos el siguiente ejemplo, a un vector x se le quiere calcular su valor exponencial, raíz cuadrada y logartimo natural. La forma tradicional de hacerlo es:

x<- c(1,2,3) 
log(sqrt(exp(x)))
## [1] 0.5 1.0 1.5

La forma tradicional crear una linea con varios argumentos es ineficiente ya que con ímplica tener un orden perfecto entre operaciones y parantesis. Usando pipeline se puede hacer lo mismo de manera más eficiente y visualmente más entendible.

x %>% exp() %>% sqrt() %>% log()
## [1] 0.5 1.0 1.5
x %>% exp() %>% sqrt() %>% log()
## [1] 0.5 1.0 1.5

Ahora usando el pipeline vamos a renombrar las variables del data set wdi a nombres más amigables.

wdi<-wdi %>% rename(le_women=SP.DYN.LE00.FE.IN,le_men=SP.DYN.LE00.MA.IN)
kable(head(wdi))
iso2c country year le_women le_men
1A Arab World 1960 47.62956 45.50474
1A Arab World 1961 48.22032 46.10173
1A Arab World 1962 48.81102 46.68912
1A Arab World 1963 49.40685 47.26981
1A Arab World 1964 50.01012 47.84458
1A Arab World 1965 50.61657 48.41137

Estadística descriptiva categórica

Una variable categórica tiene categorías que pueden ser contadas y representar proporciones de un total. Exploremos la variable country. Para saber el número de observaciones sobre categoría se usa table().

t<- table(wdi$country)
kable(t)
Var1 Freq
Afghanistan 55
Albania 55
Algeria 55
American Samoa 55
Andorra 55
Angola 55
Antigua and Barbuda 55
Arab World 55
Argentina 55
Armenia 55
Aruba 55
Australia 55
Austria 55
Azerbaijan 55
Bahamas, The 55
Bahrain 55
Bangladesh 55
Barbados 55
Belarus 55
Belgium 55
Belize 55
Benin 55
Bermuda 55
Bhutan 55
Bolivia 55
Bosnia and Herzegovina 55
Botswana 55
Brazil 55
British Virgin Islands 55
Brunei Darussalam 55
Bulgaria 55
Burkina Faso 55
Burundi 55
Cabo Verde 55
Cambodia 55
Cameroon 55
Canada 55
Caribbean small states 55
Cayman Islands 55
Central African Republic 55
Central Europe and the Baltics 55
Chad 55
Channel Islands 55
Chile 55
China 55
Colombia 55
Comoros 55
Congo, Dem. Rep. 55
Congo, Rep. 55
Costa Rica 55
Cote d’Ivoire 55
Croatia 55
Cuba 55
Curacao 55
Cyprus 55
Czech Republic 55
Denmark 55
Djibouti 55
Dominica 55
Dominican Republic 55
Early-demographic dividend 55
East Asia & Pacific 55
East Asia & Pacific (excluding high income) 55
East Asia & Pacific (IDA & IBRD countries) 55
Ecuador 55
Egypt, Arab Rep. 55
El Salvador 55
Equatorial Guinea 55
Eritrea 55
Estonia 55
Eswatini 55
Ethiopia 55
Euro area 55
Europe & Central Asia 55
Europe & Central Asia (excluding high income) 55
Europe & Central Asia (IDA & IBRD countries) 55
European Union 55
Faroe Islands 55
Fiji 55
Finland 55
Fragile and conflict affected situations 55
France 55
French Polynesia 55
Gabon 55
Gambia, The 55
Georgia 55
Germany 55
Ghana 55
Gibraltar 55
Greece 55
Greenland 55
Grenada 55
Guam 55
Guatemala 55
Guinea 55
Guinea-Bissau 55
Guyana 55
Haiti 55
Heavily indebted poor countries (HIPC) 55
High income 55
Honduras 55
Hong Kong SAR, China 55
Hungary 55
IBRD only 55
Iceland 55
IDA & IBRD total 55
IDA blend 55
IDA only 55
IDA total 55
India 55
Indonesia 55
Iran, Islamic Rep. 55
Iraq 55
Ireland 55
Isle of Man 55
Israel 55
Italy 55
Jamaica 55
Japan 55
Jordan 55
Kazakhstan 55
Kenya 55
Kiribati 55
Korea, Dem. People’s Rep. 55
Korea, Rep. 55
Kosovo 55
Kuwait 55
Kyrgyz Republic 55
Lao PDR 55
Late-demographic dividend 55
Latin America & Caribbean 55
Latin America & Caribbean (excluding high income) 55
Latin America & the Caribbean (IDA & IBRD countries) 55
Latvia 55
Least developed countries: UN classification 55
Lebanon 55
Lesotho 55
Liberia 55
Libya 55
Liechtenstein 55
Lithuania 55
Low & middle income 55
Low income 55
Lower middle income 55
Luxembourg 55
Macao SAR, China 55
Madagascar 55
Malawi 55
Malaysia 55
Maldives 55
Mali 55
Malta 55
Marshall Islands 55
Mauritania 55
Mauritius 55
Mexico 55
Micronesia, Fed. Sts. 55
Middle East & North Africa 55
Middle East & North Africa (excluding high income) 55
Middle East & North Africa (IDA & IBRD countries) 55
Middle income 55
Moldova 55
Monaco 55
Mongolia 55
Montenegro 55
Morocco 55
Mozambique 55
Myanmar 55
Namibia 55
Nauru 55
Nepal 55
Netherlands 55
New Caledonia 55
New Zealand 55
Nicaragua 55
Niger 55
Nigeria 55
North America 55
North Macedonia 55
Northern Mariana Islands 55
Norway 55
Not classified 55
OECD members 55
Oman 55
Other small states 55
Pacific island small states 55
Pakistan 55
Palau 55
Panama 55
Papua New Guinea 55
Paraguay 55
Peru 55
Philippines 55
Poland 55
Portugal 55
Post-demographic dividend 55
Pre-demographic dividend 55
Puerto Rico 55
Qatar 55
Romania 55
Russian Federation 55
Rwanda 55
Samoa 55
San Marino 55
Sao Tome and Principe 55
Saudi Arabia 55
Senegal 55
Serbia 55
Seychelles 55
Sierra Leone 55
Singapore 55
Sint Maarten (Dutch part) 55
Slovak Republic 55
Slovenia 55
Small states 55
Solomon Islands 55
Somalia 55
South Africa 55
South Asia 55
South Asia (IDA & IBRD) 55
South Sudan 55
Spain 55
Sri Lanka 55
St. Kitts and Nevis 55
St. Lucia 55
St. Martin (French part) 55
St. Vincent and the Grenadines 55
Sub-Saharan Africa 55
Sub-Saharan Africa (excluding high income) 55
Sub-Saharan Africa (IDA & IBRD countries) 55
Sudan 55
Suriname 55
Sweden 55
Switzerland 55
Syrian Arab Republic 55
Tajikistan 55
Tanzania 55
Thailand 55
Timor-Leste 55
Togo 55
Tonga 55
Trinidad and Tobago 55
Tunisia 55
Turkey 55
Turkmenistan 55
Turks and Caicos Islands 55
Tuvalu 55
Uganda 55
Ukraine 55
United Arab Emirates 55
United Kingdom 55
United States 55
Upper middle income 55
Uruguay 55
Uzbekistan 55
Vanuatu 55
Venezuela, RB 55
Vietnam 55
Virgin Islands (U.S.) 55
West Bank and Gaza 55
World 55
Yemen, Rep. 55
Zambia 55
Zimbabwe 55

Ahora para obtener proporciones usamos la función prop.table(). Además, usemos la función round para aporximar a dos cifras decimales

kable(round(prop.table(table(wdi$country)),digits = 4))
Var1 Freq
Afghanistan 0.0038
Albania 0.0038
Algeria 0.0038
American Samoa 0.0038
Andorra 0.0038
Angola 0.0038
Antigua and Barbuda 0.0038
Arab World 0.0038
Argentina 0.0038
Armenia 0.0038
Aruba 0.0038
Australia 0.0038
Austria 0.0038
Azerbaijan 0.0038
Bahamas, The 0.0038
Bahrain 0.0038
Bangladesh 0.0038
Barbados 0.0038
Belarus 0.0038
Belgium 0.0038
Belize 0.0038
Benin 0.0038
Bermuda 0.0038
Bhutan 0.0038
Bolivia 0.0038
Bosnia and Herzegovina 0.0038
Botswana 0.0038
Brazil 0.0038
British Virgin Islands 0.0038
Brunei Darussalam 0.0038
Bulgaria 0.0038
Burkina Faso 0.0038
Burundi 0.0038
Cabo Verde 0.0038
Cambodia 0.0038
Cameroon 0.0038
Canada 0.0038
Caribbean small states 0.0038
Cayman Islands 0.0038
Central African Republic 0.0038
Central Europe and the Baltics 0.0038
Chad 0.0038
Channel Islands 0.0038
Chile 0.0038
China 0.0038
Colombia 0.0038
Comoros 0.0038
Congo, Dem. Rep. 0.0038
Congo, Rep. 0.0038
Costa Rica 0.0038
Cote d’Ivoire 0.0038
Croatia 0.0038
Cuba 0.0038
Curacao 0.0038
Cyprus 0.0038
Czech Republic 0.0038
Denmark 0.0038
Djibouti 0.0038
Dominica 0.0038
Dominican Republic 0.0038
Early-demographic dividend 0.0038
East Asia & Pacific 0.0038
East Asia & Pacific (excluding high income) 0.0038
East Asia & Pacific (IDA & IBRD countries) 0.0038
Ecuador 0.0038
Egypt, Arab Rep. 0.0038
El Salvador 0.0038
Equatorial Guinea 0.0038
Eritrea 0.0038
Estonia 0.0038
Eswatini 0.0038
Ethiopia 0.0038
Euro area 0.0038
Europe & Central Asia 0.0038
Europe & Central Asia (excluding high income) 0.0038
Europe & Central Asia (IDA & IBRD countries) 0.0038
European Union 0.0038
Faroe Islands 0.0038
Fiji 0.0038
Finland 0.0038
Fragile and conflict affected situations 0.0038
France 0.0038
French Polynesia 0.0038
Gabon 0.0038
Gambia, The 0.0038
Georgia 0.0038
Germany 0.0038
Ghana 0.0038
Gibraltar 0.0038
Greece 0.0038
Greenland 0.0038
Grenada 0.0038
Guam 0.0038
Guatemala 0.0038
Guinea 0.0038
Guinea-Bissau 0.0038
Guyana 0.0038
Haiti 0.0038
Heavily indebted poor countries (HIPC) 0.0038
High income 0.0038
Honduras 0.0038
Hong Kong SAR, China 0.0038
Hungary 0.0038
IBRD only 0.0038
Iceland 0.0038
IDA & IBRD total 0.0038
IDA blend 0.0038
IDA only 0.0038
IDA total 0.0038
India 0.0038
Indonesia 0.0038
Iran, Islamic Rep. 0.0038
Iraq 0.0038
Ireland 0.0038
Isle of Man 0.0038
Israel 0.0038
Italy 0.0038
Jamaica 0.0038
Japan 0.0038
Jordan 0.0038
Kazakhstan 0.0038
Kenya 0.0038
Kiribati 0.0038
Korea, Dem. People’s Rep. 0.0038
Korea, Rep. 0.0038
Kosovo 0.0038
Kuwait 0.0038
Kyrgyz Republic 0.0038
Lao PDR 0.0038
Late-demographic dividend 0.0038
Latin America & Caribbean 0.0038
Latin America & Caribbean (excluding high income) 0.0038
Latin America & the Caribbean (IDA & IBRD countries) 0.0038
Latvia 0.0038
Least developed countries: UN classification 0.0038
Lebanon 0.0038
Lesotho 0.0038
Liberia 0.0038
Libya 0.0038
Liechtenstein 0.0038
Lithuania 0.0038
Low & middle income 0.0038
Low income 0.0038
Lower middle income 0.0038
Luxembourg 0.0038
Macao SAR, China 0.0038
Madagascar 0.0038
Malawi 0.0038
Malaysia 0.0038
Maldives 0.0038
Mali 0.0038
Malta 0.0038
Marshall Islands 0.0038
Mauritania 0.0038
Mauritius 0.0038
Mexico 0.0038
Micronesia, Fed. Sts. 0.0038
Middle East & North Africa 0.0038
Middle East & North Africa (excluding high income) 0.0038
Middle East & North Africa (IDA & IBRD countries) 0.0038
Middle income 0.0038
Moldova 0.0038
Monaco 0.0038
Mongolia 0.0038
Montenegro 0.0038
Morocco 0.0038
Mozambique 0.0038
Myanmar 0.0038
Namibia 0.0038
Nauru 0.0038
Nepal 0.0038
Netherlands 0.0038
New Caledonia 0.0038
New Zealand 0.0038
Nicaragua 0.0038
Niger 0.0038
Nigeria 0.0038
North America 0.0038
North Macedonia 0.0038
Northern Mariana Islands 0.0038
Norway 0.0038
Not classified 0.0038
OECD members 0.0038
Oman 0.0038
Other small states 0.0038
Pacific island small states 0.0038
Pakistan 0.0038
Palau 0.0038
Panama 0.0038
Papua New Guinea 0.0038
Paraguay 0.0038
Peru 0.0038
Philippines 0.0038
Poland 0.0038
Portugal 0.0038
Post-demographic dividend 0.0038
Pre-demographic dividend 0.0038
Puerto Rico 0.0038
Qatar 0.0038
Romania 0.0038
Russian Federation 0.0038
Rwanda 0.0038
Samoa 0.0038
San Marino 0.0038
Sao Tome and Principe 0.0038
Saudi Arabia 0.0038
Senegal 0.0038
Serbia 0.0038
Seychelles 0.0038
Sierra Leone 0.0038
Singapore 0.0038
Sint Maarten (Dutch part) 0.0038
Slovak Republic 0.0038
Slovenia 0.0038
Small states 0.0038
Solomon Islands 0.0038
Somalia 0.0038
South Africa 0.0038
South Asia 0.0038
South Asia (IDA & IBRD) 0.0038
South Sudan 0.0038
Spain 0.0038
Sri Lanka 0.0038
St. Kitts and Nevis 0.0038
St. Lucia 0.0038
St. Martin (French part) 0.0038
St. Vincent and the Grenadines 0.0038
Sub-Saharan Africa 0.0038
Sub-Saharan Africa (excluding high income) 0.0038
Sub-Saharan Africa (IDA & IBRD countries) 0.0038
Sudan 0.0038
Suriname 0.0038
Sweden 0.0038
Switzerland 0.0038
Syrian Arab Republic 0.0038
Tajikistan 0.0038
Tanzania 0.0038
Thailand 0.0038
Timor-Leste 0.0038
Togo 0.0038
Tonga 0.0038
Trinidad and Tobago 0.0038
Tunisia 0.0038
Turkey 0.0038
Turkmenistan 0.0038
Turks and Caicos Islands 0.0038
Tuvalu 0.0038
Uganda 0.0038
Ukraine 0.0038
United Arab Emirates 0.0038
United Kingdom 0.0038
United States 0.0038
Upper middle income 0.0038
Uruguay 0.0038
Uzbekistan 0.0038
Vanuatu 0.0038
Venezuela, RB 0.0038
Vietnam 0.0038
Virgin Islands (U.S.) 0.0038
West Bank and Gaza 0.0038
World 0.0038
Yemen, Rep. 0.0038
Zambia 0.0038
Zimbabwe 0.0038

Usemos pipeline para ver obtener el mismo resultado de manera más eficiente

tprop<-table(wdi$country) %>% prop.table() %>% round(digits = 4) 
kable(tprop)
Var1 Freq
Afghanistan 0.0038
Albania 0.0038
Algeria 0.0038
American Samoa 0.0038
Andorra 0.0038
Angola 0.0038
Antigua and Barbuda 0.0038
Arab World 0.0038
Argentina 0.0038
Armenia 0.0038
Aruba 0.0038
Australia 0.0038
Austria 0.0038
Azerbaijan 0.0038
Bahamas, The 0.0038
Bahrain 0.0038
Bangladesh 0.0038
Barbados 0.0038
Belarus 0.0038
Belgium 0.0038
Belize 0.0038
Benin 0.0038
Bermuda 0.0038
Bhutan 0.0038
Bolivia 0.0038
Bosnia and Herzegovina 0.0038
Botswana 0.0038
Brazil 0.0038
British Virgin Islands 0.0038
Brunei Darussalam 0.0038
Bulgaria 0.0038
Burkina Faso 0.0038
Burundi 0.0038
Cabo Verde 0.0038
Cambodia 0.0038
Cameroon 0.0038
Canada 0.0038
Caribbean small states 0.0038
Cayman Islands 0.0038
Central African Republic 0.0038
Central Europe and the Baltics 0.0038
Chad 0.0038
Channel Islands 0.0038
Chile 0.0038
China 0.0038
Colombia 0.0038
Comoros 0.0038
Congo, Dem. Rep. 0.0038
Congo, Rep. 0.0038
Costa Rica 0.0038
Cote d’Ivoire 0.0038
Croatia 0.0038
Cuba 0.0038
Curacao 0.0038
Cyprus 0.0038
Czech Republic 0.0038
Denmark 0.0038
Djibouti 0.0038
Dominica 0.0038
Dominican Republic 0.0038
Early-demographic dividend 0.0038
East Asia & Pacific 0.0038
East Asia & Pacific (excluding high income) 0.0038
East Asia & Pacific (IDA & IBRD countries) 0.0038
Ecuador 0.0038
Egypt, Arab Rep. 0.0038
El Salvador 0.0038
Equatorial Guinea 0.0038
Eritrea 0.0038
Estonia 0.0038
Eswatini 0.0038
Ethiopia 0.0038
Euro area 0.0038
Europe & Central Asia 0.0038
Europe & Central Asia (excluding high income) 0.0038
Europe & Central Asia (IDA & IBRD countries) 0.0038
European Union 0.0038
Faroe Islands 0.0038
Fiji 0.0038
Finland 0.0038
Fragile and conflict affected situations 0.0038
France 0.0038
French Polynesia 0.0038
Gabon 0.0038
Gambia, The 0.0038
Georgia 0.0038
Germany 0.0038
Ghana 0.0038
Gibraltar 0.0038
Greece 0.0038
Greenland 0.0038
Grenada 0.0038
Guam 0.0038
Guatemala 0.0038
Guinea 0.0038
Guinea-Bissau 0.0038
Guyana 0.0038
Haiti 0.0038
Heavily indebted poor countries (HIPC) 0.0038
High income 0.0038
Honduras 0.0038
Hong Kong SAR, China 0.0038
Hungary 0.0038
IBRD only 0.0038
Iceland 0.0038
IDA & IBRD total 0.0038
IDA blend 0.0038
IDA only 0.0038
IDA total 0.0038
India 0.0038
Indonesia 0.0038
Iran, Islamic Rep. 0.0038
Iraq 0.0038
Ireland 0.0038
Isle of Man 0.0038
Israel 0.0038
Italy 0.0038
Jamaica 0.0038
Japan 0.0038
Jordan 0.0038
Kazakhstan 0.0038
Kenya 0.0038
Kiribati 0.0038
Korea, Dem. People’s Rep. 0.0038
Korea, Rep. 0.0038
Kosovo 0.0038
Kuwait 0.0038
Kyrgyz Republic 0.0038
Lao PDR 0.0038
Late-demographic dividend 0.0038
Latin America & Caribbean 0.0038
Latin America & Caribbean (excluding high income) 0.0038
Latin America & the Caribbean (IDA & IBRD countries) 0.0038
Latvia 0.0038
Least developed countries: UN classification 0.0038
Lebanon 0.0038
Lesotho 0.0038
Liberia 0.0038
Libya 0.0038
Liechtenstein 0.0038
Lithuania 0.0038
Low & middle income 0.0038
Low income 0.0038
Lower middle income 0.0038
Luxembourg 0.0038
Macao SAR, China 0.0038
Madagascar 0.0038
Malawi 0.0038
Malaysia 0.0038
Maldives 0.0038
Mali 0.0038
Malta 0.0038
Marshall Islands 0.0038
Mauritania 0.0038
Mauritius 0.0038
Mexico 0.0038
Micronesia, Fed. Sts. 0.0038
Middle East & North Africa 0.0038
Middle East & North Africa (excluding high income) 0.0038
Middle East & North Africa (IDA & IBRD countries) 0.0038
Middle income 0.0038
Moldova 0.0038
Monaco 0.0038
Mongolia 0.0038
Montenegro 0.0038
Morocco 0.0038
Mozambique 0.0038
Myanmar 0.0038
Namibia 0.0038
Nauru 0.0038
Nepal 0.0038
Netherlands 0.0038
New Caledonia 0.0038
New Zealand 0.0038
Nicaragua 0.0038
Niger 0.0038
Nigeria 0.0038
North America 0.0038
North Macedonia 0.0038
Northern Mariana Islands 0.0038
Norway 0.0038
Not classified 0.0038
OECD members 0.0038
Oman 0.0038
Other small states 0.0038
Pacific island small states 0.0038
Pakistan 0.0038
Palau 0.0038
Panama 0.0038
Papua New Guinea 0.0038
Paraguay 0.0038
Peru 0.0038
Philippines 0.0038
Poland 0.0038
Portugal 0.0038
Post-demographic dividend 0.0038
Pre-demographic dividend 0.0038
Puerto Rico 0.0038
Qatar 0.0038
Romania 0.0038
Russian Federation 0.0038
Rwanda 0.0038
Samoa 0.0038
San Marino 0.0038
Sao Tome and Principe 0.0038
Saudi Arabia 0.0038
Senegal 0.0038
Serbia 0.0038
Seychelles 0.0038
Sierra Leone 0.0038
Singapore 0.0038
Sint Maarten (Dutch part) 0.0038
Slovak Republic 0.0038
Slovenia 0.0038
Small states 0.0038
Solomon Islands 0.0038
Somalia 0.0038
South Africa 0.0038
South Asia 0.0038
South Asia (IDA & IBRD) 0.0038
South Sudan 0.0038
Spain 0.0038
Sri Lanka 0.0038
St. Kitts and Nevis 0.0038
St. Lucia 0.0038
St. Martin (French part) 0.0038
St. Vincent and the Grenadines 0.0038
Sub-Saharan Africa 0.0038
Sub-Saharan Africa (excluding high income) 0.0038
Sub-Saharan Africa (IDA & IBRD countries) 0.0038
Sudan 0.0038
Suriname 0.0038
Sweden 0.0038
Switzerland 0.0038
Syrian Arab Republic 0.0038
Tajikistan 0.0038
Tanzania 0.0038
Thailand 0.0038
Timor-Leste 0.0038
Togo 0.0038
Tonga 0.0038
Trinidad and Tobago 0.0038
Tunisia 0.0038
Turkey 0.0038
Turkmenistan 0.0038
Turks and Caicos Islands 0.0038
Tuvalu 0.0038
Uganda 0.0038
Ukraine 0.0038
United Arab Emirates 0.0038
United Kingdom 0.0038
United States 0.0038
Upper middle income 0.0038
Uruguay 0.0038
Uzbekistan 0.0038
Vanuatu 0.0038
Venezuela, RB 0.0038
Vietnam 0.0038
Virgin Islands (U.S.) 0.0038
West Bank and Gaza 0.0038
World 0.0038
Yemen, Rep. 0.0038
Zambia 0.0038
Zimbabwe 0.0038

Estadística descriptiva básica en variables continuas

Número de observaciones El número de observaciones es el total de datos-na-null.

length(wdi$le_women) #número de observaciones con NA
## [1] 14520
is.na(wdi$le_women) %>% sum()  #número de valores NA
## [1] 1258
is.null(wdi$le_women) %>% sum() #número de valores null
## [1] 0
length(wdi$le_women) - is.na(wdi$le_women) %>% sum() #total valores
## [1] 13262

El número de variables y observaciones usando dim()

dim(wdi) #número de observaciones y variables
## [1] 14520     5

Detectar valores NA Algunos comandos no funcionan cuando tienen NA. Se debe detectar los valores NA usando la función is.na() y sumando los valores lógicos con sum

is.na(wdi$le_women) %>% sum()
## [1] 1258

La variable le_women tiene valores NA por locual usaremos el argumento na.rm=TRUE en todas las funciones de estadística descriptiva, de lo contrario tendremos error.

Media

mean(wdi$le_women, na.rm = TRUE)
## [1] 65.57062

Desviación Estándar

sd(wdi$le_women,na.rm = TRUE)
## [1] 11.81371

Varianza

var(wdi$le_women,na.rm = TRUE)
## [1] 139.5638

Mínimo

min(wdi$le_women,na.rm = TRUE)
## [1] 22.394

Máximo

max(wdi$le_women,na.rm = TRUE)
## [1] 86.9

Mediana

median(wdi$le_women,na.rm = TRUE)
## [1] 68.427

Rango

range(wdi$le_women,na.rm = TRUE)
## [1] 22.394 86.900

Quintiles

quantile(wdi$le_women,na.rm = TRUE)
##       0%      25%      50%      75%     100% 
## 22.39400 56.55975 68.42700 74.84500 86.90000

Función Summary La función summary otorga el valor mínimo, máximo, quartil 1, quartil 2 y NA.

summary(wdi$le_women) #quintiles
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   22.39   56.56   68.43   65.57   74.84   86.90    1258

Rango Interquartil

IQR(wdi$le_women,na.rm = TRUE)
## [1] 18.28525

Visualizemos como está distribuida la variable._

ggplot(wdi) + geom_boxplot(aes(x=le_women)) #caja y bigotes
## Warning: Removed 1258 rows containing non-finite values (stat_boxplot).

Podemos obtener todas las estadísticas descriptivas relevantes en un solo tibble/dataframe usando describe()

describe(wdi$le_women,na.rm = TRUE)
##    vars     n  mean    sd median trimmed   mad   min  max range  skew kurtosis
## X1    1 13262 65.57 11.81  68.43    66.4 11.88 22.39 86.9 64.51 -0.58    -0.58
##     se
## X1 0.1

Si queremos obtener kurtosis o skewness debemos tener menos de 5000 observaciones, usamos el argumento extra norm=TRUE.

Función select La función select permite seleccionar ciertas columans de un dataframe/tibble. La estructura es data, var1,var,2

select(wdi,le_women,le_men) 
## # A tibble: 14,520 x 2
##    le_women le_men
##       <dbl>  <dbl>
##  1     47.6   45.5
##  2     48.2   46.1
##  3     48.8   46.7
##  4     49.4   47.3
##  5     50.0   47.8
##  6     50.6   48.4
##  7     51.2   49.0
##  8     51.8   49.5
##  9     52.4   50.0
## 10     52.9   50.5
## # … with 14,510 more rows
wdi %>% select(le_women,le_men)
## # A tibble: 14,520 x 2
##    le_women le_men
##       <dbl>  <dbl>
##  1     47.6   45.5
##  2     48.2   46.1
##  3     48.8   46.7
##  4     49.4   47.3
##  5     50.0   47.8
##  6     50.6   48.4
##  7     51.2   49.0
##  8     51.8   49.5
##  9     52.4   50.0
## 10     52.9   50.5
## # … with 14,510 more rows

Ahora queremos borrar una columna del dataframe.

wdi<-wdi %>% select(-iso2c)
wdi
## # A tibble: 14,520 x 4
##    country     year le_women le_men
##    <chr>      <int>    <dbl>  <dbl>
##  1 Arab World  1960     47.6   45.5
##  2 Arab World  1961     48.2   46.1
##  3 Arab World  1962     48.8   46.7
##  4 Arab World  1963     49.4   47.3
##  5 Arab World  1964     50.0   47.8
##  6 Arab World  1965     50.6   48.4
##  7 Arab World  1966     51.2   49.0
##  8 Arab World  1967     51.8   49.5
##  9 Arab World  1968     52.4   50.0
## 10 Arab World  1969     52.9   50.5
## # … with 14,510 more rows

Podemos seleccionar con la misma lógica un data frame/tibble. Que contenga todas menos ciertas columnas.

wdi %>% select(-year)
## # A tibble: 14,520 x 3
##    country    le_women le_men
##    <chr>         <dbl>  <dbl>
##  1 Arab World     47.6   45.5
##  2 Arab World     48.2   46.1
##  3 Arab World     48.8   46.7
##  4 Arab World     49.4   47.3
##  5 Arab World     50.0   47.8
##  6 Arab World     50.6   48.4
##  7 Arab World     51.2   49.0
##  8 Arab World     51.8   49.5
##  9 Arab World     52.4   50.0
## 10 Arab World     52.9   50.5
## # … with 14,510 more rows

Obtengamos las estadísticas descriptivas de estas variables usando describe. Select igual permite seleccionar basados en distintos criterios como seleccionar columnas que comienzen con questio_ o que tengan una secuencia por ejemplo wti00.. Para esto hay funciones como starts_with o ends_with. Más información en help(select)

wdi %>% select(le_women,le_men) %>% stat.desc() %>% round(digits = 2) 
##               le_women    le_men
## nbr.val       13262.00  13262.00
## nbr.null          0.00      0.00
## nbr.na         1258.00   1258.00
## min              22.39     16.29
## max              86.90     84.10
## range            64.51     67.81
## sum          869597.52 808686.36
## median           68.43     63.41
## mean             65.57     60.98
## SE.mean           0.10      0.09
## CI.mean.0.95      0.20      0.18
## var             139.56    113.60
## std.dev          11.81     10.66
## coef.var          0.18      0.17

Si queremos obtener está tabla en latex para poder usarla en nuestros trabajos usamos la función xtable()

select(wdi,le_women) %>%stat.desc() %>%  round(digits = 2) %>% xtable()
## % latex table generated in R 4.0.2 by xtable 1.8-4 package
## % Fri Aug  7 19:28:14 2020
## \begin{table}[ht]
## \centering
## \begin{tabular}{rr}
##   \hline
##  & le\_women \\ 
##   \hline
## nbr.val & 13262.00 \\ 
##   nbr.null & 0.00 \\ 
##   nbr.na & 1258.00 \\ 
##   min & 22.39 \\ 
##   max & 86.90 \\ 
##   range & 64.51 \\ 
##   sum & 869597.52 \\ 
##   median & 68.43 \\ 
##   mean & 65.57 \\ 
##   SE.mean & 0.10 \\ 
##   CI.mean.0.95 & 0.20 \\ 
##   var & 139.56 \\ 
##   std.dev & 11.81 \\ 
##   coef.var & 0.18 \\ 
##    \hline
## \end{tabular}
## \end{table}

Esto copiamos y pegamos en un latex y listo.

Función filter Muchas veces no queremos toda la base solo cierta parte de ella. Usaremos filter para crear un subset. La lógica es data,condición lógica. Obtengamos los datos de Colombia, Ecuador y Perú.

wdi %>% filter(country=="Colombia" | country=="Peru" | country=="Ecuador")
## # A tibble: 165 x 4
##    country   year le_women le_men
##    <chr>    <int>    <dbl>  <dbl>
##  1 Colombia  1960     59.4   55.2
##  2 Colombia  1961     59.9   55.8
##  3 Colombia  1962     60.4   56.3
##  4 Colombia  1963     60.9   56.8
##  5 Colombia  1964     61.4   57.3
##  6 Colombia  1965     61.8   57.7
##  7 Colombia  1966     62.3   58.2
##  8 Colombia  1967     62.8   58.7
##  9 Colombia  1968     63.3   59.1
## 10 Colombia  1969     63.8   59.6
## # … with 155 more rows
wdi %>% filter(year>1990 & country=="Argentina")
## # A tibble: 24 x 4
##    country    year le_women le_men
##    <chr>     <int>    <dbl>  <dbl>
##  1 Argentina  1991     75.3   68.4
##  2 Argentina  1992     75.5   68.6
##  3 Argentina  1993     75.7   68.8
##  4 Argentina  1994     75.9   69.0
##  5 Argentina  1995     76.1   69.2
##  6 Argentina  1996     76.3   69.4
##  7 Argentina  1997     76.5   69.6
##  8 Argentina  1998     76.6   69.8
##  9 Argentina  1999     76.8   69.9
## 10 Argentina  2000     77.0   70.1
## # … with 14 more rows
wdi %>% filter(year>1990 & country=="Argentina")
## # A tibble: 24 x 4
##    country    year le_women le_men
##    <chr>     <int>    <dbl>  <dbl>
##  1 Argentina  1991     75.3   68.4
##  2 Argentina  1992     75.5   68.6
##  3 Argentina  1993     75.7   68.8
##  4 Argentina  1994     75.9   69.0
##  5 Argentina  1995     76.1   69.2
##  6 Argentina  1996     76.3   69.4
##  7 Argentina  1997     76.5   69.6
##  8 Argentina  1998     76.6   69.8
##  9 Argentina  1999     76.8   69.9
## 10 Argentina  2000     77.0   70.1
## # … with 14 more rows

Función arrange Arrange permite ordenar según una variable todo el dataframe/tible. El default es ordenar de forma ascendente, usar la función desc dentro de arrange para tener el orden descendente.

wdi %>% arrange(desc(year))
## # A tibble: 14,520 x 4
##    country                                        year le_women le_men
##    <chr>                                         <int>    <dbl>  <dbl>
##  1 Arab World                                     2014     73.0   69.3
##  2 World                                          2014     74.1   69.6
##  3 East Asia & Pacific (excluding high income)    2014     76.8   71.9
##  4 Europe & Central Asia (excluding high income)  2014     76.6   68.1
##  5 South Asia                                     2014     69.6   67.2
##  6 Andorra                                        2014     NA     NA  
##  7 United Arab Emirates                           2014     78.5   76.4
##  8 Afghanistan                                    2014     64.5   61.6
##  9 Antigua and Barbuda                            2014     77.6   75.1
## 10 Albania                                        2014     80.0   75.7
## # … with 14,510 more rows

Función mutate Cuando queremos agregar una columna extra al dataframe usarmos la siguiente forma genérica.

wdi$half_le_women<-wdi$le_women/2
wdi
## # A tibble: 14,520 x 5
##    country     year le_women le_men half_le_women
##    <chr>      <int>    <dbl>  <dbl>         <dbl>
##  1 Arab World  1960     47.6   45.5          23.8
##  2 Arab World  1961     48.2   46.1          24.1
##  3 Arab World  1962     48.8   46.7          24.4
##  4 Arab World  1963     49.4   47.3          24.7
##  5 Arab World  1964     50.0   47.8          25.0
##  6 Arab World  1965     50.6   48.4          25.3
##  7 Arab World  1966     51.2   49.0          25.6
##  8 Arab World  1967     51.8   49.5          25.9
##  9 Arab World  1968     52.4   50.0          26.2
## 10 Arab World  1969     52.9   50.5          26.5
## # … with 14,510 more rows

Está forma es ineficiente. Es mejor usar la función mutate para hacer lo mismo.

wdi %>% mutate(half_le_women_2=le_women/2)
## # A tibble: 14,520 x 6
##    country     year le_women le_men half_le_women half_le_women_2
##    <chr>      <int>    <dbl>  <dbl>         <dbl>           <dbl>
##  1 Arab World  1960     47.6   45.5          23.8            23.8
##  2 Arab World  1961     48.2   46.1          24.1            24.1
##  3 Arab World  1962     48.8   46.7          24.4            24.4
##  4 Arab World  1963     49.4   47.3          24.7            24.7
##  5 Arab World  1964     50.0   47.8          25.0            25.0
##  6 Arab World  1965     50.6   48.4          25.3            25.3
##  7 Arab World  1966     51.2   49.0          25.6            25.6
##  8 Arab World  1967     51.8   49.5          25.9            25.9
##  9 Arab World  1968     52.4   50.0          26.2            26.2
## 10 Arab World  1969     52.9   50.5          26.5            26.5
## # … with 14,510 more rows

Función group by Esta función me permite realizar operaciones por grupo. Por ejemplo, obetener la media de la esperanza de vida de hombres por país.

wdi %>% group_by(country) %>% mutate(mean_country=mean(le_men))
## # A tibble: 14,520 x 6
## # Groups:   country [264]
##    country     year le_women le_men half_le_women mean_country
##    <chr>      <int>    <dbl>  <dbl>         <dbl>        <dbl>
##  1 Arab World  1960     47.6   45.5          23.8         59.5
##  2 Arab World  1961     48.2   46.1          24.1         59.5
##  3 Arab World  1962     48.8   46.7          24.4         59.5
##  4 Arab World  1963     49.4   47.3          24.7         59.5
##  5 Arab World  1964     50.0   47.8          25.0         59.5
##  6 Arab World  1965     50.6   48.4          25.3         59.5
##  7 Arab World  1966     51.2   49.0          25.6         59.5
##  8 Arab World  1967     51.8   49.5          25.9         59.5
##  9 Arab World  1968     52.4   50.0          26.2         59.5
## 10 Arab World  1969     52.9   50.5          26.5         59.5
## # … with 14,510 more rows

Función summarise Muchas veces no queremos una nueva columna con la operación, sino una tabla con los valores.

wdi %>% group_by(country) %>% summarise(n=n(),
    mean_le_men=mean(le_men,na.rm=TRUE),
    sd_le_men=sd(le_men,na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 264 x 4
##    country                 n mean_le_men sd_le_men
##    <chr>               <int>       <dbl>     <dbl>
##  1 Afghanistan            55        46.8      9.26
##  2 Albania                55        69.0      3.43
##  3 Algeria                55        61.2      9.76
##  4 American Samoa         55       NaN       NA   
##  5 Andorra                55       NaN       NA   
##  6 Angola                 55        43.8      4.84
##  7 Antigua and Barbuda    55        68.7      4.27
##  8 Arab World             55        59.5      7.47
##  9 Argentina              55        67.3      3.30
## 10 Armenia                55        67.1      1.98
## # … with 254 more rows