Talle Panchonomist: Clase 4

Nuevos paquetes.

rmarkdown: permite la creación de cuadernos
knitr: crea tablas agradables en el cuaderno
WDI: conecta a R con las bases del Banco Mundial de indicadores de desarrollo
psych: estadística descriptiva para kurtosis y skewness
pastecs: estadística descriptiva completa
stargazer: tablas elegantes en latex
xtable: tablas elegantes en latex

Cargar paquetes

Desactivar notación científica

options(scipen=999)

Establecer directorio

Obtener datos usando WDI El paquete WDI funciona trayendo datos del banco mundial mediante ciertos códigos. Para buscar los códigos se necesita usar WDIsearch y en el argumento string ingresar keywords de lo que yo quiero buscar. Buscaré los códigos que tengan que ver con esperanza de vida (life expectancy)

query<-WDIsearch(string = "life expectancy", field = "name", short = TRUE, cache = NULL) 
kable(query)

indicator	name
SE.SCH.LIFE	School life expectancy, primary to tertiary, both sexes (years)
SE.SCH.LIFE.FE	School life expectancy, primary to tertiary, female (years)
SE.SCH.LIFE.MA	School life expectancy, primary to tertiary, male (years)
SP.DYN.LE00.FE.IN	Life expectancy at birth, female (years)
SP.DYN.LE00.IN	Life expectancy at birth, total (years)
SP.DYN.LE00.MA.IN	Life expectancy at birth, male (years)
SP.DYN.LE60.FE.IN	Life expectancy at age 60, female (years)
SP.DYN.LE60.MA.IN	Life expectancy at age 60, male (years)
SP.DYN.LIFE.MF	Life Expectancy at Birth(years)
UIS.SLE.02	School life expectancy, pre-primary, both sexes (years)
UIS.SLE.02.F	School life expectancy, pre-primary, female (years)
UIS.SLE.02.GPI	School life expectancy, pre-primary, gender parity index (GPI)
UIS.SLE.02.M	School life expectancy, pre-primary, male (years)
UIS.SLE.1	School life expectancy, primary, both sexes (years)
UIS.SLE.1.F	School life expectancy, primary, female (years)
UIS.SLE.1.GPI	School life expectancy, primary, gender parity index (GPI)
UIS.SLE.1.M	School life expectancy, primary, male (years)
UIS.SLE.12	School life expectancy, primary and lower secondary, both sexes (years)
UIS.SLE.12.F	School life expectancy, primary and lower secondary, female (years)
UIS.SLE.12.M	School life expectancy, primary and lower secondary, male (years)
UIS.SLE.123	School life expectancy, primary and secondary, both sexes (years)
UIS.SLE.123.F	School life expectancy, primary and secondary, female (years)
UIS.SLE.123.GPI	School life expectancy, primary and secondary, gender parity index (GPI)
UIS.SLE.123.M	School life expectancy, primary and secondary, male (years)
UIS.SLE.1t6.GPI	School life expectancy, primary to tertiary, gender parity index (GPI)
UIS.SLE.23	School life expectancy, secondary, both sexes (years)
UIS.SLE.23.F	School life expectancy, secondary, female (years)
UIS.SLE.23.GPI	School life expectancy, secondary, gender parity index (GPI)
UIS.SLE.23.M	School life expectancy, secondary, male (years)
UIS.SLE.4	School life expectancy, post-secondary non-tertiary, both sexes (years)
UIS.SLE.4.F	School life expectancy, post-secondary non-tertiary, female (years)
UIS.SLE.4.GPI	School life expectancy, post-secondary non-tertiary, gender parity index (GPI)
UIS.SLE.4.M	School life expectancy, post-secondary non-tertiary, male (years)
UIS.SLE.56	School life expectancy, tertiary, both sexes (years)
UIS.SLE.56.F	School life expectancy, tertiary, female (years)
UIS.SLE.56.GPI	School life expectancy, tertiary, gender parity index (GPI)
UIS.SLE.56.M	School life expectancy, tertiary, male (years)
UIS.SLEN.12.F	School life expectancy, primary and lower secondary (excluding repetition), female (years)
UIS.SLEN.12.GPI	School life expectancy, primary and lower secondary (excluding repetition), gender parity index (GPI)
UIS.SLEN.12.M	School life expectancy, primary and lower secondary (excluding repetition), male (years)
UIS.SLEN.12.T	School life expectancy, primary and lower secondary (excluding repetition), both sexes (years)

Extraeré información de esperanza de vida de hombres y mujeres al nacer para todos los países de 1960 a 2014. Creare un dataframe wdi con estos indicadores.

wdi<-WDI(indicator = c("SP.DYN.LE00.FE.IN","SP.DYN.LE00.MA.IN"),
         start = 1960, end = 2014)
kable(head(wdi))

iso2c	country	year	SP.DYN.LE00.FE.IN	SP.DYN.LE00.MA.IN
1A	Arab World	1960	47.62956	45.50474
1A	Arab World	1961	48.22032	46.10173
1A	Arab World	1962	48.81102	46.68912
1A	Arab World	1963	49.40685	47.26981
1A	Arab World	1964	50.01012	47.84458
1A	Arab World	1965	50.61657	48.41137

Estadística descriptiva y Wrangling

Estadística descriptiva es la exploración de datos con indicadores de tendencia central, disperción, distribución etc. Wrangling o cleaning es la forma en la que se maneja bases de datos. En R es espcialmente últil el usar el paquete dplyr que viene incluído al instalar tidyverse. Primero, cambieremos los dataframes usados hasta este punto por tibbles. Un tibble es una actualización al dataframe con mejor visualización.

wdi<-as_tibble(wdi)

Segundo, adaptaremos nuestro uso de funciones al uso del operador pipeline %>%. Para incluirlo simpemente presionar cmd/ctr + m. El pipeline nos va a permitir encadenar tareas de una manera elegante. Observemos el siguiente ejemplo, a un vector x se le quiere calcular su valor exponencial, raíz cuadrada y logartimo natural. La forma tradicional de hacerlo es:

x<- c(1,2,3) 
log(sqrt(exp(x)))

## [1] 0.5 1.0 1.5

La forma tradicional crear una linea con varios argumentos es ineficiente ya que con ímplica tener un orden perfecto entre operaciones y parantesis. Usando pipeline se puede hacer lo mismo de manera más eficiente y visualmente más entendible.

x %>% exp() %>% sqrt() %>% log()

## [1] 0.5 1.0 1.5

x %>% exp() %>% sqrt() %>% log()

## [1] 0.5 1.0 1.5

Ahora usando el pipeline vamos a renombrar las variables del data set wdi a nombres más amigables.

wdi<-wdi %>% rename(le_women=SP.DYN.LE00.FE.IN,le_men=SP.DYN.LE00.MA.IN)
kable(head(wdi))

iso2c	country	year	le_women	le_men
1A	Arab World	1960	47.62956	45.50474
1A	Arab World	1961	48.22032	46.10173
1A	Arab World	1962	48.81102	46.68912
1A	Arab World	1963	49.40685	47.26981
1A	Arab World	1964	50.01012	47.84458
1A	Arab World	1965	50.61657	48.41137

Estadística descriptiva categórica

Una variable categórica tiene categorías que pueden ser contadas y representar proporciones de un total. Exploremos la variable country. Para saber el número de observaciones sobre categoría se usa table().

t<- table(wdi$country)
kable(t)

Var1	Freq
Afghanistan	55
Albania	55
Algeria	55
American Samoa	55
Andorra	55
Angola	55
Antigua and Barbuda	55
Arab World	55
Argentina	55
Armenia	55
Aruba	55
Australia	55
Austria	55
Azerbaijan	55
Bahamas, The	55
Bahrain	55
Bangladesh	55
Barbados	55
Belarus	55
Belgium	55
Belize	55
Benin	55
Bermuda	55
Bhutan	55
Bolivia	55
Bosnia and Herzegovina	55
Botswana	55
Brazil	55
British Virgin Islands	55
Brunei Darussalam	55
Bulgaria	55
Burkina Faso	55
Burundi	55
Cabo Verde	55
Cambodia	55
Cameroon	55
Canada	55
Caribbean small states	55
Cayman Islands	55
Central African Republic	55
Central Europe and the Baltics	55
Chad	55
Channel Islands	55
Chile	55
China	55
Colombia	55
Comoros	55
Congo, Dem. Rep.	55
Congo, Rep.	55
Costa Rica	55
Cote d’Ivoire	55
Croatia	55
Cuba	55
Curacao	55
Cyprus	55
Czech Republic	55
Denmark	55
Djibouti	55
Dominica	55
Dominican Republic	55
Early-demographic dividend	55
East Asia & Pacific	55
East Asia & Pacific (excluding high income)	55
East Asia & Pacific (IDA & IBRD countries)	55
Ecuador	55
Egypt, Arab Rep.	55
El Salvador	55
Equatorial Guinea	55
Eritrea	55
Estonia	55
Eswatini	55
Ethiopia	55
Euro area	55
Europe & Central Asia	55
Europe & Central Asia (excluding high income)	55
Europe & Central Asia (IDA & IBRD countries)	55
European Union	55
Faroe Islands	55
Fiji	55
Finland	55
Fragile and conflict affected situations	55
France	55
French Polynesia	55
Gabon	55
Gambia, The	55
Georgia	55
Germany	55
Ghana	55
Gibraltar	55
Greece	55
Greenland	55
Grenada	55
Guam	55
Guatemala	55
Guinea	55
Guinea-Bissau	55
Guyana	55
Haiti	55
Heavily indebted poor countries (HIPC)	55
High income	55
Honduras	55
Hong Kong SAR, China	55
Hungary	55
IBRD only	55
Iceland	55
IDA & IBRD total	55
IDA blend	55
IDA only	55
IDA total	55
India	55
Indonesia	55
Iran, Islamic Rep.	55
Iraq	55
Ireland	55
Isle of Man	55
Israel	55
Italy	55
Jamaica	55
Japan	55
Jordan	55
Kazakhstan	55
Kenya	55
Kiribati	55
Korea, Dem. People’s Rep.	55
Korea, Rep.	55
Kosovo	55
Kuwait	55
Kyrgyz Republic	55
Lao PDR	55
Late-demographic dividend	55
Latin America & Caribbean	55
Latin America & Caribbean (excluding high income)	55
Latin America & the Caribbean (IDA & IBRD countries)	55
Latvia	55
Least developed countries: UN classification	55
Lebanon	55
Lesotho	55
Liberia	55
Libya	55
Liechtenstein	55
Lithuania	55
Low & middle income	55
Low income	55
Lower middle income	55
Luxembourg	55
Macao SAR, China	55
Madagascar	55
Malawi	55
Malaysia	55
Maldives	55
Mali	55
Malta	55
Marshall Islands	55
Mauritania	55
Mauritius	55
Mexico	55
Micronesia, Fed. Sts.	55
Middle East & North Africa	55
Middle East & North Africa (excluding high income)	55
Middle East & North Africa (IDA & IBRD countries)	55
Middle income	55
Moldova	55
Monaco	55
Mongolia	55
Montenegro	55
Morocco	55
Mozambique	55
Myanmar	55
Namibia	55
Nauru	55
Nepal	55
Netherlands	55
New Caledonia	55
New Zealand	55
Nicaragua	55
Niger	55
Nigeria	55
North America	55
North Macedonia	55
Northern Mariana Islands	55
Norway	55
Not classified	55
OECD members	55
Oman	55
Other small states	55
Pacific island small states	55
Pakistan	55
Palau	55
Panama	55
Papua New Guinea	55
Paraguay	55
Peru	55
Philippines	55
Poland	55
Portugal	55
Post-demographic dividend	55
Pre-demographic dividend	55
Puerto Rico	55
Qatar	55
Romania	55
Russian Federation	55
Rwanda	55
Samoa	55
San Marino	55
Sao Tome and Principe	55
Saudi Arabia	55
Senegal	55
Serbia	55
Seychelles	55
Sierra Leone	55
Singapore	55
Sint Maarten (Dutch part)	55
Slovak Republic	55
Slovenia	55
Small states	55
Solomon Islands	55
Somalia	55
South Africa	55
South Asia	55
South Asia (IDA & IBRD)	55
South Sudan	55
Spain	55
Sri Lanka	55
St. Kitts and Nevis	55
St. Lucia	55
St. Martin (French part)	55
St. Vincent and the Grenadines	55
Sub-Saharan Africa	55
Sub-Saharan Africa (excluding high income)	55
Sub-Saharan Africa (IDA & IBRD countries)	55
Sudan	55
Suriname	55
Sweden	55
Switzerland	55
Syrian Arab Republic	55
Tajikistan	55
Tanzania	55
Thailand	55
Timor-Leste	55
Togo	55
Tonga	55
Trinidad and Tobago	55
Tunisia	55
Turkey	55
Turkmenistan	55
Turks and Caicos Islands	55
Tuvalu	55
Uganda	55
Ukraine	55
United Arab Emirates	55
United Kingdom	55
United States	55
Upper middle income	55
Uruguay	55
Uzbekistan	55
Vanuatu	55
Venezuela, RB	55
Vietnam	55
Virgin Islands (U.S.)	55
West Bank and Gaza	55
World	55
Yemen, Rep.	55
Zambia	55
Zimbabwe	55

Ahora para obtener proporciones usamos la función prop.table(). Además, usemos la función round para aporximar a dos cifras decimales

kable(round(prop.table(table(wdi$country)),digits = 4))

Var1	Freq
Afghanistan	0.0038
Albania	0.0038
Algeria	0.0038
American Samoa	0.0038
Andorra	0.0038
Angola	0.0038
Antigua and Barbuda	0.0038
Arab World	0.0038
Argentina	0.0038
Armenia	0.0038
Aruba	0.0038
Australia	0.0038
Austria	0.0038
Azerbaijan	0.0038
Bahamas, The	0.0038
Bahrain	0.0038
Bangladesh	0.0038
Barbados	0.0038
Belarus	0.0038
Belgium	0.0038
Belize	0.0038
Benin	0.0038
Bermuda	0.0038
Bhutan	0.0038
Bolivia	0.0038
Bosnia and Herzegovina	0.0038
Botswana	0.0038
Brazil	0.0038
British Virgin Islands	0.0038
Brunei Darussalam	0.0038
Bulgaria	0.0038
Burkina Faso	0.0038
Burundi	0.0038
Cabo Verde	0.0038
Cambodia	0.0038
Cameroon	0.0038
Canada	0.0038
Caribbean small states	0.0038
Cayman Islands	0.0038
Central African Republic	0.0038
Central Europe and the Baltics	0.0038
Chad	0.0038
Channel Islands	0.0038
Chile	0.0038
China	0.0038
Colombia	0.0038
Comoros	0.0038
Congo, Dem. Rep.	0.0038
Congo, Rep.	0.0038
Costa Rica	0.0038
Cote d’Ivoire	0.0038
Croatia	0.0038
Cuba	0.0038
Curacao	0.0038
Cyprus	0.0038
Czech Republic	0.0038
Denmark	0.0038
Djibouti	0.0038
Dominica	0.0038
Dominican Republic	0.0038
Early-demographic dividend	0.0038
East Asia & Pacific	0.0038
East Asia & Pacific (excluding high income)	0.0038
East Asia & Pacific (IDA & IBRD countries)	0.0038
Ecuador	0.0038
Egypt, Arab Rep.	0.0038
El Salvador	0.0038
Equatorial Guinea	0.0038
Eritrea	0.0038
Estonia	0.0038
Eswatini	0.0038
Ethiopia	0.0038
Euro area	0.0038
Europe & Central Asia	0.0038
Europe & Central Asia (excluding high income)	0.0038
Europe & Central Asia (IDA & IBRD countries)	0.0038
European Union	0.0038
Faroe Islands	0.0038
Fiji	0.0038
Finland	0.0038
Fragile and conflict affected situations	0.0038
France	0.0038
French Polynesia	0.0038
Gabon	0.0038
Gambia, The	0.0038
Georgia	0.0038
Germany	0.0038
Ghana	0.0038
Gibraltar	0.0038
Greece	0.0038
Greenland	0.0038
Grenada	0.0038
Guam	0.0038
Guatemala	0.0038
Guinea	0.0038
Guinea-Bissau	0.0038
Guyana	0.0038
Haiti	0.0038
Heavily indebted poor countries (HIPC)	0.0038
High income	0.0038
Honduras	0.0038
Hong Kong SAR, China	0.0038
Hungary	0.0038
IBRD only	0.0038
Iceland	0.0038
IDA & IBRD total	0.0038
IDA blend	0.0038
IDA only	0.0038
IDA total	0.0038
India	0.0038
Indonesia	0.0038
Iran, Islamic Rep.	0.0038
Iraq	0.0038
Ireland	0.0038
Isle of Man	0.0038
Israel	0.0038
Italy	0.0038
Jamaica	0.0038
Japan	0.0038
Jordan	0.0038
Kazakhstan	0.0038
Kenya	0.0038
Kiribati	0.0038
Korea, Dem. People’s Rep.	0.0038
Korea, Rep.	0.0038
Kosovo	0.0038
Kuwait	0.0038
Kyrgyz Republic	0.0038
Lao PDR	0.0038
Late-demographic dividend	0.0038
Latin America & Caribbean	0.0038
Latin America & Caribbean (excluding high income)	0.0038
Latin America & the Caribbean (IDA & IBRD countries)	0.0038
Latvia	0.0038
Least developed countries: UN classification	0.0038
Lebanon	0.0038
Lesotho	0.0038
Liberia	0.0038
Libya	0.0038
Liechtenstein	0.0038
Lithuania	0.0038
Low & middle income	0.0038
Low income	0.0038
Lower middle income	0.0038
Luxembourg	0.0038
Macao SAR, China	0.0038
Madagascar	0.0038
Malawi	0.0038
Malaysia	0.0038
Maldives	0.0038
Mali	0.0038
Malta	0.0038
Marshall Islands	0.0038
Mauritania	0.0038
Mauritius	0.0038
Mexico	0.0038
Micronesia, Fed. Sts.	0.0038
Middle East & North Africa	0.0038
Middle East & North Africa (excluding high income)	0.0038
Middle East & North Africa (IDA & IBRD countries)	0.0038
Middle income	0.0038
Moldova	0.0038
Monaco	0.0038
Mongolia	0.0038
Montenegro	0.0038
Morocco	0.0038
Mozambique	0.0038
Myanmar	0.0038
Namibia	0.0038
Nauru	0.0038
Nepal	0.0038
Netherlands	0.0038
New Caledonia	0.0038
New Zealand	0.0038
Nicaragua	0.0038
Niger	0.0038
Nigeria	0.0038
North America	0.0038
North Macedonia	0.0038
Northern Mariana Islands	0.0038
Norway	0.0038
Not classified	0.0038
OECD members	0.0038
Oman	0.0038
Other small states	0.0038
Pacific island small states	0.0038
Pakistan	0.0038
Palau	0.0038
Panama	0.0038
Papua New Guinea	0.0038
Paraguay	0.0038
Peru	0.0038
Philippines	0.0038
Poland	0.0038
Portugal	0.0038
Post-demographic dividend	0.0038
Pre-demographic dividend	0.0038
Puerto Rico	0.0038
Qatar	0.0038
Romania	0.0038
Russian Federation	0.0038
Rwanda	0.0038
Samoa	0.0038
San Marino	0.0038
Sao Tome and Principe	0.0038
Saudi Arabia	0.0038
Senegal	0.0038
Serbia	0.0038
Seychelles	0.0038
Sierra Leone	0.0038
Singapore	0.0038
Sint Maarten (Dutch part)	0.0038
Slovak Republic	0.0038
Slovenia	0.0038
Small states	0.0038
Solomon Islands	0.0038
Somalia	0.0038
South Africa	0.0038
South Asia	0.0038
South Asia (IDA & IBRD)	0.0038
South Sudan	0.0038
Spain	0.0038
Sri Lanka	0.0038
St. Kitts and Nevis	0.0038
St. Lucia	0.0038
St. Martin (French part)	0.0038
St. Vincent and the Grenadines	0.0038
Sub-Saharan Africa	0.0038
Sub-Saharan Africa (excluding high income)	0.0038
Sub-Saharan Africa (IDA & IBRD countries)	0.0038
Sudan	0.0038
Suriname	0.0038
Sweden	0.0038
Switzerland	0.0038
Syrian Arab Republic	0.0038
Tajikistan	0.0038
Tanzania	0.0038
Thailand	0.0038
Timor-Leste	0.0038
Togo	0.0038
Tonga	0.0038
Trinidad and Tobago	0.0038
Tunisia	0.0038
Turkey	0.0038
Turkmenistan	0.0038
Turks and Caicos Islands	0.0038
Tuvalu	0.0038
Uganda	0.0038
Ukraine	0.0038
United Arab Emirates	0.0038
United Kingdom	0.0038
United States	0.0038
Upper middle income	0.0038
Uruguay	0.0038
Uzbekistan	0.0038
Vanuatu	0.0038
Venezuela, RB	0.0038
Vietnam	0.0038
Virgin Islands (U.S.)	0.0038
West Bank and Gaza	0.0038
World	0.0038
Yemen, Rep.	0.0038
Zambia	0.0038
Zimbabwe	0.0038

Usemos pipeline para ver obtener el mismo resultado de manera más eficiente

tprop<-table(wdi$country) %>% prop.table() %>% round(digits = 4) 
kable(tprop)

Var1	Freq
Afghanistan	0.0038
Albania	0.0038
Algeria	0.0038
American Samoa	0.0038
Andorra	0.0038
Angola	0.0038
Antigua and Barbuda	0.0038
Arab World	0.0038
Argentina	0.0038
Armenia	0.0038
Aruba	0.0038
Australia	0.0038
Austria	0.0038
Azerbaijan	0.0038
Bahamas, The	0.0038
Bahrain	0.0038
Bangladesh	0.0038
Barbados	0.0038
Belarus	0.0038
Belgium	0.0038
Belize	0.0038
Benin	0.0038
Bermuda	0.0038
Bhutan	0.0038
Bolivia	0.0038
Bosnia and Herzegovina	0.0038
Botswana	0.0038
Brazil	0.0038
British Virgin Islands	0.0038
Brunei Darussalam	0.0038
Bulgaria	0.0038
Burkina Faso	0.0038
Burundi	0.0038
Cabo Verde	0.0038
Cambodia	0.0038
Cameroon	0.0038
Canada	0.0038
Caribbean small states	0.0038
Cayman Islands	0.0038
Central African Republic	0.0038
Central Europe and the Baltics	0.0038
Chad	0.0038
Channel Islands	0.0038
Chile	0.0038
China	0.0038
Colombia	0.0038
Comoros	0.0038
Congo, Dem. Rep.	0.0038
Congo, Rep.	0.0038
Costa Rica	0.0038
Cote d’Ivoire	0.0038
Croatia	0.0038
Cuba	0.0038
Curacao	0.0038
Cyprus	0.0038
Czech Republic	0.0038
Denmark	0.0038
Djibouti	0.0038
Dominica	0.0038
Dominican Republic	0.0038
Early-demographic dividend	0.0038
East Asia & Pacific	0.0038
East Asia & Pacific (excluding high income)	0.0038
East Asia & Pacific (IDA & IBRD countries)	0.0038
Ecuador	0.0038
Egypt, Arab Rep.	0.0038
El Salvador	0.0038
Equatorial Guinea	0.0038
Eritrea	0.0038
Estonia	0.0038
Eswatini	0.0038
Ethiopia	0.0038
Euro area	0.0038
Europe & Central Asia	0.0038
Europe & Central Asia (excluding high income)	0.0038
Europe & Central Asia (IDA & IBRD countries)	0.0038
European Union	0.0038
Faroe Islands	0.0038
Fiji	0.0038
Finland	0.0038
Fragile and conflict affected situations	0.0038
France	0.0038
French Polynesia	0.0038
Gabon	0.0038
Gambia, The	0.0038
Georgia	0.0038
Germany	0.0038
Ghana	0.0038
Gibraltar	0.0038
Greece	0.0038
Greenland	0.0038
Grenada	0.0038
Guam	0.0038
Guatemala	0.0038
Guinea	0.0038
Guinea-Bissau	0.0038
Guyana	0.0038
Haiti	0.0038
Heavily indebted poor countries (HIPC)	0.0038
High income	0.0038
Honduras	0.0038
Hong Kong SAR, China	0.0038
Hungary	0.0038
IBRD only	0.0038
Iceland	0.0038
IDA & IBRD total	0.0038
IDA blend	0.0038
IDA only	0.0038
IDA total	0.0038
India	0.0038
Indonesia	0.0038
Iran, Islamic Rep.	0.0038
Iraq	0.0038
Ireland	0.0038
Isle of Man	0.0038
Israel	0.0038
Italy	0.0038
Jamaica	0.0038
Japan	0.0038
Jordan	0.0038
Kazakhstan	0.0038
Kenya	0.0038
Kiribati	0.0038
Korea, Dem. People’s Rep.	0.0038
Korea, Rep.	0.0038
Kosovo	0.0038
Kuwait	0.0038
Kyrgyz Republic	0.0038
Lao PDR	0.0038
Late-demographic dividend	0.0038
Latin America & Caribbean	0.0038
Latin America & Caribbean (excluding high income)	0.0038
Latin America & the Caribbean (IDA & IBRD countries)	0.0038
Latvia	0.0038
Least developed countries: UN classification	0.0038
Lebanon	0.0038
Lesotho	0.0038
Liberia	0.0038
Libya	0.0038
Liechtenstein	0.0038
Lithuania	0.0038
Low & middle income	0.0038
Low income	0.0038
Lower middle income	0.0038
Luxembourg	0.0038
Macao SAR, China	0.0038
Madagascar	0.0038
Malawi	0.0038
Malaysia	0.0038
Maldives	0.0038
Mali	0.0038
Malta	0.0038
Marshall Islands	0.0038
Mauritania	0.0038
Mauritius	0.0038
Mexico	0.0038
Micronesia, Fed. Sts.	0.0038
Middle East & North Africa	0.0038
Middle East & North Africa (excluding high income)	0.0038
Middle East & North Africa (IDA & IBRD countries)	0.0038
Middle income	0.0038
Moldova	0.0038
Monaco	0.0038
Mongolia	0.0038
Montenegro	0.0038
Morocco	0.0038
Mozambique	0.0038
Myanmar	0.0038
Namibia	0.0038
Nauru	0.0038
Nepal	0.0038
Netherlands	0.0038
New Caledonia	0.0038
New Zealand	0.0038
Nicaragua	0.0038
Niger	0.0038
Nigeria	0.0038
North America	0.0038
North Macedonia	0.0038
Northern Mariana Islands	0.0038
Norway	0.0038
Not classified	0.0038
OECD members	0.0038
Oman	0.0038
Other small states	0.0038
Pacific island small states	0.0038
Pakistan	0.0038
Palau	0.0038
Panama	0.0038
Papua New Guinea	0.0038
Paraguay	0.0038
Peru	0.0038
Philippines	0.0038
Poland	0.0038
Portugal	0.0038
Post-demographic dividend	0.0038
Pre-demographic dividend	0.0038
Puerto Rico	0.0038
Qatar	0.0038
Romania	0.0038
Russian Federation	0.0038
Rwanda	0.0038
Samoa	0.0038
San Marino	0.0038
Sao Tome and Principe	0.0038
Saudi Arabia	0.0038
Senegal	0.0038
Serbia	0.0038
Seychelles	0.0038
Sierra Leone	0.0038
Singapore	0.0038
Sint Maarten (Dutch part)	0.0038
Slovak Republic	0.0038
Slovenia	0.0038
Small states	0.0038
Solomon Islands	0.0038
Somalia	0.0038
South Africa	0.0038
South Asia	0.0038
South Asia (IDA & IBRD)	0.0038
South Sudan	0.0038
Spain	0.0038
Sri Lanka	0.0038
St. Kitts and Nevis	0.0038
St. Lucia	0.0038
St. Martin (French part)	0.0038
St. Vincent and the Grenadines	0.0038
Sub-Saharan Africa	0.0038
Sub-Saharan Africa (excluding high income)	0.0038
Sub-Saharan Africa (IDA & IBRD countries)	0.0038
Sudan	0.0038
Suriname	0.0038
Sweden	0.0038
Switzerland	0.0038
Syrian Arab Republic	0.0038
Tajikistan	0.0038
Tanzania	0.0038
Thailand	0.0038
Timor-Leste	0.0038
Togo	0.0038
Tonga	0.0038
Trinidad and Tobago	0.0038
Tunisia	0.0038
Turkey	0.0038
Turkmenistan	0.0038
Turks and Caicos Islands	0.0038
Tuvalu	0.0038
Uganda	0.0038
Ukraine	0.0038
United Arab Emirates	0.0038
United Kingdom	0.0038
United States	0.0038
Upper middle income	0.0038
Uruguay	0.0038
Uzbekistan	0.0038
Vanuatu	0.0038
Venezuela, RB	0.0038
Vietnam	0.0038
Virgin Islands (U.S.)	0.0038
West Bank and Gaza	0.0038
World	0.0038
Yemen, Rep.	0.0038
Zambia	0.0038
Zimbabwe	0.0038

Estadística descriptiva básica en variables continuas

Número de observaciones El número de observaciones es el total de datos-na-null.

length(wdi$le_women) #número de observaciones con NA

## [1] 14520

is.na(wdi$le_women) %>% sum()  #número de valores NA

## [1] 1258

is.null(wdi$le_women) %>% sum() #número de valores null

## [1] 0

length(wdi$le_women) - is.na(wdi$le_women) %>% sum() #total valores

## [1] 13262

El número de variables y observaciones usando dim()

dim(wdi) #número de observaciones y variables

## [1] 14520     5

Detectar valores NA Algunos comandos no funcionan cuando tienen NA. Se debe detectar los valores NA usando la función is.na() y sumando los valores lógicos con sum

is.na(wdi$le_women) %>% sum()

## [1] 1258

La variable le_women tiene valores NA por locual usaremos el argumento na.rm=TRUE en todas las funciones de estadística descriptiva, de lo contrario tendremos error.

Media

mean(wdi$le_women, na.rm = TRUE)

## [1] 65.57062

Desviación Estándar

sd(wdi$le_women,na.rm = TRUE)

## [1] 11.81371

Varianza

var(wdi$le_women,na.rm = TRUE)

## [1] 139.5638

Mínimo

min(wdi$le_women,na.rm = TRUE)

## [1] 22.394

Máximo

max(wdi$le_women,na.rm = TRUE)

## [1] 86.9

Mediana

median(wdi$le_women,na.rm = TRUE)

## [1] 68.427

Rango

range(wdi$le_women,na.rm = TRUE)

## [1] 22.394 86.900

Quintiles

quantile(wdi$le_women,na.rm = TRUE)

##       0%      25%      50%      75%     100% 
## 22.39400 56.55975 68.42700 74.84500 86.90000

Función Summary La función summary otorga el valor mínimo, máximo, quartil 1, quartil 2 y NA.

summary(wdi$le_women) #quintiles

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   22.39   56.56   68.43   65.57   74.84   86.90    1258

Rango Interquartil

IQR(wdi$le_women,na.rm = TRUE)

## [1] 18.28525

Visualizemos como está distribuida la variable._

ggplot(wdi) + geom_boxplot(aes(x=le_women)) #caja y bigotes

## Warning: Removed 1258 rows containing non-finite values (stat_boxplot).

Podemos obtener todas las estadísticas descriptivas relevantes en un solo tibble/dataframe usando describe()

describe(wdi$le_women,na.rm = TRUE)

##    vars     n  mean    sd median trimmed   mad   min  max range  skew kurtosis
## X1    1 13262 65.57 11.81  68.43    66.4 11.88 22.39 86.9 64.51 -0.58    -0.58
##     se
## X1 0.1

Si queremos obtener kurtosis o skewness debemos tener menos de 5000 observaciones, usamos el argumento extra norm=TRUE.

Función select La función select permite seleccionar ciertas columans de un dataframe/tibble. La estructura es data, var1,var,2

select(wdi,le_women,le_men)

## # A tibble: 14,520 x 2
##    le_women le_men
##       <dbl>  <dbl>
##  1     47.6   45.5
##  2     48.2   46.1
##  3     48.8   46.7
##  4     49.4   47.3
##  5     50.0   47.8
##  6     50.6   48.4
##  7     51.2   49.0
##  8     51.8   49.5
##  9     52.4   50.0
## 10     52.9   50.5
## # … with 14,510 more rows

wdi %>% select(le_women,le_men)

## # A tibble: 14,520 x 2
##    le_women le_men
##       <dbl>  <dbl>
##  1     47.6   45.5
##  2     48.2   46.1
##  3     48.8   46.7
##  4     49.4   47.3
##  5     50.0   47.8
##  6     50.6   48.4
##  7     51.2   49.0
##  8     51.8   49.5
##  9     52.4   50.0
## 10     52.9   50.5
## # … with 14,510 more rows

Ahora queremos borrar una columna del dataframe.

wdi<-wdi %>% select(-iso2c)
wdi

## # A tibble: 14,520 x 4
##    country     year le_women le_men
##    <chr>      <int>    <dbl>  <dbl>
##  1 Arab World  1960     47.6   45.5
##  2 Arab World  1961     48.2   46.1
##  3 Arab World  1962     48.8   46.7
##  4 Arab World  1963     49.4   47.3
##  5 Arab World  1964     50.0   47.8
##  6 Arab World  1965     50.6   48.4
##  7 Arab World  1966     51.2   49.0
##  8 Arab World  1967     51.8   49.5
##  9 Arab World  1968     52.4   50.0
## 10 Arab World  1969     52.9   50.5
## # … with 14,510 more rows

Podemos seleccionar con la misma lógica un data frame/tibble. Que contenga todas menos ciertas columnas.

wdi %>% select(-year)

## # A tibble: 14,520 x 3
##    country    le_women le_men
##    <chr>         <dbl>  <dbl>
##  1 Arab World     47.6   45.5
##  2 Arab World     48.2   46.1
##  3 Arab World     48.8   46.7
##  4 Arab World     49.4   47.3
##  5 Arab World     50.0   47.8
##  6 Arab World     50.6   48.4
##  7 Arab World     51.2   49.0
##  8 Arab World     51.8   49.5
##  9 Arab World     52.4   50.0
## 10 Arab World     52.9   50.5
## # … with 14,510 more rows

Obtengamos las estadísticas descriptivas de estas variables usando describe. Select igual permite seleccionar basados en distintos criterios como seleccionar columnas que comienzen con questio_ o que tengan una secuencia por ejemplo wti00.. Para esto hay funciones como starts_with o ends_with. Más información en help(select)

wdi %>% select(le_women,le_men) %>% stat.desc() %>% round(digits = 2)

##               le_women    le_men
## nbr.val       13262.00  13262.00
## nbr.null          0.00      0.00
## nbr.na         1258.00   1258.00
## min              22.39     16.29
## max              86.90     84.10
## range            64.51     67.81
## sum          869597.52 808686.36
## median           68.43     63.41
## mean             65.57     60.98
## SE.mean           0.10      0.09
## CI.mean.0.95      0.20      0.18
## var             139.56    113.60
## std.dev          11.81     10.66
## coef.var          0.18      0.17

Si queremos obtener está tabla en latex para poder usarla en nuestros trabajos usamos la función xtable()

select(wdi,le_women) %>%stat.desc() %>%  round(digits = 2) %>% xtable()

## % latex table generated in R 4.0.2 by xtable 1.8-4 package
## % Fri Aug  7 19:28:14 2020
## \begin{table}[ht]
## \centering
## \begin{tabular}{rr}
##   \hline
##  & le\_women \\ 
##   \hline
## nbr.val & 13262.00 \\ 
##   nbr.null & 0.00 \\ 
##   nbr.na & 1258.00 \\ 
##   min & 22.39 \\ 
##   max & 86.90 \\ 
##   range & 64.51 \\ 
##   sum & 869597.52 \\ 
##   median & 68.43 \\ 
##   mean & 65.57 \\ 
##   SE.mean & 0.10 \\ 
##   CI.mean.0.95 & 0.20 \\ 
##   var & 139.56 \\ 
##   std.dev & 11.81 \\ 
##   coef.var & 0.18 \\ 
##    \hline
## \end{tabular}
## \end{table}

Esto copiamos y pegamos en un latex y listo.

Función filter Muchas veces no queremos toda la base solo cierta parte de ella. Usaremos filter para crear un subset. La lógica es data,condición lógica. Obtengamos los datos de Colombia, Ecuador y Perú.

wdi %>% filter(country=="Colombia" | country=="Peru" | country=="Ecuador")

## # A tibble: 165 x 4
##    country   year le_women le_men
##    <chr>    <int>    <dbl>  <dbl>
##  1 Colombia  1960     59.4   55.2
##  2 Colombia  1961     59.9   55.8
##  3 Colombia  1962     60.4   56.3
##  4 Colombia  1963     60.9   56.8
##  5 Colombia  1964     61.4   57.3
##  6 Colombia  1965     61.8   57.7
##  7 Colombia  1966     62.3   58.2
##  8 Colombia  1967     62.8   58.7
##  9 Colombia  1968     63.3   59.1
## 10 Colombia  1969     63.8   59.6
## # … with 155 more rows

wdi %>% filter(year>1990 & country=="Argentina")

## # A tibble: 24 x 4
##    country    year le_women le_men
##    <chr>     <int>    <dbl>  <dbl>
##  1 Argentina  1991     75.3   68.4
##  2 Argentina  1992     75.5   68.6
##  3 Argentina  1993     75.7   68.8
##  4 Argentina  1994     75.9   69.0
##  5 Argentina  1995     76.1   69.2
##  6 Argentina  1996     76.3   69.4
##  7 Argentina  1997     76.5   69.6
##  8 Argentina  1998     76.6   69.8
##  9 Argentina  1999     76.8   69.9
## 10 Argentina  2000     77.0   70.1
## # … with 14 more rows

wdi %>% filter(year>1990 & country=="Argentina")

## # A tibble: 24 x 4
##    country    year le_women le_men
##    <chr>     <int>    <dbl>  <dbl>
##  1 Argentina  1991     75.3   68.4
##  2 Argentina  1992     75.5   68.6
##  3 Argentina  1993     75.7   68.8
##  4 Argentina  1994     75.9   69.0
##  5 Argentina  1995     76.1   69.2
##  6 Argentina  1996     76.3   69.4
##  7 Argentina  1997     76.5   69.6
##  8 Argentina  1998     76.6   69.8
##  9 Argentina  1999     76.8   69.9
## 10 Argentina  2000     77.0   70.1
## # … with 14 more rows

Función arrange Arrange permite ordenar según una variable todo el dataframe/tible. El default es ordenar de forma ascendente, usar la función desc dentro de arrange para tener el orden descendente.

wdi %>% arrange(desc(year))

## # A tibble: 14,520 x 4
##    country                                        year le_women le_men
##    <chr>                                         <int>    <dbl>  <dbl>
##  1 Arab World                                     2014     73.0   69.3
##  2 World                                          2014     74.1   69.6
##  3 East Asia & Pacific (excluding high income)    2014     76.8   71.9
##  4 Europe & Central Asia (excluding high income)  2014     76.6   68.1
##  5 South Asia                                     2014     69.6   67.2
##  6 Andorra                                        2014     NA     NA  
##  7 United Arab Emirates                           2014     78.5   76.4
##  8 Afghanistan                                    2014     64.5   61.6
##  9 Antigua and Barbuda                            2014     77.6   75.1
## 10 Albania                                        2014     80.0   75.7
## # … with 14,510 more rows

Función mutate Cuando queremos agregar una columna extra al dataframe usarmos la siguiente forma genérica.

wdi$half_le_women<-wdi$le_women/2
wdi

## # A tibble: 14,520 x 5
##    country     year le_women le_men half_le_women
##    <chr>      <int>    <dbl>  <dbl>         <dbl>
##  1 Arab World  1960     47.6   45.5          23.8
##  2 Arab World  1961     48.2   46.1          24.1
##  3 Arab World  1962     48.8   46.7          24.4
##  4 Arab World  1963     49.4   47.3          24.7
##  5 Arab World  1964     50.0   47.8          25.0
##  6 Arab World  1965     50.6   48.4          25.3
##  7 Arab World  1966     51.2   49.0          25.6
##  8 Arab World  1967     51.8   49.5          25.9
##  9 Arab World  1968     52.4   50.0          26.2
## 10 Arab World  1969     52.9   50.5          26.5
## # … with 14,510 more rows

Está forma es ineficiente. Es mejor usar la función mutate para hacer lo mismo.

wdi %>% mutate(half_le_women_2=le_women/2)

## # A tibble: 14,520 x 6
##    country     year le_women le_men half_le_women half_le_women_2
##    <chr>      <int>    <dbl>  <dbl>         <dbl>           <dbl>
##  1 Arab World  1960     47.6   45.5          23.8            23.8
##  2 Arab World  1961     48.2   46.1          24.1            24.1
##  3 Arab World  1962     48.8   46.7          24.4            24.4
##  4 Arab World  1963     49.4   47.3          24.7            24.7
##  5 Arab World  1964     50.0   47.8          25.0            25.0
##  6 Arab World  1965     50.6   48.4          25.3            25.3
##  7 Arab World  1966     51.2   49.0          25.6            25.6
##  8 Arab World  1967     51.8   49.5          25.9            25.9
##  9 Arab World  1968     52.4   50.0          26.2            26.2
## 10 Arab World  1969     52.9   50.5          26.5            26.5
## # … with 14,510 more rows

Función group by Esta función me permite realizar operaciones por grupo. Por ejemplo, obetener la media de la esperanza de vida de hombres por país.

wdi %>% group_by(country) %>% mutate(mean_country=mean(le_men))

## # A tibble: 14,520 x 6
## # Groups:   country [264]
##    country     year le_women le_men half_le_women mean_country
##    <chr>      <int>    <dbl>  <dbl>         <dbl>        <dbl>
##  1 Arab World  1960     47.6   45.5          23.8         59.5
##  2 Arab World  1961     48.2   46.1          24.1         59.5
##  3 Arab World  1962     48.8   46.7          24.4         59.5
##  4 Arab World  1963     49.4   47.3          24.7         59.5
##  5 Arab World  1964     50.0   47.8          25.0         59.5
##  6 Arab World  1965     50.6   48.4          25.3         59.5
##  7 Arab World  1966     51.2   49.0          25.6         59.5
##  8 Arab World  1967     51.8   49.5          25.9         59.5
##  9 Arab World  1968     52.4   50.0          26.2         59.5
## 10 Arab World  1969     52.9   50.5          26.5         59.5
## # … with 14,510 more rows

Función summarise Muchas veces no queremos una nueva columna con la operación, sino una tabla con los valores.

wdi %>% group_by(country) %>% summarise(n=n(),
    mean_le_men=mean(le_men,na.rm=TRUE),
    sd_le_men=sd(le_men,na.rm=TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 264 x 4
##    country                 n mean_le_men sd_le_men
##    <chr>               <int>       <dbl>     <dbl>
##  1 Afghanistan            55        46.8      9.26
##  2 Albania                55        69.0      3.43
##  3 Algeria                55        61.2      9.76
##  4 American Samoa         55       NaN       NA   
##  5 Andorra                55       NaN       NA   
##  6 Angola                 55        43.8      4.84
##  7 Antigua and Barbuda    55        68.7      4.27
##  8 Arab World             55        59.5      7.47
##  9 Argentina              55        67.3      3.30
## 10 Armenia                55        67.1      1.98
## # … with 254 more rows