EXAMEN INTEGRAL

Adrian Risco Chang_20165898

Pregunta 1: Realice Análisis Factorial para el Índice de Desarrollo Humano e interprete sus resultados. Complemente sus resultados con un mapa de similitudes de las variables que componen el índice.

1. Cargamos y limpiamos la data

Se carga IDH tomando como referencia lo realizado en el proyecto aplicativo

library(htmltab)
link1="http://hdr.undp.org/en/composite/HDI"

hdi<- htmltab(doc =link1,
               which ='/html/body/div[2]/div/section/div/div/div/div/div/table',
               encoding = "UTF-8")

## Warning: Columns [V7,V7,V7,V7,V7,V7,V7] seem to have no data and are
## removed. Use rm_nodata_cols = F to suppress this behavior

hdiA=hdi #creo un objeto mas por seguridad

Exploramos la data para ver como se comportan las variables

head(hdi)

##    HDI rank Table 1. Human Development Index and its components >> Country
## 6      <NA>                                    VERY HIGH HUMAN DEVELOPMENT
## 7         1                                                         Norway
## 8         2                                                    Switzerland
## 9         3                                                      Australia
## 10        4                                                        Ireland
## 11        5                                                        Germany
##    Human Development Index (HDI) >> Value >> 2017
## 6                     VERY HIGH HUMAN DEVELOPMENT
## 7                                           0.953
## 8                                           0.944
## 9                                           0.939
## 10                                          0.938
## 11                                          0.936
##    SDG 3 >> Life expectancy at birth >> (years) >> 2017
## 6                           VERY HIGH HUMAN DEVELOPMENT
## 7                                                  82.3
## 8                                                  83.5
## 9                                                  83.1
## 10                                                 81.6
## 11                                                 81.2
##    SDG 4.3 >> Expected years of schooling >> (years) >> 2017
## 6                                VERY HIGH HUMAN DEVELOPMENT
## 7                                                       17.9
## 8                                                       16.2
## 9                                                       22.9
## 10                                                      19.6
## 11                                                      17.0
##    SDG 4.6 >> Mean years of schooling >> (years) >> 2017
## 6                            VERY HIGH HUMAN DEVELOPMENT
## 7                                                   12.6
## 8                                                   13.4
## 9                                                   12.9
## 10                                                  12.5
## 11                                                  14.1
##    SDG 8.5 >> Gross national income (GNI) per capita >> (2011 PPP $) >> 2017
## 6                                                VERY HIGH HUMAN DEVELOPMENT
## 7                                                                     68,012
## 8                                                                     57,625
## 9                                                                     43,560
## 10                                                                    53,754
## 11                                                                    46,136
##    GNI per capita rank minus HDI rank >> 2017            HDI rank >> 2016
## 6                 VERY HIGH HUMAN DEVELOPMENT VERY HIGH HUMAN DEVELOPMENT
## 7                                           5                           1
## 8                                           8                           2
## 9                                          18                           3
## 10                                          8                           4
## 11                                         13                           4

tail(hdi)

##     HDI rank
## 220     <NA>
## 221     <NA>
## 223     <NA>
## 224     <NA>
## 226     <NA>
## 228     <NA>
##     Table 1. Human Development Index and its components >> Country
## 220                                                     South Asia
## 221                                             Sub-Saharan Africa
## 223                                      Least developed countries
## 224                                 Small island developing states
## 226         Organisation for Economic Co-operation and Development
## 228                                                          World
##     Human Development Index (HDI) >> Value >> 2017
## 220                                          0.638
## 221                                          0.537
## 223                                          0.524
## 224                                          0.722
## 226                                          0.895
## 228                                          0.728
##     SDG 3 >> Life expectancy at birth >> (years) >> 2017
## 220                                                 <NA>
## 221                                                 <NA>
## 223                                                 <NA>
## 224                                                 <NA>
## 226                                                 <NA>
## 228                                                 <NA>
##     SDG 4.3 >> Expected years of schooling >> (years) >> 2017
## 220                                                      <NA>
## 221                                                      <NA>
## 223                                                      <NA>
## 224                                                      <NA>
## 226                                                      <NA>
## 228                                                      <NA>
##     SDG 4.6 >> Mean years of schooling >> (years) >> 2017
## 220                                                  <NA>
## 221                                                  <NA>
## 223                                                  <NA>
## 224                                                  <NA>
## 226                                                  <NA>
## 228                                                  <NA>
##     SDG 8.5 >> Gross national income (GNI) per capita >> (2011 PPP $) >> 2017
## 220                                                                      <NA>
## 221                                                                      <NA>
## 223                                                                      <NA>
## 224                                                                      <NA>
## 226                                                                      <NA>
## 228                                                                      <NA>
##     GNI per capita rank minus HDI rank >> 2017 HDI rank >> 2016
## 220                                       <NA>             <NA>
## 221                                       <NA>             <NA>
## 223                                       <NA>             <NA>
## 224                                       <NA>             <NA>
## 226                                       <NA>             <NA>
## 228                                       <NA>             <NA>

names(hdi)

## [1] "HDI rank"                                                                 
## [2] "Table 1. Human Development Index and its components >> Country"           
## [3] "Human Development Index (HDI) >> Value >> 2017"                           
## [4] "SDG 3 >> Life expectancy at birth >> (years) >> 2017"                     
## [5] "SDG 4.3 >> Expected years of schooling >> (years) >> 2017"                
## [6] "SDG 4.6 >> Mean years of schooling >> (years) >> 2017"                    
## [7] "SDG 8.5 >> Gross national income (GNI) per capita >> (2011 PPP $) >> 2017"
## [8] "GNI per capita rank minus HDI rank >> 2017"                               
## [9] "HDI rank >> 2016"

str(hdi)

## 'data.frame':    217 obs. of  9 variables:
##  $ HDI rank                                                                 : chr  NA "1" "2" "3" ...
##  $ Table 1. Human Development Index and its components >> Country           : chr  "VERY HIGH HUMAN DEVELOPMENT" "Norway" "Switzerland" "Australia" ...
##  $ Human Development Index (HDI) >> Value >> 2017                           : chr  "VERY HIGH HUMAN DEVELOPMENT" "0.953" "0.944" "0.939" ...
##  $ SDG 3 >> Life expectancy at birth >> (years) >> 2017                     : chr  "VERY HIGH HUMAN DEVELOPMENT" "82.3" "83.5" "83.1" ...
##  $ SDG 4.3 >> Expected years of schooling >> (years) >> 2017                : chr  "VERY HIGH HUMAN DEVELOPMENT" "17.9" "16.2" "22.9" ...
##  $ SDG 4.6 >> Mean years of schooling >> (years) >> 2017                    : chr  "VERY HIGH HUMAN DEVELOPMENT" "12.6" "13.4" "12.9" ...
##  $ SDG 8.5 >> Gross national income (GNI) per capita >> (2011 PPP $) >> 2017: chr  "VERY HIGH HUMAN DEVELOPMENT" "68,012" "57,625" "43,560" ...
##  $ GNI per capita rank minus HDI rank >> 2017                               : chr  "VERY HIGH HUMAN DEVELOPMENT" "5" "8" "18" ...
##  $ HDI rank >> 2016                                                         : chr  "VERY HIGH HUMAN DEVELOPMENT" "1" "2" "3" ...

Luego,se cambian los nombres de las variables

newNames=c('rank','country','hdi','lifExp','expecSchool','timeSchool','income','null1','rankold')
names(hdi)=newNames
head(hdi)

##    rank                     country                         hdi
## 6  <NA> VERY HIGH HUMAN DEVELOPMENT VERY HIGH HUMAN DEVELOPMENT
## 7     1                      Norway                       0.953
## 8     2                 Switzerland                       0.944
## 9     3                   Australia                       0.939
## 10    4                     Ireland                       0.938
## 11    5                     Germany                       0.936
##                         lifExp                 expecSchool
## 6  VERY HIGH HUMAN DEVELOPMENT VERY HIGH HUMAN DEVELOPMENT
## 7                         82.3                        17.9
## 8                         83.5                        16.2
## 9                         83.1                        22.9
## 10                        81.6                        19.6
## 11                        81.2                        17.0
##                     timeSchool                      income
## 6  VERY HIGH HUMAN DEVELOPMENT VERY HIGH HUMAN DEVELOPMENT
## 7                         12.6                      68,012
## 8                         13.4                      57,625
## 9                         12.9                      43,560
## 10                        12.5                      53,754
## 11                        14.1                      46,136
##                          null1                     rankold
## 6  VERY HIGH HUMAN DEVELOPMENT VERY HIGH HUMAN DEVELOPMENT
## 7                            5                           1
## 8                            8                           2
## 9                           18                           3
## 10                           8                           4
## 11                          13                           4

Empezamos a limpiar la data

#eliminamos todas las filas que no son paises
hdi[is.na(hdi$rank),]

##     rank                                                country
## 6   <NA>                            VERY HIGH HUMAN DEVELOPMENT
## 66  <NA>                                 HIGH HUMAN DEVELOPMENT
## 120 <NA>                               MEDIUM HUMAN DEVELOPMENT
## 160 <NA>                                  LOW HUMAN DEVELOPMENT
## 199 <NA>                         OTHER COUNTRIES OR TERRITORIES
## 207 <NA>                               Human development groups
## 208 <NA>                            Very high human development
## 209 <NA>                                 High human development
## 210 <NA>                               Medium human development
## 211 <NA>                                  Low human development
## 213 <NA>                                   Developing countries
## 215 <NA>                                                Regions
## 216 <NA>                                            Arab States
## 217 <NA>                              East Asia and the Pacific
## 218 <NA>                                Europe and Central Asia
## 219 <NA>                        Latin America and the Caribbean
## 220 <NA>                                             South Asia
## 221 <NA>                                     Sub-Saharan Africa
## 223 <NA>                              Least developed countries
## 224 <NA>                         Small island developing states
## 226 <NA> Organisation for Economic Co-operation and Development
## 228 <NA>                                                  World
##                                hdi                         lifExp
## 6      VERY HIGH HUMAN DEVELOPMENT    VERY HIGH HUMAN DEVELOPMENT
## 66          HIGH HUMAN DEVELOPMENT         HIGH HUMAN DEVELOPMENT
## 120       MEDIUM HUMAN DEVELOPMENT       MEDIUM HUMAN DEVELOPMENT
## 160          LOW HUMAN DEVELOPMENT          LOW HUMAN DEVELOPMENT
## 199 OTHER COUNTRIES OR TERRITORIES OTHER COUNTRIES OR TERRITORIES
## 207                           <NA>                           <NA>
## 208                          0.894                           <NA>
## 209                          0.757                           <NA>
## 210                          0.645                           <NA>
## 211                          0.504                           <NA>
## 213                          0.681                           <NA>
## 215                           <NA>                           <NA>
## 216                          0.699                           <NA>
## 217                          0.733                           <NA>
## 218                          0.771                           <NA>
## 219                          0.758                           <NA>
## 220                          0.638                           <NA>
## 221                          0.537                           <NA>
## 223                          0.524                           <NA>
## 224                          0.722                           <NA>
## 226                          0.895                           <NA>
## 228                          0.728                           <NA>
##                        expecSchool                     timeSchool
## 6      VERY HIGH HUMAN DEVELOPMENT    VERY HIGH HUMAN DEVELOPMENT
## 66          HIGH HUMAN DEVELOPMENT         HIGH HUMAN DEVELOPMENT
## 120       MEDIUM HUMAN DEVELOPMENT       MEDIUM HUMAN DEVELOPMENT
## 160          LOW HUMAN DEVELOPMENT          LOW HUMAN DEVELOPMENT
## 199 OTHER COUNTRIES OR TERRITORIES OTHER COUNTRIES OR TERRITORIES
## 207                           <NA>                           <NA>
## 208                           <NA>                           <NA>
## 209                           <NA>                           <NA>
## 210                           <NA>                           <NA>
## 211                           <NA>                           <NA>
## 213                           <NA>                           <NA>
## 215                           <NA>                           <NA>
## 216                           <NA>                           <NA>
## 217                           <NA>                           <NA>
## 218                           <NA>                           <NA>
## 219                           <NA>                           <NA>
## 220                           <NA>                           <NA>
## 221                           <NA>                           <NA>
## 223                           <NA>                           <NA>
## 224                           <NA>                           <NA>
## 226                           <NA>                           <NA>
## 228                           <NA>                           <NA>
##                             income                          null1
## 6      VERY HIGH HUMAN DEVELOPMENT    VERY HIGH HUMAN DEVELOPMENT
## 66          HIGH HUMAN DEVELOPMENT         HIGH HUMAN DEVELOPMENT
## 120       MEDIUM HUMAN DEVELOPMENT       MEDIUM HUMAN DEVELOPMENT
## 160          LOW HUMAN DEVELOPMENT          LOW HUMAN DEVELOPMENT
## 199 OTHER COUNTRIES OR TERRITORIES OTHER COUNTRIES OR TERRITORIES
## 207                           <NA>                           <NA>
## 208                           <NA>                           <NA>
## 209                           <NA>                           <NA>
## 210                           <NA>                           <NA>
## 211                           <NA>                           <NA>
## 213                           <NA>                           <NA>
## 215                           <NA>                           <NA>
## 216                           <NA>                           <NA>
## 217                           <NA>                           <NA>
## 218                           <NA>                           <NA>
## 219                           <NA>                           <NA>
## 220                           <NA>                           <NA>
## 221                           <NA>                           <NA>
## 223                           <NA>                           <NA>
## 224                           <NA>                           <NA>
## 226                           <NA>                           <NA>
## 228                           <NA>                           <NA>
##                            rankold
## 6      VERY HIGH HUMAN DEVELOPMENT
## 66          HIGH HUMAN DEVELOPMENT
## 120       MEDIUM HUMAN DEVELOPMENT
## 160          LOW HUMAN DEVELOPMENT
## 199 OTHER COUNTRIES OR TERRITORIES
## 207                           <NA>
## 208                           <NA>
## 209                           <NA>
## 210                           <NA>
## 211                           <NA>
## 213                           <NA>
## 215                           <NA>
## 216                           <NA>
## 217                           <NA>
## 218                           <NA>
## 219                           <NA>
## 220                           <NA>
## 221                           <NA>
## 223                           <NA>
## 224                           <NA>
## 226                           <NA>
## 228                           <NA>

hdi=hdi[!is.na(hdi$rank),]

library(naniar)
hdi=replace_with_na_all(hdi,condition = ~.x == '..') #con este codigo reemplazamos todos los valores extranos por NAs
hdi$income=gsub(',','',hdi$income) #eliminamos las comas para luego poder convertir a numericos

Vemos como va quedando la data

str(hdi)

## Classes 'tbl_df', 'tbl' and 'data.frame':    195 obs. of  9 variables:
##  $ rank       : chr  "1" "2" "3" "4" ...
##  $ country    : chr  "Norway" "Switzerland" "Australia" "Ireland" ...
##  $ hdi        : chr  "0.953" "0.944" "0.939" "0.938" ...
##  $ lifExp     : chr  "82.3" "83.5" "83.1" "81.6" ...
##  $ expecSchool: chr  "17.9" "16.2" "22.9" "19.6" ...
##  $ timeSchool : chr  "12.6" "13.4" "12.9" "12.5" ...
##  $ income     : chr  "68012" "57625" "43560" "53754" ...
##  $ null1      : chr  "5" "8" "18" "8" ...
##  $ rankold    : chr  "1" "2" "3" "4" ...

head(hdi)

## # A tibble: 6 x 9
##   rank  country    hdi   lifExp expecSchool timeSchool income null1 rankold
##   <chr> <chr>      <chr> <chr>  <chr>       <chr>      <chr>  <chr> <chr>  
## 1 1     Norway     0.953 82.3   17.9        12.6       68012  5     1      
## 2 2     Switzerla… 0.944 83.5   16.2        13.4       57625  8     2      
## 3 3     Australia  0.939 83.1   22.9        12.9       43560  18    3      
## 4 4     Ireland    0.938 81.6   19.6        12.5       53754  8     4      
## 5 5     Germany    0.936 81.2   17.0        14.1       46136  13    4      
## 6 6     Iceland    0.935 82.9   19.3        12.4       45810  13    6

names(hdi)

## [1] "rank"        "country"     "hdi"         "lifExp"      "expecSchool"
## [6] "timeSchool"  "income"      "null1"       "rankold"

#Eliminamos las columnas innecesarias y convertimos a numerico
hdi[,c(1,8,9)]=NULL
hdi[,c(2:6)]=lapply(hdi[,c(2:6)],as.numeric)
str(hdi)

## Classes 'tbl_df', 'tbl' and 'data.frame':    195 obs. of  6 variables:
##  $ country    : chr  "Norway" "Switzerland" "Australia" "Ireland" ...
##  $ hdi        : num  0.953 0.944 0.939 0.938 0.936 0.935 0.933 0.933 0.932 0.931 ...
##  $ lifExp     : num  82.3 83.5 83.1 81.6 81.2 82.9 84.1 82.6 83.2 82 ...
##  $ expecSchool: num  17.9 16.2 22.9 19.6 17 19.3 16.3 17.6 16.2 18 ...
##  $ timeSchool : num  12.6 13.4 12.9 12.5 14.1 12.4 12 12.4 11.5 12.2 ...
##  $ income     : num  68012 57625 43560 53754 46136 ...

Eliminamos los espacios en blanco de la columna de paises

library(stringr)
hdi$country=trimws(hdi$country,whitespace = "[\\h\\v]")

Eliminamos todas las filas con datos perdidos

hdi=hdi[complete.cases(hdi),]

Finalmente, ponemos los nombres de los paises en el indice

row.names(hdi)=hdi$country

## Warning: Setting row names on a tibble is deprecated.

La data esta limpia

2. Empezamos a hacer el analisis factorial

names(hdi)

## [1] "country"     "hdi"         "lifExp"      "expecSchool" "timeSchool" 
## [6] "income"

Creamos una subdata con las variables, normalizamos y ponemos los paises en el indice

hdi_s=as.data.frame(scale(hdi[-c(1,2)]))
row.names(hdi_s)=hdi$country
names(hdi_s)

## [1] "lifExp"      "expecSchool" "timeSchool"  "income"

Sacamos la correlacion de las 4 variables

library(psych)

pearson = cor(hdi_s)

#Ahora, vemos el resultado graficamente de los componentes estandarizados
cor.plot(pearson, 
         numbers=T, 
         upper=FALSE, 
         main = "Correlation", 
         show.legend = FALSE)

Ahora, hacemos el analisis KMO para ver que tan apropiado es juntar nuestros indicadores.

KMO(hdi_s) #Hay que tener en cuenta que mientras mas cerca 1 es mejor.

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = hdi_s)
## Overall MSA =  0.84
## MSA for each item = 
##      lifExp expecSchool  timeSchool      income 
##        0.83        0.80        0.83        0.92

Vemos cuantos indices se pueden crear a partir del punto de corte del grafico

fa.parallel(pearson, fm="pa", fa="fa", main = "Scree Plot",n.obs = nrow(hdi_s))

## Parallel analysis suggests that the number of factors =  1  and the number of components =  NA

Vemos que el Scree Plot nos señala 2.0, pero el R nos sugiere que solo creemos 1 indice. Entonces, pediremos que nos elabore solo 1

hdi_sF <- fa(hdi_s, #data estandarizada
                     nfactors=1, #número de factores
                     rotate="varimax")

Ahora, mediante loadingd vemos como se compone el indice que se ha formado y cuanto aporta cada variable

hdi_sF$loadings

## 
## Loadings:
##             MR1  
## lifExp      0.894
## expecSchool 0.900
## timeSchool  0.881
## income      0.700
## 
##                  MR1
## SS loadings    2.874
## Proportion Var 0.718

El loadings nos muestra que se ha podido recuperar el 71.8% de la variabilidad de todas las variables. Es un porcentaje aceptable.

Ahora, vemos la representacion grafica de los loadings

fa.diagram(hdi_sF)

Sacamos communalities para ver cuanto aporta cada componente a nuestra variable latente. Recordar que mientras mayor sea el numero es mejor.

sort(hdi_sF$communalities)

##      income  timeSchool      lifExp expecSchool 
##   0.4902014   0.7759529   0.7983572   0.8093960

Todas los componentes aportan una cantidad aceptable, a excepcion de ‘income’ que aporta un 49%.

Sacamos uniquenesses para ver que componente no colabora con nuestra variable latente. Recordar que mientras mmayor sea el numero es peor.

sort(hdi_sF$uniquenesses)

## expecSchool      lifExp  timeSchool      income 
##   0.1906049   0.2016429   0.2240468   0.5097995

Todos los componentes tienen un porcentaje bajo, a excepcion de ‘income’, que presenta 50,9%

Sacamos complexity para ver la cercania a factores. Esto nos dice que un componente se está yendo con 1 o con más de 1 factor. Recordar que mientras mas grande, menos cercano a los factores.

sort(hdi_sF$complexity)

## expecSchool  timeSchool      lifExp      income 
##           1           1           1           1

Todos los componentes muestran valor 1

Ahora, vemos los valores de nuestro nuevo indice

head(hdi_sF$scores)

##                  MR1
## Norway      1.608669
## Switzerland 1.487544
## Australia   2.121101
## Ireland     1.690876
## Germany     1.482233
## Iceland     1.665010

Ahora, creamos un data frame que contenga los valores estandarizados de los componentes con el nuevo score

hdi_nuevo=as.data.frame(cbind(hdi_s,hdi_sF$scores))
head(hdi_nuevo)

##               lifExp expecSchool timeSchool   income      MR1
## Norway      1.327981    1.580902   1.306243 2.546038 1.608669
## Switzerland 1.485127    1.002181   1.564419 2.016730 1.487544
## Australia   1.432745    3.283024   1.403059 1.299995 2.121101
## Ireland     1.236313    2.159623   1.273972 1.819468 1.690876
## Germany     1.183931    1.274520   1.790322 1.431265 1.482233
## Iceland     1.406554    2.057496   1.241700 1.414652 1.665010

names(hdi_nuevo)[5]="hdiF" #cambiamos el nombre del nuevo score de hdi.

Interpretacion

A partir del analisis factorial podemos encontrar que el orden de los paises ha cambiado. Por ejemplo, el orden de los tres primeros paises en el ranking ha pasado de ser Noruega, Suiza y Australia a Australia, Irlanda e Islandia. De igual forma, los ultimos tres lugares pasaron de ser Sudan del Sur, Republica Centroafricana y Niger a Sudar del Sur, Chad y Niger. Podemos ver a mas detalle estos cambios, que se dan a lo largo de toda la data, observandolas en extenso. En aspectos mas tecnicos sobre el analisis factorial, la variable ‘income’ ha mostrado no contribuir en gran medida al indice, en comparacion con las demas.

3. Mapa de Similitudes

Ahora, complementamos el analisis factorial con un mapa de similitudes de las variables que componen el nuevo indice IDH.

names(hdi_nuevo)

## [1] "lifExp"      "expecSchool" "timeSchool"  "income"      "hdiF"

Calculamos las distancias entre los casos en base a las puntuaciones Z

hdi_sD=dist(hdi_nuevo[-c(5)])

Ahora, creamos el mapa

hdi_map <- cmdscale(hdi_sD,eig=TRUE, k=2) # Hemos sugerido 2 dimensiones (k=2)
hdi_map$GOF #el GOF indica la bondad de ajuste y mientras mas a cerca a 1 es mejor

## [1] 0.8983953 0.8983953

Correcto, el GOF sale 0.89

Ahora, vemos la posicion de los puntos creados

titulo="Mapa de Similitudes entre paises por IDH"
x <- hdi_map$points[,1]
y <- hdi_map$points[,2]
plot(x, y, main=titulo)

Queremos mostrar los nombres de los paises en lugar de puntos

Entonces, primero hacemos el rownames

rownames(hdi_map$points)

##   [1] "Norway"                            
##   [2] "Switzerland"                       
##   [3] "Australia"                         
##   [4] "Ireland"                           
##   [5] "Germany"                           
##   [6] "Iceland"                           
##   [7] "Hong Kong, China (SAR)"            
##   [8] "Sweden"                            
##   [9] "Singapore"                         
##  [10] "Netherlands"                       
##  [11] "Denmark"                           
##  [12] "Canada"                            
##  [13] "United States"                     
##  [14] "United Kingdom"                    
##  [15] "Finland"                           
##  [16] "New Zealand"                       
##  [17] "Belgium"                           
##  [18] "Liechtenstein"                     
##  [19] "Japan"                             
##  [20] "Austria"                           
##  [21] "Luxembourg"                        
##  [22] "Israel"                            
##  [23] "Korea (Republic of)"               
##  [24] "France"                            
##  [25] "Slovenia"                          
##  [26] "Spain"                             
##  [27] "Czechia"                           
##  [28] "Italy"                             
##  [29] "Malta"                             
##  [30] "Estonia"                           
##  [31] "Greece"                            
##  [32] "Cyprus"                            
##  [33] "Poland"                            
##  [34] "United Arab Emirates"              
##  [35] "Andorra"                           
##  [36] "Lithuania"                         
##  [37] "Qatar"                             
##  [38] "Slovakia"                          
##  [39] "Brunei Darussalam"                 
##  [40] "Saudi Arabia"                      
##  [41] "Latvia"                            
##  [42] "Portugal"                          
##  [43] "Bahrain"                           
##  [44] "Chile"                             
##  [45] "Hungary"                           
##  [46] "Croatia"                           
##  [47] "Argentina"                         
##  [48] "Oman"                              
##  [49] "Russian Federation"                
##  [50] "Montenegro"                        
##  [51] "Bulgaria"                          
##  [52] "Romania"                           
##  [53] "Belarus"                           
##  [54] "Bahamas"                           
##  [55] "Uruguay"                           
##  [56] "Kuwait"                            
##  [57] "Malaysia"                          
##  [58] "Barbados"                          
##  [59] "Kazakhstan"                        
##  [60] "Iran (Islamic Republic of)"        
##  [61] "Palau"                             
##  [62] "Seychelles"                        
##  [63] "Costa Rica"                        
##  [64] "Turkey"                            
##  [65] "Mauritius"                         
##  [66] "Panama"                            
##  [67] "Serbia"                            
##  [68] "Albania"                           
##  [69] "Trinidad and Tobago"               
##  [70] "Antigua and Barbuda"               
##  [71] "Georgia"                           
##  [72] "Saint Kitts and Nevis"             
##  [73] "Cuba"                              
##  [74] "Mexico"                            
##  [75] "Grenada"                           
##  [76] "Sri Lanka"                         
##  [77] "Bosnia and Herzegovina"            
##  [78] "Venezuela (Bolivarian Republic of)"
##  [79] "Brazil"                            
##  [80] "Azerbaijan"                        
##  [81] "Lebanon"                           
##  [82] "The former Yugoslav Republic of"   
##  [83] "Armenia"                           
##  [84] "Thailand"                          
##  [85] "Algeria"                           
##  [86] "China"                             
##  [87] "Ecuador"                           
##  [88] "Ukraine"                           
##  [89] "Peru"                              
##  [90] "Colombia"                          
##  [91] "Saint Lucia"                       
##  [92] "Fiji"                              
##  [93] "Mongolia"                          
##  [94] "Dominican Republic"                
##  [95] "Jordan"                            
##  [96] "Tunisia"                           
##  [97] "Jamaica"                           
##  [98] "Tonga"                             
##  [99] "Saint Vincent and the Grenadines"  
## [100] "Suriname"                          
## [101] "Botswana"                          
## [102] "Maldives"                          
## [103] "Dominica"                          
## [104] "Samoa"                             
## [105] "Uzbekistan"                        
## [106] "Belize"                            
## [107] "Marshall Islands"                  
## [108] "Libya"                             
## [109] "Turkmenistan"                      
## [110] "Gabon"                             
## [111] "Paraguay"                          
## [112] "Moldova (Republic of)"             
## [113] "Philippines"                       
## [114] "South Africa"                      
## [115] "Egypt"                             
## [116] "Indonesia"                         
## [117] "Viet Nam"                          
## [118] "Bolivia (Plurinational State of)"  
## [119] "Palestine, State of"               
## [120] "Iraq"                              
## [121] "El Salvador"                       
## [122] "Kyrgyzstan"                        
## [123] "Morocco"                           
## [124] "Nicaragua"                         
## [125] "Cabo Verde"                        
## [126] "Guyana"                            
## [127] "Guatemala"                         
## [128] "Tajikistan"                        
## [129] "Namibia"                           
## [130] "India"                             
## [131] "Micronesia (Federated States of)"  
## [132] "Timor-Leste"                       
## [133] "Honduras"                          
## [134] "Bhutan"                            
## [135] "Kiribati"                          
## [136] "Bangladesh"                        
## [137] "Congo"                             
## [138] "Vanuatu"                           
## [139] "Lao People's Democratic Republi"   
## [140] "Ghana"                             
## [141] "Equatorial Guinea"                 
## [142] "Kenya"                             
## [143] "Sao Tome and Principe"             
## [144] "Eswatini (Kingdom of)"             
## [145] "Zambia"                            
## [146] "Cambodia"                          
## [147] "Angola"                            
## [148] "Myanmar"                           
## [149] "Nepal"                             
## [150] "Pakistan"                          
## [151] "Cameroon"                          
## [152] "Solomon Islands"                   
## [153] "Papua New Guinea"                  
## [154] "Tanzania (United Republic of)"     
## [155] "Syrian Arab Republic"              
## [156] "Zimbabwe"                          
## [157] "Nigeria"                           
## [158] "Rwanda"                            
## [159] "Lesotho"                           
## [160] "Mauritania"                        
## [161] "Madagascar"                        
## [162] "Uganda"                            
## [163] "Benin"                             
## [164] "Senegal"                           
## [165] "Comoros"                           
## [166] "Togo"                              
## [167] "Sudan"                             
## [168] "Afghanistan"                       
## [169] "Haiti"                             
## [170] "Côte d'Ivoire"                     
## [171] "Malawi"                            
## [172] "Djibouti"                          
## [173] "Ethiopia"                          
## [174] "Gambia"                            
## [175] "Guinea"                            
## [176] "Congo (Democratic Republic of th"  
## [177] "Guinea-Bissau"                     
## [178] "Yemen"                             
## [179] "Eritrea"                           
## [180] "Mozambique"                        
## [181] "Liberia"                           
## [182] "Mali"                              
## [183] "Burkina Faso"                      
## [184] "Sierra Leone"                      
## [185] "Burundi"                           
## [186] "Chad"                              
## [187] "South Sudan"                       
## [188] "Central African Republic"          
## [189] "Niger"

Ahora ponemos nombres en lugar de puntos

plot(x, y, xlab="Dimensión 1", ylab="Dimensión 2", main=titulo, 
     type="n")

# Colocamos etiquetas y colores a los puntos
text(x, y,labels = rownames(hdi_map$points),cex=0.5)

Ahora, ponemos puntos y nombres de los paises

hdi_map_DF=as.data.frame(hdi_map$points)

Ploteamos

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

library(ggrepel)
base=ggplot(hdi_map_DF,aes(x=V1,y=V2))
base+geom_point() + geom_text_repel(aes(label=row.names(hdi_map_DF)),size=2)

El mapa de similitudes esta listo

Pregunta 2: Realice un modelo para explicar el índice de libertad económica.Justifique la elección del modelo y analice sus resultados.

El modelo que se usara para responder esta pregunta es un modelo de regresion lineal.

Para el modelo se han elegido las siguientes variables. Como variable dependiente, se usara el Indice de Libertades Economicas cargado en la parte anterior. Como variables independientes se han elegido Derechos de Propiedad, Derchos Politicos e Indice de Desarrollo Humano (IDH).

Justicifacion del modelo

Este trabajo tiene por finalidad explicar el Índice de Libertad Económica a partir de un estudio estadistico. La siguientes propuestas nacen de una revisión de literatura sobre los debates mas importantes en la historia sobre la libertad económica. El mas influyente hasta la fecha es el que se da entre Tocqueville y Marx. Este debate es importante, ya que se plantea desde el ámbito político. Es decir, el análisis político es el eje central, en especial, los derechos.

Se optó por comprobar las tesis de Tocqueville. Esta se basa en que países con instituciones democráticas y derechos de propiedad bien instituidos explicarían la libertad económica de los Estados. Esta tesis ha tenido muchos detractores, pero también se ha trabajo sobre su matriz con premura. En ese sentido, Hernando de Soto desagrega lo escrito por Tocqueville. El cree que tan solo la propiedad determina la libertad económica y, por consiguiente, un sano desarrollo del capitalismo. Al mismo tiempo, Acemoglu & Robinson desagregan la tesis de Tocqueville desde otro aspecto. Los autores sostienen que es el avance en derechos políticos lo que determina la Libertad Economica de los Estados y, así, un buen desarrollo del capitalismo.

Es así como Tocqueville, Acemoglu & Robinson y Hernando de Soto sostienen que, por lo general, los países desarrollados son los que tienden a tener un mejor desempeño en Libertades Económicas. Estas proposiciones son una clara afrenta con los razonamientos de Marx desde un inicio. El filosofo sostuvo que el desarrollo jamas se provocaría en los Estados con derechos formales (políticos y de propiedad). El denostaba la democracia y, sobre todo, los derechos de propiedad. Aun asi, desde las restricciones de un trabajo como este, se opta por comprobar la tesis general tocquevilliana.

Hipotesis

Hipótesis 1: la libertad económica se explica según la calidad en los derechos de propiedad de los diferentes Estados

Hipótesis 2: la libertad económica se explica según el avance de los derechos políticos de los diferentes Estados.

Hipótesis 3: la libertad económica se explica según el desarrollo humano de los diferentes Estados.

1. Cargamos y limpiamos la data de la variable dependiente

Se carga el Indice de Libertad Economica (efi) tomando como referencia lo realizado en el proyecto aplicativo.

library(htmltab)
link2="https://docs.google.com/spreadsheets/d/e/2PACX-1vRZQXK_LLaAu_G6t1Fttz63BPTuDUhwRRuzXGChv8jg1jnXt5gog4PCPaZr76tc1W7GbpJrtZ9lf-YD/pub?gid=723085611&single=true&output=csv"

efi=read.csv(link2)
efiA=efi #creo un objeto mas por seguridad

Modificando la data

Exploramos la data para ver como se comportan las variables

names(efi)

##  [1] "Name"                   "index.year"            
##  [3] "overall.score"          "property.rights"       
##  [5] "government.integrity"   "judicial.effectiveness"
##  [7] "tax.burden"             "government.spending"   
##  [9] "fiscal.health"          "business.freedom"      
## [11] "labor.freedom"          "monetary.freedom"      
## [13] "trade.freedom"          "investment.freedom"    
## [15] "financial.freedom"

head(efi)

##          Name index.year overall.score property.rights
## 1 Afghanistan       2019          51.5            19.6
## 2     Albania       2019          66.5            54.8
## 3     Algeria       2019          46.2            31.6
## 4      Angola       2019          50.6            35.9
## 5   Argentina       2019          52.2            47.8
## 6     Armenia       2019          67.7            57.2
##   government.integrity judicial.effectiveness tax.burden
## 1                 25.2                   29.6       91.7
## 2                 40.4                   30.6       86.3
## 3                 28.9                   36.2       76.4
## 4                 20.5                   26.6       83.9
## 5                 33.5                   44.5       69.3
## 6                 38.6                   46.3       84.7
##   government.spending fiscal.health business.freedom labor.freedom
## 1                80.3          99.3             49.2          60.4
## 2                73.9          80.6             69.3          52.7
## 3                48.7          18.7             61.6          49.9
## 4                80.7          58.2             55.7          58.8
## 5                49.5          33.0             56.4          46.9
## 6                79.0          53.0             78.3          71.4
##   monetary.freedom trade.freedom investment.freedom financial.freedom
## 1             76.7          66.0               10.0              10.0
## 2             81.5          87.8               70.0              70.0
## 3             74.9          67.4               30.0              30.0
## 4             55.4          61.2               30.0              40.0
## 5             60.2          70.0               55.0              60.0
## 6             77.8          80.8               75.0              70.0

tail(efi)

##           Name index.year overall.score property.rights
## 181    Vanuatu       2019          56.4            65.9
## 182 Venezuela        2019          25.9             7.6
## 183    Vietnam       2019          55.3            49.8
## 184      Yemen       2019           N/A            19.6
## 185     Zambia       2019          53.6            45.0
## 186   Zimbabwe       2019          40.4            29.7
##     government.integrity judicial.effectiveness tax.burden
## 181                 51.9                   36.4       97.3
## 182                  7.9                   13.1       74.7
## 183                 34.0                   40.3       79.7
## 184                 20.3                   22.2        N/A
## 185                 32.3                   35.6       72.3
## 186                 15.8                   24.8       62.3
##     government.spending fiscal.health business.freedom labor.freedom
## 181                54.1          15.3             52.4          58.8
## 182                58.1          17.6             33.9          28.0
## 183                74.1          40.7             63.5          62.8
## 184                83.7           0.0             45.1          49.8
## 185                80.1          12.3             71.1          46.0
## 186                74.5          23.7             33.4          43.3
##     monetary.freedom trade.freedom investment.freedom financial.freedom
## 181             75.0          64.4               65.0              40.0
## 182              0.0          60.0                0.0              10.0
## 183             68.9          79.2               30.0              40.0
## 184             61.5          71.4               50.0               N/A
## 185             70.3          72.6               55.0              50.0
## 186             72.4          70.0               25.0              10.0

str(efi)

## 'data.frame':    186 obs. of  15 variables:
##  $ Name                  : Factor w/ 186 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ index.year            : int  2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
##  $ overall.score         : Factor w/ 139 levels "25.9","27.8",..: 27 96 9 24 30 100 134 111 91 95 ...
##  $ property.rights       : Factor w/ 129 levels "10.4","19.6",..: 2 59 10 21 44 66 104 114 73 82 ...
##  $ government.integrity  : Factor w/ 90 levels "14.3","15.8",..: 12 51 21 7 33 45 79 76 54 64 ...
##  $ judicial.effectiveness: Factor w/ 135 levels "10.0","12.3",..: 17 21 37 13 61 67 132 116 90 82 ...
##  $ tax.burden            : Factor w/ 147 levels "0.0","42.0","43.2",..: 130 112 65 100 35 104 21 8 119 145 ...
##  $ government.spending   : Factor w/ 158 levels "0.0","0.9","14.4",..: 118 95 33 119 37 112 58 7 56 64 ...
##  $ fiscal.health         : Factor w/ 148 levels "0.0","10.7","100.0",..: 146 79 18 44 32 41 97 96 108 29 ...
##  $ business.freedom      : Factor w/ 164 levels "17.7","20.0",..: 26 101 73 49 55 136 154 121 102 108 ...
##  $ labor.freedom         : Factor w/ 155 levels "20.0","28.0",..: 80 49 36 73 28 119 146 113 96 118 ...
##  $ monetary.freedom      : Factor w/ 132 levels "0.0","48.3","49.1",..: 66 95 54 5 11 72 128 95 18 96 ...
##  $ trade.freedom         : Factor w/ 112 levels "0.0","45.0","47.0",..: 38 104 42 20 50 83 103 97 61 93 ...
##  $ investment.freedom    : Factor w/ 21 levels "0.0","10.0","15.0",..: 2 15 6 6 12 16 17 19 13 16 ...
##  $ financial.freedom     : Factor w/ 11 levels "0.0","10.0","20.0",..: 2 8 4 5 7 8 10 8 7 9 ...

Limpiamos y modificamos toda la data efi.

efi=efi[-c(76,94,95,151,160,184),] #eliminamos las filas con datos perdidos

efi=efi[-c(2)] #eliminamos la columna que no nos es de utilidad (index.year)

#eliminando el punto que no es punto en cada variable
efi$overall.score=gsub("\\.", ".", efi$overall.score)
efi$property.rights=gsub("\\.", ".", efi$property.rights)
efi$government.integrity=gsub("\\.", ".", efi$government.integrity)
efi$judicial.effectiveness=gsub("\\.", ".", efi$judicial.effectiveness)
efi$tax.burden=gsub("\\.", ".", efi$tax.burden)
efi$government.spending=gsub("\\.", ".", efi$government.spending)
efi$fiscal.health=gsub("\\.", ".", efi$fiscal.health)
efi$business.freedom=gsub("\\.", ".", efi$business.freedom)
efi$labor.freedom=gsub("\\.", ".", efi$labor.freedom)
efi$monetary.freedom=gsub("\\.", ".", efi$monetary.freedom)
efi$trade.freedom=gsub("\\.", ".", efi$trade.freedom)
efi$investment.freedom=gsub("\\.", ".", efi$investment.freedom)
efi$financial.freedom=gsub("\\.", ".", efi$financial.freedom)

#cambiamos los nombres de las variables
names(efi)=c("country","efiscore","propri","govinte","judeff","taxbu","govspen","fishe","busfre","laborfree","monetfree","tradefree","investfree","finanfree") 

#convertimos las variables a numerico, salvo la vaariable 'country'
efi[,c(2:14)]=lapply(efi[,c(2:14)],as.numeric)

#finalmente, eliminamos los espacios en blanco de la variable 'country'
efi$country=trimws(efi$country,whitespace = "[\\h\\v]")

La data esta limpia

2. Cargamos y limpiamos las datas de las variables independientes

Empezamos, cargando la data Derechos de Propiedad tomando como referencia lo realizado en el proyecto aplicativo.

library(htmltab)
link3="https://docs.google.com/spreadsheets/d/e/2PACX-1vQQEUiDpE7JB17-qRdyr4Ay75hqvumyWMFgJd5UTauRdVvuuCnXVkQkAYSmEOaDf_Xm_0dCeow7jObh/pub?gid=0&single=true&output=csv"

pror=read.csv(link3)
prorA=pror #creo un objeto mas por seguridad

Explorando data

head(pror)

##        Name                                  Region Score Annual.Change
## 1   Albania Central Eastern Europe And Central Asia 4.525           703
## 2   Algeria            Middle East And North Africa 4.140           -20
## 3 Argentina               Latin America & Caribbean 5.025           457
## 4   Armenia Central Eastern Europe And Central Asia 4.714           588
## 5 Australia                        Asia And Oceania 8.329            85
## 6   Austria                          Western Europe 8.004            -8
##   Global.Rank Regional.Rank Year Compare
## 1         102            21 2018 Compare
## 2         113            14 2018 Compare
## 3          79            11 2018 Compare
## 4          95            18 2018 Compare
## 5           7             3 2018 Compare
## 6          15             9 2018 Compare

tail(pror)

##            Name                       Region Score Annual.Change
## 120     Uruguay    Latin America & Caribbean 6.191          -221
## 121   Venezuela    Latin America & Caribbean 2.975           -82
## 122     Vietnam             Asia And Oceania 5.075           145
## 123 Yemen, Rep. Middle East And North Africa 2.792            64
## 124      Zambia                       Africa 4.732          -185
## 125    Zimbabwe                       Africa 3.844            84
##     Global.Rank Regional.Rank Year Compare
## 120          43             3 2018 Compare
## 121         123            19 2018 Compare
## 122          76            15 2018 Compare
## 123         124            15 2018 Compare
## 124          92            11 2018 Compare
## 125         117            23 2018 Compare

str(pror)

## 'data.frame':    125 obs. of  8 variables:
##  $ Name         : Factor w/ 125 levels "Albania","Algeria",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Region       : Factor w/ 7 levels "Africa","Asia And Oceania",..: 3 5 4 3 2 7 3 5 2 7 ...
##  $ Score        : num  4.53 4.14 5.03 4.71 8.33 ...
##  $ Annual.Change: num  703 -20 457 588 85 ...
##  $ Global.Rank  : int  102 113 79 95 7 15 78 45 122 18 ...
##  $ Regional.Rank: int  21 14 11 18 3 9 15 7 19 11 ...
##  $ Year         : int  2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
##  $ Compare      : Factor w/ 1 level "Compare": 1 1 1 1 1 1 1 1 1 1 ...

names(pror)

## [1] "Name"          "Region"        "Score"         "Annual.Change"
## [5] "Global.Rank"   "Regional.Rank" "Year"          "Compare"

Limpiamos la data

pror=pror[-c(2,4,5,6,7,8)] #eliminamos las columnas que no nos son necesarias

#cambiao nombres y elimino espacios en blanco
names(pror)=c("country","ProRscore")
library(stringr)
pror$country=trimws(pror$country,whitespace = "[\\h\\v]")

#se cambia el nombre de un pais con la finalidad de poder mergear
pror[119,1]="United States" 

#formateamos el indice
row.names(pror)=NULL

La data esta lista

Ahora, cargamos la data Derechos Politicos tomando como referencia lo realizado en el proyecto aplicativo.

library(htmltab)
link4="https://docs.google.com/spreadsheets/d/e/2PACX-1vTH3-CvIQK1311LiyuZ0gNVgBuEdZq3JYYj7LiHgGl1Ye--0ZvU0qntUBQohN8gRg/pub?gid=1951874879&single=true&output=csv"

polr=read.csv(link4)
polrA=pror #creo un objeto mas por seguridad

Explorando data

head(polr)

##              X              X.1
## 1      Country Political Rights
## 2  Afghanistan                5
## 3      Albania                3
## 4      Algeria                6
## 5      Andorra                1
## 6       Angola                6

tail(polr)

##               X X.1
## 191  Uzbekistan   7
## 192     Vanuatu   2
## 193   Venezuela   7
## 194     Vietnam   7
## 195       Yemen   7
## 196      Zambia   4

str(polr)

## 'data.frame':    196 obs. of  2 variables:
##  $ X  : Factor w/ 196 levels "","  Switzerland",..: 191 3 4 5 6 7 8 9 10 11 ...
##  $ X.1: Factor w/ 9 levels "","1","2","3",..: 9 6 4 7 2 7 3 3 5 2 ...

Limpiando y modificando data

names(polr)=c("country","PolRscore") #camio el nombre de las variables
polr=polr[-c(1,26),] #se eliminan estas 2 filas innecesarias
row.names(polr)=NULL #formateo el indice

#elimino espacios en blanco de la variable country
library(stringr)
polr$country=trimws(polr$country,whitespace = "[\\h\\v]")
polr$PolRscore=trimws(polr$PolRscore,whitespace = "[\\h\\v]")

#modifico las variables
polr$country=as.character(polr$country)
polr$PolRscore=as.numeric(polr$PolRscore)
summary(polr)

##    country            PolRscore    
##  Length:194         Min.   :1.000  
##  Class :character   1st Qu.:1.000  
##  Mode  :character   Median :3.000  
##                     Mean   :3.443  
##                     3rd Qu.:5.000  
##                     Max.   :7.000

#recodifico la variable contraintuitiva PolRscore
library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

polr$PolRscore=recode(polr$PolRscore,"1=7;2=6;3=5;4=4;5=3;6=2;7=1")

polr[113,1]="Micronesia" #se cambia el nombre de un pais con la finalidad de poder mergear

La data esta lista

Debido a que se usara un modelo de regresion lineal, se ha considerado pertinente estandarizar los varoles de las variables de las datas en sus puntuaciones Z. En primer lugar, se hara el merge de las datas ‘efi, ’polr’ y ‘pror’, para luego estandarizarlas. En segundo lugar, se tomara la data resultante del analisis factorial de la data de ‘hdi’ (Indice de Desarrollo Humano), ya que ese plantea una mayor exactitud matematica, por lo que esta sera mergeada al final.

Preparamos los primeros merges

merge1=merge(efi,polr,by="country",all.x=T)
merge2=merge(merge1,pror,by="country",all.x=T)

Debido a que en el merge tenemos una gran cantidad de NA’s y con la finalidad de no perder tantos casos, se ha decidido imputar.

for(i in 2:ncol(merge2)){  # para cada columna:
  MEDIA=mean(merge2[,i], na.rm = TRUE) # calcula la media de esa columna.
 merge2[is.na(merge2[,i]), i] <- MEDIA # pon la media donde haya un NA en esa columna 
}

summary(merge2)

##    country             efiscore         propri         govinte     
##  Length:180         Min.   : 5.90   Min.   : 7.60   Min.   : 7.90  
##  Class :character   1st Qu.:53.95   1st Qu.:37.35   1st Qu.:28.10  
##  Mode  :character   Median :60.75   Median :52.25   Median :36.50  
##                     Mean   :60.77   Mean   :53.03   Mean   :42.15  
##                     3rd Qu.:67.80   3rd Qu.:65.90   3rd Qu.:50.35  
##                     Max.   :90.20   Max.   :97.40   Max.   :96.70  
##      judeff          taxbu          govspen          fishe       
##  Min.   : 5.00   Min.   : 0.00   Min.   : 0.00   Min.   :  0.00  
##  1st Qu.:31.45   1st Qu.:70.97   1st Qu.:51.90   1st Qu.: 42.12  
##  Median :43.10   Median :78.05   Median :68.90   Median : 80.60  
##  Mean   :45.54   Mean   :77.21   Mean   :64.52   Mean   : 66.91  
##  3rd Qu.:55.55   3rd Qu.:85.42   3rd Qu.:82.30   3rd Qu.: 91.62  
##  Max.   :92.40   Max.   :99.80   Max.   :96.60   Max.   :100.00  
##      busfre        laborfree       monetfree       tradefree    
##  Min.   : 5.00   Min.   : 5.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.:55.45   1st Qu.:50.70   1st Qu.:72.38   1st Qu.:66.75  
##  Median :65.30   Median :60.20   Median :77.85   Median :76.30  
##  Mean   :64.05   Mean   :59.58   Mean   :75.39   Mean   :74.43  
##  3rd Qu.:75.12   3rd Qu.:68.83   3rd Qu.:81.85   3rd Qu.:84.40  
##  Max.   :96.40   Max.   :91.00   Max.   :88.00   Max.   :95.00  
##    investfree      finanfree       PolRscore       ProRscore    
##  Min.   : 0.00   Min.   : 0.00   Min.   :1.000   Min.   :2.733  
##  1st Qu.:45.00   1st Qu.:30.00   1st Qu.:3.000   1st Qu.:5.056  
##  Median :60.00   Median :50.00   Median :5.000   Median :5.762  
##  Mean   :57.75   Mean   :48.61   Mean   :4.535   Mean   :5.762  
##  3rd Qu.:75.00   3rd Qu.:60.00   3rd Qu.:6.000   3rd Qu.:6.093  
##  Max.   :95.00   Max.   :90.00   Max.   :7.000   Max.   :8.692

Tal como se ha indicado, ahora se estandariza el ‘merge2’

merge2_s=as.data.frame(scale(merge2[-1]))

Debido a que la columna paises se ha eliminado en la operacion anterior, volvemos a agregarla

countriesm2=c(merge2$country)
merge2_s$country=c(merge2$country)

La data esta lista

Ahora, la data de IDH (‘hdi’) ya habia sido cargada para la pregunta anterior; sin embargo, el nombre de los paises no coincide con el nombre de los paises de la data ‘merge2_s’, por lo que procedemos a modificarla.

#En primer lugar, creamos una copia de la data 'hdi_nuevo' por seguridad y trabajamos con ella
hdiY=hdi_nuevo 

#Ahora, formateamos el indice
row.names(hdiY)=NULL

#Ahora, agregamos la variable de paises
country=c(hdi$country) #creamos un valor con los nombres de los paises
hdiY$country=c(hdi$country) #añadimos la nueva variable

Ahora, cambiamos el nombre de los paises para que el merge pueda hacerse

hdiY[7,6]="Hong Kong"
hdiY[23,6]="South Korea"
hdiY[27,6]="Czech Republic"
hdiY[49,6]="Russia"
hdiY[60,6]="Iran"
hdiY[78,6]="Venezuela"
hdiY[112,6]="Moldova"
hdiY[118,6]="Bolivia"
hdiY[131,6]="Micronesia"
hdiY[137,6]="Republic of Congo"
hdiY[139,6]="Laos"
hdiY[154,6]="Tanzania"
hdiY[155,6]="Syria"
hdiY[176,6]="Democratic Republic of Congo"
hdiY[190,6]="North Korea"

Merge

Aplicamos el merge entre las datas ‘merge2_s’ y ‘hdiY’

merge_total=merge(merge2_s,hdiY,by="country",all.x=T)

Debido a que se tienen algunos casos con valores perdidos, se ha optado por imputar.

for(i in 2:ncol(merge_total)){  # para cada columna:
  MEDIA=mean(merge_total[,i], na.rm = TRUE) # calcula la media de esa columna.
 merge_total[is.na(merge_total[,i]), i] <- MEDIA # pon la media donde haya un NA en esa columna 
}

summary(merge_total)

##    country             efiscore             propri        
##  Length:180         Min.   :-4.874705   Min.   :-2.35232  
##  Class :character   1st Qu.:-0.605766   1st Qu.:-0.81195  
##  Mode  :character   Median :-0.001629   Median :-0.04047  
##                     Mean   : 0.000000   Mean   : 0.00000  
##                     3rd Qu.: 0.624719   3rd Qu.: 0.66629  
##                     Max.   : 2.614818   Max.   : 2.29727  
##     govinte            judeff            taxbu             govspen       
##  Min.   :-1.7462   Min.   :-2.2620   Min.   :-5.84577   Min.   :-2.8320  
##  1st Qu.:-0.7164   1st Qu.:-0.7860   1st Qu.:-0.47226   1st Qu.:-0.5538  
##  Median :-0.2882   Median :-0.1359   Median : 0.06339   Median : 0.1925  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.4179   3rd Qu.: 0.5588   3rd Qu.: 0.62175   3rd Qu.: 0.7807  
##  Max.   : 2.7809   Max.   : 2.6151   Max.   : 1.71008   Max.   : 1.4084  
##      fishe             busfre           laborfree         monetfree      
##  Min.   :-2.1447   Min.   :-3.77900   Min.   :-3.7707   Min.   :-6.8628  
##  1st Qu.:-0.7945   1st Qu.:-0.55040   1st Qu.:-0.6137   1st Qu.:-0.2742  
##  Median : 0.4387   Median : 0.07996   Median : 0.0426   Median : 0.2242  
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.7921   3rd Qu.: 0.70872   3rd Qu.: 0.6384   3rd Qu.: 0.5884  
##  Max.   : 1.0606   Max.   : 2.07024   Max.   : 2.1703   Max.   : 1.1482  
##    tradefree         investfree        finanfree         PolRscore      
##  Min.   :-6.1220   Min.   :-2.6291   Min.   :-2.5060   Min.   :-1.6733  
##  1st Qu.:-0.6316   1st Qu.:-0.5805   1st Qu.:-0.9594   1st Qu.:-0.7266  
##  Median : 0.1540   Median : 0.1024   Median : 0.0716   Median : 0.2202  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.8202   3rd Qu.: 0.7853   3rd Qu.: 0.5871   3rd Qu.: 0.6936  
##  Max.   : 1.6921   Max.   : 1.6958   Max.   : 2.1337   Max.   : 1.1669  
##    ProRscore           lifExp          expecSchool      
##  Min.   :-2.6241   Min.   :-2.61377   Min.   :-2.67440  
##  1st Qu.:-0.6119   1st Qu.:-0.61670   1st Qu.:-0.52973  
##  Median : 0.0000   Median : 0.14284   Median : 0.04474  
##  Mean   : 0.0000   Mean   : 0.01547   Mean   : 0.04474  
##  3rd Qu.: 0.2869   3rd Qu.: 0.68630   3rd Qu.: 0.66176  
##  Max.   : 2.5381   Max.   : 1.56370   Max.   : 3.28302  
##    timeSchool           income              hdiF         
##  Min.   :-2.27594   Min.   :-0.88598   Min.   :-2.11143  
##  1st Qu.:-0.72689   1st Qu.:-0.66897   1st Qu.:-0.61959  
##  Median : 0.03371   Median :-0.25840   Median : 0.09808  
##  Mean   : 0.01979   Mean   : 0.01642   Mean   : 0.02758  
##  3rd Qu.: 0.86250   3rd Qu.: 0.35928   3rd Qu.: 0.68624  
##  Max.   : 1.79032   Max.   : 5.03313   Max.   : 2.12110

El merge final esta listo

3. Modelo de regresion lineal

Formateamos el indice y elegimos las variables que vamos a usar en el modelo. Se ha optado por usar ‘efiscore’, ‘PolRscore’, ‘ProRscore’, ‘hdiF’

names(merge_total)

##  [1] "country"     "efiscore"    "propri"      "govinte"     "judeff"     
##  [6] "taxbu"       "govspen"     "fishe"       "busfre"      "laborfree"  
## [11] "monetfree"   "tradefree"   "investfree"  "finanfree"   "PolRscore"  
## [16] "ProRscore"   "lifExp"      "expecSchool" "timeSchool"  "income"     
## [21] "hdiF"

#formateamos el indice con el nombre de los paises
row.names(merge_total)=merge_total$country
merge_total=merge_total[-1]

Creamos el subset linealData y corremos la regresion

names(merge_total)

##  [1] "efiscore"    "propri"      "govinte"     "judeff"      "taxbu"      
##  [6] "govspen"     "fishe"       "busfre"      "laborfree"   "monetfree"  
## [11] "tradefree"   "investfree"  "finanfree"   "PolRscore"   "ProRscore"  
## [16] "lifExp"      "expecSchool" "timeSchool"  "income"      "hdiF"

linealData= merge_total[c(1,14,15,20)]
rLineal=lm(efiscore~.,data=linealData)

Ahora, se ve el summary de la regresion

summary(rLineal)

## 
## Call:
## lm(formula = efiscore ~ ., data = linealData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5709 -0.3509  0.0365  0.4662  1.6542 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.008958   0.053401  -0.168  0.86697    
## PolRscore    0.181550   0.061594   2.948  0.00364 ** 
## ProRscore    0.357411   0.069458   5.146 7.07e-07 ***
## hdiF         0.324801   0.075690   4.291 2.93e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7159 on 176 degrees of freedom
## Multiple R-squared:  0.4961, Adjusted R-squared:  0.4875 
## F-statistic: 57.75 on 3 and 176 DF,  p-value: < 2.2e-16

Como se puede observar, todas las variables independientes son significativas.

5. Diagnostico de Regresion Lineal

Linealidad: con esta prueba observamos si el modelo lineal es aplicable

plot(rLineal, 1)

Vemos que los casos mantienen linealidad y no se alejan de la linea roja. Entonces, el modelo lineal es aplicable.

Homocedasticidad: con esta prueba verificamos si el modelo es homocedastico. Esto significa que la subida de la curva no sea exponencial.

plot(rLineal, 3)

Vemos que la curva no tiene un crecimiento exponencial.

Ahora, se realiza el test de Breusch-Pagan. Con este test evaluamos la hipotesis nula: el modelo es heterocedastico.

library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

#Test Breusch-Pagan
bptest(rLineal)

## 
##  studentized Breusch-Pagan test
## 
## data:  rLineal
## BP = 10.175, df = 3, p-value = 0.01713

El valor p es 0.02193. Entonces, no se rechaza la hipótesis nula. El modelo es heterocedastico.

Normalidad de los residuos: con esta prueba comprobamos si los residuos se acercan a la recta. Mientras mas cerca esten de la recta es mejor

plot(rLineal, 2)

En su mayoria, los residuos se acercan bastante a la recta.

Test de Shapiro: mediante este test evaluamos la hipotesis nula: los residuos son normales.

shapiro.test(rLineal$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  rLineal$residuals
## W = 0.88064, p-value = 8.809e-11

El valor p es 2.765e-10. Por lo tanto, se rechaza la hipotesis nula: los residuos no son normales.

No multicolinelidad: esta prueba detecta si alguna variable esta siendo redundante. Se busca que los valores de cada variable que nos muestra el test sean menores que 5.

library(DescTools)

## 
## Attaching package: 'DescTools'

## The following object is masked from 'package:car':
## 
##     Recode

## The following objects are masked from 'package:psych':
## 
##     AUC, ICC, SD

VIF(rLineal)

## PolRscore ProRscore      hdiF 
##  1.325018  1.684974  1.773479

Los valores mostrados estan bastante lejos de 5. Por lo tanto, no hay multicolinealidad.

Valores influyentes: con esta prueba se buscan los valores atipicos que influyen en el modelo de la regresion. Mientras mas cerca a la linea Cook’s distance son mas influyentes.

plot(rLineal, 5)

Se puede observar que se tienen 3 casos atipicos muy cerca de la linea Cook’s distance: North Korea, Cuba y Venezuela.

6. CONCLUSION

Al realizar el modelo de regresion es posible observar que las tres hipotesis planteadas se cumplen. En este sentido, es posible afirmar que la la libertad economica se explica desde los derechos de propiedad y los políticos. Asimismo, los paises con un mayor indice de desarrollo humano tienen una mayor libertad economica.

Sin embargo, al realizar el diagnostico de la regresion lineal se observa que el modelo no cumple con el Breusch-Pagan test ni con el Shapiro test, por lo que es heterocedastico y los residuos no presentan normalidad. Esto puede explicarse debido a la presencia de tres casos atipicos: North Korea, Cuba y Venezuela, los cuales, ademas, estan bastante cerca de la linea Cook’s distance mostrada en la prueba valores influyentes. Estos 3 paises tienen regimenes dictatoriales de izquierda, por lo que sus valores en cada una de las variables son bastante diferentes con las de los demas paises.

*Las tildes han sido omitidas