Lee el fichero de datos y asignarlo al objeto df. Comprueba que lo has leído bien.

df <- read.table( "http://gauss.inf.um.es/datos/paisesMundoRedC.csv", 
                  header = TRUE,
                  sep = ";",
                  dec = ".",
                  stringsAsFactors = FALSE )

head(df)
##                  Country                  EPI_regions
## AGO               Angola           Sub-Saharan Africa
## ALB              Albania    Central and Eastern Europ
## ARE United Arab Emirates Middle East and North Africa
## ARG            Argentina    Latin America and Caribbe
## ARM              Armenia Middle East and North Africa
## AUS            Australia    East Asia and the Pacific
##               GEO_subregion Population2005 GDP_capita.MRYA   landarea  EPI
## AGO         Southern Africa        15941.4          2314.4 1251895.62 39.5
## ALB          Central Europe         3129.7          4955.3   28346.12 84.0
## ARE       Arabian Peninsula         4495.8         22698.3   74776.60 64.0
## ARG           South America        38747.2         13652.4 2736296.00 81.8
## ARM          Eastern Europe         3016.3          5011.0   28272.73 77.8
## AUS Australia + New Zealand        20155.1         30677.9 7634643.84 79.8
##     FOREST FISH AGRICULTURE
## AGO   95.4 87.3        61.3
## ALB  100.0 62.5        75.6
## ARE  100.0 50.0        72.3
## ARG   75.9 58.8        79.9
## ARM   70.1   NA        94.2
## AUS  100.0 96.7        78.7
dim(df)
## [1] 149  10
str(df)
## 'data.frame':    149 obs. of  10 variables:
##  $ Country        : chr  "Angola" "Albania" "United Arab Emirates" "Argentina" ...
##  $ EPI_regions    : chr  "Sub-Saharan Africa" "Central and Eastern Europ" "Middle East and North Africa" "Latin America and Caribbe" ...
##  $ GEO_subregion  : chr  "Southern Africa" "Central Europe" "Arabian Peninsula" "South America" ...
##  $ Population2005 : num  15941 3130 4496 38747 3016 ...
##  $ GDP_capita.MRYA: num  2314 4955 22698 13652 5011 ...
##  $ landarea       : num  1251896 28346 74777 2736296 28273 ...
##  $ EPI            : num  39.5 84 64 81.8 77.8 79.8 89.4 72.2 54.7 78.4 ...
##  $ FOREST         : num  95.4 100 100 75.9 70.1 100 100 100 0 100 ...
##  $ FISH           : num  87.3 62.5 50 58.8 NA 96.7 NA NA NA 47.4 ...
##  $ AGRICULTURE    : num  61.3 75.6 72.3 79.9 94.2 78.7 76.4 71.4 95.9 80.8 ...
  1. ¿Cómo son los datos? ¿Qué estructura tienen? ¿Qué información contiene la variable GEO_subregion? Comentalo adecuadamente.

Se incluyen 149 observaciones de 10 variables (de las cuáles 3 son de tipo cadena de caracteres y el resto son numéricas). La información contenida en GEO_subregion es de tipo cadena de caracteres e incluye la subregión geográfica a la que pertenece cada país.

  1. Utiliza el siguiente código para seleccionar de nuestro conjunto de datos la información relativa a los países africanos. Será el conjunto de datos con el que trabajaremos a partir de ahora (dfA).
indicesAfrica <- grep("Africa", df$GEO_subregion) 
dfA <- df[indicesAfrica,]
str(dfA)
## 'data.frame':    41 obs. of  10 variables:
##  $ Country        : chr  "Angola" "Burundi" "Benin" "Burkina Faso" ...
##  $ EPI_regions    : chr  "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" ...
##  $ GEO_subregion  : chr  "Southern Africa" "Eastern Africa" "Western Africa" "Western Africa" ...
##  $ Population2005 : num  15941 7548 8439 13228 1765 ...
##  $ GDP_capita.MRYA: num  2314 630 1016 1143 11313 ...
##  $ landarea       : num  1251896 25227 115828 275748 559516 ...
##  $ EPI            : num  39.5 54.7 56.1 44.3 68.7 56 65.2 63.8 47.3 69.7 ...
##  $ FOREST         : num  95.4 0 17.8 64.5 79.2 97.2 100 78.4 94.8 98.4 ...
##  $ FISH           : num  87.3 NA 91.5 NA NA NA 91.2 52.4 46.3 74.1 ...
##  $ AGRICULTURE    : num  61.3 95.9 88.2 87.7 72.3 71.8 88.7 69.9 70.8 99.1 ...
  1. Codifica adecuadamente las variables categóricas de dfA, quizás empleando la función factor().
dfA[, 2:3] <- lapply(dfA[, 2:3], factor)
str(dfA)
## 'data.frame':    41 obs. of  10 variables:
##  $ Country        : chr  "Angola" "Burundi" "Benin" "Burkina Faso" ...
##  $ EPI_regions    : Factor w/ 2 levels "Middle East and North Africa",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ GEO_subregion  : Factor w/ 5 levels "Central Africa",..: 4 2 5 5 4 1 5 1 1 1 ...
##  $ Population2005 : num  15941 7548 8439 13228 1765 ...
##  $ GDP_capita.MRYA: num  2314 630 1016 1143 11313 ...
##  $ landarea       : num  1251896 25227 115828 275748 559516 ...
##  $ EPI            : num  39.5 54.7 56.1 44.3 68.7 56 65.2 63.8 47.3 69.7 ...
##  $ FOREST         : num  95.4 0 17.8 64.5 79.2 97.2 100 78.4 94.8 98.4 ...
##  $ FISH           : num  87.3 NA 91.5 NA NA NA 91.2 52.4 46.3 74.1 ...
##  $ AGRICULTURE    : num  61.3 95.9 88.2 87.7 72.3 71.8 88.7 69.9 70.8 99.1 ...
  1. Haz un resumen del conjunto de datos empleando la función summary(). Comenta los resultados obtenidos
summary(dfA)
##    Country                                EPI_regions
##  Length:41          Middle East and North Africa: 5  
##  Class :character   Sub-Saharan Africa          :36  
##  Mode  :character                                    
##                                                      
##                                                      
##                                                      
##                                                      
##          GEO_subregion Population2005     GDP_capita.MRYA  
##  Central Africa : 6    Min.   :   793.1   Min.   :  629.8  
##  Eastern Africa : 7    1st Qu.:  5525.5   1st Qu.: 1008.1  
##  Northern Africa: 5    Median : 12883.9   Median : 1312.8  
##  Southern Africa:10    Mean   : 21030.0   Mean   : 2506.2  
##  Western Africa :13    3rd Qu.: 28816.2   3rd Qu.: 2299.1  
##                        Max.   :131529.7   Max.   :11313.3  
##                                                            
##     landarea            EPI            FOREST            FISH      
##  Min.   :  17410   Min.   :39.10   Min.   :  0.00   Min.   :23.90  
##  1st Qu.: 147882   1st Qu.:51.30   1st Qu.: 73.30   1st Qu.:72.60  
##  Median : 403759   Median :59.40   Median : 86.40   Median :79.10  
##  Mean   : 642219   Mean   :59.16   Mean   : 78.51   Mean   :75.11  
##  3rd Qu.: 968072   3rd Qu.:69.00   3rd Qu.: 98.40   3rd Qu.:87.05  
##  Max.   :2492385   Max.   :78.10   Max.   :100.00   Max.   :91.60  
##                                                     NA's   :14     
##   AGRICULTURE   
##  Min.   :53.00  
##  1st Qu.:69.30  
##  Median :73.90  
##  Mean   :74.87  
##  3rd Qu.:81.60  
##  Max.   :99.10  
## 

Para la variable Country tenemos el número total de países (41). Las variables de región y subregión contemplan el número de países pertenecientes a cada región o subregión.

En cuanto a las variables numéricas:

  1. Selecciona las variables Population2005, landarea y GDP_capita.MRYA y calcula los descriptivos principales según el factor GEO_subregion.
# Carga de las librerías
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.3.3
dfA2 <- select(dfA, Population2005, landarea, GDP_capita.MRYA, GEO_subregion)

melted <- melt(dfA2, id.vars="GEO_subregion")

summarise(group_by(melted, GEO_subregion, variable), Media=mean(value), SD=sd(value), Mediana=median(value))
## Source: local data frame [15 x 5]
## Groups: GEO_subregion [?]
## 
##      GEO_subregion        variable       Media           SD   Mediana
##             <fctr>          <fctr>       <dbl>        <dbl>     <dbl>
## 1   Central Africa  Population2005   15506.650   21282.7174   6893.30
## 2   Central Africa        landarea  875594.577  786167.9759 544316.82
## 3   Central Africa GDP_capita.MRYA    2037.667    1914.4573   1250.05
## 4   Eastern Africa  Population2005   23183.186   27067.0928   9037.70
## 5   Eastern Africa        landarea  300392.153  413610.3654 121862.90
## 6   Eastern Africa GDP_capita.MRYA    1163.486     417.6242   1104.70
## 7  Northern Africa  Population2005   36940.140   23147.1353  32853.80
## 8  Northern Africa        landarea 1262919.154 1079434.0339 968071.46
## 9  Northern Africa GDP_capita.MRYA    4912.280    2209.5782   4346.40
## 10 Southern Africa  Population2005   16388.480   15486.9692  12946.70
## 11 Southern Africa        landarea  676390.913  418232.7095 761220.37
## 12 Southern Africa GDP_capita.MRYA    4057.450    4095.1023   2026.50
## 13  Western Africa  Population2005   19871.100   34051.0490  11658.20
## 14  Western Africa        landarea  453551.048  453693.8606 245860.06
## 15  Western Africa GDP_capita.MRYA    1326.885     561.7618   1142.90
  1. Calcula para la variable AGRICULTURE la media, mediana, desviación típica, el mínimo y el máximo según el factor GEO_subregion . Pon etiquetas representativas a las columnas.
library(tables)
## Warning: package 'tables' was built under R version 3.3.3
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 3.3.3
## Loading required package: lattice
## Loading required package: survival
## Warning: package 'survival' was built under R version 3.3.3
## Loading required package: Formula
## Warning: package 'Formula' was built under R version 3.3.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.3
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     combine, src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
tabular((Subregión = GEO_subregion) ~ (Agricultura = AGRICULTURE) * ((Media = mean) + (Mediana = median) + (SD = sd) + (Mín. = min) + (Máx. = max)), data = dfA)
##                                                      
##                  Agricultura                         
##  Subregión       Media       Mediana SD     Mín. Máx.
##  Central Africa  79.28       76.75   11.174 69.9 99.1
##  Eastern Africa  77.41       78.00   12.403 54.4 95.9
##  Northern Africa 66.04       68.40    8.136 53.0 74.8
##  Southern Africa 69.74       71.80    4.681 61.3 74.7
##  Western Africa  78.82       78.80    7.131 65.9 88.7
  1. Imprime la tabla de forma elegante.
tt <- tabular((Subregión = GEO_subregion) ~ (Agricultura = AGRICULTURE) * ((Media = mean) + (Mediana = median) + (SD = sd) + (Mín. = min) + (Máx. = max)), data = dfA)

html(tt, options = htmloptions(pad = T))
  Agricultura
Subregión Media Mediana SD Mín. Máx.
Central Africa 79.28 76.75 11.174 69.9 99.1
Eastern Africa 77.41 78.00 12.403 54.4 95.9
Northern Africa 66.04 68.40  8.136 53.0 74.8
Southern Africa 69.74 71.80  4.681 61.3 74.7
Western Africa 78.82 78.80  7.131 65.9 88.7
  1. Haz un gráfico de dispersión para estudiar la relación entre el tamaño de la población de cada país (Population2005) y el producto interior bruto (GDP_capita.MRYA). Colorea los puntos según el factor GEO_subregion.
ggplot(data = dfA, mapping = aes(x = Population2005, y = GDP_capita.MRYA)) + geom_point(mapping = aes(colour = GEO_subregion)) + labs(title = "Población vs PIB per capita 2005", x = "Población (personas)", y = "PIB per capita ($)")

  1. Realiza un gráfico de tu interés para el conjunto de datos empleando la librería ggplot2.
dfA3 <- mutate(dfA, population_density = Population2005 * 1000/ landarea)

ggplot(data = dfA3, aes(x = GEO_subregion, y = population_density, fill = GEO_subregion)) + geom_boxplot() + labs(title = "Densidad de población según la región, África (2005)", x = "Región geográfica", y = "Hab/km2") + theme(axis.text.x = element_text(angle = 90, hjust = 1))