Lee el fichero de datos y asignarlo al objeto df. Comprueba que lo has leído bien.
df <- read.table( "http://gauss.inf.um.es/datos/paisesMundoRedC.csv",
header = TRUE,
sep = ";",
dec = ".",
stringsAsFactors = FALSE )
head(df)
## Country EPI_regions
## AGO Angola Sub-Saharan Africa
## ALB Albania Central and Eastern Europ
## ARE United Arab Emirates Middle East and North Africa
## ARG Argentina Latin America and Caribbe
## ARM Armenia Middle East and North Africa
## AUS Australia East Asia and the Pacific
## GEO_subregion Population2005 GDP_capita.MRYA landarea EPI
## AGO Southern Africa 15941.4 2314.4 1251895.62 39.5
## ALB Central Europe 3129.7 4955.3 28346.12 84.0
## ARE Arabian Peninsula 4495.8 22698.3 74776.60 64.0
## ARG South America 38747.2 13652.4 2736296.00 81.8
## ARM Eastern Europe 3016.3 5011.0 28272.73 77.8
## AUS Australia + New Zealand 20155.1 30677.9 7634643.84 79.8
## FOREST FISH AGRICULTURE
## AGO 95.4 87.3 61.3
## ALB 100.0 62.5 75.6
## ARE 100.0 50.0 72.3
## ARG 75.9 58.8 79.9
## ARM 70.1 NA 94.2
## AUS 100.0 96.7 78.7
dim(df)
## [1] 149 10
str(df)
## 'data.frame': 149 obs. of 10 variables:
## $ Country : chr "Angola" "Albania" "United Arab Emirates" "Argentina" ...
## $ EPI_regions : chr "Sub-Saharan Africa" "Central and Eastern Europ" "Middle East and North Africa" "Latin America and Caribbe" ...
## $ GEO_subregion : chr "Southern Africa" "Central Europe" "Arabian Peninsula" "South America" ...
## $ Population2005 : num 15941 3130 4496 38747 3016 ...
## $ GDP_capita.MRYA: num 2314 4955 22698 13652 5011 ...
## $ landarea : num 1251896 28346 74777 2736296 28273 ...
## $ EPI : num 39.5 84 64 81.8 77.8 79.8 89.4 72.2 54.7 78.4 ...
## $ FOREST : num 95.4 100 100 75.9 70.1 100 100 100 0 100 ...
## $ FISH : num 87.3 62.5 50 58.8 NA 96.7 NA NA NA 47.4 ...
## $ AGRICULTURE : num 61.3 75.6 72.3 79.9 94.2 78.7 76.4 71.4 95.9 80.8 ...
Se incluyen 149 observaciones de 10 variables (de las cuáles 3 son de tipo cadena de caracteres y el resto son numéricas). La información contenida en GEO_subregion es de tipo cadena de caracteres e incluye la subregión geográfica a la que pertenece cada país.
indicesAfrica <- grep("Africa", df$GEO_subregion)
dfA <- df[indicesAfrica,]
str(dfA)
## 'data.frame': 41 obs. of 10 variables:
## $ Country : chr "Angola" "Burundi" "Benin" "Burkina Faso" ...
## $ EPI_regions : chr "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" ...
## $ GEO_subregion : chr "Southern Africa" "Eastern Africa" "Western Africa" "Western Africa" ...
## $ Population2005 : num 15941 7548 8439 13228 1765 ...
## $ GDP_capita.MRYA: num 2314 630 1016 1143 11313 ...
## $ landarea : num 1251896 25227 115828 275748 559516 ...
## $ EPI : num 39.5 54.7 56.1 44.3 68.7 56 65.2 63.8 47.3 69.7 ...
## $ FOREST : num 95.4 0 17.8 64.5 79.2 97.2 100 78.4 94.8 98.4 ...
## $ FISH : num 87.3 NA 91.5 NA NA NA 91.2 52.4 46.3 74.1 ...
## $ AGRICULTURE : num 61.3 95.9 88.2 87.7 72.3 71.8 88.7 69.9 70.8 99.1 ...
dfA[, 2:3] <- lapply(dfA[, 2:3], factor)
str(dfA)
## 'data.frame': 41 obs. of 10 variables:
## $ Country : chr "Angola" "Burundi" "Benin" "Burkina Faso" ...
## $ EPI_regions : Factor w/ 2 levels "Middle East and North Africa",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ GEO_subregion : Factor w/ 5 levels "Central Africa",..: 4 2 5 5 4 1 5 1 1 1 ...
## $ Population2005 : num 15941 7548 8439 13228 1765 ...
## $ GDP_capita.MRYA: num 2314 630 1016 1143 11313 ...
## $ landarea : num 1251896 25227 115828 275748 559516 ...
## $ EPI : num 39.5 54.7 56.1 44.3 68.7 56 65.2 63.8 47.3 69.7 ...
## $ FOREST : num 95.4 0 17.8 64.5 79.2 97.2 100 78.4 94.8 98.4 ...
## $ FISH : num 87.3 NA 91.5 NA NA NA 91.2 52.4 46.3 74.1 ...
## $ AGRICULTURE : num 61.3 95.9 88.2 87.7 72.3 71.8 88.7 69.9 70.8 99.1 ...
summary(dfA)
## Country EPI_regions
## Length:41 Middle East and North Africa: 5
## Class :character Sub-Saharan Africa :36
## Mode :character
##
##
##
##
## GEO_subregion Population2005 GDP_capita.MRYA
## Central Africa : 6 Min. : 793.1 Min. : 629.8
## Eastern Africa : 7 1st Qu.: 5525.5 1st Qu.: 1008.1
## Northern Africa: 5 Median : 12883.9 Median : 1312.8
## Southern Africa:10 Mean : 21030.0 Mean : 2506.2
## Western Africa :13 3rd Qu.: 28816.2 3rd Qu.: 2299.1
## Max. :131529.7 Max. :11313.3
##
## landarea EPI FOREST FISH
## Min. : 17410 Min. :39.10 Min. : 0.00 Min. :23.90
## 1st Qu.: 147882 1st Qu.:51.30 1st Qu.: 73.30 1st Qu.:72.60
## Median : 403759 Median :59.40 Median : 86.40 Median :79.10
## Mean : 642219 Mean :59.16 Mean : 78.51 Mean :75.11
## 3rd Qu.: 968072 3rd Qu.:69.00 3rd Qu.: 98.40 3rd Qu.:87.05
## Max. :2492385 Max. :78.10 Max. :100.00 Max. :91.60
## NA's :14
## AGRICULTURE
## Min. :53.00
## 1st Qu.:69.30
## Median :73.90
## Mean :74.87
## 3rd Qu.:81.60
## Max. :99.10
##
Para la variable Country tenemos el número total de países (41). Las variables de región y subregión contemplan el número de países pertenecientes a cada región o subregión.
En cuanto a las variables numéricas:
GDP_capita.MRYA sería el equivalente al PIB e incluye valores comprendidos entre 629.8 y 11313.3, aunque teniendo en cuenta la mediana (1312.8) y el valor del tercer cuartil (2299.1), sabemos que la mayor parte de valores están mucho más cerca del mínimo que del máximo.
landarea incluye valores comprendidos entre 17410 y 2492385, siendo la mediana 403759, es decir, la mayor parte de países tienen un área más pequeña y el máximo posiblemente sea un outlayer.
EPI oscila entre 39.10 y 78.10 siendo la media y la mediana similares (en torno a 59) e intuyéndose que las observaciones para esta variable siguen una distribución normal.
FOREST, FISH Y AGRICULTURE tienen pinta de ser porcentajes, ya que todas las observaciones para estas variables se hallan entre 0 y 100. Considerando esto, los valores de las medianas para las 3 variables serían bastante altos, estando entre 70 y 80.
En el caso de FISH, parece que algunas observaciones (14) no se pudieron conocer y por ello hay celdas sin valor numérico.
# Carga de las librerías
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.3.3
dfA2 <- select(dfA, Population2005, landarea, GDP_capita.MRYA, GEO_subregion)
melted <- melt(dfA2, id.vars="GEO_subregion")
summarise(group_by(melted, GEO_subregion, variable), Media=mean(value), SD=sd(value), Mediana=median(value))
## Source: local data frame [15 x 5]
## Groups: GEO_subregion [?]
##
## GEO_subregion variable Media SD Mediana
## <fctr> <fctr> <dbl> <dbl> <dbl>
## 1 Central Africa Population2005 15506.650 21282.7174 6893.30
## 2 Central Africa landarea 875594.577 786167.9759 544316.82
## 3 Central Africa GDP_capita.MRYA 2037.667 1914.4573 1250.05
## 4 Eastern Africa Population2005 23183.186 27067.0928 9037.70
## 5 Eastern Africa landarea 300392.153 413610.3654 121862.90
## 6 Eastern Africa GDP_capita.MRYA 1163.486 417.6242 1104.70
## 7 Northern Africa Population2005 36940.140 23147.1353 32853.80
## 8 Northern Africa landarea 1262919.154 1079434.0339 968071.46
## 9 Northern Africa GDP_capita.MRYA 4912.280 2209.5782 4346.40
## 10 Southern Africa Population2005 16388.480 15486.9692 12946.70
## 11 Southern Africa landarea 676390.913 418232.7095 761220.37
## 12 Southern Africa GDP_capita.MRYA 4057.450 4095.1023 2026.50
## 13 Western Africa Population2005 19871.100 34051.0490 11658.20
## 14 Western Africa landarea 453551.048 453693.8606 245860.06
## 15 Western Africa GDP_capita.MRYA 1326.885 561.7618 1142.90
library(tables)
## Warning: package 'tables' was built under R version 3.3.3
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 3.3.3
## Loading required package: lattice
## Loading required package: survival
## Warning: package 'survival' was built under R version 3.3.3
## Loading required package: Formula
## Warning: package 'Formula' was built under R version 3.3.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.3
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## combine, src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
tabular((Subregión = GEO_subregion) ~ (Agricultura = AGRICULTURE) * ((Media = mean) + (Mediana = median) + (SD = sd) + (Mín. = min) + (Máx. = max)), data = dfA)
##
## Agricultura
## Subregión Media Mediana SD Mín. Máx.
## Central Africa 79.28 76.75 11.174 69.9 99.1
## Eastern Africa 77.41 78.00 12.403 54.4 95.9
## Northern Africa 66.04 68.40 8.136 53.0 74.8
## Southern Africa 69.74 71.80 4.681 61.3 74.7
## Western Africa 78.82 78.80 7.131 65.9 88.7
tt <- tabular((Subregión = GEO_subregion) ~ (Agricultura = AGRICULTURE) * ((Media = mean) + (Mediana = median) + (SD = sd) + (Mín. = min) + (Máx. = max)), data = dfA)
html(tt, options = htmloptions(pad = T))
Agricultura | |||||
---|---|---|---|---|---|
Subregión | Media | Mediana | SD | Mín. | Máx. |
Central Africa | 79.28 | 76.75 | 11.174 | 69.9 | 99.1 |
Eastern Africa | 77.41 | 78.00 | 12.403 | 54.4 | 95.9 |
Northern Africa | 66.04 | 68.40 | 8.136 | 53.0 | 74.8 |
Southern Africa | 69.74 | 71.80 | 4.681 | 61.3 | 74.7 |
Western Africa | 78.82 | 78.80 | 7.131 | 65.9 | 88.7 |
ggplot(data = dfA, mapping = aes(x = Population2005, y = GDP_capita.MRYA)) + geom_point(mapping = aes(colour = GEO_subregion)) + labs(title = "Población vs PIB per capita 2005", x = "Población (personas)", y = "PIB per capita ($)")
dfA3 <- mutate(dfA, population_density = Population2005 * 1000/ landarea)
ggplot(data = dfA3, aes(x = GEO_subregion, y = population_density, fill = GEO_subregion)) + geom_boxplot() + labs(title = "Densidad de población según la región, África (2005)", x = "Región geográfica", y = "Hab/km2") + theme(axis.text.x = element_text(angle = 90, hjust = 1))