PEC 2

Autores

Victoria Bermúdez Ruiz

Última modificación

6 de noviembre de 2024

Actividad 1

Si pones el siguiente chunk en eval = T y te aseguras de que el archivo aadd_unit2.png se encuentra en tu directorio de trabajo, verás el marco de datos COW Trade 4.0. Observa los datos de la imagen y consulta, si lo crees necesario, la página web para responder a las siguientes preguntas:

La tabla tiene 495 observaciones y 10 variables. (dim(euis) [1] 495 10)
La unidad de observación son dos importadores, los que se muestran en la tabla son Canadá y Estados Unidos de América.
Tiene variables de intervalo (“year”), nominales (“ccode1”, “ccode2”, “importer1” e “importer2”), y de ratio (desde “flow1” hasta “smoothtotrade”).
Su nivel de análisis es tercera imagen.
Respecto a la relación comercial entre Canadá y Estados Unidos, ¿qué país era más exportador hacia el otro antes del crack del 29?
Podríamos pensar que Canadá es un país con tendencia más exportadora respecto a Estados Unidos debido a que los datos correspondientes a la variable “flow1” (exportaciones canadienses hacia Estados Unidos) en todas las observaciones fueron mayores que las de “flow2” (exportaciones estadounidenses hacia Canadá).

knitr::include_graphics("aadd_unit2.png")

Actividad 2

En esta actividad utilizaremos el marco de datos eu_ideology_scores.xlsx. Para ello lo importaremos con la función readxl() y le daremos el nombre de euis.

library(readxl)
euis <- read_xlsx("eu_ideology_scores.xlsx")

En primer lugar, haz una exploración general de los datos con las funciones que ya conoces, identificando el número de observaciones, variables, unidad de observación, etc.

dim(euis)

[1] 495  10

glimpse(euis)

Rows: 495
Columns: 10
$ country_name                    <chr> "Belgium", "Belgium", "Belgium", "Belg…
$ region_name                     <chr> "Bruxelles", "Bruxelles", "Bruxelles",…
$ party_name                      <chr> "Front Démocratique des Francophones",…
$ party_abbreviation              <chr> "FDF/DéFI", "FDF/DéFI", "FDF/DéFI", "F…
$ election_year_regional          <dbl> NA, NA, NA, NA, NA, NA, 1989, 1995, 19…
$ elec_date_reg                   <chr> NA, NA, NA, NA, NA, NA, "6/18/1989", "…
$ election_year_national_previous <dbl> 1971, 1974, 1977, 1978, 1981, 1985, 19…
$ elec_date_nat                   <chr> "26125", "27305", "4/17/1977", "12/17/…
$ eu                              <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,…
$ dum_eu                          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

euis

# A tibble: 495 × 10
   country_name region_name party_name party_abbreviation election_year_regional
   <chr>        <chr>       <chr>      <chr>                               <dbl>
 1 Belgium      Bruxelles   Front Dém… FDF/DéFI                               NA
 2 Belgium      Bruxelles   Front Dém… FDF/DéFI                               NA
 3 Belgium      Bruxelles   Front Dém… FDF/DéFI                               NA
 4 Belgium      Bruxelles   Front Dém… FDF/DéFI                               NA
 5 Belgium      Bruxelles   Front Dém… FDF/DéFI                               NA
 6 Belgium      Bruxelles   Front Dém… FDF/DéFI                               NA
 7 Belgium      Bruxelles   Front Dém… FDF/DéFI                             1989
 8 Belgium      Bruxelles   Front Dém… FDF/DéFI                             1995
 9 Belgium      Bruxelles   Front Dém… FDF/DéFI                             1999
10 Belgium      Bruxelles   Front Dém… FDF/DéFI                             2004
# ℹ 485 more rows
# ℹ 5 more variables: elec_date_reg <chr>,
#   election_year_national_previous <dbl>, elec_date_nat <chr>, eu <dbl>,
#   dum_eu <dbl>

Con unique(), examina cuántos países diferentes hay.

unique(euis$country_name)

[1] "Belgium"        "France"         "Germany"        "Italy"         
[5] "Spain"          "Sweden"         "United Kingdom"

Escribe el código que te permita observar qué observación / observaciones es “Euskal Herria Bildu”.

euis|> 
   filter(party_name == "Euskal Herria Bildu")

# A tibble: 2 × 10
  country_name region_name  party_name party_abbreviation election_year_regional
  <chr>        <chr>        <chr>      <chr>                               <dbl>
1 Spain        Pais Vasco … Euskal He… EH Bildu                             2012
2 Spain        Pais Vasco … Euskal He… EH Bildu                             2016
# ℹ 5 more variables: elec_date_reg <chr>,
#   election_year_national_previous <dbl>, elec_date_nat <chr>, eu <dbl>,
#   dum_eu <dbl>

Cambia “Social Democratic and Labour Party” por “Social Democratic & Labour Party”.

euis|>
    mutate(party_name = if_else(party_name == "Social Democratic and Labour Party", 
                                "Social Democratic & Labour Party", 
                                party_name))|>
select(country_name, party_name)|>
    filter(party_name=="Social Democratic & Labour Party")

# A tibble: 14 × 2
   country_name   party_name                      
   <chr>          <chr>                           
 1 United Kingdom Social Democratic & Labour Party
 2 United Kingdom Social Democratic & Labour Party
 3 United Kingdom Social Democratic & Labour Party
 4 United Kingdom Social Democratic & Labour Party
 5 United Kingdom Social Democratic & Labour Party
 6 United Kingdom Social Democratic & Labour Party
 7 United Kingdom Social Democratic & Labour Party
 8 United Kingdom Social Democratic & Labour Party
 9 United Kingdom Social Democratic & Labour Party
10 United Kingdom Social Democratic & Labour Party
11 United Kingdom Social Democratic & Labour Party
12 United Kingdom Social Democratic & Labour Party
13 United Kingdom Social Democratic & Labour Party
14 United Kingdom Social Democratic & Labour Party

La variable election_year_regional computa el año en que se produjeron elecciones regionales. No obstante, lo que queremos crear es una nueva variable que se llame dif y que calcule la diferencia de años con el actualizado, de manera que si unas elecciones se produjeron en 1995 y estamos en 2024, el resultado sea 29 (2024 - 1995).

euis|> 
    mutate(dif = 2024 - election_year_regional)|>
  select(country_name, election_year_regional, dif)|>
print(n=10)

# A tibble: 495 × 3
   country_name election_year_regional   dif
   <chr>                         <dbl> <dbl>
 1 Belgium                          NA    NA
 2 Belgium                          NA    NA
 3 Belgium                          NA    NA
 4 Belgium                          NA    NA
 5 Belgium                          NA    NA
 6 Belgium                          NA    NA
 7 Belgium                        1989    35
 8 Belgium                        1995    29
 9 Belgium                        1999    25
10 Belgium                        2004    20
# ℹ 485 more rows

Reduce los datos, de manera que conserves las observaciones de Cataluña que tuvieron elecciones regionales más tarde de 2010.

euis|>
  filter(region_name == "Catalonia" & election_year_regional == 2010)

# A tibble: 2 × 10
  country_name region_name party_name  party_abbreviation election_year_regional
  <chr>        <chr>       <chr>       <chr>                               <dbl>
1 Spain        Catalonia   Esquerra R… ERC                                  2010
2 Spain        Catalonia   Convergénc… CiU-PDeCAT                           2010
# ℹ 5 more variables: elec_date_reg <chr>,
#   election_year_national_previous <dbl>, elec_date_nat <chr>, eu <dbl>,
#   dum_eu <dbl>

Finalmente, crea una nueva variable decade a partir de los valores de la variable election_year_national_previous, de manera que observes sus valores por décadas. Es decir, en lugar de 1971 debería verse el valor “1970s”, en lugar de 1981 el valor “1980s”, etc.

euis|>
    mutate(decade = case_when(election_year_national_previous >= 1870 & election_year_national_previous < 1880 ~ "1870s",
        election_year_national_previous >= 1880 & election_year_national_previous < 1890 ~ "1880s",
        election_year_national_previous >= 1890 & election_year_national_previous < 1900 ~ "1890s",
        election_year_national_previous >= 1900 & election_year_national_previous < 1910 ~ "1900s",
        election_year_national_previous >= 1910 & election_year_national_previous < 1920 ~ "1910s",
        election_year_national_previous >= 1920 & election_year_national_previous < 1930 ~ "1920s",
        election_year_national_previous >= 1930 & election_year_national_previous < 1940 ~ "1930s",
        election_year_national_previous >= 1940 & election_year_national_previous < 1950 ~ "1940s",
        election_year_national_previous >= 1950 & election_year_national_previous < 1960 ~ "1950s",
        election_year_national_previous >= 1960 & election_year_national_previous < 1970 ~ "1960s",
        election_year_national_previous >= 1970 & election_year_national_previous < 1980 ~ "1970s",
        election_year_national_previous >= 1980 & election_year_national_previous < 1990 ~ "1980s",
        election_year_national_previous >= 1990 & election_year_national_previous < 2000 ~ "1990s",
        election_year_national_previous >= 2000 & election_year_national_previous < 2010 ~ "2000s"))|>
     select(country_name, decade)|>
     print(n = 10)

# A tibble: 495 × 2
   country_name decade
   <chr>        <chr> 
 1 Belgium      1970s 
 2 Belgium      1970s 
 3 Belgium      1970s 
 4 Belgium      1970s 
 5 Belgium      1980s 
 6 Belgium      1980s 
 7 Belgium      1980s 
 8 Belgium      1990s 
 9 Belgium      1990s 
10 Belgium      2000s 
# ℹ 485 more rows

Actividad 3: The War Of The Five Kings

En esta sección utilizaremos el sistema pipe (%>% o |>) y las funciones del paquete dplyr (filter, select, mutate, arrange). Y lo aplicaremos a la base de datos The War Of The Five Kings, inspirada en la popular serie Game of Thrones.

got <- tibble(read.csv("https://github.com/chrisalbon/war_of_the_five_kings_dataset/raw/master/5kings_battles_v1.csv"))

En primer lugar, vamos a reducir los datos, de manera que solo conservemos los datos que nos interesan. Primero deberás visualizar los datos para hacerte una idea de su contenido (recomendamos con glimpse()). A continuación, crea un nuevo objeto gotr donde conservemos las siguientes columnas: el nombre de la batalla, el año, el rey atacante y defensor, el tamaño del atacante y el defensor, el resultado y el tipo de la batalla, la localización, la región y si era verano.

glimpse(got)

Rows: 38
Columns: 25
$ name               <chr> "Battle of the Golden Tooth", "Battle at the Mummer…
$ year               <int> 298, 298, 298, 298, 298, 298, 298, 299, 299, 299, 2…
$ battle_number      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
$ attacker_king      <chr> "Joffrey/Tommen Baratheon", "Joffrey/Tommen Barathe…
$ defender_king      <chr> "Robb Stark", "Robb Stark", "Robb Stark", "Joffrey/…
$ attacker_1         <chr> "Lannister", "Lannister", "Lannister", "Stark", "St…
$ attacker_2         <chr> "", "", "", "", "Tully", "Tully", "", "", "", "", "…
$ attacker_3         <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ attacker_4         <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ defender_1         <chr> "Tully", "Baratheon", "Tully", "Lannister", "Lannis…
$ defender_2         <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ defender_3         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ defender_4         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ attacker_outcome   <chr> "win", "win", "win", "loss", "win", "win", "win", "…
$ battle_type        <chr> "pitched battle", "ambush", "pitched battle", "pitc…
$ major_death        <int> 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ major_capture      <int> 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, …
$ attacker_size      <int> 15000, NA, 15000, 18000, 1875, 6000, NA, NA, 1000, …
$ defender_size      <int> 4000, 120, 10000, 20000, 6000, 12625, NA, NA, NA, N…
$ attacker_commander <chr> "Jaime Lannister", "Gregor Clegane", "Jaime Lannist…
$ defender_commander <chr> "Clement Piper, Vance", "Beric Dondarrion", "Edmure…
$ summer             <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ location           <chr> "Golden Tooth", "Mummer's Ford", "Riverrun", "Green…
$ region             <chr> "The Westerlands", "The Riverlands", "The Riverland…
$ note               <chr> "", "", "", "", "", "", "", "", "", "Greyjoy's troo…

gotr <- got |> 
    select(name, year, attacker_king, defender_king, attacker_size, defender_size, attacker_outcome, battle_type, location, region, summer)

Filtra los datos de gotr para que se vean solo las batallas donde Stannis Baratheon era el rey atacante.

gotr|>
  filter(attacker_king == "Stannis Baratheon")

# A tibble: 5 × 11
  name              year attacker_king defender_king attacker_size defender_size
  <chr>            <int> <chr>         <chr>                 <int>         <int>
1 Siege of Storm'…   299 Stannis Bara… Renly Barath…          5000         20000
2 Battle of the B…   299 Stannis Bara… Joffrey/Tomm…         21000          7250
3 Battle of Castl…   300 Stannis Bara… Mance Rayder         100000          1240
4 Retaking of Dee…   300 Stannis Bara… Balon/Euron …          4500           200
5 Siege of Winter…   300 Stannis Bara… Joffrey/Tomm…          5000          8000
# ℹ 5 more variables: attacker_outcome <chr>, battle_type <chr>,
#   location <chr>, region <chr>, summer <int>

Filtra los datos para que se vean solo las batallas donde Stannis Baratheon era el rey atacante y la batalla fue en invierno.

gotr|>
  filter(attacker_king == "Stannis Baratheon" & summer == 0)

# A tibble: 3 × 11
  name              year attacker_king defender_king attacker_size defender_size
  <chr>            <int> <chr>         <chr>                 <int>         <int>
1 Battle of Castl…   300 Stannis Bara… Mance Rayder         100000          1240
2 Retaking of Dee…   300 Stannis Bara… Balon/Euron …          4500           200
3 Siege of Winter…   300 Stannis Bara… Joffrey/Tomm…          5000          8000
# ℹ 5 more variables: attacker_outcome <chr>, battle_type <chr>,
#   location <chr>, region <chr>, summer <int>

Copia el código anterior y crea una nueva pipe, donde pongas los datos de tamaño del atacante y tamaño del defensor en miles.

gotr|>
     mutate(attacker_size = attacker_size / 1000, defender_size = defender_size / 1000)|>
     filter(attacker_king == "Stannis Baratheon"&summer == 0)

# A tibble: 3 × 11
  name              year attacker_king defender_king attacker_size defender_size
  <chr>            <int> <chr>         <chr>                 <dbl>         <dbl>
1 Battle of Castl…   300 Stannis Bara… Mance Rayder          100            1.24
2 Retaking of Dee…   300 Stannis Bara… Balon/Euron …           4.5          0.2 
3 Siege of Winter…   300 Stannis Bara… Joffrey/Tomm…           5            8   
# ℹ 5 more variables: attacker_outcome <chr>, battle_type <chr>,
#   location <chr>, region <chr>, summer <int>

Copia el código anterior y crea una nueva pipe, donde ordenes los datos por tamaño del ejército defensor.

gotr|>
    mutate(attacker_size = attacker_size / 1000, defender_size = defender_size / 1000)|>
    filter(attacker_king == "Stannis Baratheon"&summer == 0)|>
    arrange(desc(defender_size))

# A tibble: 3 × 11
  name              year attacker_king defender_king attacker_size defender_size
  <chr>            <int> <chr>         <chr>                 <dbl>         <dbl>
1 Siege of Winter…   300 Stannis Bara… Joffrey/Tomm…           5            8   
2 Battle of Castl…   300 Stannis Bara… Mance Rayder          100            1.24
3 Retaking of Dee…   300 Stannis Bara… Balon/Euron …           4.5          0.2 
# ℹ 5 more variables: attacker_outcome <chr>, battle_type <chr>,
#   location <chr>, region <chr>, summer <int>

Entre las batallas que eran una emboscada, ¿cuáles eran los ejércitos defensores con más tropas? Responde a la pregunta utilizando, en este orden, filter(), select() y arrange(). El marco de datos resultante debería mostrar claramente la respuesta.

unique(gotr$battle_type)

[1] "pitched battle" "ambush"         "siege"          "razing"        
[5] ""

gotr|>
  filter(battle_type == "ambush")|>
  select(defender_king, defender_size)|>
  arrange(desc(defender_size))

# A tibble: 10 × 2
   defender_king            defender_size
   <chr>                            <int>
 1 Joffrey/Tommen Baratheon         12625
 2 Joffrey/Tommen Baratheon         10000
 3 Joffrey/Tommen Baratheon          6000
 4 Robb Stark                        3500
 5 Robb Stark                        2000
 6 Robb Stark                         120
 7 Joffrey/Tommen Baratheon           100
 8 Robb Stark                          NA
 9 Robb Stark                          NA
10 Joffrey/Tommen Baratheon            NA