Taller 1 - R estadístico

Importación de datos

library(readr)

## Warning: package 'readr' was built under R version 4.3.2

Pokemon <- read_delim("C:/Users/almun/OneDrive - Universidad Nacional de Colombia/Escritorio/UIFCE/UIFCE 2024/R estadistico/Archivos/Pokemon.csv", 
    delim = ";", escape_double = FALSE, trim_ws = TRUE)

## Rows: 800 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (3): Name, Type1, Type2
## dbl (9): #, Total, HP, Attack, Defense, Sp.Atk, Sp.Def, Speed, Generation
## lgl (1): Legendary
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

1 - Diga qué clase de datos tienen las variables: Name, Attack y Legendary.

attach(Pokemon)

variables = c('Name','Attack', 'Lefendary')

n = class(Name)
a = class(Attack)
l = class(Legendary)


data.frame('Variable' = variables, clase = c(n,a,l))

##    Variable     clase
## 1      Name character
## 2    Attack   numeric
## 3 Lefendary   logical

2 - Con respecto a la variable “Type1”, ¿cuántos pokemones hay en cada categoría?

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

print( Pokemon %>% group_by(Type1) %>%
  summarise('Número de Pokemons ' = n() ))

## # A tibble: 18 × 2
##    Type1    `Número de Pokemons `
##    <chr>                    <int>
##  1 Bug                         69
##  2 Dark                        31
##  3 Dragon                      32
##  4 Electric                    44
##  5 Fairy                       17
##  6 Fighting                    27
##  7 Fire                        52
##  8 Flying                       4
##  9 Ghost                       32
## 10 Grass                       70
## 11 Ground                      32
## 12 Ice                         24
## 13 Normal                      98
## 14 Poison                      28
## 15 Psychic                     57
## 16 Rock                        44
## 17 Steel                       27
## 18 Water                      112

3 - Calcule el promedio, mediana y desviación estándar de las variables: Attack, Defense, HP.

attach(Pokemon)

## The following objects are masked from Pokemon (pos = 4):
## 
##     #, Attack, Defense, Generation, HP, Legendary, Name, Sp.Atk,
##     Sp.Def, Speed, Total, Type1, Type2

variables = c('Attack', 'Defense', 'HP')
promedio = c(mean(Attack),mean(Defense), mean(HP))
mediana = c(median(Attack),median(Defense), median(HP))
desviacion = c(sd(Attack), sd(Defense), sd(HP))

data.frame(variables, promedio, mediana, desviacion)

##   variables promedio mediana desviacion
## 1    Attack 79.00125      75   32.45737
## 2   Defense 73.84250      70   31.18350
## 3        HP 69.25875      65   25.53467

A través de los datos podemos observar un nivel alto de dispersión en estos, con lo cual se puede inferir una variedad en cuanto a las caracteristicas de los Pokemon, esto es

attach(Pokemon)

## The following objects are masked from Pokemon (pos = 3):
## 
##     #, Attack, Defense, Generation, HP, Legendary, Name, Sp.Atk,
##     Sp.Def, Speed, Total, Type1, Type2

## The following objects are masked from Pokemon (pos = 5):
## 
##     #, Attack, Defense, Generation, HP, Legendary, Name, Sp.Atk,
##     Sp.Def, Speed, Total, Type1, Type2

boxplot(Attack, Defense, HP, names = variables)

4 - Obtenga la media de las variables Attack y Defense, agrupando por la variable “Type1”. De igual manera, obtenga el número de pokemons que hay en cada categoría “Type1”, junto con alguna otra medida de dispersión que considere conveniente.

library(dplyr)


datos = Pokemon %>% group_by(Type1) %>%
    summarise('Media - Attack' = mean(Attack), 
    'Media - Defense' = mean(Defense),
    'Media - HP' = mean(HP),
    'Rango - Attack' = paste(range(Attack), collapse = '-'),
    'Rango - Defense' = paste(range(Defense), collapse = '-'),
    'Rango - HP' = paste(range(HP), collapse = '-'),
    'Número de Pokemons'  = n() )

print(datos)

## # A tibble: 18 × 8
##    Type1    `Media - Attack` `Media - Defense` `Media - HP` `Rango - Attack`
##    <chr>               <dbl>             <dbl>        <dbl> <chr>           
##  1 Bug                  71.0              70.7         56.9 10-185          
##  2 Dark                 88.4              70.2         66.8 50-150          
##  3 Dragon              112.               86.4         83.3 50-180          
##  4 Electric             69.1              66.3         59.8 30-123          
##  5 Fairy                61.5              65.7         74.1 20-131          
##  6 Fighting             96.8              65.9         69.9 35-145          
##  7 Fire                 84.8              67.8         69.9 30-160          
##  8 Flying               78.8              66.2         70.8 30-115          
##  9 Ghost                73.8              81.2         64.4 30-165          
## 10 Grass                73.2              70.8         67.3 27-132          
## 11 Ground               95.8              84.8         73.8 40-180          
## 12 Ice                  72.8              71.4         72   30-130          
## 13 Normal               73.5              59.8         77.3 5-160           
## 14 Poison               74.7              68.8         67.2 43-106          
## 15 Psychic              71.5              67.7         70.6 20-190          
## 16 Rock                 92.9             101.          65.4 40-165          
## 17 Steel                92.7             126.          65.2 24-150          
## 18 Water                74.2              72.9         72.1 10-155          
## # ℹ 3 more variables: `Rango - Defense` <chr>, `Rango - HP` <chr>,
## #   `Número de Pokemons` <int>

Podemos analizar el comportamiento de los datos y el grupo de Pokemon con la media más alta respecto a las variables Attack, Defense y HP (sin tomar en cuenta los ouliers)

attach(datos)

## The following object is masked from Pokemon (pos = 3):
## 
##     Type1

## The following object is masked from Pokemon (pos = 4):
## 
##     Type1

## The following object is masked from Pokemon (pos = 6):
## 
##     Type1

max_attack = paste(datos[which.max(`Media - Attack`),'Type1'],
                   max(`Media - Attack`),collapse = '-')
max_defense = paste(datos[which.max(`Media - Defense`),'Type1'],
                    max(`Media - Defense`), collapse = '-' )
max_hp = paste(datos[which.max(`Media - HP`),'Type1'],
                    max(`Media - HP`), collapse = '-')

max_df = data.frame(max_attack, max_defense, max_hp)
colnames(max_df) <- c("Attack", "Defense", "HP")

print(max_df)

##           Attack               Defense             HP
## 1 Dragon 112.125 Steel 126.37037037037 Dragon 83.3125

5 - Obtenga la media de “Total” para los pokemones que son legendarios. Compare con la media de “Total” para los pokemones que NO son legendarios. ¿Qué grupo tiene mayor variación?

library(dplyr)

legendary = Pokemon %>% select(Total) %>% filter(Legendary == TRUE)

not_legendary = Pokemon %>% select(Total) %>% filter(Legendary == FALSE)

`Media - Legendarios` = mean(legendary$Total)

`Media - No legendarios` = mean(not_legendary$Total)

`Desviacion E. - Legendarios` = sd(legendary$Total)

`Desviacion E. - No legendarios` = sd(not_legendary$Total)

data.frame(`Media - Legendarios`,`Media - No legendarios`,
           `Desviacion E. - Legendarios`,`Desviacion E. - No legendarios` )

##   Media...Legendarios Media...No.legendarios Desviacion.E....Legendarios
## 1            637.3846               417.2136                    60.93739
##   Desviacion.E....No.legendarios
## 1                       106.7604

Mediante esta comparación podemos notar que la media del total de aquellos pokemones legendarios es mucho más alta de los no legendarios, pero la desviación estandar de los no legendarios es más alta que la de los legendarios, lo que sugiere mayor dispersión en los datos, que podría darse devido al número de registros

paste('Número de registros Legendarios:', nrow(legendary))

## [1] "Número de registros Legendarios: 65"

paste('Número de registros No Legendarios:', nrow(not_legendary))

## [1] "Número de registros No Legendarios: 735"

Taller 1 - R estadístico

Lucio Alfonso Muñoz Adarme

2024-03-01