Manejo de datos en R

Tipos de datos

Ahora que ya vimos los diferentes objetos en R, procedemos a ver la forma en la que se clasifican según el tipo de información que contienen y cómo se pueden manejar según sus características.

En R existen los datos numericos (numeric), naturales (integers), logicos (logical o factor), y de texto (character).

Normalmente R identifica automáticamente las clases de datos al tener una base de datos. Pero para descubrir el tipo de datos podemos usar la función class() o la función is.CLASE().

num <- 2.3
class(num)

## [1] "numeric"

char <- "hola"
class(char)

## [1] "character"

log <- TRUE
class(log)

## [1] "logical"

is.numeric(char)

## [1] FALSE

is.numeric(num)

## [1] TRUE

Si queremos cambiar la clase de un dato, usamos la función as.CLASE() para “obligar” a R a ver una observación de cierta manera. Por ejemplo, si tenemos 2.2 pero no lo queremos tener como número sino como texto podemos usar:

num <- 2.2
class(num)

## [1] "numeric"

num_char <- as.character(num)
num_char

## [1] "2.2"

class(num)

## [1] "numeric"

Estructuras de datos

Estas observaciones guardadas en los objetos pueden organizarse también de formas distintas. Ya vimos algunas previamente como las listas c() o incluso las matrices, pero la forma más común de organizar los datos es en una base de datos o dataframe.

Las bases de datos están hechas de filas y columnas que pueden contener distintos tipos de datos. Cada fila suele corresponder a una observación y cada columna suele correspondera una variable. Las columnas son vectores con un tipo de dato en específico. Por ejemplo, para crear una base de datos debemos primero crear los vectores con los tipos de datos, y luego organizarlos en la base con la función data.frame().

ej.altura <- c(1.80, 1.60, 1.50, 1.30)
ej.peso <- c(90, 70, 60, 45)
ej.nombres <- c("Juan", "Pedro", "Ana", "Luisa")

ej.base <- data.frame(Altura = ej.altura, Peso = ej.peso, Nombre = ej.nombres)
ej.base

##   Altura Peso Nombre
## 1    1.8   90   Juan
## 2    1.6   70  Pedro
## 3    1.5   60    Ana
## 4    1.3   45  Luisa

Ahora que tenemos nuestra base de datos armada, odemos visualizarla de distintas maneras. Por ejemplo:

La función dim() nos dice el número de filas y de columnas de la base La función str() nos hace un resúmen de su estructura. La función names() muestra el nombre de las variables.

El argumento BASE$VARIABLE nos permite ver y modificar una variable específica de la base de datos.

dim(ej.base)

## [1] 4 3

str(ej.base)

## 'data.frame':    4 obs. of  3 variables:
##  $ Altura: num  1.8 1.6 1.5 1.3
##  $ Peso  : num  90 70 60 45
##  $ Nombre: chr  "Juan" "Pedro" "Ana" "Luisa"

ej.base$Nombre

## [1] "Juan"  "Pedro" "Ana"   "Luisa"

class(ej.base$Nombre)

## [1] "character"

Podemos ver que los nombres aparecen como datos de tipo caracter, si creemos que es mejor que sean factores, o datos lógicos, podemos usar la función que ya conocemos para modificar solo esta columna.

ej.base$Nombre <- as.factor(ej.base$Nombre)
str(ej.base)

## 'data.frame':    4 obs. of  3 variables:
##  $ Altura: num  1.8 1.6 1.5 1.3
##  $ Peso  : num  90 70 60 45
##  $ Nombre: Factor w/ 4 levels "Ana","Juan","Luisa",..: 2 4 1 3

Importar datos

Para comenzar a trabajar con nuestras propias bases de datos que tengamos en excel, stata, SPSS u otro tipo de hoja de cálculo, podemos importarlas a R con una función. No obstante, primero hay que prepararla con un formato específico que R pueda leer. El más común y universal de estos formatos son los documentos “Comma Separated Values” o .csv. Al guardar cualquier hoja de cálculo con este formato la podemos importar a R con las funciones read.table(file = "carpeta/documento.csv"). Podemos añadir argumentos como header=TRUE para indicar que la primera fila contiene el nombre de las variables, sep="_" si nuestras columnas están separadas por un caracter específico o incluso el argumento stringsAsfactors=TRUE si queremos que todos nuestros datos de texto se interpreten como factores o datos de tipo lógico. Otra función más específica para los documentos .csv es read.csv("carpeta/documento.csv"). A continuación abrimos el siguiente documento ejemplo “flower”.

flowers <- read.table(file = "C:\\Users\\tomas\\OneDrive\\Documents\\Curso de R\\flower.txt", header=TRUE, sep = "\t",  stringsAsFactors = TRUE, dec = ",")

flowers <- read.csv2("C:\\Users\\tomas\\OneDrive\\Documents\\Curso de R\\flower.csv", stringsAsFactors = TRUE, header = TRUE)

flowers

##    treat nitrogen block height weight leafarea shootarea flowers
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 80 notip     high     2    5.2  17.70     19.1     181.1       8
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 96 notip      low     2    4.4  10.60     16.2      63.3       6

Para esta base de datos, indicamos que los datos están separados por TAB o \t (por eso el formato es *.txt), indicamos que los nombres de las variables están en la primera fila y que queremos que todos las celdas de texto sean consideradas factores. Adicionalmente, si los puntos decimales están separados por comas (2,45) en lugar de puntos (2.45) debemos especificarlo con el argumento dec=",". La función read.csv2() asume el estándar europeo de separación por comas de los números. Usando esta función no es necesario especificar header=, sep= o dec= Veamos la estructura de la base de datos:

str(flowers)

## 'data.frame':    96 obs. of  8 variables:
##  $ treat    : Factor w/ 2 levels "notip","tip": 2 2 2 2 2 2 2 2 2 2 ...
##  $ nitrogen : Factor w/ 3 levels "high","low","medium": 3 3 3 3 3 3 3 3 3 3 ...
##  $ block    : int  1 1 1 1 1 1 1 1 2 2 ...
##  $ height   : num  7.5 10.7 11.2 10.4 10.4 9.8 6.9 9.4 10.4 12.3 ...
##  $ weight   : num  7.62 12.14 12.76 8.78 13.58 ...
##  $ leafarea : num  11.7 14.1 7.1 11.9 14.5 12.2 13.2 14 10.5 16.1 ...
##  $ shootarea: num  31.9 46 66.7 20.3 26.9 72.7 43.1 28.5 57.8 36.9 ...
##  $ flowers  : int  1 10 10 1 4 9 7 6 5 8 ...

Análisis de bases de datos

Ahora que ya tenemos en nuestro espacio de trabajo la base de datos “flowers” y conocemos su estructura, podemos comenzar a analizar las características de sus variables y sus observaciones. Para acceder a cada variable utilizamos nuevamente el argumento $ con la función summary(). Esta funnción nos da las estadísticas más relevantes para cada tipo de variable. Si queremos obtener una estadística específica podemos usarla como función (mean(), median(), max(), etc.)

summary(flowers$weight)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.790   9.027  11.395  12.155  14.537  23.890

mean(flowers$weight)

## [1] 12.15458

max(flowers$weight)

## [1] 23.89

Así como hicimos previamente con los vectores, también podemos acceder a datos específicos y observaciones específicas usando los símbolos []. Si queremos acceder a una celda específica usamos BASEDEDATOS[fila, columna], las filas y las columnas son índices y la base de datos funciona como el vector.

flowers[2,4]

## [1] 10.7

#También podemos usar
flowers$height[2]

## [1] 10.7

Todos los argumentos lógicos vistos previamente también funcionan para acceder a nuestros datos a partir de la base.

flowers[1:8, 1:5]

##   treat nitrogen block height weight
## 1   tip   medium     1    7.5   7.62
## 2   tip   medium     1   10.7  12.14
## 3   tip   medium     1   11.2  12.76
## 4   tip   medium     1   10.4   8.78
## 5   tip   medium     1   10.4  13.58
## 6   tip   medium     1    9.8  10.08
## 7   tip   medium     1    6.9  10.11
## 8   tip   medium     1    9.4  10.28

flowers[c(1, 5, 10, 15, 20), c(3, 4, 7)]

##    block height shootarea
## 1      1    7.5      31.9
## 5      1   10.4      26.9
## 10     2   12.3      36.9
## 15     2    9.0      90.1
## 20     1    8.5      91.4

# Si queremos que muestre todas las filas o todas las columnas, dejamos el espacio vacío. 

flowers[, c(3, 5)]

##    block weight
## 1      1   7.62
## 2      1  12.14
## 3      1  12.76
## 4      1   8.78
## 5      1  13.58
## 6      1  10.08
## 7      1  10.11
## 8      1  10.28
## 9      2  10.48
## 10     2  13.48
## 11     2  13.18
## 12     2  11.56
## 13     2   8.16
## 14     2  11.22
## 15     2  10.20
## 16     2  12.55
## 17     1  18.66
## 18     1  18.07
## 19     1  13.29
## 20     1  14.33
## 21     1  19.12
## 22     1  15.49
## 23     1  17.82
## 24     1  17.13
## 25     2  23.89
## 26     2  14.77
## 27     2  13.60
## 28     2  16.58
## 29     2  13.26
## 30     2  17.32
## 31     2  14.50
## 32     2  19.20
## 33     1   6.88
## 34     1  10.23
## 35     1   5.97
## 36     1  13.05
## 37     1   6.49
## 38     1  11.27
## 39     1   8.96
## 40     1  11.48
## 41     2  10.89
## 42     2   8.74
## 43     2   8.89
## 44     2   9.39
## 45     2   7.16
## 46     2   8.10
## 47     2   8.72
## 48     2   8.04
## 49     1  11.03
## 50     1   9.29
## 51     1  13.60
## 52     1  12.58
## 53     1  12.93
## 54     1  10.04
## 55     1   6.89
## 56     1  14.85
## 57     2  11.36
## 58     2   9.07
## 59     2  10.18
## 60     2  13.68
## 61     2  11.43
## 62     2  10.47
## 63     2  10.70
## 64     2  12.97
## 65     1  22.53
## 66     1  17.33
## 67     1  11.52
## 68     1  18.24
## 69     1  16.57
## 70     1  17.22
## 71     1  15.21
## 72     1  19.15
## 73     2  13.42
## 74     2  16.82
## 75     2  14.00
## 76     2  18.88
## 77     2  13.68
## 78     2  18.75
## 79     2  14.65
## 80     2  17.70
## 81     1   7.17
## 82     1   7.28
## 83     1   5.79
## 84     1   9.97
## 85     1   8.60
## 86     1   6.01
## 87     1   9.93
## 88     1   7.03
## 89     2   9.10
## 90     2   9.05
## 91     2   8.10
## 92     2   7.45
## 93     2   9.19
## 94     2   8.92
## 95     2   8.44
## 96     2  10.60

# También podemos exluir valores

flowers[-1:-90, -c(5, 6, 7, 8)]

##    treat nitrogen block height
## 91 notip      low     2    3.7
## 92 notip      low     2    3.2
## 93 notip      low     2    3.9
## 94 notip      low     2    3.3
## 95 notip      low     2    5.5
## 96 notip      low     2    4.4

# Si no recordamos el número de la variable, podemos usar su nombre

flowers[1:5, c("height", "weight")]

##   height weight
## 1    7.5   7.62
## 2   10.7  12.14
## 3   11.2  12.76
## 4   10.4   8.78
## 5   10.4  13.58

Todos estos ejemplos son índices posicionales. Es decir, estamos usando la posición de las observaciones y las variables para acceder a ellas. No obstante, también podemos usar índices lógicos usando tests para extraer los datos que nos interesan. Por ejemplo, podemos extraer todos los valores donde la altura de las flores es mayor a 12.

flowers[flowers$height>12, ]

##    treat nitrogen block height weight leafarea shootarea flowers
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 38   tip      low     1   12.3  11.27     13.7      28.7       5

Recordemos que los demás operadores lógicos son: >= Mayor o igual a. == Igual a. != Diferente a.

También podemos combinar estos operadores con las expresiones & y “|”.

flowers[flowers$height>=11 & flowers$nitrogen != "medium", c(2, 3, 5, 7, 8)]

##    nitrogen block weight shootarea flowers
## 17     high     1  18.66      54.0       9
## 21     high     1  19.12     113.2      13
## 25     high     2  23.89     101.5      12
## 32     high     2  19.20      89.9      14
## 38      low     1  11.27      28.7       5
## 47      low     2   8.72      28.3       6

Una función alternativa a usar [] es lal función subset(). La ventaja de esta función es que no hay que especificar cada columna con $ sino solo con el nombre de la columba. No obstante, es menos flexible al usarla.

subset(flowers, height>=11 & nitrogen !="medium", c(2, 3, 5, 7, 8))

##    nitrogen block weight shootarea flowers
## 17     high     1  18.66      54.0       9
## 21     high     1  19.12     113.2      13
## 25     high     2  23.89     101.5      12
## 32     high     2  19.20      89.9      14
## 38      low     1  11.27      28.7       5
## 47      low     2   8.72      28.3       6

### Ordenar bases de datos

Previamente, usamos la función order() para vectores con múltiples objetos. Esto también lo podemos lograr para cualquiera de las columnas de nuestra base de datos.

flowers[order(flowers$height, decreasing = TRUE), ]

##    treat nitrogen block height weight leafarea shootarea flowers
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 80 notip     high     2    5.2  17.70     19.1     181.1       8
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 68 notip     high     1    1.2  18.24     16.6     148.1       7

Si queremos ordenar una base de datos según varias variables, lo podemos hacer con la misma función.

flowers[order(flowers$flowers, flowers$block), ]

##    treat nitrogen block height weight leafarea shootarea flowers
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 80 notip     high     2    5.2  17.70     19.1     181.1       8
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 70 notip     high     1   10.9  17.22     49.2     189.6      17

Las funciones para ordenar se pueden aplicar fácilmente a los datos numéricos, para aplicarla a los datos de texto primero debemos indicar cuál es el orden de los factores. Si no lo hacemos, ordena las observaciones alfabéticamente. Asignamos un orden con la función factor() y el argumento levels=.

flowers$nitrogen <- factor(flowers$nitrogen, levels = c("low", "medium", "high"))

flowers[order(flowers$nitrogen),]

##    treat nitrogen block height weight leafarea shootarea flowers
## 33   tip      low     1    8.0   6.88      9.3      16.1       4
## 34   tip      low     1    8.0  10.23     11.9      88.1       4
## 35   tip      low     1    6.4   5.97      8.7       7.3       2
## 36   tip      low     1    7.6  13.05      7.2      47.2       8
## 37   tip      low     1    9.7   6.49      8.1      18.0       3
## 38   tip      low     1   12.3  11.27     13.7      28.7       5
## 39   tip      low     1    9.1   8.96      9.7      23.8       3
## 40   tip      low     1    8.9  11.48     11.1      39.4       7
## 41   tip      low     2    7.4  10.89     13.3       9.5       5
## 42   tip      low     2    3.1   8.74     16.1      39.1       3
## 43   tip      low     2    7.9   8.89      8.4      34.1       4
## 44   tip      low     2    8.8   9.39      7.1      38.9       4
## 45   tip      low     2    8.5   7.16      8.7      29.9       4
## 46   tip      low     2    5.6   8.10     10.1       5.8       2
## 47   tip      low     2   11.5   8.72     10.2      28.3       6
## 48   tip      low     2    5.8   8.04      5.8      30.7       7
## 81 notip      low     1    3.9   7.17     13.5      52.8       6
## 82 notip      low     1    2.3   7.28     13.8      32.8       6
## 83 notip      low     1    5.2   5.79     11.0      67.4       5
## 84 notip      low     1    2.2   9.97      9.6      63.1       2
## 85 notip      low     1    4.5   8.60      9.4     113.5       7
## 86 notip      low     1    1.8   6.01     17.6      46.2       4
## 87 notip      low     1    3.0   9.93     12.0      56.6       6
## 88 notip      low     1    3.7   7.03      7.9      36.7       5
## 89 notip      low     2    2.4   9.10     14.5      78.7       8
## 90 notip      low     2    5.7   9.05      9.6      63.2       6
## 91 notip      low     2    3.7   8.10     10.5      60.5       6
## 92 notip      low     2    3.2   7.45     14.1      38.1       4
## 93 notip      low     2    3.9   9.19     12.4      52.6       9
## 94 notip      low     2    3.3   8.92     11.6      55.2       6
## 95 notip      low     2    5.5   8.44     13.5      77.6       9
## 96 notip      low     2    4.4  10.60     16.2      63.3       6
## 1    tip   medium     1    7.5   7.62     11.7      31.9       1
## 2    tip   medium     1   10.7  12.14     14.1      46.0      10
## 3    tip   medium     1   11.2  12.76      7.1      66.7      10
## 4    tip   medium     1   10.4   8.78     11.9      20.3       1
## 5    tip   medium     1   10.4  13.58     14.5      26.9       4
## 6    tip   medium     1    9.8  10.08     12.2      72.7       9
## 7    tip   medium     1    6.9  10.11     13.2      43.1       7
## 8    tip   medium     1    9.4  10.28     14.0      28.5       6
## 9    tip   medium     2   10.4  10.48     10.5      57.8       5
## 10   tip   medium     2   12.3  13.48     16.1      36.9       8
## 11   tip   medium     2   10.4  13.18     11.1      56.8      12
## 12   tip   medium     2   11.0  11.56     12.6      31.3       6
## 13   tip   medium     2    7.1   8.16     29.6       9.7       2
## 14   tip   medium     2    6.0  11.22     13.0      16.4       3
## 15   tip   medium     2    9.0  10.20     10.8      90.1       6
## 16   tip   medium     2    4.5  12.55     13.4      14.4       6
## 49 notip   medium     1    5.6  11.03     18.6      49.9       8
## 50 notip   medium     1    5.3   9.29     11.5      82.3       6
## 51 notip   medium     1    7.5  13.60     13.6     122.2      11
## 52 notip   medium     1    4.1  12.58     13.9     136.6      11
## 53 notip   medium     1    3.5  12.93     16.6     109.3       3
## 54 notip   medium     1    8.5  10.04     12.3     113.6       4
## 55 notip   medium     1    4.9   6.89      8.2      52.9       3
## 56 notip   medium     1    2.5  14.85     17.5      77.8      10
## 57 notip   medium     2    5.4  11.36     17.8     104.6      12
## 58 notip   medium     2    3.9   9.07      9.6      90.4       7
## 59 notip   medium     2    5.8  10.18     15.7      88.8       6
## 60 notip   medium     2    4.5  13.68     14.8     125.5       9
## 61 notip   medium     2    8.0  11.43     12.6      43.2      14
## 62 notip   medium     2    1.8  10.47     11.8     120.8       9
## 63 notip   medium     2    2.2  10.70     15.3      97.1       7
## 64 notip   medium     2    3.9  12.97     17.0      97.5       5
## 17   tip     high     1   12.6  18.66     18.6      54.0       9
## 18   tip     high     1   10.0  18.07     16.9      90.5       3
## 19   tip     high     1   10.0  13.29     15.8     142.7      12
## 20   tip     high     1    8.5  14.33     13.2      91.4       5
## 21   tip     high     1   14.1  19.12     13.1     113.2      13
## 22   tip     high     1   10.1  15.49     12.6      77.2      12
## 23   tip     high     1    8.5  17.82     20.5      54.4       3
## 24   tip     high     1    6.5  17.13     24.1     147.4       6
## 25   tip     high     2   11.5  23.89     14.3     101.5      12
## 26   tip     high     2    7.7  14.77     17.2     104.5       4
## 27   tip     high     2    6.4  13.60     13.6     152.6       7
## 28   tip     high     2    8.8  16.58     16.7     100.1       9
## 29   tip     high     2    9.2  13.26     11.3     108.0       9
## 30   tip     high     2    6.2  17.32     11.6      85.9       5
## 31   tip     high     2    6.3  14.50     18.3      55.6       8
## 32   tip     high     2   17.2  19.20     10.9      89.9      14
## 65 notip     high     1    8.5  22.53     20.8     166.9      16
## 66 notip     high     1    8.5  17.33     19.8     184.4      12
## 67 notip     high     1    6.4  11.52     12.1     140.5       7
## 68 notip     high     1    1.2  18.24     16.6     148.1       7
## 69 notip     high     1    2.6  16.57     17.1     141.1       3
## 70 notip     high     1   10.9  17.22     49.2     189.6      17
## 71 notip     high     1    7.2  15.21     15.9     135.0      14
## 72 notip     high     1    2.1  19.15     15.6     176.7       6
## 73 notip     high     2    4.7  13.42     19.8     124.7       5
## 74 notip     high     2    5.0  16.82     17.3     182.5      15
## 75 notip     high     2    6.5  14.00     10.1     126.5       7
## 76 notip     high     2    2.6  18.88     16.4     181.5      14
## 77 notip     high     2    6.0  13.68     16.2     133.7       2
## 78 notip     high     2    9.3  18.75     18.4     181.1      16
## 79 notip     high     2    4.6  14.65     16.7      91.7      11
## 80 notip     high     2    5.2  17.70     19.1     181.1       8

Ahora vemos cómo se ordena desde “low” hasta “high” y no alfabéticamente (“high”, “low”, “medium”).

Datos en R

Tomás Villescas

2023-03-27

Manejo de datos en R

Tipos de datos

Estructuras de datos

Importar datos

Análisis de bases de datos