The goal of this tutorial is to order a dataframe by one column in particular. This process is interesting if we want for example to sort products by volume of sales or by profit made.
# First of all we load the data
# For this tutorial we are going to use the iris plant dataset
data(iris)
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# We want to order our dataset from highest Sepal Length to lowest.
# We use the order function like this:
order(iris$Sepal.Length, decreasing = TRUE)
## [1] 132 118 119 123 136 106 131 108 110 126 130 103 51 53 121 140 142
## [18] 77 113 144 66 78 87 109 125 141 145 146 59 76 55 105 111 117
## [35] 148 52 75 112 116 129 133 138 57 73 88 101 104 124 134 137 147
## [52] 69 98 127 149 64 72 74 92 128 135 63 79 84 86 120 139 62
## [69] 71 150 15 68 83 93 102 115 143 16 19 56 80 96 97 100 114
## [86] 65 67 70 89 95 122 34 37 54 81 82 90 91 6 11 17 21
## [103] 32 85 49 28 29 33 60 1 18 20 22 24 40 45 47 99 5
## [120] 8 26 27 36 41 44 50 61 94 2 10 35 38 58 107 12 13
## [137] 25 31 46 3 30 4 7 23 48 42 9 39 43 14
# We obtain the index of the position of the plant in the ordered list
# Now we can obtain the order of the dataset
iris_ordered <- iris[order(iris$Sepal.Length, decreasing = TRUE), ]
head(iris_ordered, 10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 132 7.9 3.8 6.4 2.0 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 126 7.2 3.2 6.0 1.8 virginica
# We can plot the variable to see that all the dataset is properly ordered
plot(iris_ordered$Sepal.Length)
# If we want this to be our new true order we can remove the row names to set them in the new order
rownames(iris_ordered) <- NULL
head(iris_ordered, 10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 7.9 3.8 6.4 2.0 virginica
## 2 7.7 3.8 6.7 2.2 virginica
## 3 7.7 2.6 6.9 2.3 virginica
## 4 7.7 2.8 6.7 2.0 virginica
## 5 7.7 3.0 6.1 2.3 virginica
## 6 7.6 3.0 6.6 2.1 virginica
## 7 7.4 2.8 6.1 1.9 virginica
## 8 7.3 2.9 6.3 1.8 virginica
## 9 7.2 3.6 6.1 2.5 virginica
## 10 7.2 3.2 6.0 1.8 virginica
In this tutorial we have learnt how to order a dataframe by the values of a specific column, then reorder the row names to the new configuration.