1 Goal


The goal of this tutorial is to order a dataframe by one column in particular. This process is interesting if we want for example to sort products by volume of sales or by profit made.


2 Order dataframe


# First of all we load the data
# For this tutorial we are going to use the iris plant dataset
data(iris)
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# We want to order our dataset from highest Sepal Length to lowest.
# We use the order function like this:

order(iris$Sepal.Length, decreasing = TRUE)
##   [1] 132 118 119 123 136 106 131 108 110 126 130 103  51  53 121 140 142
##  [18]  77 113 144  66  78  87 109 125 141 145 146  59  76  55 105 111 117
##  [35] 148  52  75 112 116 129 133 138  57  73  88 101 104 124 134 137 147
##  [52]  69  98 127 149  64  72  74  92 128 135  63  79  84  86 120 139  62
##  [69]  71 150  15  68  83  93 102 115 143  16  19  56  80  96  97 100 114
##  [86]  65  67  70  89  95 122  34  37  54  81  82  90  91   6  11  17  21
## [103]  32  85  49  28  29  33  60   1  18  20  22  24  40  45  47  99   5
## [120]   8  26  27  36  41  44  50  61  94   2  10  35  38  58 107  12  13
## [137]  25  31  46   3  30   4   7  23  48  42   9  39  43  14
# We obtain the index of the position of the plant in the ordered list

# Now we can obtain the order of the dataset 
iris_ordered <- iris[order(iris$Sepal.Length, decreasing = TRUE), ]
head(iris_ordered, 10)
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 132          7.9         3.8          6.4         2.0 virginica
## 118          7.7         3.8          6.7         2.2 virginica
## 119          7.7         2.6          6.9         2.3 virginica
## 123          7.7         2.8          6.7         2.0 virginica
## 136          7.7         3.0          6.1         2.3 virginica
## 106          7.6         3.0          6.6         2.1 virginica
## 131          7.4         2.8          6.1         1.9 virginica
## 108          7.3         2.9          6.3         1.8 virginica
## 110          7.2         3.6          6.1         2.5 virginica
## 126          7.2         3.2          6.0         1.8 virginica
# We can plot the variable to see that all the dataset is properly ordered
plot(iris_ordered$Sepal.Length)

# If we want this to be our new true order we can remove the row names to set them in the new order
rownames(iris_ordered) <- NULL
head(iris_ordered, 10)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 1           7.9         3.8          6.4         2.0 virginica
## 2           7.7         3.8          6.7         2.2 virginica
## 3           7.7         2.6          6.9         2.3 virginica
## 4           7.7         2.8          6.7         2.0 virginica
## 5           7.7         3.0          6.1         2.3 virginica
## 6           7.6         3.0          6.6         2.1 virginica
## 7           7.4         2.8          6.1         1.9 virginica
## 8           7.3         2.9          6.3         1.8 virginica
## 9           7.2         3.6          6.1         2.5 virginica
## 10          7.2         3.2          6.0         1.8 virginica

3 Conclusion


In this tutorial we have learnt how to order a dataframe by the values of a specific column, then reorder the row names to the new configuration.