Stacking and Sorting Data

Often times large datasets are divided into smaller dataframes and stored in separate files. For instance, point of sale data may be stored in such a way that there’s a separate file for each month. Alternatively, subsets of the data are extracted from a large database in smaller sections. In either of these situations, the rows from the different dataframes need to be stacked together to form a single dataframe. You might call this a vertical stack because it makes the dataframe longer.

There are also times when you want to stack columns from one dataframe to another. You might call this a horizontal stack because it makes the dataframe wider.

After stacking dataframes together into a single dataframe, a common task is to sort the data in a meaningful way.

This lesson focuses on three functions from the dplyr package that helps accomplish these tasks. The first two functions, bind_rows and bind_cols, are used for stacking data together.

The third function, arrange, is used for sorting data.

Preliminaries

Load the dplyr, and magrittr packages.

library(dplyr)
library(magrittr)

Make sure that this file and the jan17Items.csv file are in the same folder and that the working directory is set to that folder.

Read in the jan17Items, feb17Items, and mar17Items.csv data.

j17i <- read.csv('jan17Items.csv')
f17i <- read.csv('feb17Items.csv')
m17i <- read.csv('mar17Items.csv')

Stacking Rows of Data Using dplyr’s bind_rows Function

Now that we’ve read in the point-of-sale data for three separate months, let’s stack the rows together using the bind_rows dplyr function. Let’s review a couple of key points from the help documentation for that function by running ?bind_rows in the Console.

Notice that the first argument is ellipses, indicating that you can combine two or more dataframes, lists, or a list of dataframes. It’s nice that we’re not limited to combining two dataframes at a time. Let’s try it out:

allItems <- bind_rows(j17i, f17i, m17i)

We can visually inspect the dataframe and confirm that it has observations from each month, and in the order that we entered them into the function. We can also add up the rows for the three separate dataframes and verify that the sum is equal to the rows of the new allItems dataframe that we created.

A great aspect about this function is that you can have a different number of columns, and the column names don’t have to match up. While this can be a very useful apsect about this function it also means that you have to be very careful because if the column names aren’t spelled exactly the same way then you may unintentionally create new columns.

Stacking Columns of Data Using dplyr’s bind_cols Function

The bind_cols function works similarly to the bind_rows function. You may have noticed that the documentation for the bind_cols function is the same as for the bind_rows function.

To illustrate the bind_cols function, let’s first separate the allItems dataframe into two three separate dataframes of seven columns each.

df1 <- allItems[,1:7]
df2 <- allItems[,8:14]
df3 <- allItems[,15:21]

Now, let’s bind them back together, but in a different order.

allItems2 <- bind_cols(df1, df3, df2)

Comparing the number of rows and columns in the allItems2 dataframe to the allItems dataframe confirms that we have a dataframe that is the same shape. A visual inspection confirms that the order of the columns is different.

What happens if you use the bind_cols to stack columns that have a different number of rows? It will throw an error unless there’s only one row in which case it will fill in every other row with the values from that one row.

It’s important to note that there are easier ways to reorder the columns, without having to separate the columns and then piece them back together again. ## Sorting the Data Using dplyr’s arrange Function A related preprocessing task is ordering the data based on values in a specific column. This is often desirable after joining two or more dataframes together. Let’s assume that once we create the allItems dataframe we want to sort the data by cardholder name, and then by date. Here’s how we can easily accomplish that using the arrange function:

allItems <- allItems %>%
  arrange(CardholderName, Time)

A visual inspection of the data confirms that the data is sorted first by CardholderName and then by Time. If you want to sort in descending order, then you can use the desc function to indicate that’s what you want to do.

One other useful function is the two-way pipe operator, %<>%, which saves you from retyping a dataframe’s name if you’re going to reassign it to the same name.

allItems %<>%
  arrange(desc(CardholderName), Time)

Concluding Comments

In conclusion, these functions to bind and sort datasets are really handy. I find that I use the bind_cols function much less frequently than I use the bind_rows function because I often want to create columns from two totally different dataframes, like point of sale data and weather data. When this is the case, I use a join function to make sure that I’m combining values that should go together. That’s a topic for another lesson, though.

I hope these simple examples illustrate the key aspects of how they work. As you apply them you’ll find that there are optional parameters that you might want to consider using, such as creating a new column to keep track of the id of each dataframe that you stack.

LS0tDQp0aXRsZTogIlN0YWNraW5nIGFuZCBTb3J0aW5nIERhdGEiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KT2Z0ZW4gdGltZXMgbGFyZ2UgZGF0YXNldHMgYXJlIGRpdmlkZWQgaW50byBzbWFsbGVyIGRhdGFmcmFtZXMgYW5kIHN0b3JlZCBpbiBzZXBhcmF0ZSBmaWxlcy4gRm9yIGluc3RhbmNlLCBwb2ludCBvZiBzYWxlIGRhdGEgbWF5IGJlIHN0b3JlZCBpbiBzdWNoIGEgd2F5IHRoYXQgdGhlcmUncyBhIHNlcGFyYXRlIGZpbGUgZm9yIGVhY2ggbW9udGguIEFsdGVybmF0aXZlbHksIHN1YnNldHMgb2YgdGhlIGRhdGEgYXJlIGV4dHJhY3RlZCBmcm9tIGEgbGFyZ2UgZGF0YWJhc2UgaW4gc21hbGxlciBzZWN0aW9ucy4gSW4gZWl0aGVyIG9mIHRoZXNlIHNpdHVhdGlvbnMsIHRoZSByb3dzIGZyb20gdGhlIGRpZmZlcmVudCBkYXRhZnJhbWVzIG5lZWQgdG8gYmUgc3RhY2tlZCB0b2dldGhlciB0byBmb3JtIGEgc2luZ2xlIGRhdGFmcmFtZS4gWW91IG1pZ2h0IGNhbGwgdGhpcyBhIHZlcnRpY2FsIHN0YWNrIGJlY2F1c2UgaXQgbWFrZXMgdGhlIGRhdGFmcmFtZSBsb25nZXIuDQoNClRoZXJlIGFyZSBhbHNvIHRpbWVzIHdoZW4geW91IHdhbnQgdG8gc3RhY2sgY29sdW1ucyBmcm9tIG9uZSBkYXRhZnJhbWUgdG8gYW5vdGhlci4gWW91IG1pZ2h0IGNhbGwgdGhpcyBhIGhvcml6b250YWwgc3RhY2sgYmVjYXVzZSBpdCBtYWtlcyB0aGUgZGF0YWZyYW1lIHdpZGVyLg0KDQpBZnRlciBzdGFja2luZyBkYXRhZnJhbWVzIHRvZ2V0aGVyIGludG8gYSBzaW5nbGUgZGF0YWZyYW1lLCBhIGNvbW1vbiB0YXNrIGlzIHRvIHNvcnQgdGhlIGRhdGEgaW4gYSBtZWFuaW5nZnVsIHdheS4NCg0KVGhpcyBsZXNzb24gZm9jdXNlcyBvbiB0aHJlZSBmdW5jdGlvbnMgZnJvbSB0aGUgZHBseXIgcGFja2FnZSB0aGF0IGhlbHBzIGFjY29tcGxpc2ggdGhlc2UgdGFza3MuIFRoZSBmaXJzdCB0d28gZnVuY3Rpb25zLCBiaW5kX3Jvd3MgYW5kIGJpbmRfY29scywgYXJlIHVzZWQgZm9yIHN0YWNraW5nIGRhdGEgdG9nZXRoZXIuDQoNClRoZSB0aGlyZCBmdW5jdGlvbiwgYXJyYW5nZSwgaXMgdXNlZCBmb3Igc29ydGluZyBkYXRhLg0KDQojIyBQcmVsaW1pbmFyaWVzDQpMb2FkIHRoZSBkcGx5ciwgYW5kIG1hZ3JpdHRyIHBhY2thZ2VzLg0KYGBge3J9DQpsaWJyYXJ5KGRwbHlyKQ0KbGlicmFyeShtYWdyaXR0cikNCmBgYA0KTWFrZSBzdXJlIHRoYXQgdGhpcyBmaWxlIGFuZCB0aGUgamFuMTdJdGVtcy5jc3YgZmlsZSBhcmUgaW4gdGhlIHNhbWUgZm9sZGVyIGFuZCB0aGF0IHRoZSB3b3JraW5nIGRpcmVjdG9yeSBpcyBzZXQgdG8gdGhhdCBmb2xkZXIuDQoNClJlYWQgaW4gdGhlIGphbjE3SXRlbXMsIGZlYjE3SXRlbXMsIGFuZCBtYXIxN0l0ZW1zLmNzdiBkYXRhLg0KYGBge3J9DQpqMTdpIDwtIHJlYWQuY3N2KCdqYW4xN0l0ZW1zLmNzdicpDQpmMTdpIDwtIHJlYWQuY3N2KCdmZWIxN0l0ZW1zLmNzdicpDQptMTdpIDwtIHJlYWQuY3N2KCdtYXIxN0l0ZW1zLmNzdicpDQpgYGANCg0KIyMgU3RhY2tpbmcgUm93cyBvZiBEYXRhIFVzaW5nIGRwbHlyJ3MgYmluZF9yb3dzIEZ1bmN0aW9uDQpOb3cgdGhhdCB3ZSd2ZSByZWFkIGluIHRoZSBwb2ludC1vZi1zYWxlIGRhdGEgZm9yIHRocmVlIHNlcGFyYXRlIG1vbnRocywgbGV0J3Mgc3RhY2sgdGhlIHJvd3MgdG9nZXRoZXIgdXNpbmcgdGhlIGJpbmRfcm93cyBkcGx5ciBmdW5jdGlvbi4gTGV0J3MgcmV2aWV3IGEgY291cGxlIG9mIGtleSBwb2ludHMgZnJvbSB0aGUgaGVscCBkb2N1bWVudGF0aW9uIGZvciB0aGF0IGZ1bmN0aW9uIGJ5IHJ1bm5pbmcgYD9iaW5kX3Jvd3NgIGluIHRoZSBDb25zb2xlLg0KDQpOb3RpY2UgdGhhdCB0aGUgZmlyc3QgYXJndW1lbnQgaXMgZWxsaXBzZXMsIGluZGljYXRpbmcgdGhhdCB5b3UgY2FuIGNvbWJpbmUgdHdvIG9yIG1vcmUgZGF0YWZyYW1lcywgbGlzdHMsIG9yIGEgbGlzdCBvZiBkYXRhZnJhbWVzLiBJdCdzIG5pY2UgdGhhdCB3ZSdyZSBub3QgbGltaXRlZCB0byBjb21iaW5pbmcgdHdvIGRhdGFmcmFtZXMgYXQgYSB0aW1lLiBMZXQncyB0cnkgaXQgb3V0Og0KDQpgYGB7cn0NCmFsbEl0ZW1zIDwtIGJpbmRfcm93cyhqMTdpLCBmMTdpLCBtMTdpKQ0KYGBgDQpXZSBjYW4gdmlzdWFsbHkgaW5zcGVjdCB0aGUgZGF0YWZyYW1lIGFuZCBjb25maXJtIHRoYXQgaXQgaGFzIG9ic2VydmF0aW9ucyBmcm9tIGVhY2ggbW9udGgsIGFuZCBpbiB0aGUgb3JkZXIgdGhhdCB3ZSBlbnRlcmVkIHRoZW0gaW50byB0aGUgZnVuY3Rpb24uIFdlIGNhbiBhbHNvIGFkZCB1cCB0aGUgcm93cyBmb3IgdGhlIHRocmVlIHNlcGFyYXRlIGRhdGFmcmFtZXMgYW5kIHZlcmlmeSB0aGF0IHRoZSBzdW0gaXMgZXF1YWwgdG8gdGhlIHJvd3Mgb2YgdGhlIG5ldyBhbGxJdGVtcyBkYXRhZnJhbWUgdGhhdCB3ZSBjcmVhdGVkLg0KDQpBIGdyZWF0IGFzcGVjdCBhYm91dCB0aGlzIGZ1bmN0aW9uIGlzIHRoYXQgeW91IGNhbiBoYXZlIGEgZGlmZmVyZW50IG51bWJlciBvZiBjb2x1bW5zLCBhbmQgdGhlIGNvbHVtbiBuYW1lcyBkb24ndCBoYXZlIHRvIG1hdGNoIHVwLiBXaGlsZSB0aGlzIGNhbiBiZSBhIHZlcnkgdXNlZnVsIGFwc2VjdCBhYm91dCB0aGlzIGZ1bmN0aW9uIGl0IGFsc28gbWVhbnMgdGhhdCB5b3UgaGF2ZSB0byBiZSB2ZXJ5IGNhcmVmdWwgYmVjYXVzZSBpZiB0aGUgY29sdW1uIG5hbWVzIGFyZW4ndCBzcGVsbGVkIGV4YWN0bHkgdGhlIHNhbWUgd2F5IHRoZW4geW91IG1heSB1bmludGVudGlvbmFsbHkgY3JlYXRlIG5ldyBjb2x1bW5zLg0KDQojIyBTdGFja2luZyBDb2x1bW5zIG9mIERhdGEgVXNpbmcgZHBseXIncyBiaW5kX2NvbHMgRnVuY3Rpb24NClRoZSBiaW5kX2NvbHMgZnVuY3Rpb24gd29ya3Mgc2ltaWxhcmx5IHRvIHRoZSBiaW5kX3Jvd3MgZnVuY3Rpb24uIFlvdSBtYXkgaGF2ZSBub3RpY2VkIHRoYXQgdGhlIGRvY3VtZW50YXRpb24gZm9yIHRoZSBiaW5kX2NvbHMgZnVuY3Rpb24gaXMgdGhlIHNhbWUgYXMgZm9yIHRoZSBiaW5kX3Jvd3MgZnVuY3Rpb24uIA0KDQpUbyBpbGx1c3RyYXRlIHRoZSBiaW5kX2NvbHMgZnVuY3Rpb24sIGxldCdzIGZpcnN0IHNlcGFyYXRlIHRoZSBhbGxJdGVtcyBkYXRhZnJhbWUgaW50byB0d28gdGhyZWUgc2VwYXJhdGUgZGF0YWZyYW1lcyBvZiBzZXZlbiBjb2x1bW5zIGVhY2guDQpgYGB7cn0NCmRmMSA8LSBhbGxJdGVtc1ssMTo3XQ0KZGYyIDwtIGFsbEl0ZW1zWyw4OjE0XQ0KZGYzIDwtIGFsbEl0ZW1zWywxNToyMV0NCmBgYA0KDQpOb3csIGxldCdzIGJpbmQgdGhlbSBiYWNrIHRvZ2V0aGVyLCBidXQgaW4gYSBkaWZmZXJlbnQgb3JkZXIuDQpgYGB7cn0NCmFsbEl0ZW1zMiA8LSBiaW5kX2NvbHMoZGYxLCBkZjMsIGRmMikNCmBgYA0KQ29tcGFyaW5nIHRoZSBudW1iZXIgb2Ygcm93cyBhbmQgY29sdW1ucyBpbiB0aGUgYWxsSXRlbXMyIGRhdGFmcmFtZSB0byB0aGUgYWxsSXRlbXMgZGF0YWZyYW1lIGNvbmZpcm1zIHRoYXQgd2UgaGF2ZSBhIGRhdGFmcmFtZSB0aGF0IGlzIHRoZSBzYW1lIHNoYXBlLiBBIHZpc3VhbCBpbnNwZWN0aW9uIGNvbmZpcm1zIHRoYXQgdGhlIG9yZGVyIG9mIHRoZSBjb2x1bW5zIGlzIGRpZmZlcmVudC4NCg0KV2hhdCBoYXBwZW5zIGlmIHlvdSB1c2UgdGhlIGJpbmRfY29scyB0byBzdGFjayBjb2x1bW5zIHRoYXQgaGF2ZSBhIGRpZmZlcmVudCBudW1iZXIgb2Ygcm93cz8gSXQgd2lsbCB0aHJvdyBhbiBlcnJvciB1bmxlc3MgdGhlcmUncyBvbmx5IG9uZSByb3cgaW4gd2hpY2ggY2FzZSBpdCB3aWxsIGZpbGwgaW4gZXZlcnkgb3RoZXIgcm93IHdpdGggdGhlIHZhbHVlcyBmcm9tIHRoYXQgb25lIHJvdy4NCg0KSXQncyBpbXBvcnRhbnQgdG8gbm90ZSB0aGF0IHRoZXJlIGFyZSBlYXNpZXIgd2F5cyB0byByZW9yZGVyIHRoZSBjb2x1bW5zLCB3aXRob3V0IGhhdmluZyB0byBzZXBhcmF0ZSB0aGUgY29sdW1ucyBhbmQgdGhlbiBwaWVjZSB0aGVtIGJhY2sgdG9nZXRoZXIgYWdhaW4uDQojIyBTb3J0aW5nIHRoZSBEYXRhIFVzaW5nIGRwbHlyJ3MgYXJyYW5nZSBGdW5jdGlvbg0KQSByZWxhdGVkIHByZXByb2Nlc3NpbmcgdGFzayBpcyBvcmRlcmluZyB0aGUgZGF0YSBiYXNlZCBvbiB2YWx1ZXMgaW4gYSBzcGVjaWZpYyBjb2x1bW4uIFRoaXMgaXMgb2Z0ZW4gZGVzaXJhYmxlIGFmdGVyIGpvaW5pbmcgdHdvIG9yIG1vcmUgZGF0YWZyYW1lcyB0b2dldGhlci4gTGV0J3MgYXNzdW1lIHRoYXQgb25jZSB3ZSBjcmVhdGUgdGhlIGFsbEl0ZW1zIGRhdGFmcmFtZSB3ZSB3YW50IHRvIHNvcnQgdGhlIGRhdGEgYnkgY2FyZGhvbGRlciBuYW1lLCBhbmQgdGhlbiBieSBkYXRlLiBIZXJlJ3MgaG93IHdlIGNhbiBlYXNpbHkgYWNjb21wbGlzaCB0aGF0IHVzaW5nIHRoZSBhcnJhbmdlIGZ1bmN0aW9uOg0KYGBge3J9DQphbGxJdGVtcyA8LSBhbGxJdGVtcyAlPiUNCiAgYXJyYW5nZShDYXJkaG9sZGVyTmFtZSwgVGltZSkNCmBgYA0KQSB2aXN1YWwgaW5zcGVjdGlvbiBvZiB0aGUgZGF0YSBjb25maXJtcyB0aGF0IHRoZSBkYXRhIGlzIHNvcnRlZCBmaXJzdCBieSBDYXJkaG9sZGVyTmFtZSBhbmQgdGhlbiBieSBUaW1lLiBJZiB5b3Ugd2FudCB0byBzb3J0IGluIGRlc2NlbmRpbmcgb3JkZXIsIHRoZW4geW91IGNhbiB1c2UgdGhlIGBkZXNjYCBmdW5jdGlvbiB0byBpbmRpY2F0ZSB0aGF0J3Mgd2hhdCB5b3Ugd2FudCB0byBkby4NCg0KT25lIG90aGVyIHVzZWZ1bCBmdW5jdGlvbiBpcyB0aGUgdHdvLXdheSBwaXBlIG9wZXJhdG9yLCBgJTw+JWAsIHdoaWNoIHNhdmVzIHlvdSBmcm9tIHJldHlwaW5nIGEgZGF0YWZyYW1lJ3MgbmFtZSBpZiB5b3UncmUgZ29pbmcgdG8gcmVhc3NpZ24gaXQgdG8gdGhlIHNhbWUgbmFtZS4NCg0KYGBge3J9DQphbGxJdGVtcyAlPD4lDQogIGFycmFuZ2UoZGVzYyhDYXJkaG9sZGVyTmFtZSksIFRpbWUpDQpgYGANCiMjIENvbmNsdWRpbmcgQ29tbWVudHMNCkluIGNvbmNsdXNpb24sIHRoZXNlIGZ1bmN0aW9ucyB0byBiaW5kIGFuZCBzb3J0IGRhdGFzZXRzIGFyZSByZWFsbHkgaGFuZHkuIEkgZmluZCB0aGF0IEkgdXNlIHRoZSBiaW5kX2NvbHMgZnVuY3Rpb24gbXVjaCBsZXNzIGZyZXF1ZW50bHkgdGhhbiBJIHVzZSB0aGUgYmluZF9yb3dzIGZ1bmN0aW9uIGJlY2F1c2UgSSBvZnRlbiB3YW50IHRvIGNyZWF0ZSBjb2x1bW5zIGZyb20gdHdvIHRvdGFsbHkgZGlmZmVyZW50IGRhdGFmcmFtZXMsIGxpa2UgcG9pbnQgb2Ygc2FsZSBkYXRhIGFuZCB3ZWF0aGVyIGRhdGEuIFdoZW4gdGhpcyBpcyB0aGUgY2FzZSwgSSB1c2UgYSBqb2luIGZ1bmN0aW9uIHRvIG1ha2Ugc3VyZSB0aGF0IEknbSBjb21iaW5pbmcgdmFsdWVzIHRoYXQgc2hvdWxkIGdvIHRvZ2V0aGVyLiBUaGF0J3MgYSB0b3BpYyBmb3IgYW5vdGhlciBsZXNzb24sIHRob3VnaC4NCg0KSSBob3BlIHRoZXNlIHNpbXBsZSBleGFtcGxlcyBpbGx1c3RyYXRlIHRoZSBrZXkgYXNwZWN0cyBvZiBob3cgdGhleSB3b3JrLiBBcyB5b3UgYXBwbHkgdGhlbSB5b3UnbGwgZmluZCB0aGF0IHRoZXJlIGFyZSBvcHRpb25hbCBwYXJhbWV0ZXJzIHRoYXQgeW91IG1pZ2h0IHdhbnQgdG8gY29uc2lkZXIgdXNpbmcsIHN1Y2ggYXMgY3JlYXRpbmcgYSBuZXcgY29sdW1uIHRvIGtlZXAgdHJhY2sgb2YgdGhlIGlkIG9mIGVhY2ggZGF0YWZyYW1lIHRoYXQgeW91IHN0YWNrLg0KDQo=