This lesson focuses on four functions that simplify some common data preprocessing tasks, mutate(), rename(), relocate(), and distinct(). There are many other functions in the dplyr package for wrangling data. You should spend some time reviewing them when you want to perform a specific task.

Preliminaries

Load the dplyr and magrittr packages.

library(dplyr)
library(magrittr)

Make sure that this file and the jan17Items.csv file are in the same folder and that the working directory is set to that folder.

Read in the jan17Items data as j17i.

j17i <- read.csv('jan17Items.csv')

Creating New Variables Using dplyr’s Mutate Function

The mutate function in the dplyr package makes it less verbose than using only base R by reducing the number of times that you have to type out the name of the dataframe.

Notice that the j17i dataframe contains a column for Cost, Price, and Quantity. Let’s assume that cost represents the variable cost for one unit of a particular line item. We can calculate the contribution margin per unit for each line item if we subtract the cost from the price. We can calculate the total contribution margin by multiplying the contribution margin per unit by the quantity. Here’s how you could do that using the mutate function.

j17i_2 <- j17i %>%
  mutate(
    contMarginPerUnit = Price - Cost 
    , contMarginTotal = contMarginPerUnit * Quantity
  )

To do the same thing with base R you would have to type out the name of the dataframe much more, which makes it harder to read in my opinion.

j17i_2base <- j17i
j17i_2base$contMarginPerUnit <- j17i_2base$Price - j17i_2base$Cost
j17i_2base$contMarginTotal <- j17i_2base$contMarginPerUnit * j17i_2base$Quantity

Renaming Columns Using dplyr’s Rename Function

Renaming columns is very easy to do using dplyr’s rename function.

j17i_2 <- j17i_2 %>%
  rename(cmu = contMarginPerUnit, cmt = contMarginTotal)

Rearraranging Column Order Using dplyr’s Relocate Function

Rearranging columns in a meaningful order can make analyses simpler. For instance, calculated columns are always added on to the far right of a dataframe, but it may make sense to place them next to the columns that the new columns are based on especially if you want to do a quick visual inspection. In modeling it’s often helpful if the variable of interest, also known as the dependent variable, is one of the farthest columns to the left. It can also make processes more efficient if you arrange the columns based on data type so that all of the numeric columns are next to each other. Here are a couple of examples of how dplyr’s ?relocate function can make that process simpler.

j17i_2 <- j17i_2 %>%
  relocate(cmu, cmt, .after = Quantity)

# Place Numeric columns before character columns
numFirst <- j17i_2 %>% 
  relocate(where(is.numeric), .before = where(is.character))

Identifying Distinct Values Using dplyr’s Distinct Function

The distinct function returns only the unique values from a column.

Let’s assume that once we’ve calculated the total contribution margin for each line item observation we want a list of all line items for which the contribution margin is greater than $40. However, we do not want to see repeat values of line items, only the distinct values.

Here’s how we would do that:

highCmLineItems <- j17i_2 %>%
  filter(cmt > 40) %>%
  select(LineItem) %>%
  distinct()

highCmLineItems

Concluding Comments

Notice how easy it is to read the code using the tidyverse grammar. It’s easy enough that you really don’t need comments once you get used to what the functions do.

LS0tDQp0aXRsZTogIlVzaW5nIGRwbHlyJ3MgTXV0YXRlLCBSZW5hbWUsIFJlbG9jYXRlLCBhbmQgRGlzdGluY3QgRnVuY3Rpb25zIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NClRoaXMgbGVzc29uIGZvY3VzZXMgb24gZm91ciBmdW5jdGlvbnMgdGhhdCBzaW1wbGlmeSBzb21lIGNvbW1vbiBkYXRhIHByZXByb2Nlc3NpbmcgdGFza3MsIGBtdXRhdGUoKWAsIGByZW5hbWUoKWAsIGByZWxvY2F0ZSgpYCwgYW5kIGBkaXN0aW5jdCgpYC4gVGhlcmUgYXJlIG1hbnkgb3RoZXIgZnVuY3Rpb25zIGluIHRoZSBkcGx5ciBwYWNrYWdlIGZvciB3cmFuZ2xpbmcgZGF0YS4gWW91IHNob3VsZCBzcGVuZCBzb21lIHRpbWUgcmV2aWV3aW5nIHRoZW0gd2hlbiB5b3Ugd2FudCB0byBwZXJmb3JtIGEgc3BlY2lmaWMgdGFzay4NCg0KIyMgUHJlbGltaW5hcmllcw0KTG9hZCB0aGUgZHBseXIgYW5kIG1hZ3JpdHRyIHBhY2thZ2VzLg0KYGBge3J9DQpsaWJyYXJ5KGRwbHlyKQ0KbGlicmFyeShtYWdyaXR0cikNCmBgYA0KTWFrZSBzdXJlIHRoYXQgdGhpcyBmaWxlIGFuZCB0aGUgamFuMTdJdGVtcy5jc3YgZmlsZSBhcmUgaW4gdGhlIHNhbWUgZm9sZGVyIGFuZCB0aGF0IHRoZSB3b3JraW5nIGRpcmVjdG9yeSBpcyBzZXQgdG8gdGhhdCBmb2xkZXIuDQoNClJlYWQgaW4gdGhlIGphbjE3SXRlbXMgZGF0YSBhcyBqMTdpLg0KYGBge3J9DQpqMTdpIDwtIHJlYWQuY3N2KCdqYW4xN0l0ZW1zLmNzdicpDQpgYGANCg0KIyMjIENyZWF0aW5nIE5ldyBWYXJpYWJsZXMgVXNpbmcgZHBseXIncyBNdXRhdGUgRnVuY3Rpb24NClRoZSBgbXV0YXRlYCBmdW5jdGlvbiBpbiB0aGUgZHBseXIgcGFja2FnZSBtYWtlcyBpdCBsZXNzIHZlcmJvc2UgdGhhbiB1c2luZyBvbmx5IGJhc2UgUiBieSByZWR1Y2luZyB0aGUgbnVtYmVyIG9mIHRpbWVzIHRoYXQgeW91IGhhdmUgdG8gdHlwZSBvdXQgdGhlIG5hbWUgb2YgdGhlIGRhdGFmcmFtZS4gDQoNCk5vdGljZSB0aGF0IHRoZSBqMTdpIGRhdGFmcmFtZSBjb250YWlucyBhIGNvbHVtbiBmb3IgQ29zdCwgUHJpY2UsIGFuZCBRdWFudGl0eS4gTGV0J3MgYXNzdW1lIHRoYXQgY29zdCByZXByZXNlbnRzIHRoZSB2YXJpYWJsZSBjb3N0IGZvciBvbmUgdW5pdCBvZiBhIHBhcnRpY3VsYXIgbGluZSBpdGVtLiBXZSBjYW4gY2FsY3VsYXRlIHRoZSBjb250cmlidXRpb24gbWFyZ2luIHBlciB1bml0IGZvciBlYWNoIGxpbmUgaXRlbSBpZiB3ZSBzdWJ0cmFjdCB0aGUgY29zdCBmcm9tIHRoZSBwcmljZS4gV2UgY2FuIGNhbGN1bGF0ZSB0aGUgdG90YWwgY29udHJpYnV0aW9uIG1hcmdpbiBieSBtdWx0aXBseWluZyB0aGUgY29udHJpYnV0aW9uIG1hcmdpbiBwZXIgdW5pdCBieSB0aGUgcXVhbnRpdHkuIEhlcmUncyBob3cgeW91IGNvdWxkIGRvIHRoYXQgdXNpbmcgdGhlIGBtdXRhdGVgIGZ1bmN0aW9uLg0KDQpgYGB7cn0NCmoxN2lfMiA8LSBqMTdpICU+JQ0KICBtdXRhdGUoDQogICAgY29udE1hcmdpblBlclVuaXQgPSBQcmljZSAtIENvc3QgDQogICAgLCBjb250TWFyZ2luVG90YWwgPSBjb250TWFyZ2luUGVyVW5pdCAqIFF1YW50aXR5DQogICkNCmBgYA0KDQpUbyBkbyB0aGUgc2FtZSB0aGluZyB3aXRoIGJhc2UgUiB5b3Ugd291bGQgaGF2ZSB0byB0eXBlIG91dCB0aGUgbmFtZSBvZiB0aGUgZGF0YWZyYW1lIG11Y2ggbW9yZSwgd2hpY2ggbWFrZXMgaXQgaGFyZGVyIHRvIHJlYWQgaW4gbXkgb3Bpbmlvbi4NCmBgYHtyfQ0KajE3aV8yYmFzZSA8LSBqMTdpDQpqMTdpXzJiYXNlJGNvbnRNYXJnaW5QZXJVbml0IDwtIGoxN2lfMmJhc2UkUHJpY2UgLSBqMTdpXzJiYXNlJENvc3QNCmoxN2lfMmJhc2UkY29udE1hcmdpblRvdGFsIDwtIGoxN2lfMmJhc2UkY29udE1hcmdpblBlclVuaXQgKiBqMTdpXzJiYXNlJFF1YW50aXR5DQpgYGANCg0KIyMjIFJlbmFtaW5nIENvbHVtbnMgVXNpbmcgZHBseXIncyBSZW5hbWUgRnVuY3Rpb24NClJlbmFtaW5nIGNvbHVtbnMgaXMgdmVyeSBlYXN5IHRvIGRvIHVzaW5nIGRwbHlyJ3MgYHJlbmFtZWAgZnVuY3Rpb24uDQoNCmBgYHtyfQ0KajE3aV8yIDwtIGoxN2lfMiAlPiUNCiAgcmVuYW1lKGNtdSA9IGNvbnRNYXJnaW5QZXJVbml0LCBjbXQgPSBjb250TWFyZ2luVG90YWwpDQpgYGANCg0KIyMjIFJlYXJyYXJhbmdpbmcgQ29sdW1uIE9yZGVyIFVzaW5nIGRwbHlyJ3MgUmVsb2NhdGUgRnVuY3Rpb24NClJlYXJyYW5naW5nIGNvbHVtbnMgaW4gYSBtZWFuaW5nZnVsIG9yZGVyIGNhbiBtYWtlIGFuYWx5c2VzIHNpbXBsZXIuIEZvciBpbnN0YW5jZSwgY2FsY3VsYXRlZCBjb2x1bW5zIGFyZSBhbHdheXMgYWRkZWQgb24gdG8gdGhlIGZhciByaWdodCBvZiBhIGRhdGFmcmFtZSwgYnV0IGl0IG1heSBtYWtlIHNlbnNlIHRvIHBsYWNlIHRoZW0gbmV4dCB0byB0aGUgY29sdW1ucyB0aGF0IHRoZSBuZXcgY29sdW1ucyBhcmUgYmFzZWQgb24gZXNwZWNpYWxseSBpZiB5b3Ugd2FudCB0byBkbyBhIHF1aWNrIHZpc3VhbCBpbnNwZWN0aW9uLiBJbiBtb2RlbGluZyBpdCdzIG9mdGVuIGhlbHBmdWwgaWYgdGhlIHZhcmlhYmxlIG9mIGludGVyZXN0LCBhbHNvIGtub3duIGFzIHRoZSBkZXBlbmRlbnQgdmFyaWFibGUsIGlzIG9uZSBvZiB0aGUgZmFydGhlc3QgY29sdW1ucyB0byB0aGUgbGVmdC4gSXQgY2FuIGFsc28gbWFrZSBwcm9jZXNzZXMgbW9yZSBlZmZpY2llbnQgaWYgeW91IGFycmFuZ2UgdGhlIGNvbHVtbnMgYmFzZWQgb24gZGF0YSB0eXBlIHNvIHRoYXQgYWxsIG9mIHRoZSBudW1lcmljIGNvbHVtbnMgYXJlIG5leHQgdG8gZWFjaCBvdGhlci4gSGVyZSBhcmUgYSBjb3VwbGUgb2YgZXhhbXBsZXMgb2YgaG93IGRwbHlyJ3MgYD9yZWxvY2F0ZWAgZnVuY3Rpb24gY2FuIG1ha2UgdGhhdCBwcm9jZXNzIHNpbXBsZXIuDQoNCmBgYHtyfQ0KajE3aV8yIDwtIGoxN2lfMiAlPiUNCiAgcmVsb2NhdGUoY211LCBjbXQsIC5hZnRlciA9IFF1YW50aXR5KQ0KDQojIFBsYWNlIE51bWVyaWMgY29sdW1ucyBiZWZvcmUgY2hhcmFjdGVyIGNvbHVtbnMNCm51bUZpcnN0IDwtIGoxN2lfMiAlPiUgDQogIHJlbG9jYXRlKHdoZXJlKGlzLm51bWVyaWMpLCAuYmVmb3JlID0gd2hlcmUoaXMuY2hhcmFjdGVyKSkNCmBgYA0KDQoNCiMjIyBJZGVudGlmeWluZyBEaXN0aW5jdCBWYWx1ZXMgVXNpbmcgZHBseXIncyBEaXN0aW5jdCBGdW5jdGlvbg0KVGhlIGBkaXN0aW5jdGAgZnVuY3Rpb24gcmV0dXJucyBvbmx5IHRoZSB1bmlxdWUgdmFsdWVzIGZyb20gYSBjb2x1bW4uIA0KDQpMZXQncyBhc3N1bWUgdGhhdCBvbmNlIHdlJ3ZlIGNhbGN1bGF0ZWQgdGhlIHRvdGFsIGNvbnRyaWJ1dGlvbiBtYXJnaW4gZm9yIGVhY2ggbGluZSBpdGVtIG9ic2VydmF0aW9uIHdlIHdhbnQgYSBsaXN0IG9mIGFsbCBsaW5lIGl0ZW1zIGZvciB3aGljaCB0aGUgY29udHJpYnV0aW9uIG1hcmdpbiBpcyBncmVhdGVyIHRoYW4gJDQwLiBIb3dldmVyLCB3ZSBkbyBub3Qgd2FudCB0byBzZWUgcmVwZWF0IHZhbHVlcyBvZiBsaW5lIGl0ZW1zLCBvbmx5IHRoZSBkaXN0aW5jdCB2YWx1ZXMuDQoNCkhlcmUncyBob3cgd2Ugd291bGQgZG8gdGhhdDoNCmBgYHtyfQ0KaGlnaENtTGluZUl0ZW1zIDwtIGoxN2lfMiAlPiUNCiAgZmlsdGVyKGNtdCA+IDQwKSAlPiUNCiAgc2VsZWN0KExpbmVJdGVtKSAlPiUNCiAgZGlzdGluY3QoKQ0KDQpoaWdoQ21MaW5lSXRlbXMNCmBgYA0KDQojIyMgQ29uY2x1ZGluZyBDb21tZW50cw0KTm90aWNlIGhvdyBlYXN5IGl0IGlzIHRvIHJlYWQgdGhlIGNvZGUgdXNpbmcgdGhlIHRpZHl2ZXJzZSBncmFtbWFyLiBJdCdzIGVhc3kgZW5vdWdoIHRoYXQgeW91IHJlYWxseSBkb24ndCBuZWVkIGNvbW1lbnRzIG9uY2UgeW91IGdldCB1c2VkIHRvIHdoYXQgdGhlIGZ1bmN0aW9ucyBkby4=