This lesson focuses on four functions that simplify some common data
preprocessing tasks, mutate(), rename(),
relocate(), and distinct(). There are many
other functions in the dplyr package for wrangling data. You should
spend some time reviewing them when you want to perform a specific
task.
Preliminaries
Load the dplyr and magrittr packages.
library(dplyr)
library(magrittr)
Make sure that this file and the jan17Items.csv file are in the same
folder and that the working directory is set to that folder.
Read in the jan17Items data as j17i.
j17i <- read.csv('jan17Items.csv')
Creating New Variables Using dplyr’s Mutate Function
The mutate function in the dplyr package makes it less
verbose than using only base R by reducing the number of times that you
have to type out the name of the dataframe.
Notice that the j17i dataframe contains a column for Cost, Price, and
Quantity. Let’s assume that cost represents the variable cost for one
unit of a particular line item. We can calculate the contribution margin
per unit for each line item if we subtract the cost from the price. We
can calculate the total contribution margin by multiplying the
contribution margin per unit by the quantity. Here’s how you could do
that using the mutate function.
j17i_2 <- j17i %>%
mutate(
contMarginPerUnit = Price - Cost
, contMarginTotal = contMarginPerUnit * Quantity
)
To do the same thing with base R you would have to type out the name
of the dataframe much more, which makes it harder to read in my
opinion.
j17i_2base <- j17i
j17i_2base$contMarginPerUnit <- j17i_2base$Price - j17i_2base$Cost
j17i_2base$contMarginTotal <- j17i_2base$contMarginPerUnit * j17i_2base$Quantity
Renaming Columns Using dplyr’s Rename Function
Renaming columns is very easy to do using dplyr’s rename
function.
j17i_2 <- j17i_2 %>%
rename(cmu = contMarginPerUnit, cmt = contMarginTotal)
Rearraranging Column Order Using dplyr’s Relocate Function
Rearranging columns in a meaningful order can make analyses simpler.
For instance, calculated columns are always added on to the far right of
a dataframe, but it may make sense to place them next to the columns
that the new columns are based on especially if you want to do a quick
visual inspection. In modeling it’s often helpful if the variable of
interest, also known as the dependent variable, is one of the farthest
columns to the left. It can also make processes more efficient if you
arrange the columns based on data type so that all of the numeric
columns are next to each other. Here are a couple of examples of how
dplyr’s ?relocate function can make that process
simpler.
j17i_2 <- j17i_2 %>%
relocate(cmu, cmt, .after = Quantity)
# Place Numeric columns before character columns
numFirst <- j17i_2 %>%
relocate(where(is.numeric), .before = where(is.character))
Identifying Distinct Values Using dplyr’s Distinct Function
The distinct function returns only the unique values
from a column.
Let’s assume that once we’ve calculated the total contribution margin
for each line item observation we want a list of all line items for
which the contribution margin is greater than $40. However, we do not
want to see repeat values of line items, only the distinct values.
Here’s how we would do that:
highCmLineItems <- j17i_2 %>%
filter(cmt > 40) %>%
select(LineItem) %>%
distinct()
highCmLineItems
LS0tDQp0aXRsZTogIlVzaW5nIGRwbHlyJ3MgTXV0YXRlLCBSZW5hbWUsIFJlbG9jYXRlLCBhbmQgRGlzdGluY3QgRnVuY3Rpb25zIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NClRoaXMgbGVzc29uIGZvY3VzZXMgb24gZm91ciBmdW5jdGlvbnMgdGhhdCBzaW1wbGlmeSBzb21lIGNvbW1vbiBkYXRhIHByZXByb2Nlc3NpbmcgdGFza3MsIGBtdXRhdGUoKWAsIGByZW5hbWUoKWAsIGByZWxvY2F0ZSgpYCwgYW5kIGBkaXN0aW5jdCgpYC4gVGhlcmUgYXJlIG1hbnkgb3RoZXIgZnVuY3Rpb25zIGluIHRoZSBkcGx5ciBwYWNrYWdlIGZvciB3cmFuZ2xpbmcgZGF0YS4gWW91IHNob3VsZCBzcGVuZCBzb21lIHRpbWUgcmV2aWV3aW5nIHRoZW0gd2hlbiB5b3Ugd2FudCB0byBwZXJmb3JtIGEgc3BlY2lmaWMgdGFzay4NCg0KIyMgUHJlbGltaW5hcmllcw0KTG9hZCB0aGUgZHBseXIgYW5kIG1hZ3JpdHRyIHBhY2thZ2VzLg0KYGBge3J9DQpsaWJyYXJ5KGRwbHlyKQ0KbGlicmFyeShtYWdyaXR0cikNCmBgYA0KTWFrZSBzdXJlIHRoYXQgdGhpcyBmaWxlIGFuZCB0aGUgamFuMTdJdGVtcy5jc3YgZmlsZSBhcmUgaW4gdGhlIHNhbWUgZm9sZGVyIGFuZCB0aGF0IHRoZSB3b3JraW5nIGRpcmVjdG9yeSBpcyBzZXQgdG8gdGhhdCBmb2xkZXIuDQoNClJlYWQgaW4gdGhlIGphbjE3SXRlbXMgZGF0YSBhcyBqMTdpLg0KYGBge3J9DQpqMTdpIDwtIHJlYWQuY3N2KCdqYW4xN0l0ZW1zLmNzdicpDQpgYGANCg0KIyMjIENyZWF0aW5nIE5ldyBWYXJpYWJsZXMgVXNpbmcgZHBseXIncyBNdXRhdGUgRnVuY3Rpb24NClRoZSBgbXV0YXRlYCBmdW5jdGlvbiBpbiB0aGUgZHBseXIgcGFja2FnZSBtYWtlcyBpdCBsZXNzIHZlcmJvc2UgdGhhbiB1c2luZyBvbmx5IGJhc2UgUiBieSByZWR1Y2luZyB0aGUgbnVtYmVyIG9mIHRpbWVzIHRoYXQgeW91IGhhdmUgdG8gdHlwZSBvdXQgdGhlIG5hbWUgb2YgdGhlIGRhdGFmcmFtZS4gDQoNCk5vdGljZSB0aGF0IHRoZSBqMTdpIGRhdGFmcmFtZSBjb250YWlucyBhIGNvbHVtbiBmb3IgQ29zdCwgUHJpY2UsIGFuZCBRdWFudGl0eS4gTGV0J3MgYXNzdW1lIHRoYXQgY29zdCByZXByZXNlbnRzIHRoZSB2YXJpYWJsZSBjb3N0IGZvciBvbmUgdW5pdCBvZiBhIHBhcnRpY3VsYXIgbGluZSBpdGVtLiBXZSBjYW4gY2FsY3VsYXRlIHRoZSBjb250cmlidXRpb24gbWFyZ2luIHBlciB1bml0IGZvciBlYWNoIGxpbmUgaXRlbSBpZiB3ZSBzdWJ0cmFjdCB0aGUgY29zdCBmcm9tIHRoZSBwcmljZS4gV2UgY2FuIGNhbGN1bGF0ZSB0aGUgdG90YWwgY29udHJpYnV0aW9uIG1hcmdpbiBieSBtdWx0aXBseWluZyB0aGUgY29udHJpYnV0aW9uIG1hcmdpbiBwZXIgdW5pdCBieSB0aGUgcXVhbnRpdHkuIEhlcmUncyBob3cgeW91IGNvdWxkIGRvIHRoYXQgdXNpbmcgdGhlIGBtdXRhdGVgIGZ1bmN0aW9uLg0KDQpgYGB7cn0NCmoxN2lfMiA8LSBqMTdpICU+JQ0KICBtdXRhdGUoDQogICAgY29udE1hcmdpblBlclVuaXQgPSBQcmljZSAtIENvc3QgDQogICAgLCBjb250TWFyZ2luVG90YWwgPSBjb250TWFyZ2luUGVyVW5pdCAqIFF1YW50aXR5DQogICkNCmBgYA0KDQpUbyBkbyB0aGUgc2FtZSB0aGluZyB3aXRoIGJhc2UgUiB5b3Ugd291bGQgaGF2ZSB0byB0eXBlIG91dCB0aGUgbmFtZSBvZiB0aGUgZGF0YWZyYW1lIG11Y2ggbW9yZSwgd2hpY2ggbWFrZXMgaXQgaGFyZGVyIHRvIHJlYWQgaW4gbXkgb3Bpbmlvbi4NCmBgYHtyfQ0KajE3aV8yYmFzZSA8LSBqMTdpDQpqMTdpXzJiYXNlJGNvbnRNYXJnaW5QZXJVbml0IDwtIGoxN2lfMmJhc2UkUHJpY2UgLSBqMTdpXzJiYXNlJENvc3QNCmoxN2lfMmJhc2UkY29udE1hcmdpblRvdGFsIDwtIGoxN2lfMmJhc2UkY29udE1hcmdpblBlclVuaXQgKiBqMTdpXzJiYXNlJFF1YW50aXR5DQpgYGANCg0KIyMjIFJlbmFtaW5nIENvbHVtbnMgVXNpbmcgZHBseXIncyBSZW5hbWUgRnVuY3Rpb24NClJlbmFtaW5nIGNvbHVtbnMgaXMgdmVyeSBlYXN5IHRvIGRvIHVzaW5nIGRwbHlyJ3MgYHJlbmFtZWAgZnVuY3Rpb24uDQoNCmBgYHtyfQ0KajE3aV8yIDwtIGoxN2lfMiAlPiUNCiAgcmVuYW1lKGNtdSA9IGNvbnRNYXJnaW5QZXJVbml0LCBjbXQgPSBjb250TWFyZ2luVG90YWwpDQpgYGANCg0KIyMjIFJlYXJyYXJhbmdpbmcgQ29sdW1uIE9yZGVyIFVzaW5nIGRwbHlyJ3MgUmVsb2NhdGUgRnVuY3Rpb24NClJlYXJyYW5naW5nIGNvbHVtbnMgaW4gYSBtZWFuaW5nZnVsIG9yZGVyIGNhbiBtYWtlIGFuYWx5c2VzIHNpbXBsZXIuIEZvciBpbnN0YW5jZSwgY2FsY3VsYXRlZCBjb2x1bW5zIGFyZSBhbHdheXMgYWRkZWQgb24gdG8gdGhlIGZhciByaWdodCBvZiBhIGRhdGFmcmFtZSwgYnV0IGl0IG1heSBtYWtlIHNlbnNlIHRvIHBsYWNlIHRoZW0gbmV4dCB0byB0aGUgY29sdW1ucyB0aGF0IHRoZSBuZXcgY29sdW1ucyBhcmUgYmFzZWQgb24gZXNwZWNpYWxseSBpZiB5b3Ugd2FudCB0byBkbyBhIHF1aWNrIHZpc3VhbCBpbnNwZWN0aW9uLiBJbiBtb2RlbGluZyBpdCdzIG9mdGVuIGhlbHBmdWwgaWYgdGhlIHZhcmlhYmxlIG9mIGludGVyZXN0LCBhbHNvIGtub3duIGFzIHRoZSBkZXBlbmRlbnQgdmFyaWFibGUsIGlzIG9uZSBvZiB0aGUgZmFydGhlc3QgY29sdW1ucyB0byB0aGUgbGVmdC4gSXQgY2FuIGFsc28gbWFrZSBwcm9jZXNzZXMgbW9yZSBlZmZpY2llbnQgaWYgeW91IGFycmFuZ2UgdGhlIGNvbHVtbnMgYmFzZWQgb24gZGF0YSB0eXBlIHNvIHRoYXQgYWxsIG9mIHRoZSBudW1lcmljIGNvbHVtbnMgYXJlIG5leHQgdG8gZWFjaCBvdGhlci4gSGVyZSBhcmUgYSBjb3VwbGUgb2YgZXhhbXBsZXMgb2YgaG93IGRwbHlyJ3MgYD9yZWxvY2F0ZWAgZnVuY3Rpb24gY2FuIG1ha2UgdGhhdCBwcm9jZXNzIHNpbXBsZXIuDQoNCmBgYHtyfQ0KajE3aV8yIDwtIGoxN2lfMiAlPiUNCiAgcmVsb2NhdGUoY211LCBjbXQsIC5hZnRlciA9IFF1YW50aXR5KQ0KDQojIFBsYWNlIE51bWVyaWMgY29sdW1ucyBiZWZvcmUgY2hhcmFjdGVyIGNvbHVtbnMNCm51bUZpcnN0IDwtIGoxN2lfMiAlPiUgDQogIHJlbG9jYXRlKHdoZXJlKGlzLm51bWVyaWMpLCAuYmVmb3JlID0gd2hlcmUoaXMuY2hhcmFjdGVyKSkNCmBgYA0KDQoNCiMjIyBJZGVudGlmeWluZyBEaXN0aW5jdCBWYWx1ZXMgVXNpbmcgZHBseXIncyBEaXN0aW5jdCBGdW5jdGlvbg0KVGhlIGBkaXN0aW5jdGAgZnVuY3Rpb24gcmV0dXJucyBvbmx5IHRoZSB1bmlxdWUgdmFsdWVzIGZyb20gYSBjb2x1bW4uIA0KDQpMZXQncyBhc3N1bWUgdGhhdCBvbmNlIHdlJ3ZlIGNhbGN1bGF0ZWQgdGhlIHRvdGFsIGNvbnRyaWJ1dGlvbiBtYXJnaW4gZm9yIGVhY2ggbGluZSBpdGVtIG9ic2VydmF0aW9uIHdlIHdhbnQgYSBsaXN0IG9mIGFsbCBsaW5lIGl0ZW1zIGZvciB3aGljaCB0aGUgY29udHJpYnV0aW9uIG1hcmdpbiBpcyBncmVhdGVyIHRoYW4gJDQwLiBIb3dldmVyLCB3ZSBkbyBub3Qgd2FudCB0byBzZWUgcmVwZWF0IHZhbHVlcyBvZiBsaW5lIGl0ZW1zLCBvbmx5IHRoZSBkaXN0aW5jdCB2YWx1ZXMuDQoNCkhlcmUncyBob3cgd2Ugd291bGQgZG8gdGhhdDoNCmBgYHtyfQ0KaGlnaENtTGluZUl0ZW1zIDwtIGoxN2lfMiAlPiUNCiAgZmlsdGVyKGNtdCA+IDQwKSAlPiUNCiAgc2VsZWN0KExpbmVJdGVtKSAlPiUNCiAgZGlzdGluY3QoKQ0KDQpoaWdoQ21MaW5lSXRlbXMNCmBgYA0KDQojIyMgQ29uY2x1ZGluZyBDb21tZW50cw0KTm90aWNlIGhvdyBlYXN5IGl0IGlzIHRvIHJlYWQgdGhlIGNvZGUgdXNpbmcgdGhlIHRpZHl2ZXJzZSBncmFtbWFyLiBJdCdzIGVhc3kgZW5vdWdoIHRoYXQgeW91IHJlYWxseSBkb24ndCBuZWVkIGNvbW1lbnRzIG9uY2UgeW91IGdldCB1c2VkIHRvIHdoYXQgdGhlIGZ1bmN0aW9ucyBkby4=