This lesson introduces two functions from the tidyr package for
pivoting dataframes between wide and long formats. The tidyr package is
part of the tidyverse, and it has functions for reshaping
dataframes.
The shape of a dataframe refers to the number of rows and columns.
Many plotting functions and dashboard applications work best with long
dataframes that have few columns and many rows. In contrast, many
algorithms and human readable tables work best with wide dataframes that
have few rows and many columns.
Preliminaries
If you haven’t already installed the tidyr package on the machine
that you’re using, then you can do so by running the next code chunk. If
you have already installed it, then you do not need to run it again.
# install.packages('tidyr')
Load the dplyr, magrittr, and lubridate packages.
library(dplyr)
library(magrittr)
library(lubridate)
library(tidyr)
Make sure that this file and the jan17Items.csv file are in the same
folder and that the working directory is set to that folder.
Read in the jan17Items data as j17i.
j17i <- read.csv('jan17Items.csv')
Aggregate the Data at the Day Level
Aggregate the data into a single observation for each day. Include a
computation for the average cost, the maximum price, and the number of
transactions for each day.
daily <- j17i %>%
mutate(
Time = ymd_hms(Time)
, date = round_date(Time, 'day')
) %>%
group_by(date) %>%
summarise(avgCost = mean(Cost, na.rm = T)
, maxPrice = max(Price, na.rm = T)
, transactionQuantity = n_distinct(TransactionNumber)) %>%
ungroup()
Pivot Longer
Pivoting to a longer dataframe means that we’re going to collapse the
names for some columns into a single column, and the
values from those columns into another single column.
This will require duplicating the column(s) that identify the
observation.
dailyLong <- daily %>%
pivot_longer(cols = c(avgCost, maxPrice, transactionQuantity))
# Alternative approaches that also work. See the <tidy-select> link in the documentation
dailyLong <- daily %>%
pivot_longer(cols = avgCost:transactionQuantity)
dailyLong <- daily %>%
pivot_longer(cols = c('avgCost', 'maxPrice', 'transactionQuantity'))
Notice that the shape of this dataframe is 90 rows long and 3 columns
wide rather than 30 columns long and four columns wide.
Let’s specify what we want the names to be for the names_to and
values_to columns.
dailyLong <- daily %>%
pivot_longer(cols = avgCost:transactionQuantity
, names_to = 'metrics'
, values_to = 'vals')
Pivot Wider
Let’s now pivot the long dataframe back to a wide version of the
dataframe using the pivot_wider function. Let’s read the help
documentation by running ?pivot_wider in the Console. You can see that
it’s a parallel version of the pivot_longer function.
dailyWide <- dailyLong %>%
pivot_wider(names_from = metrics, values_from = vals)
A visual inspection confirms that the dailyWide dataframe appears to
be the same as the daily dataframe, showing that we successfully pivoted
back to the wide format.
LS0tDQp0aXRsZTogIlBpdm90aW5nIERhdGFmcmFtZXMgQmV0d2VlbiBXaWRlIGFuZCBMb25nIFNoYXBlcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQpUaGlzIGxlc3NvbiBpbnRyb2R1Y2VzIHR3byBmdW5jdGlvbnMgZnJvbSB0aGUgdGlkeXIgcGFja2FnZSBmb3IgcGl2b3RpbmcgZGF0YWZyYW1lcyBiZXR3ZWVuIHdpZGUgYW5kIGxvbmcgZm9ybWF0cy4gVGhlIHRpZHlyIHBhY2thZ2UgaXMgcGFydCBvZiB0aGUgdGlkeXZlcnNlLCBhbmQgaXQgaGFzIGZ1bmN0aW9ucyBmb3IgcmVzaGFwaW5nIGRhdGFmcmFtZXMuIA0KDQpUaGUgc2hhcGUgb2YgYSBkYXRhZnJhbWUgcmVmZXJzIHRvIHRoZSBudW1iZXIgb2Ygcm93cyBhbmQgY29sdW1ucy4gTWFueSBwbG90dGluZyBmdW5jdGlvbnMgYW5kIGRhc2hib2FyZCBhcHBsaWNhdGlvbnMgd29yayBiZXN0IHdpdGggbG9uZyBkYXRhZnJhbWVzIHRoYXQgaGF2ZSBmZXcgY29sdW1ucyBhbmQgbWFueSByb3dzLiBJbiBjb250cmFzdCwgbWFueSBhbGdvcml0aG1zIGFuZCBodW1hbiByZWFkYWJsZSB0YWJsZXMgd29yayBiZXN0IHdpdGggd2lkZSBkYXRhZnJhbWVzIHRoYXQgaGF2ZSBmZXcgcm93cyBhbmQgbWFueSBjb2x1bW5zLg0KDQojIyBQcmVsaW1pbmFyaWVzDQpJZiB5b3UgaGF2ZW4ndCBhbHJlYWR5IGluc3RhbGxlZCB0aGUgdGlkeXIgcGFja2FnZSBvbiB0aGUgbWFjaGluZSB0aGF0IHlvdSdyZSB1c2luZywgdGhlbiB5b3UgY2FuIGRvIHNvIGJ5IHJ1bm5pbmcgdGhlIG5leHQgY29kZSBjaHVuay4gSWYgeW91IGhhdmUgYWxyZWFkeSBpbnN0YWxsZWQgaXQsIHRoZW4geW91IGRvIG5vdCBuZWVkIHRvIHJ1biBpdCBhZ2Fpbi4NCmBgYHtyfQ0KIyBpbnN0YWxsLnBhY2thZ2VzKCd0aWR5cicpDQpgYGANCg0KTG9hZCB0aGUgZHBseXIsIG1hZ3JpdHRyLCBhbmQgbHVicmlkYXRlIHBhY2thZ2VzLg0KYGBge3J9DQpsaWJyYXJ5KGRwbHlyKQ0KbGlicmFyeShtYWdyaXR0cikNCmxpYnJhcnkobHVicmlkYXRlKQ0KbGlicmFyeSh0aWR5cikNCmBgYA0KTWFrZSBzdXJlIHRoYXQgdGhpcyBmaWxlIGFuZCB0aGUgamFuMTdJdGVtcy5jc3YgZmlsZSBhcmUgaW4gdGhlIHNhbWUgZm9sZGVyIGFuZCB0aGF0IHRoZSB3b3JraW5nIGRpcmVjdG9yeSBpcyBzZXQgdG8gdGhhdCBmb2xkZXIuDQoNClJlYWQgaW4gdGhlIGphbjE3SXRlbXMgZGF0YSBhcyBqMTdpLg0KYGBge3J9DQpqMTdpIDwtIHJlYWQuY3N2KCdqYW4xN0l0ZW1zLmNzdicpDQpgYGANCg0KIyMgQWdncmVnYXRlIHRoZSBEYXRhIGF0IHRoZSBEYXkgTGV2ZWwNCkFnZ3JlZ2F0ZSB0aGUgZGF0YSBpbnRvIGEgc2luZ2xlIG9ic2VydmF0aW9uIGZvciBlYWNoIGRheS4gSW5jbHVkZSBhIGNvbXB1dGF0aW9uIGZvciB0aGUgYXZlcmFnZSBjb3N0LCB0aGUgbWF4aW11bSBwcmljZSwgYW5kIHRoZSBudW1iZXIgb2YgdHJhbnNhY3Rpb25zIGZvciBlYWNoIGRheS4NCg0KYGBge3J9DQpkYWlseSA8LSBqMTdpICU+JQ0KICBtdXRhdGUoDQogICAgVGltZSA9IHltZF9obXMoVGltZSkNCiAgICAsIGRhdGUgPSByb3VuZF9kYXRlKFRpbWUsICdkYXknKQ0KICApICU+JQ0KICBncm91cF9ieShkYXRlKSAlPiUNCiAgc3VtbWFyaXNlKGF2Z0Nvc3QgPSBtZWFuKENvc3QsIG5hLnJtID0gVCkNCiAgICAgICAgICAgICwgbWF4UHJpY2UgPSBtYXgoUHJpY2UsIG5hLnJtID0gVCkNCiAgICAgICAgICAgICwgdHJhbnNhY3Rpb25RdWFudGl0eSA9IG5fZGlzdGluY3QoVHJhbnNhY3Rpb25OdW1iZXIpKSAlPiUNCiAgdW5ncm91cCgpDQpgYGANCg0KIyMgUGl2b3QgTG9uZ2VyDQpQaXZvdGluZyB0byBhIGxvbmdlciBkYXRhZnJhbWUgbWVhbnMgdGhhdCB3ZSdyZSBnb2luZyB0byBjb2xsYXBzZSB0aGUgKipuYW1lcyoqIGZvciBzb21lIGNvbHVtbnMgaW50byBhIHNpbmdsZSBjb2x1bW4sIGFuZCB0aGUgKip2YWx1ZXMqKiBmcm9tIHRob3NlIGNvbHVtbnMgaW50byBhbm90aGVyIHNpbmdsZSBjb2x1bW4uIFRoaXMgd2lsbCByZXF1aXJlIGR1cGxpY2F0aW5nIHRoZSBjb2x1bW4ocykgdGhhdCBpZGVudGlmeSB0aGUgb2JzZXJ2YXRpb24uDQoNCmBgYHtyfQ0KZGFpbHlMb25nIDwtIGRhaWx5ICU+JQ0KICBwaXZvdF9sb25nZXIoY29scyA9IGMoYXZnQ29zdCwgbWF4UHJpY2UsIHRyYW5zYWN0aW9uUXVhbnRpdHkpKQ0KDQojIEFsdGVybmF0aXZlIGFwcHJvYWNoZXMgdGhhdCBhbHNvIHdvcmsuIFNlZSB0aGUgPHRpZHktc2VsZWN0PiBsaW5rIGluIHRoZSBkb2N1bWVudGF0aW9uDQpkYWlseUxvbmcgPC0gZGFpbHkgJT4lDQogIHBpdm90X2xvbmdlcihjb2xzID0gYXZnQ29zdDp0cmFuc2FjdGlvblF1YW50aXR5KQ0KZGFpbHlMb25nIDwtIGRhaWx5ICU+JQ0KICBwaXZvdF9sb25nZXIoY29scyA9IGMoJ2F2Z0Nvc3QnLCAnbWF4UHJpY2UnLCAndHJhbnNhY3Rpb25RdWFudGl0eScpKQ0KYGBgDQpOb3RpY2UgdGhhdCB0aGUgc2hhcGUgb2YgdGhpcyBkYXRhZnJhbWUgaXMgOTAgcm93cyBsb25nIGFuZCAzIGNvbHVtbnMgd2lkZSByYXRoZXIgdGhhbiAzMCBjb2x1bW5zIGxvbmcgYW5kIGZvdXIgY29sdW1ucyB3aWRlLg0KDQpMZXQncyBzcGVjaWZ5IHdoYXQgd2Ugd2FudCB0aGUgbmFtZXMgdG8gYmUgZm9yIHRoZSBuYW1lc190byBhbmQgdmFsdWVzX3RvIGNvbHVtbnMuDQoNCmBgYHtyfQ0KZGFpbHlMb25nIDwtIGRhaWx5ICU+JQ0KICBwaXZvdF9sb25nZXIoY29scyA9IGF2Z0Nvc3Q6dHJhbnNhY3Rpb25RdWFudGl0eQ0KICAgICAgICAgICAgICAgLCBuYW1lc190byA9ICdtZXRyaWNzJw0KICAgICAgICAgICAgICAgLCB2YWx1ZXNfdG8gPSAndmFscycpDQpgYGANCg0KIyMgUGl2b3QgV2lkZXINCkxldCdzIG5vdyBwaXZvdCB0aGUgbG9uZyBkYXRhZnJhbWUgYmFjayB0byBhIHdpZGUgdmVyc2lvbiBvZiB0aGUgZGF0YWZyYW1lIHVzaW5nIHRoZSBwaXZvdF93aWRlciBmdW5jdGlvbi4gTGV0J3MgcmVhZCB0aGUgaGVscCBkb2N1bWVudGF0aW9uIGJ5IHJ1bm5pbmcgP3Bpdm90X3dpZGVyIGluIHRoZSBDb25zb2xlLiBZb3UgY2FuIHNlZSB0aGF0IGl0J3MgYSBwYXJhbGxlbCB2ZXJzaW9uIG9mIHRoZSBwaXZvdF9sb25nZXIgZnVuY3Rpb24uDQoNCmBgYHtyfQ0KZGFpbHlXaWRlIDwtIGRhaWx5TG9uZyAlPiUNCiAgcGl2b3Rfd2lkZXIobmFtZXNfZnJvbSA9IG1ldHJpY3MsIHZhbHVlc19mcm9tID0gdmFscykNCmBgYA0KQSB2aXN1YWwgaW5zcGVjdGlvbiBjb25maXJtcyB0aGF0IHRoZSBkYWlseVdpZGUgZGF0YWZyYW1lIGFwcGVhcnMgdG8gYmUgdGhlIHNhbWUgYXMgdGhlIGRhaWx5IGRhdGFmcmFtZSwgc2hvd2luZyB0aGF0IHdlIHN1Y2Nlc3NmdWxseSBwaXZvdGVkIGJhY2sgdG8gdGhlIHdpZGUgZm9ybWF0Lg0KDQoNCiMjIENvbmNsdWRpbmcgQ29tbWVudHMNCkl0J3MgbmljZSB0aGF0IHRoZSBwaXZvdF93aWRlciBhbmQgcGl2b3RfbG9uZ2VyIGZ1bmN0aW9ucyBhcmUgcGFyYWxsZWwgaW4gdGhlaXIgdXNhZ2UuIFRoZXNlIGZ1bmN0aW9ucyBoYXZlIG1hbnkgb3RoZXIgb3B0aW9uYWwgcGFyYW1ldGVycyB0aGF0IHlvdSBjYW4gZXhwbG9yZSwgYnV0IG1hbnkgb2YgdGhvc2Ugb3B0aW9ucyBjYW4gYmUgcGVyZm9ybWVkIGFmdGVyIHRoZSBkYXRhZnJhbWUgaGFzIGJlZW4gcmVzaGFwZWQgdXNpbmcgdGhlIG11dGF0ZSBmdW5jdGlvbi4NCg0KSGF2aW5nIGEgbGl0dGxlIGJpdCBvZiBoaXN0b3J5IHVzaW5nIHRoZSB0aWR5dmVyc2UsIEkgcmVhbGx5IGxpa2UgaG93IGVhc3kgaXQgaGFzIGJlY29tZSB0byB1c2UgdGhlIHBpdm90X3dpZGVyIGFuZCBwaXZvdF9sb25nZXIgZnVuY3Rpb25zLiBCZWNhdXNlIHRoZXkncmUgc28gZWFzeSB0byB1c2UgYW5kIGJlY2F1c2UgcGl2b3RpbmcgdGhlIHNoYXBlIG9mIHRoZSBkYXRhZnJhbWUgaXMgZG9uZSBzbyBmcmVxdWVudGx5LCBJIGNvbnNpZGVyIHRoZW0gZXNzZW50aWFsIHRvIGtub3cgaG93IHRvIHVzZSBpZiB5b3UncmUgZ29pbmcgdG8gZG8gYW55IGtpbmQgb2YgZGF0YSB3cmFuZ2xpbmcgaW4gUi4=