library("tidyverse")
1. What function would you use to read a file where fields were separated with “|”?
Use the read_delim()
function with the argument delim="|"
.
read_delim(file, delim = "|")
3. What are the most important arguments to read_fwf()
?
The most important argument to read_fwf()
which reads “fixed-width formats”, is col_positions
which tells the function where data columns begin and end.
4. Sometimes strings in a CSV file contain commas. To prevent them from causing problems they need to be surrounded by a quoting character, like "
or '
. By convention, read_csv()
assumes that the quoting character will be "
, and if you want to change it you’ll need to use read_delim()
instead. What arguments do you need to specify to read the following text into a data frame?
"x,y\n1,'a,b'"
For read_delim()
, we will will need to specify a delimiter, in this case ","
, and a quote argument.
x <- "x,y\n1,'a,b'"
read_delim(x, ",", quote = "'")
However, this question is out of date. read_csv()
now supports a quote argument, so the following code works.
read_csv(x, quote = "'")
5. Identify what is wrong with each of the following inline CSV files. What happens when you run the code?
read_csv("a,b\n1,2,3\n4,5,6")
2 parsing failures.
row col expected actual file
1 -- 2 columns 3 columns literal data
2 -- 2 columns 3 columns literal data
Only two columns are specified in the header “a” and “b”, but the rows have three columns, so the last column is dropped.
read_csv("a,b,c\n1,2\n1,2,3,4")
2 parsing failures.
row col expected actual file
1 -- 3 columns 2 columns literal data
2 -- 3 columns 4 columns literal data
The numbers of columns in the data do not match the number of columns in the header (three). In row one, there are only two values, so column c
is set to missing. In row two, there is an extra value, and that value is dropped.
read_csv("a,b\n\"1")
2 parsing failures.
row col expected actual file
1 a closing quote at end of file literal data
1 -- 2 columns 1 columns literal data
It’s not clear what the intent was here. The opening quote "1
is dropped because it is not closed, and a
is treated as an integer.
read_csv("a,b\n1,2\na,b")
Both “a” and “b” are treated as character vectors since they contain non-numeric strings. This may have been intentional, or the author may have intended the values of the columns to be “1,2” and “a,b”.
read_csv("a;b\n1;3")
The values are separated by “;” rather than “,”. Use read_csv2()
instead:
read_csv2("a;b\n1;3")
Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.
LS0tDQp0aXRsZTogInJlYWRyIg0Kb3V0cHV0OiANCiAgaHRtbF9ub3RlYm9vazoNCiAgICB0b2M6IHRydWUNCiAgICB0b2NfZmxvYXQ6IHRydWUNCi0tLQ0KDQpgYGB7cixtZXNzYWdlPUZBTFNFLHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KCJ0aWR5dmVyc2UiKQ0KYGBgDQojIyMgMS4gV2hhdCBmdW5jdGlvbiB3b3VsZCB5b3UgdXNlIHRvIHJlYWQgYSBmaWxlIHdoZXJlIGZpZWxkcyB3ZXJlIHNlcGFyYXRlZCB3aXRoIOKAnHzigJ0/DQoNClVzZSB0aGUgYHJlYWRfZGVsaW0oKWAgZnVuY3Rpb24gd2l0aCB0aGUgYXJndW1lbnQgYGRlbGltPSJ8ImAuDQoNCmBgYHt9DQpyZWFkX2RlbGltKGZpbGUsIGRlbGltID0gInwiKQ0KYGBgDQoNCiMjIyAyLiBBcGFydCBmcm9tIGBmaWxlYCwgYHNraXBgLCBhbmQgYGNvbW1lbnRgLCB3aGF0IG90aGVyIGFyZ3VtZW50cyBkbyBgcmVhZF9jc3YoKWAgYW5kIGByZWFkX3RzdigpYCBoYXZlIGluIGNvbW1vbj8NCg0KVGhleSBoYXZlIHRoZSBmb2xsb3dpbmcgYXJndW1lbnRzIGluIGNvbW1vbjoNCg0KYGBge3J9DQp1bmlvbihuYW1lcyhmb3JtYWxzKHJlYWRfY3N2KSksIG5hbWVzKGZvcm1hbHMocmVhZF90c3YpKSkNCmBgYA0KDQogLSBgY29sX25hbWVzYCBhbmQgYGNvbF90eXBlc2AgYXJlIHVzZWQgdG8gc3BlY2lmeSB0aGUgY29sdW1uIG5hbWVzIGFuZCBob3cgdG8gcGFyc2UgdGhlIGNvbHVtbnMNCiAtIGBsb2NhbGVgIGlzIGltcG9ydGFudCBmb3IgZGV0ZXJtaW5pbmcgdGhpbmdzIGxpa2UgdGhlIGVuY29kaW5nIGFuZCB3aGV0aGVyIOKAnC7igJ0gb3Ig4oCcLOKAnSBpcyB1c2VkIGFzIGEgZGVjaW1hbCBtYXJrLg0KIC0gYG5hYCBhbmQgYHF1b3RlZF9uYWAgY29udHJvbCB3aGljaCBzdHJpbmdzIGFyZSB0cmVhdGVkIGFzIG1pc3NpbmcgdmFsdWVzIHdoZW4gcGFyc2luZyB2ZWN0b3JzDQogLSBgdHJpbV93c2AgdHJpbXMgd2hpdGVzcGFjZSBiZWZvcmUgYW5kIGFmdGVyIGNlbGxzIGJlZm9yZSBwYXJzaW5nDQogLSBgbl9tYXhgIHNldHMgaG93IG1hbnkgcm93cyB0byByZWFkDQogLSBgZ3Vlc3NfbWF4YCBzZXRzIGhvdyBtYW55IHJvd3MgdG8gdXNlIHdoZW4gZ3Vlc3NpbmcgdGhlIGNvbHVtbiB0eXBlDQogLSBgcHJvZ3Jlc3NgIGRldGVybWluZXMgd2hldGhlciBhIHByb2dyZXNzIGJhciBpcyBzaG93bi4NCg0KIyMjIDMuIFdoYXQgYXJlIHRoZSBtb3N0IGltcG9ydGFudCBhcmd1bWVudHMgdG8gYHJlYWRfZndmKClgPw0KDQpUaGUgbW9zdCBpbXBvcnRhbnQgYXJndW1lbnQgdG8gYHJlYWRfZndmKClgIHdoaWNoIHJlYWRzIOKAnGZpeGVkLXdpZHRoIGZvcm1hdHPigJ0sIGlzIGBjb2xfcG9zaXRpb25zYCB3aGljaCB0ZWxscyB0aGUgZnVuY3Rpb24gd2hlcmUgZGF0YSBjb2x1bW5zIGJlZ2luIGFuZCBlbmQuDQoNCiMjIyA0LiBTb21ldGltZXMgc3RyaW5ncyBpbiBhIENTViBmaWxlIGNvbnRhaW4gY29tbWFzLiBUbyBwcmV2ZW50IHRoZW0gZnJvbSBjYXVzaW5nIHByb2JsZW1zIHRoZXkgbmVlZCB0byBiZSBzdXJyb3VuZGVkIGJ5IGEgcXVvdGluZyBjaGFyYWN0ZXIsIGxpa2UgYCJgIG9yIGAnYC4gQnkgY29udmVudGlvbiwgYHJlYWRfY3N2KClgIGFzc3VtZXMgdGhhdCB0aGUgcXVvdGluZyBjaGFyYWN0ZXIgd2lsbCBiZSBgImAsIGFuZCBpZiB5b3Ugd2FudCB0byBjaGFuZ2UgaXQgeW914oCZbGwgbmVlZCB0byB1c2UgYHJlYWRfZGVsaW0oKWAgaW5zdGVhZC4gV2hhdCBhcmd1bWVudHMgZG8geW91IG5lZWQgdG8gc3BlY2lmeSB0byByZWFkIHRoZSBmb2xsb3dpbmcgdGV4dCBpbnRvIGEgZGF0YSBmcmFtZT8NCg0KYGBge30NCiJ4LHlcbjEsJ2EsYiciDQpgYGANCg0KRm9yIGByZWFkX2RlbGltKClgLCB3ZSB3aWxsIHdpbGwgbmVlZCB0byBzcGVjaWZ5IGEgZGVsaW1pdGVyLCBpbiB0aGlzIGNhc2UgYCIsImAsIGFuZCBhIHF1b3RlIGFyZ3VtZW50Lg0KDQpgYGB7cn0NCnggPC0gIngseVxuMSwnYSxiJyINCnJlYWRfZGVsaW0oeCwgIiwiLCBxdW90ZSA9ICInIikNCmBgYA0KDQpIb3dldmVyLCB0aGlzIHF1ZXN0aW9uIGlzIG91dCBvZiBkYXRlLiBgcmVhZF9jc3YoKWAgbm93IHN1cHBvcnRzIGEgcXVvdGUgYXJndW1lbnQsIHNvIHRoZSBmb2xsb3dpbmcgY29kZSB3b3Jrcy4NCg0KYGBge3J9DQpyZWFkX2Nzdih4LCBxdW90ZSA9ICInIikNCmBgYA0KDQojIyMgNS4gSWRlbnRpZnkgd2hhdCBpcyB3cm9uZyB3aXRoIGVhY2ggb2YgdGhlIGZvbGxvd2luZyBpbmxpbmUgQ1NWIGZpbGVzLiBXaGF0IGhhcHBlbnMgd2hlbiB5b3UgcnVuIHRoZSBjb2RlPw0KDQpgYGB7cn0NCnJlYWRfY3N2KCJhLGJcbjEsMiwzXG40LDUsNiIpDQpgYGANCg0KT25seSB0d28gY29sdW1ucyBhcmUgc3BlY2lmaWVkIGluIHRoZSBoZWFkZXIg4oCcYeKAnSBhbmQg4oCcYuKAnSwgYnV0IHRoZSByb3dzIGhhdmUgdGhyZWUgY29sdW1ucywgc28gdGhlIGxhc3QgY29sdW1uIGlzIGRyb3BwZWQuDQoNCmBgYHtyfQ0KcmVhZF9jc3YoImEsYixjXG4xLDJcbjEsMiwzLDQiKQ0KYGBgDQoNClRoZSBudW1iZXJzIG9mIGNvbHVtbnMgaW4gdGhlIGRhdGEgZG8gbm90IG1hdGNoIHRoZSBudW1iZXIgb2YgY29sdW1ucyBpbiB0aGUgaGVhZGVyICh0aHJlZSkuIEluIHJvdyBvbmUsIHRoZXJlIGFyZSBvbmx5IHR3byB2YWx1ZXMsIHNvIGNvbHVtbiBgY2AgaXMgc2V0IHRvIG1pc3NpbmcuIEluIHJvdyB0d28sIHRoZXJlIGlzIGFuIGV4dHJhIHZhbHVlLCBhbmQgdGhhdCB2YWx1ZSBpcyBkcm9wcGVkLg0KDQpgYGB7cn0NCnJlYWRfY3N2KCJhLGJcblwiMSIpDQpgYGANCg0KSXTigJlzIG5vdCBjbGVhciB3aGF0IHRoZSBpbnRlbnQgd2FzIGhlcmUuIFRoZSBvcGVuaW5nIHF1b3RlIGAiMWAgaXMgZHJvcHBlZCBiZWNhdXNlIGl0IGlzIG5vdCBjbG9zZWQsIGFuZCBgYWAgaXMgdHJlYXRlZCBhcyBhbiBpbnRlZ2VyLg0KDQpgYGB7cn0NCnJlYWRfY3N2KCJhLGJcbjEsMlxuYSxiIikNCmBgYA0KDQpCb3RoIOKAnGHigJ0gYW5kIOKAnGLigJ0gYXJlIHRyZWF0ZWQgYXMgY2hhcmFjdGVyIHZlY3RvcnMgc2luY2UgdGhleSBjb250YWluIG5vbi1udW1lcmljIHN0cmluZ3MuIFRoaXMgbWF5IGhhdmUgYmVlbiBpbnRlbnRpb25hbCwgb3IgdGhlIGF1dGhvciBtYXkgaGF2ZSBpbnRlbmRlZCB0aGUgdmFsdWVzIG9mIHRoZSBjb2x1bW5zIHRvIGJlIOKAnDEsMuKAnSBhbmQg4oCcYSxi4oCdLg0KDQpgYGB7cn0NCnJlYWRfY3N2KCJhO2JcbjE7MyIpDQpgYGANCg0KVGhlIHZhbHVlcyBhcmUgc2VwYXJhdGVkIGJ5IOKAnDvigJ0gcmF0aGVyIHRoYW4g4oCcLOKAnS4gVXNlIGByZWFkX2NzdjIoKWAgaW5zdGVhZDoNCg0KYGBge3J9DQpyZWFkX2NzdjIoImE7YlxuMTszIikNCmBgYA==