####Week 4:
####Dataframes:
####What’s a data frame?
You may remember from the chapter about matrices that all the elements that you put in a matrix should be of the same type. Back then, your data set on Star Wars only contained numeric elements.
When doing a market research survey, however, you often have questions such as:
‘Are you married?’ or ‘yes/no’ questions (logical)
‘How old are you?’ (numeric)
‘What is your opinion on this product?’ or other ‘open-ended’ questions (character) …
The output, namely the respondents’ answers to the questions formulated above, is a data set of different data types. You will often find yourself working with data sets that contain different data types instead of only one.
A data frame has the variables of a data set as columns and the observations as rows.
####Exercise 4.1:
Print the data from the built-in example data frame “mtcars”:
# Print out built-in R data frame
mtcars
Quick, have a look at your data set Wow, that is a lot of cars!
Working with large data sets is not uncommon in data analysis. When you work with (extremely) large data sets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire data set.
So how to do this in R? Well, the function head() enables you to show the first observations of a data frame. Similarly, the function tail() prints out the last observations in your data set.
Both head() and tail() print a top line called the ‘header’, which contains the names of the different variables in your data set.
####Exercise 4.2:
Call head() on the mtcars data set to have a look at the header and the first observations.
# Call head() on mtcars
head(mtcars)
####Exercise 4.3:
Have a look at the structure
Another method that is often used to get a rapid overview of your data is the function str(). The function str() shows you the structure of your data set. For a data frame it tells you:
The total number of observations (e.g. 32 car types) The total number of variables (e.g. 11 car features) A full list of the variables names (e.g. mpg, cyl … ) The data type of each variable (e.g. num) The first observations Applying the str() function will often be the first thing that you do when receiving a new data set or data frame. It is a great way to get more insight in your data set before diving into the real analysis.
LS0tDQp0aXRsZTogIkRhdGEgU2NpZW5jZSINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCiMjIyNXZWVrIDQ6IA0KDQojIyMjRGF0YWZyYW1lczogDQoNCg0KDQojIyMjV2hhdCdzIGEgZGF0YSBmcmFtZT8NCg0KWW91IG1heSByZW1lbWJlciBmcm9tIHRoZSBjaGFwdGVyIGFib3V0IG1hdHJpY2VzIHRoYXQgYWxsIHRoZSBlbGVtZW50cyB0aGF0IHlvdSBwdXQgaW4gYSBtYXRyaXggc2hvdWxkIGJlIG9mIHRoZSBzYW1lIHR5cGUuIEJhY2sgdGhlbiwgeW91ciBkYXRhIHNldCBvbiBTdGFyIFdhcnMgb25seSBjb250YWluZWQgbnVtZXJpYyBlbGVtZW50cy4NCg0KV2hlbiBkb2luZyBhIG1hcmtldCByZXNlYXJjaCBzdXJ2ZXksIGhvd2V2ZXIsIHlvdSBvZnRlbiBoYXZlIHF1ZXN0aW9ucyBzdWNoIGFzOg0KDQonQXJlIHlvdSBtYXJyaWVkPycgb3IgJ3llcy9ubycgcXVlc3Rpb25zIChsb2dpY2FsKQ0KDQonSG93IG9sZCBhcmUgeW91PycgKG51bWVyaWMpDQoNCidXaGF0IGlzIHlvdXIgb3BpbmlvbiBvbiB0aGlzIHByb2R1Y3Q/JyBvciBvdGhlciAnb3Blbi1lbmRlZCcgcXVlc3Rpb25zIChjaGFyYWN0ZXIpDQouLi4NCg0KVGhlIG91dHB1dCwgbmFtZWx5IHRoZSByZXNwb25kZW50cycgYW5zd2VycyB0byB0aGUgcXVlc3Rpb25zIGZvcm11bGF0ZWQgYWJvdmUsIGlzIGEgZGF0YSBzZXQgb2YgZGlmZmVyZW50IGRhdGEgdHlwZXMuIFlvdSB3aWxsIG9mdGVuIGZpbmQgeW91cnNlbGYgd29ya2luZyB3aXRoIGRhdGEgc2V0cyB0aGF0IGNvbnRhaW4gZGlmZmVyZW50IGRhdGEgdHlwZXMgaW5zdGVhZCBvZiBvbmx5IG9uZS4NCg0KQSBkYXRhIGZyYW1lIGhhcyB0aGUgdmFyaWFibGVzIG9mIGEgZGF0YSBzZXQgYXMgY29sdW1ucyBhbmQgdGhlIG9ic2VydmF0aW9ucyBhcyByb3dzLg0KDQojIyMjRXhlcmNpc2UgNC4xOg0KDQpQcmludCB0aGUgZGF0YSBmcm9tIHRoZSBidWlsdC1pbiBleGFtcGxlIGRhdGEgZnJhbWUgIm10Y2FycyI6DQoNCmBgYHtyfQ0KIyBQcmludCBvdXQgYnVpbHQtaW4gUiBkYXRhIGZyYW1lDQptdGNhcnMgDQpgYGANCg0KDQoNClF1aWNrLCBoYXZlIGEgbG9vayBhdCB5b3VyIGRhdGEgc2V0DQpXb3csIHRoYXQgaXMgYSBsb3Qgb2YgY2FycyENCg0KV29ya2luZyB3aXRoIGxhcmdlIGRhdGEgc2V0cyBpcyBub3QgdW5jb21tb24gaW4gZGF0YSBhbmFseXNpcy4gV2hlbiB5b3Ugd29yayB3aXRoIChleHRyZW1lbHkpIGxhcmdlIGRhdGEgc2V0cyBhbmQgZGF0YSBmcmFtZXMsIHlvdXIgZmlyc3QgdGFzayBhcyBhIGRhdGEgYW5hbHlzdCBpcyB0byBkZXZlbG9wIGEgY2xlYXIgdW5kZXJzdGFuZGluZyBvZiBpdHMgc3RydWN0dXJlIGFuZCBtYWluIGVsZW1lbnRzLiBUaGVyZWZvcmUsIGl0IGlzIG9mdGVuIHVzZWZ1bCB0byBzaG93IG9ubHkgYSBzbWFsbCBwYXJ0IG9mIHRoZSBlbnRpcmUgZGF0YSBzZXQuDQoNClNvIGhvdyB0byBkbyB0aGlzIGluIFI/IFdlbGwsIHRoZSBmdW5jdGlvbiBoZWFkKCkgZW5hYmxlcyB5b3UgdG8gc2hvdyB0aGUgZmlyc3Qgb2JzZXJ2YXRpb25zIG9mIGEgZGF0YSBmcmFtZS4gU2ltaWxhcmx5LCB0aGUgZnVuY3Rpb24gdGFpbCgpIHByaW50cyBvdXQgdGhlIGxhc3Qgb2JzZXJ2YXRpb25zIGluIHlvdXIgZGF0YSBzZXQuDQoNCkJvdGggaGVhZCgpIGFuZCB0YWlsKCkgcHJpbnQgYSB0b3AgbGluZSBjYWxsZWQgdGhlICdoZWFkZXInLCB3aGljaCBjb250YWlucyB0aGUgbmFtZXMgb2YgdGhlIGRpZmZlcmVudCB2YXJpYWJsZXMgaW4geW91ciBkYXRhIHNldC4NCg0KDQojIyMjRXhlcmNpc2UgNC4yOg0KDQpDYWxsIGhlYWQoKSBvbiB0aGUgbXRjYXJzIGRhdGEgc2V0IHRvIGhhdmUgYSBsb29rIGF0IHRoZSBoZWFkZXIgYW5kIHRoZSBmaXJzdCBvYnNlcnZhdGlvbnMuDQoNCmBgYHtyfQ0KIyBDYWxsIGhlYWQoKSBvbiBtdGNhcnMNCmhlYWQobXRjYXJzKQ0KYGBgDQoNCg0KDQojIyMjRXhlcmNpc2UgNC4zOiANCg0KSGF2ZSBhIGxvb2sgYXQgdGhlIHN0cnVjdHVyZQ0KDQpBbm90aGVyIG1ldGhvZCB0aGF0IGlzIG9mdGVuIHVzZWQgdG8gZ2V0IGEgcmFwaWQgb3ZlcnZpZXcgb2YgeW91ciBkYXRhIGlzIHRoZSBmdW5jdGlvbiBzdHIoKS4gVGhlIGZ1bmN0aW9uIHN0cigpIHNob3dzIHlvdSB0aGUgc3RydWN0dXJlIG9mIHlvdXIgZGF0YSBzZXQuIEZvciBhIGRhdGEgZnJhbWUgaXQgdGVsbHMgeW91Og0KDQpUaGUgdG90YWwgbnVtYmVyIG9mIG9ic2VydmF0aW9ucyAoZS5nLiAzMiBjYXIgdHlwZXMpDQpUaGUgdG90YWwgbnVtYmVyIG9mIHZhcmlhYmxlcyAoZS5nLiAxMSBjYXIgZmVhdHVyZXMpDQpBIGZ1bGwgbGlzdCBvZiB0aGUgdmFyaWFibGVzIG5hbWVzIChlLmcuIG1wZywgY3lsIC4uLiApDQpUaGUgZGF0YSB0eXBlIG9mIGVhY2ggdmFyaWFibGUgKGUuZy4gbnVtKQ0KVGhlIGZpcnN0IG9ic2VydmF0aW9ucw0KQXBwbHlpbmcgdGhlIHN0cigpIGZ1bmN0aW9uIHdpbGwgb2Z0ZW4gYmUgdGhlIGZpcnN0IHRoaW5nIHRoYXQgeW91IGRvIHdoZW4gcmVjZWl2aW5nIGEgbmV3IGRhdGEgc2V0IG9yIGRhdGEgZnJhbWUuIEl0IGlzIGEgZ3JlYXQgd2F5IHRvIGdldCBtb3JlIGluc2lnaHQgaW4geW91ciBkYXRhIHNldCBiZWZvcmUgZGl2aW5nIGludG8gdGhlIHJlYWwgYW5hbHlzaXMuDQoNCg0K