Data Formats
Content you should have understood before watching this video:
- Number 2, ‘Variables’
- Number 10, ‘Subsetting Data’
Wide vs. long format
Data Formats
- The skill to format data correctly is at the very origin of any good data analysis!
- Understanding the difference between the wide and the long format is best achieved using examples
In theory:
- In the wide format, you can have a single continuous variable over several columns, usually one per grouping variable
- In the long format, every variable occupies a single column
Wide vs. long format
Data Formats
The ‘$’ symbol is to access variables contained inside a data frame, here we extract the variable ‘Sepal Length’ from ‘iris’
Wide vs. long format - an example
Data Formats
remove pic and draw!
The most important in a nutshell
Data Formats
- The actual data collection is often more practical to do in the wide format
- As soon as you’re done though, bring your data set into the long format
- For smaller data sets, do it in excel, for larger ones use R, there are functions and whole packages that can help with this!