Department of Environmental Science, AUT

Data Formats: Prerequisites

Data Formats

Content you should have understood before watching this video:

  • Number 2, ‘Variables’
  • Number 10, ‘Subsetting Data’

Wide vs. long format

Data Formats
  • The skill to format data correctly is at the very origin of any good data analysis!
  • Understanding the difference between the wide and the long format is best achieved using examples

In theory:

  • In the wide format, you can have a single continuous variable over several columns, usually one per grouping variable
  • In the long format, every variable occupies a single column

Wide vs. long format

Data Formats

The ‘$’ symbol is to access variables contained inside a data frame, here we extract the variable ‘Sepal Length’ from ‘iris’

Wide vs. long format - an example

Data Formats

remove pic and draw!

The most important in a nutshell

Data Formats
  • The actual data collection is often more practical to do in the wide format
  • As soon as you’re done though, bring your data set into the long format
  • For smaller data sets, do it in excel, for larger ones use R, there are functions and whole packages that can help with this!