It’s 1 p.m and your boss comes to you with some data and asks you to create some Tableau dashboards by day’s end. You open the data in Tableau and find that the data is in ‘wide’ format, preventing you from making some important visualizations. For instance, you wanted a line chart with different line plots for variables and wanted time for the x-axis (a.k.a. time series). What can you do besides manually cut/pasting values?
If you don’t know what I’m talking about, look at this wikipedia entry on Wide and narrow data. For instance, if you were handed data that was in the form of an excel pivot table, it’s likely that the data is in wide format.
Let’s look at a sample dataset that is in wide format. The dataset called airquality comes preloaded with R.
data(airquality)
head(airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
The long format for this dataframe will combine the Ozone, Solar.R, Wind and Temp columns into one column and the ‘primary key’ for the dataset will be the combination of month and day. To do this, we will need the reshape2 package. If you don’t already have this package install it:
install.packages('reshape2')
The conversion from long to wide gets done with the melt function. In the code below, I specify the dataset, the variables that remain the same and also specify the creation of a new variable called Values.
library(reshape2)
aql.long <- melt(airquality, # specify dataset
id.vars = c("Month", "Day"), # ID variables are the variables that remain the same
variable.name = "Factors", # this names the new created variable
value.name = "Values") # and fills it with the values
Now we’re in long format:
subset(aql.long, !duplicated(aql.long$Factors))
## Month Day Factors Values
## 1 5 1 Ozone 41.0
## 154 5 1 Solar.R 190.0
## 307 5 1 Wind 7.4
## 460 5 1 Temp 67.0
What if you wanted to go the other way around? You do so by using the dcast function. Here, you need to specify the following:
# Converting dataframe to wide format
airquality2 <- dcast(aql.long, Month+Day ~ Factors, # left of "~" : variables that stay the same, right of "~" : variables to convert to rows
value.var="Values") # data that goes into new columns
and:
head(airquality2)
## Month Day Ozone Solar.R Wind Temp
## 1 5 1 41 190 7.4 67
## 2 5 2 36 118 8.0 72
## 3 5 3 12 149 12.6 74
## 4 5 4 18 313 11.5 62
## 5 5 5 NA NA 14.3 56
## 6 5 6 28 NA 14.9 66