Transform data from wide to long format in R

Mark Bounthavong

26 May 2025

Introduction

Oftentimes, I have to convert data from the wide format to the long format in order to perform a longitudinal analysis (Figure 1). In the wide format, longitudinal data can be labelled from x_01 to x_12 as depicted in Figure 1. When this is transformed from the wide to the long data format, the number sequence gets its own column called time

\[\begin{aligned} time = \{0, 1, 2, ..., 12\} \end{aligned}\]

where the values for x_01 to x_12 are aligned. See Figure 1 for an example of a wide data format transforming to a long data format.

Figure 1. Wide to Long data.

Figure 1. Wide to Long data.

R has many functions that will allow you to do this transformation. However, I find that that the pivot_longer() function from the tidyverse package is a useful tool to perform this transformation.

In this tutorial, I’ll review a couple of examples of using the pivot_longer() function to transform data from the wide to the long format.

Motivating example

We will use data that is available on my GitHub site.

Step 1: Load data

We can load the data using the following command:

### Load data
url_file1 <- "https://raw.githubusercontent.com/mbounthavong/R-tutorials/refs/heads/main/Data/wide_data1.csv"
data.wide1 <- read.csv(url_file1, header = TRUE)

Once the data is loaded, we can visualize its contents.

### Visualize data
str(data.wide1) # Structure of data
## 'data.frame':    10 obs. of  28 variables:
##  $ patientid: int  1 2 3 4 5 6 7 8 9 10
##  $ age      : int  45 66 73 32 19 21 35 31 22 34
##  $ sex      : int  1 1 1 0 0 1 0 0 1 1
##  $ race     : int  1 2 3 3 2 1 1 1 1 2
##  $ x_01     : int  38 54 45 28 97 16 7 80 71 15
##  $ x_02     : int  39 72 11 98 54 6 40 67 82 52
##  $ x_03     : int  54 51 45 66 54 66 59 85 27 20
##  $ x_04     : int  31 23 8 15 98 19 39 25 44 36
##  $ x_05     : int  86 9 67 7 28 16 34 90 20 49
##  $ x_06     : int  6 3 44 24 17 58 37 24 53 29
##  $ x_07     : int  31 61 73 45 56 80 66 73 37 51
##  $ x_08     : int  50 41 50 80 47 32 13 63 12 33
##  $ x_09     : int  27 98 68 48 79 57 41 82 80 70
##  $ x_10     : int  84 47 91 7 45 66 14 42 25 84
##  $ x_11     : int  11 6 20 8 60 76 33 23 90 72
##  $ x_12     : int  67 87 63 36 88 76 13 6 26 92
##  $ t_01     : int  86 54 60 52 58 36 46 81 87 98
##  $ t_02     : int  66 14 6 42 40 55 25 73 2 68
##  $ t_03     : int  8 62 45 19 72 90 76 45 94 23
##  $ t_04     : int  99 65 25 28 51 41 54 39 8 77
##  $ t_05     : int  56 95 46 62 7 42 24 20 86 33
##  $ t_06     : int  64 31 15 38 11 18 15 14 58 41
##  $ t_07     : int  18 38 11 29 6 17 74 23 44 6
##  $ t_08     : int  3 61 51 33 76 87 4 23 86 11
##  $ t_09     : int  7 49 63 67 59 11 47 85 92 37
##  $ t_10     : int  69 16 20 6 2 15 21 6 94 34
##  $ t_11     : int  6 36 31 67 74 42 12 93 30 87
##  $ t_12     : int  72 95 22 63 46 49 23 50 55 51
head(data.wide1) # Visualize the first six rows
##   patientid age sex race x_01 x_02 x_03 x_04 x_05 x_06 x_07 x_08 x_09 x_10 x_11
## 1         1  45   1    1   38   39   54   31   86    6   31   50   27   84   11
## 2         2  66   1    2   54   72   51   23    9    3   61   41   98   47    6
## 3         3  73   1    3   45   11   45    8   67   44   73   50   68   91   20
## 4         4  32   0    3   28   98   66   15    7   24   45   80   48    7    8
## 5         5  19   0    2   97   54   54   98   28   17   56   47   79   45   60
## 6         6  21   1    1   16    6   66   19   16   58   80   32   57   66   76
##   x_12 t_01 t_02 t_03 t_04 t_05 t_06 t_07 t_08 t_09 t_10 t_11 t_12
## 1   67   86   66    8   99   56   64   18    3    7   69    6   72
## 2   87   54   14   62   65   95   31   38   61   49   16   36   95
## 3   63   60    6   45   25   46   15   11   51   63   20   31   22
## 4   36   52   42   19   28   62   38   29   33   67    6   67   63
## 5   88   58   40   72   51    7   11    6   76   59    2   74   46
## 6   76   36   55   90   41   42   18   17   87   11   15   42   49

Notice the convention used to name the columns. The number sequence is separated from the variable prefix by _. Additionally, each column name has two digits (e.g., 01, 02, …, 12). The naming convention of the column name will be exploited to transform the data from the wide to the long format. For instance, we can separate the prefix and number using the name_sep argument of the pivot_longer() function.

But first, we need to locate the position of the variables we want to transform on the dataframe.

Step 2: Locate position of variables on the dataframe

Since R treats the dataframe like a matrix, we want to locate the position of the variables that we want to transform from the wide to the long format. There are two sequences of variables that we want:

\[\begin{aligned} x = \{0, 1, 2, ..., 12\} \\ t = \{0, 1, 2, ..., 12\} \end{aligned}\]

We want to identify where x_0 and t_12 are located on the dataframe. This will allow us to input the range of variables that will undergo the transformation from wide to long.

### Identify location of variable in the dataframe
grep("x_01", names(data.wide1)) 
## [1] 5
grep("t_12", names(data.wide1)) 
## [1] 28

Using the grep() function we identified the starting and ending positions of x_01 and t_12, respectively, as 5 and 28. We will use this to inform the pivot_long() function which columns to transform from wide to long format.

Step 3: Transform from wide to long using pivot_longer()

Now we can use the pivot_longer() function to transform the data from wide to long. First, we need to load the tidyverse package. Then, we enter the necessary options into the pivot_longer() function such as names_to = and names_sep =.

The cols = 5:28 option allows us to determine which columns will be transformed from wide to long format. The remaining columns will remain constant. The names_to = option will use the digits on the right of the argument names_to = c(".value", "time") as the value. For example, x_02 will be transformed to time = 02. The names_sep = c("\\_") separates the digit by using the _ in the x_02 column name.

Putting this altogether, we can write out the entire pivot_longer() function as:

### Load library
library("tidyverse")

### Reshape from wide to long
data.long1 <- data.wide1 %>%
                pivot_longer(cols = 5:28,
                             names_to = c(".value", "time"),
                             names_sep = c("\\_"))

The data.long1 object contains the transformed data.

Afterwards, you will notice that time is a character or string variable. We want to change this to a numeric.

### Convert time to numeric
data.long1$time <- as.numeric(data.long1$time)

Step 4: Verify the transformation is successful

We can visualize the data using the head() function.

### Convert time to numeric
head(data.long1)
## # A tibble: 6 × 7
##   patientid   age   sex  race  time     x     t
##       <int> <int> <int> <int> <dbl> <int> <int>
## 1         1    45     1     1     1    38    86
## 2         1    45     1     1     2    39    66
## 3         1    45     1     1     3    54     8
## 4         1    45     1     1     4    31    99
## 5         1    45     1     1     5    86    56
## 6         1    45     1     1     6     6    64

Other column naming conventions

In this example, we used the column naming convention x_01 where the digits are separated from the prefix by _. But what is we had a different column naming convention such that there is not _ separating the digit from the prefix?

\[\begin{aligned} x1, x2, x3, ..., x12 \\ t1, t2, t3, ..., t12 \end{aligned}\]
### Load data
url_file2 <- "https://raw.githubusercontent.com/mbounthavong/R-tutorials/refs/heads/main/Data/wide_data2.csv"
data.wide2 <- read.csv(url_file2, header = TRUE)

### Visualize data
str(data.wide2) # Structure of data
## 'data.frame':    10 obs. of  28 variables:
##  $ patientid: int  1 2 3 4 5 6 7 8 9 10
##  $ age      : int  45 66 73 32 19 21 35 31 22 34
##  $ sex      : int  1 1 1 0 0 1 0 0 1 1
##  $ race     : int  1 2 3 3 2 1 1 1 1 2
##  $ x1       : int  38 54 45 28 97 16 7 80 71 15
##  $ x2       : int  39 72 11 98 54 6 40 67 82 52
##  $ x3       : int  54 51 45 66 54 66 59 85 27 20
##  $ x4       : int  31 23 8 15 98 19 39 25 44 36
##  $ x5       : int  86 9 67 7 28 16 34 90 20 49
##  $ x6       : int  6 3 44 24 17 58 37 24 53 29
##  $ x7       : int  31 61 73 45 56 80 66 73 37 51
##  $ x8       : int  50 41 50 80 47 32 13 63 12 33
##  $ x9       : int  27 98 68 48 79 57 41 82 80 70
##  $ x10      : int  84 47 91 7 45 66 14 42 25 84
##  $ x11      : int  11 6 20 8 60 76 33 23 90 72
##  $ x12      : int  67 87 63 36 88 76 13 6 26 92
##  $ t1       : int  86 54 60 52 58 36 46 81 87 98
##  $ t2       : int  66 14 6 42 40 55 25 73 2 68
##  $ t3       : int  8 62 45 19 72 90 76 45 94 23
##  $ t4       : int  99 65 25 28 51 41 54 39 8 77
##  $ t5       : int  56 95 46 62 7 42 24 20 86 33
##  $ t6       : int  64 31 15 38 11 18 15 14 58 41
##  $ t7       : int  18 38 11 29 6 17 74 23 44 6
##  $ t8       : int  3 61 51 33 76 87 4 23 86 11
##  $ t9       : int  7 49 63 67 59 11 47 85 92 37
##  $ t10      : int  69 16 20 6 2 15 21 6 94 34
##  $ t11      : int  6 36 31 67 74 42 12 93 30 87
##  $ t12      : int  72 95 22 63 46 49 23 50 55 51
head(data.wide2) # Visualize the first six rows
##   patientid age sex race x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 t1 t2 t3 t4 t5
## 1         1  45   1    1 38 39 54 31 86  6 31 50 27  84  11  67 86 66  8 99 56
## 2         2  66   1    2 54 72 51 23  9  3 61 41 98  47   6  87 54 14 62 65 95
## 3         3  73   1    3 45 11 45  8 67 44 73 50 68  91  20  63 60  6 45 25 46
## 4         4  32   0    3 28 98 66 15  7 24 45 80 48   7   8  36 52 42 19 28 62
## 5         5  19   0    2 97 54 54 98 28 17 56 47 79  45  60  88 58 40 72 51  7
## 6         6  21   1    1 16  6 66 19 16 58 80 32 57  66  76  76 36 55 90 41 42
##   t6 t7 t8 t9 t10 t11 t12
## 1 64 18  3  7  69   6  72
## 2 31 38 61 49  16  36  95
## 3 15 11 51 63  20  31  22
## 4 38 29 33 67   6  67  63
## 5 11  6 76 59   2  74  46
## 6 18 17 87 11  15  42  49

We can address this using the names_pattern = "(.)(\\d+)" option in pivot_longer() function. The (.) denotes the prefix, and (\\d+) denotes the digits after the prefix.

For example, the column name x12 has the prefix of x and the digit of 12. The d+ is important because it will generate a digit regardless if there is a single digit 1 or multiple digits 12. If you don’t include the + after the d, the pivot_function() will only generate values from 0 to 9 and not for 10 or greater.

#### Reshape from wide to long
data.long2 <- data.wide2 %>%
                pivot_longer(cols = 5:28,
                             names_to = c(".value", "time"),
                             names_pattern = "(.)(\\d+)")

### Convert time to numeric
data.long2$time <- as.numeric(data.long1$time)

Once you’ve completed this part, you can visualize the data:

str(data.long2)
## tibble [120 × 7] (S3: tbl_df/tbl/data.frame)
##  $ patientid: int [1:120] 1 1 1 1 1 1 1 1 1 1 ...
##  $ age      : int [1:120] 45 45 45 45 45 45 45 45 45 45 ...
##  $ sex      : int [1:120] 1 1 1 1 1 1 1 1 1 1 ...
##  $ race     : int [1:120] 1 1 1 1 1 1 1 1 1 1 ...
##  $ time     : num [1:120] 1 2 3 4 5 6 7 8 9 10 ...
##  $ x        : int [1:120] 38 39 54 31 86 6 31 50 27 84 ...
##  $ t        : int [1:120] 86 66 8 99 56 64 18 3 7 69 ...
head(data.long2)
## # A tibble: 6 × 7
##   patientid   age   sex  race  time     x     t
##       <int> <int> <int> <int> <dbl> <int> <int>
## 1         1    45     1     1     1    38    86
## 2         1    45     1     1     2    39    66
## 3         1    45     1     1     3    54     8
## 4         1    45     1     1     4    31    99
## 5         1    45     1     1     5    86    56
## 6         1    45     1     1     6     6    64

Conclusions

Transforming data from the wide to long format is convenient when using the pivot_longer() function. One needs to be careful with naming convention of the columns to simplify the process. I recommend using the same naming convention when constructing your data. It will make transforming from the wide to long format much more convenient.

Next time, we will learn how to transform our data from the long to the wide format.

Acknowledgements

I used the R CRAN project page on Pivoting to help me learn how to transform data from wide to long.

I learned about advanced pivoting from an online reference on Data Wrangling developed by Sara Altman, Bill Behrman, and Hadley Wickham. They provide an excellent explanation of the names_to = and names_pattern = options in excellent detail in the Advance pivoting chapter.

Disclaimers

This is for educational purposes only.

This is a work in progress and updates are expected in the future.