Deriving the numeric differences between two dataframes.

In this scenario, the two data.frames have the same structure, and are intended to convey the same type of informations at two different time points.

For the sake of simplicity, we will use the mtcars data set, taking two equally sized segments of the dataset as exemplars of two snapshots.

It is important to note the structure of the mtcars dataframe: a dataframe comprised entirely of numeric columns, with bespoke rownames.

mtcars1 <- mtcars %>% head(10)
mtcars2 <- mtcars %>% tail(10) 
## adjusted mtcars rownames to be the same for both DFs.
rownames(mtcars2) <- rownames(mtcars1)

mtcars1-mtcars2 
##                     mpg cyl   disp   hp  drat     wt  qsec vs am gear carb
## Mazda RX4           5.8  -2 -144.0  -40  0.75 -0.815 -0.84  0  1    1    2
## Mazda RX4 Wag       7.7  -2 -190.0 -135  0.17 -0.965  1.61  0  1    1    0
## Datsun 710          3.6  -4 -292.0  -82  0.77 -1.525  1.56  1  1    1   -1
## Hornet 4 Drive     -5.9   2  179.0   44 -1.00  1.280  0.54  0 -1   -1    0
## Hornet Sportabout  -7.3   4  239.7   84 -1.28  1.300  0.32  0 -1   -2    0
## Valiant           -12.3   2  129.9   -8 -1.01  1.947  3.32  0 -1   -2   -1
## Duster 360         -1.5   0    9.0  -19 -1.01  0.400  1.34  0 -1   -2    0
## Merc 240D           4.7  -2    1.7 -113  0.07  0.420  4.50  1 -1   -1   -4
## Merc 230            7.8  -4 -160.2 -240  0.38 -0.420  8.30  1 -1   -1   -6
## Merc 280           -2.2   2   46.6   14 -0.19  0.660 -0.30  0 -1    0    2

Presence of an Identifier Column in the DF

For the purposes of demonstration, we will transform the rownames into an identifier column in the dataframe. This inclusion accords with the principles of tidy data. This can be done using the rownames_to_column() in the {tibble} R package. The new column is specified using the var= argument as “car”.

mtcars1 <- mtcars1 %>% tibble::rownames_to_column(var="car")
mtcars2 <- mtcars2 %>% tibble::rownames_to_column(var="car")

The rownames are now columns in the dataframe.

head(mtcars1)
##                 car  mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
glimpse(mtcars1)
## Rows: 10
## Columns: 12
## $ car  <chr> "Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Ho~
## $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2
## $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
## $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92
## $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
## $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1
## $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4

The following code will not work, generating the following error message. This is due to the presence of a character variable in the dataframe (i.e. rownames)

mtcars1-mtcars2 

Error in FUN(left, right) : non-numeric argument to binary operator

Transforming a Column into rownames

The “car” column can be transformed into rownames, using the column_to_rownames() function. The “car” column is specified.

mtcars1 <- mtcars1 %>% tibble::column_to_rownames("car")
mtcars2 <- mtcars2 %>% tibble::column_to_rownames("car")

This will remove mention of “cars” as a previous column name.

mtcars1-mtcars2 
##                     mpg cyl   disp   hp  drat     wt  qsec vs am gear carb
## Mazda RX4           5.8  -2 -144.0  -40  0.75 -0.815 -0.84  0  1    1    2
## Mazda RX4 Wag       7.7  -2 -190.0 -135  0.17 -0.965  1.61  0  1    1    0
## Datsun 710          3.6  -4 -292.0  -82  0.77 -1.525  1.56  1  1    1   -1
## Hornet 4 Drive     -5.9   2  179.0   44 -1.00  1.280  0.54  0 -1   -1    0
## Hornet Sportabout  -7.3   4  239.7   84 -1.28  1.300  0.32  0 -1   -2    0
## Valiant           -12.3   2  129.9   -8 -1.01  1.947  3.32  0 -1   -2   -1
## Duster 360         -1.5   0    9.0  -19 -1.01  0.400  1.34  0 -1   -2    0
## Merc 240D           4.7  -2    1.7 -113  0.07  0.420  4.50  1 -1   -1   -4
## Merc 230            7.8  -4 -160.2 -240  0.38 -0.420  8.30  1 -1   -1   -6
## Merc 280           -2.2   2   46.6   14 -0.19  0.660 -0.30  0 -1    0    2

The differences between dataframes can be captured in a new dataframe, where the rownames have been reverted to a column

diff_df <- (mtcars1-mtcars2) %>% tibble::rownames_to_column("car")
diff_df
##                  car   mpg cyl   disp   hp  drat     wt  qsec vs am gear carb
## 1          Mazda RX4   5.8  -2 -144.0  -40  0.75 -0.815 -0.84  0  1    1    2
## 2      Mazda RX4 Wag   7.7  -2 -190.0 -135  0.17 -0.965  1.61  0  1    1    0
## 3         Datsun 710   3.6  -4 -292.0  -82  0.77 -1.525  1.56  1  1    1   -1
## 4     Hornet 4 Drive  -5.9   2  179.0   44 -1.00  1.280  0.54  0 -1   -1    0
## 5  Hornet Sportabout  -7.3   4  239.7   84 -1.28  1.300  0.32  0 -1   -2    0
## 6            Valiant -12.3   2  129.9   -8 -1.01  1.947  3.32  0 -1   -2   -1
## 7         Duster 360  -1.5   0    9.0  -19 -1.01  0.400  1.34  0 -1   -2    0
## 8          Merc 240D   4.7  -2    1.7 -113  0.07  0.420  4.50  1 -1   -1   -4
## 9           Merc 230   7.8  -4 -160.2 -240  0.38 -0.420  8.30  1 -1   -1   -6
## 10          Merc 280  -2.2   2   46.6   14 -0.19  0.660 -0.30  0 -1    0    2