The code chunk bellow specifies the libraries used in this analysis.

library(tidyverse) # collection of packages for data analysis
library(readxl)    # to read excel files
library(lubridate) # to clean column names in a data frame
library(ggplot2)   # to visualize the data
library(knitr)     # for generating the markdown report
library(kableExtra) # to create tables in the markdown report

The data is stored in the “Data” sheet, which is now loaded to the variable data.

data <- readxl::read_xlsx("Argus Media - Excel Data Developer Technical Test.xlsx",
                          sheet = "Data")

column_names <- colnames(data)[-1] # get the column names excluding the ID column

Note that the data frame consists of 159 observations of 9 variables. See the structure of the table bellow.

dplyr::glimpse(data)
## Rows: 159
## Columns: 9
## $ ID                          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,~
## $ `DEAL DATE`                 <dttm> 2021-05-10, 2021-05-10, 2021-05-10, 2021-~
## $ COMMODITY                   <chr> "Coal", "Coal", "Coal", "Coal", "Coal", "C~
## $ `COMMODITY SOURCE LOCATION` <chr> "Australia", "Australia", "Australia", "So~
## $ `DELIVERY LOCATION`         <chr> "AMS", "SOT", "ANT", "ANT", "ANT", "ANT", ~
## $ `DELIVERY MONTH`            <chr> "JUL", "JUN", "JUL", "JUN", "MAY", "SEP", ~
## $ `DELIVERY YEAR`             <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, ~
## $ VOLUME                      <dbl> 200, 100, 25, 10, 100, 50, 10, 200, 250, 1~
## $ PRICE                       <dbl> 91.75, 93.49, 93.04, 96.32, 93.49, 93.78, ~

To make it easier to manipulate the data, we use the janitor package to clean all the variable names and the dplyr to pre-process some columns.

data <- data %>% 
  janitor::clean_names() %>% 
  dplyr::mutate(deal_date = lubridate::as_date(deal_date)) %>% 
  dplyr::select(-id)

In the code above, after cleaning the column names, the variable deal_date is coerced to date format, and since the ID column is nothing but the row number, it is disregarded.

Task 1 - price indexes

Now, we create a new table containing the index price for each deal date that is included in the raw data.

In order to check how many unique deal date there are, we can use the unique function.

data %>% 
  dplyr::select(deal_date) %>%
  unique()
DEAL DATE
2021-05-10
2021-05-11
2021-05-12
2021-05-13
2021-05-14

Let us define a function that, given a value of deal_date, returns a \(1 \times 2\) tibble with the indexes Coal2 and Coal4 in one column each.

It is important to remark that since the “Delivery Date” is not available in the raw data, we consider this date as the first day of the Delivery Month and Delivery Year. Thus, as an example take the first row in the raw data. As the delivery month is JUL and delivery year is 2021, the delivery date is considered to be 2021-7-1.

#' Computes the Coal2 and Coal4 indexes for the raw data.
#'
#' @param chosen_date Date for which the indexes are computed.
#'
#' @return 1x2 tibble containing indexes Coal2, and Coal4.
get_indexes <- function(chosen_date){
  # Description of the following command:
  # obtains rows of the raw data where deal_date == chosen_date,
  # unites the columns delivery_month and delivery_year to form
  # a date-type column named delivery_date
  # computes the delivery_period: diff between 
  # delivery_date and deal_date 
  tmp <- data %>% 
    dplyr::filter(deal_date == chosen_date) %>%
    tidyr::unite("delivery_date", delivery_month:delivery_year, sep = "-") %>% 
    dplyr::mutate(delivery_date = lubridate::my(delivery_date),
           delivery_period = delivery_date - deal_date)
    
  # Description of the following command:
  # obtains coal deals that are delivered to Northwest Europe
  # and with delivery period within 180 days of deal date
  # computes the Volume-Weighted Average Price (VWAP)
  # pulls the index as a number
  coal2 <- tmp %>% 
    dplyr::filter(delivery_location %in% c("ARA", "AMS", "ROT", "ANT"),
           delivery_period <= 180) %>% 
    dplyr::summarise(index = sum(volume*price) / sum(volume)) %>% 
    dplyr::pull(index)
    
  # Description of the following command:
  # obtains coal deals that are delivered from South Africa
  # and with delivery period within 180 days of deal date
  # computes the Volume-Weighted Average Price (VWAP)
  # pulls the index as a number
  coal4 <- tmp %>% 
    dplyr::filter(commodity_source_location == "South Africa",
           delivery_period <= 180) %>% 
    dplyr::summarise(index = sum(volume*price) / sum(volume)) %>% 
    dplyr::pull(index)
  
  # creates a tibble with the two indexes
  tidyr::tibble(coal2 = coal2, coal4 = coal4)
}

Then, we can use the function map from package purrr to get the indexes for each date.

indexes <- data %>% 
  dplyr::select(deal_date) %>%
  unique() %>% 
  dplyr::mutate(table = purrr::map(deal_date, get_indexes)) %>% 
  tidyr::unnest(cols = table)

indexes
DEAL DATE Coal2 Coal4
2021-05-10 89.768 90.621
2021-05-11 98.055 96.596
2021-05-12 80.494 88.354
2021-05-13 90.945 96.856
2021-05-14 88.993 93.700

Since this table represents two indexes over the time (the deal date variable), We can visualize it in a graph, see the figure below.

indexes %>%
  tidyr::pivot_longer(cols = coal2:coal4,
               names_to = "index", values_to = "value") %>% 
  ggplot2::ggplot(aes(x = deal_date, y = value, color = index)) +
  ggplot2::geom_line() +
  ggplot2::geom_point() +
  ggplot2::scale_color_manual(labels = c("Coal2", "Coal4"),
                     values = RColorBrewer::brewer.pal(3, "Paired")) +
  ggplot2::scale_x_date(date_labels = "%Y-%m-%d") +
  ggplot2::labs(x = "Deal date", y = "Index", color = "") +
  ggplot2::theme_bw()

Task 2 - exporting the data

First we sort the raw data by COMMODITY SOURCE LOCATION in the following order: South Africa, Australia, and Columbia.

data %>% 
  dplyr::mutate(commodity_source_location = factor(commodity_source_location,
                                            levels = c("South Africa", "Australia", "Columbia"))) %>% 
  dplyr::arrange(commodity_source_location)
DEAL DATE COMMODITY COMMODITY SOURCE LOCATION DELIVERY LOCATION DELIVERY MONTH DELIVERY YEAR VOLUME PRICE
2021-05-10 Coal South Africa ANT JUN 2021 10 96.32
2021-05-10 Coal South Africa ANT MAY 2021 250 93.29
2021-05-10 Coal South Africa AMS JUN 2021 100 93.22
2021-05-10 Coal South Africa ARA MAY 2021 100 94.12
2021-05-10 Coal South Africa AMS MAY 2021 25 94.57
2021-05-10 Coal South Africa ANT JUN 2021 50 10.79
2021-05-10 Coal South Africa ANT MAY 2021 100 93.78
2021-05-10 Coal South Africa ROT JUN 2021 50 96.02
2021-05-10 Coal South Africa ANT JUN 2021 110 98.80
2021-05-10 Coal South Africa AMS JUL 2021 10 99.49
2021-05-10 Coal South Africa ARA OCT 2021 250 94.56
2021-05-11 Coal South Africa SOT MAY 2021 20 97.77
2021-05-11 Coal South Africa ARA JUL 2021 50 99.20
2021-05-11 Coal South Africa ANT JUN 2021 50 98.42
2021-05-11 Coal South Africa ANT JUN 2021 100 94.12
2021-05-11 Coal South Africa ANT JUL 2021 1000 99.97
2021-05-11 Coal South Africa UK MAY 2021 40 10.37
2021-05-11 Coal South Africa ARA JUN 2021 100 97.37
2021-05-12 Coal South Africa AMS JUN 2021 100 10.37
2021-05-12 Coal South Africa ANT JUN 2021 50 97.22
2021-05-12 Coal South Africa ANT JUL 2021 500 94.31
2021-05-12 Coal South Africa ANT JUL 2021 40 10.40
2021-05-12 Coal South Africa UK OCT 2021 100 93.27
2021-05-12 Coal South Africa ARA MAY 2021 250 96.55
2021-05-12 Coal South Africa UK JUN 2021 200 97.77
2021-05-12 Coal South Africa AMS JUN 2021 50 99.20
2021-05-12 Coal South Africa ARA JUN 2021 150 95.31
2021-05-12 Coal South Africa UK MAY 2021 50 97.42
2021-05-12 Coal South Africa UK OCT 2021 100 97.89
2021-05-12 Coal South Africa AMS JUL 2021 10 96.32
2021-05-13 Coal South Africa ANT JUN 2021 40 94.12
2021-05-13 Coal South Africa UK JUN 2021 25 91.76
2021-05-13 Coal South Africa AMS JUN 2021 100 96.95
2021-05-13 Coal South Africa AMS JAN 2022 75 10.24
2021-05-13 Coal South Africa ARA JUN 2021 200 98.25
2021-05-13 Coal South Africa ROT JUN 2021 100 97.89
2021-05-13 Coal South Africa ARA MAY 2021 100 95.31
2021-05-14 Coal South Africa AMS JUN 2021 120 93.35
2021-05-14 Coal South Africa ARA AUG 2021 25 94.46
2021-05-14 Coal South Africa AMS MAY 2021 200 90.97
2021-05-14 Coal South Africa ANT JUN 2021 25 93.56
2021-05-14 Coal South Africa AMS JUL 2021 50 97.22
2021-05-14 Coal South Africa UK SEP 2021 250 94.56
2021-05-14 Coal South Africa ANT JUN 2021 100 90.97
2021-05-14 Coal South Africa ARA JUL 2021 100 94.46
2021-05-14 Coal South Africa ROT JUN 2021 100 94.31
2021-05-14 Coal South Africa ANT MAY 2021 100 94.56
2021-05-14 Coal South Africa ARA JUL 2021 100 96.02
2021-05-10 Coal Australia AMS JUL 2021 200 91.75
2021-05-10 Coal Australia SOT JUN 2021 100 93.49
2021-05-10 Coal Australia ANT JUL 2021 25 93.04
2021-05-10 Coal Australia ANT MAY 2021 100 93.49
2021-05-10 Coal Australia ANT SEP 2021 50 93.78
2021-05-10 Coal Australia ARA MAY 2021 200 90.56
2021-05-10 Coal Australia ARA MAR 2022 100 99.20
2021-05-10 Coal Australia ROT JUN 2021 25 93.78
2021-05-10 Coal Australia ROT JUN 2021 50 95.31
2021-05-10 Coal Australia ANT DEC 2021 100 99.98
2021-05-10 Coal Australia ROT MAY 2021 10 94.50
2021-05-10 Coal Australia AMS JUL 2021 10 93.22
2021-05-10 Coal Australia ANT JUN 2021 40 96.32
2021-05-10 Coal Australia ARA MAY 2021 100 92.66
2021-05-10 Coal Australia ROT OCT 2021 200 94.17
2021-05-10 Coal Australia ANT JUL 2021 25 94.46
2021-05-10 Coal Australia UK AUG 2021 200 90.28
2021-05-10 Coal Australia ARA DEC 2021 110 96.02
2021-05-10 Coal Australia AMS JUN 2021 20 98.45
2021-05-10 Coal Australia AMS JUN 2021 100 93.35
2021-05-10 Coal Australia AMS JUN 2021 50 96.71
2021-05-10 Coal Australia AMS JUN 2021 100 10.37
2021-05-10 Coal Australia AMS SEP 2021 100 96.95
2021-05-11 Coal Australia ROT MAY 2021 100 90.28
2021-05-11 Coal Australia ANT JUN 2021 50 90.28
2021-05-11 Coal Australia ARA JUN 2021 100 94.17
2021-05-11 Coal Australia AMS AUG 2021 100 98.45
2021-05-11 Coal Australia ANT MAY 2021 100 94.57
2021-05-11 Coal Australia ANT JUL 2021 50 93.27
2021-05-11 Coal Australia ARA JUN 2021 90 96.55
2021-05-11 Coal Australia AMS JUN 2021 100 99.85
2021-05-11 Coal Australia ARA MAY 2021 100 99.85
2021-05-11 Coal Australia ARA JUL 2021 100 99.98
2021-05-11 Coal Australia UK MAR 2021 100 98.45
2021-05-11 Coal Australia AMS JUN 2021 90 99.49
2021-05-12 Coal Australia ANT JUN 2021 100 93.49
2021-05-12 Coal Australia ANT JUN 2021 100 10.24
2021-05-12 Coal Australia ANT DEC 2021 100 93.29
2021-05-12 Coal Australia ANT MAY 2021 100 10.79
2021-05-12 Coal Australia ANT JUN 2021 20 94.57
2021-05-12 Coal Australia ANT JUN 2021 100 99.85
2021-05-12 Coal Australia ROT JUL 2021 200 90.56
2021-05-12 Coal Australia AMS JUN 2021 100 97.26
2021-05-12 Coal Australia UK JUL 2021 20 94.12
2021-05-12 Coal Australia ARA JUN 2021 100 92.66
2021-05-12 Coal Australia AMS JUN 2021 50 93.35
2021-05-12 Coal Australia AMS JUN 2021 100 10.40
2021-05-12 Coal Australia SOT JUN 2021 20 97.26
2021-05-13 Coal Australia ARA MAY 2021 1000 93.56
2021-05-13 Coal Australia UK AUG 2021 750 96.71
2021-05-13 Gas Australia ARA MAY 2021 40 92.66
2021-05-13 Coal Australia ARA JUL 2021 250 98.25
2021-05-13 Coal Australia ARA JUN 2021 100 10.79
2021-05-13 Coal Australia ARA MAY 2021 40 94.12
2021-05-14 Coal Australia UK JUN 2021 10 98.65
2021-05-14 Coal Australia ROT JUN 2021 50 96.95
2021-05-14 Coal Australia ANT MAY 2021 50 93.29
2021-05-14 Coal Australia AMS SEP 2021 200 91.75
2021-05-14 Coal Australia ARA JUN 2021 100 94.50
2021-05-14 Coal Australia AMS DEC 2021 25 91.75
2021-05-14 Coal Australia UK JUN 2021 250 97.77
2021-05-14 Coal Australia ROT JUN 2021 100 94.50
2021-05-14 Coal Australia ANT JUN 2021 200 10.48
2021-05-14 Coal Australia ARA JUL 2021 50 98.80
2021-05-14 Coal Australia ARA JUN 2021 100 93.22
2021-05-14 Gas Australia ARA JUN 2021 20 10.24
2021-05-14 Coal Australia ANT JUL 2021 1000 93.56
2021-05-14 Coal Australia ANT JUL 2021 750 98.42
2021-05-14 Coal Australia ARA MAY 2021 50 94.31
2021-05-10 Coal Columbia AMS JUN 2021 10 99.20
2021-05-10 Coal Columbia ARA JUL 2021 50 97.42
2021-05-10 Coal Columbia AMS JUL 2021 25 97.09
2021-05-10 Coal Columbia AMS JUN 2021 100 98.65
2021-05-10 Coal Columbia ANT OCT 2021 100 98.80
2021-05-10 Coal Columbia ARA MAY 2021 1000 97.37
2021-05-10 Coal Columbia ANT OCT 2021 100 10.48
2021-05-10 Coal Columbia ARA MAY 2021 50 91.76
2021-05-10 Coal Columbia UK JUL 2021 20 97.89
2021-05-10 Coal Columbia UK JUN 2021 50 98.65
2021-05-11 Coal Columbia ANT JUN 2021 20 93.27
2021-05-11 Coal Columbia ARA JUN 2021 100 99.98
2021-05-11 Coal Columbia ANT JUN 2021 100 97.37
2021-05-11 Coal Columbia UK JUN 2021 50 93.04
2021-05-12 Coal Columbia ANT MAY 2021 10 94.17
2021-05-12 Coal Columbia AMS MAY 2021 50 10.40
2021-05-12 Coal Columbia ANT JUL 2021 50 97.37
2021-05-12 Coal Columbia AMS JUN 2021 100 99.49
2021-05-12 Coal Columbia ARA AUG 2021 50 93.04
2021-05-12 Coal Columbia ANT DEC 2021 100 98.25
2021-05-12 Coal Columbia AMS JUN 2021 250 97.42
2021-05-12 Coal Columbia ANT JUN 2021 20 96.55
2021-05-12 Coal Columbia ARA JUN 2021 50 99.97
2021-05-12 Coal Columbia AMS JUN 2021 110 90.97
2021-05-13 Coal Columbia AMS JUL 2021 50 94.50
2021-05-13 Coal Columbia AMS JUN 2021 100 90.56
2021-05-13 Coal Columbia AMS JUN 2021 100 97.22
2021-05-13 Coal Columbia ARA JUN 2021 100 97.09
2021-05-13 Coal Columbia ANT JUN 2021 250 97.42
2021-05-13 Coal Columbia ARA JUN 2021 110 97.09
2021-05-13 Coal Columbia AMS JUL 2021 50 10.48
2021-05-13 Coal Columbia ARA JUL 2021 500 91.76
2021-05-14 Gas Columbia ARA JUN 2021 100 98.42
2021-05-14 Coal Columbia ARA JUN 2021 50 97.26
2021-05-14 Coal Columbia AMS AUG 2021 10 99.97
2021-05-14 Coal Columbia ARA JUN 2021 90 96.71
2021-05-14 Coal Columbia AMS MAR 2021 110 97.09
2021-05-14 Coal Columbia AMS JUL 2021 10 94.17
2021-05-14 Coal Columbia ANT SEP 2021 120 99.97
2021-05-14 Coal Columbia AMS JUN 2021 90 96.71
2021-05-14 Coal Columbia ANT SEP 2021 500 90.56
2021-05-14 Coal Columbia AMS MAY 2021 100 10.79

The mutate function above, changes the commodity_source_location variable to be treated as a factor (categorical data type in R) with the ordered levels “South Africa”, “Australia”, “Columbia”. Once it is done, we can export this data to any specified path using the following code.

data %>% 
  dplyr::mutate(commodity_source_location = factor(commodity_source_location,
                                            levels = c("South Africa",
                                                       "Australia",
                                                       "Columbia"))) %>% 
  dplyr::arrange(commodity_source_location) %>% 
  readr::write_csv(file = "path_to_export_the_file/name.csv")