The code chunk bellow specifies the libraries used in this analysis.
library(tidyverse) # collection of packages for data analysis
library(readxl) # to read excel files
library(lubridate) # to clean column names in a data frame
library(ggplot2) # to visualize the data
library(knitr) # for generating the markdown report
library(kableExtra) # to create tables in the markdown report
The data is stored in the “Data” sheet, which is now loaded to the variable data
.
data <- readxl::read_xlsx("Argus Media - Excel Data Developer Technical Test.xlsx",
sheet = "Data")
column_names <- colnames(data)[-1] # get the column names excluding the ID column
Note that the data frame consists of 159 observations of 9 variables. See the structure of the table bellow.
dplyr::glimpse(data)
## Rows: 159
## Columns: 9
## $ ID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,~
## $ `DEAL DATE` <dttm> 2021-05-10, 2021-05-10, 2021-05-10, 2021-~
## $ COMMODITY <chr> "Coal", "Coal", "Coal", "Coal", "Coal", "C~
## $ `COMMODITY SOURCE LOCATION` <chr> "Australia", "Australia", "Australia", "So~
## $ `DELIVERY LOCATION` <chr> "AMS", "SOT", "ANT", "ANT", "ANT", "ANT", ~
## $ `DELIVERY MONTH` <chr> "JUL", "JUN", "JUL", "JUN", "MAY", "SEP", ~
## $ `DELIVERY YEAR` <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, ~
## $ VOLUME <dbl> 200, 100, 25, 10, 100, 50, 10, 200, 250, 1~
## $ PRICE <dbl> 91.75, 93.49, 93.04, 96.32, 93.49, 93.78, ~
To make it easier to manipulate the data, we use the janitor
package to clean all the variable names and the dplyr
to pre-process some columns.
data <- data %>%
janitor::clean_names() %>%
dplyr::mutate(deal_date = lubridate::as_date(deal_date)) %>%
dplyr::select(-id)
In the code above, after cleaning the column names, the variable deal_date
is coerced to date format, and since the ID
column is nothing but the row number, it is disregarded.
Now, we create a new table containing the index price for each deal date that is included in the raw data.
In order to check how many unique deal date there are, we can use the unique
function.
data %>%
dplyr::select(deal_date) %>%
unique()
DEAL DATE |
---|
2021-05-10 |
2021-05-11 |
2021-05-12 |
2021-05-13 |
2021-05-14 |
Let us define a function that, given a value of deal_date
, returns a \(1 \times 2\) tibble with the indexes Coal2 and Coal4 in one column each.
It is important to remark that since the “Delivery Date” is not available in the raw data, we consider this date as the first day of the Delivery Month and Delivery Year. Thus, as an example take the first row in the raw data. As the delivery month is JUL and delivery year is 2021, the delivery date is considered to be 2021-7-1.
#' Computes the Coal2 and Coal4 indexes for the raw data.
#'
#' @param chosen_date Date for which the indexes are computed.
#'
#' @return 1x2 tibble containing indexes Coal2, and Coal4.
get_indexes <- function(chosen_date){
# Description of the following command:
# obtains rows of the raw data where deal_date == chosen_date,
# unites the columns delivery_month and delivery_year to form
# a date-type column named delivery_date
# computes the delivery_period: diff between
# delivery_date and deal_date
tmp <- data %>%
dplyr::filter(deal_date == chosen_date) %>%
tidyr::unite("delivery_date", delivery_month:delivery_year, sep = "-") %>%
dplyr::mutate(delivery_date = lubridate::my(delivery_date),
delivery_period = delivery_date - deal_date)
# Description of the following command:
# obtains coal deals that are delivered to Northwest Europe
# and with delivery period within 180 days of deal date
# computes the Volume-Weighted Average Price (VWAP)
# pulls the index as a number
coal2 <- tmp %>%
dplyr::filter(delivery_location %in% c("ARA", "AMS", "ROT", "ANT"),
delivery_period <= 180) %>%
dplyr::summarise(index = sum(volume*price) / sum(volume)) %>%
dplyr::pull(index)
# Description of the following command:
# obtains coal deals that are delivered from South Africa
# and with delivery period within 180 days of deal date
# computes the Volume-Weighted Average Price (VWAP)
# pulls the index as a number
coal4 <- tmp %>%
dplyr::filter(commodity_source_location == "South Africa",
delivery_period <= 180) %>%
dplyr::summarise(index = sum(volume*price) / sum(volume)) %>%
dplyr::pull(index)
# creates a tibble with the two indexes
tidyr::tibble(coal2 = coal2, coal4 = coal4)
}
Then, we can use the function map
from package purrr
to get the indexes for each date.
indexes <- data %>%
dplyr::select(deal_date) %>%
unique() %>%
dplyr::mutate(table = purrr::map(deal_date, get_indexes)) %>%
tidyr::unnest(cols = table)
indexes
DEAL DATE | Coal2 | Coal4 |
---|---|---|
2021-05-10 | 89.768 | 90.621 |
2021-05-11 | 98.055 | 96.596 |
2021-05-12 | 80.494 | 88.354 |
2021-05-13 | 90.945 | 96.856 |
2021-05-14 | 88.993 | 93.700 |
Since this table represents two indexes over the time (the deal date variable), We can visualize it in a graph, see the figure below.
indexes %>%
tidyr::pivot_longer(cols = coal2:coal4,
names_to = "index", values_to = "value") %>%
ggplot2::ggplot(aes(x = deal_date, y = value, color = index)) +
ggplot2::geom_line() +
ggplot2::geom_point() +
ggplot2::scale_color_manual(labels = c("Coal2", "Coal4"),
values = RColorBrewer::brewer.pal(3, "Paired")) +
ggplot2::scale_x_date(date_labels = "%Y-%m-%d") +
ggplot2::labs(x = "Deal date", y = "Index", color = "") +
ggplot2::theme_bw()
First we sort the raw data by COMMODITY SOURCE LOCATION in the following order: South Africa, Australia, and Columbia.
data %>%
dplyr::mutate(commodity_source_location = factor(commodity_source_location,
levels = c("South Africa", "Australia", "Columbia"))) %>%
dplyr::arrange(commodity_source_location)
DEAL DATE | COMMODITY | COMMODITY SOURCE LOCATION | DELIVERY LOCATION | DELIVERY MONTH | DELIVERY YEAR | VOLUME | PRICE |
---|---|---|---|---|---|---|---|
2021-05-10 | Coal | South Africa | ANT | JUN | 2021 | 10 | 96.32 |
2021-05-10 | Coal | South Africa | ANT | MAY | 2021 | 250 | 93.29 |
2021-05-10 | Coal | South Africa | AMS | JUN | 2021 | 100 | 93.22 |
2021-05-10 | Coal | South Africa | ARA | MAY | 2021 | 100 | 94.12 |
2021-05-10 | Coal | South Africa | AMS | MAY | 2021 | 25 | 94.57 |
2021-05-10 | Coal | South Africa | ANT | JUN | 2021 | 50 | 10.79 |
2021-05-10 | Coal | South Africa | ANT | MAY | 2021 | 100 | 93.78 |
2021-05-10 | Coal | South Africa | ROT | JUN | 2021 | 50 | 96.02 |
2021-05-10 | Coal | South Africa | ANT | JUN | 2021 | 110 | 98.80 |
2021-05-10 | Coal | South Africa | AMS | JUL | 2021 | 10 | 99.49 |
2021-05-10 | Coal | South Africa | ARA | OCT | 2021 | 250 | 94.56 |
2021-05-11 | Coal | South Africa | SOT | MAY | 2021 | 20 | 97.77 |
2021-05-11 | Coal | South Africa | ARA | JUL | 2021 | 50 | 99.20 |
2021-05-11 | Coal | South Africa | ANT | JUN | 2021 | 50 | 98.42 |
2021-05-11 | Coal | South Africa | ANT | JUN | 2021 | 100 | 94.12 |
2021-05-11 | Coal | South Africa | ANT | JUL | 2021 | 1000 | 99.97 |
2021-05-11 | Coal | South Africa | UK | MAY | 2021 | 40 | 10.37 |
2021-05-11 | Coal | South Africa | ARA | JUN | 2021 | 100 | 97.37 |
2021-05-12 | Coal | South Africa | AMS | JUN | 2021 | 100 | 10.37 |
2021-05-12 | Coal | South Africa | ANT | JUN | 2021 | 50 | 97.22 |
2021-05-12 | Coal | South Africa | ANT | JUL | 2021 | 500 | 94.31 |
2021-05-12 | Coal | South Africa | ANT | JUL | 2021 | 40 | 10.40 |
2021-05-12 | Coal | South Africa | UK | OCT | 2021 | 100 | 93.27 |
2021-05-12 | Coal | South Africa | ARA | MAY | 2021 | 250 | 96.55 |
2021-05-12 | Coal | South Africa | UK | JUN | 2021 | 200 | 97.77 |
2021-05-12 | Coal | South Africa | AMS | JUN | 2021 | 50 | 99.20 |
2021-05-12 | Coal | South Africa | ARA | JUN | 2021 | 150 | 95.31 |
2021-05-12 | Coal | South Africa | UK | MAY | 2021 | 50 | 97.42 |
2021-05-12 | Coal | South Africa | UK | OCT | 2021 | 100 | 97.89 |
2021-05-12 | Coal | South Africa | AMS | JUL | 2021 | 10 | 96.32 |
2021-05-13 | Coal | South Africa | ANT | JUN | 2021 | 40 | 94.12 |
2021-05-13 | Coal | South Africa | UK | JUN | 2021 | 25 | 91.76 |
2021-05-13 | Coal | South Africa | AMS | JUN | 2021 | 100 | 96.95 |
2021-05-13 | Coal | South Africa | AMS | JAN | 2022 | 75 | 10.24 |
2021-05-13 | Coal | South Africa | ARA | JUN | 2021 | 200 | 98.25 |
2021-05-13 | Coal | South Africa | ROT | JUN | 2021 | 100 | 97.89 |
2021-05-13 | Coal | South Africa | ARA | MAY | 2021 | 100 | 95.31 |
2021-05-14 | Coal | South Africa | AMS | JUN | 2021 | 120 | 93.35 |
2021-05-14 | Coal | South Africa | ARA | AUG | 2021 | 25 | 94.46 |
2021-05-14 | Coal | South Africa | AMS | MAY | 2021 | 200 | 90.97 |
2021-05-14 | Coal | South Africa | ANT | JUN | 2021 | 25 | 93.56 |
2021-05-14 | Coal | South Africa | AMS | JUL | 2021 | 50 | 97.22 |
2021-05-14 | Coal | South Africa | UK | SEP | 2021 | 250 | 94.56 |
2021-05-14 | Coal | South Africa | ANT | JUN | 2021 | 100 | 90.97 |
2021-05-14 | Coal | South Africa | ARA | JUL | 2021 | 100 | 94.46 |
2021-05-14 | Coal | South Africa | ROT | JUN | 2021 | 100 | 94.31 |
2021-05-14 | Coal | South Africa | ANT | MAY | 2021 | 100 | 94.56 |
2021-05-14 | Coal | South Africa | ARA | JUL | 2021 | 100 | 96.02 |
2021-05-10 | Coal | Australia | AMS | JUL | 2021 | 200 | 91.75 |
2021-05-10 | Coal | Australia | SOT | JUN | 2021 | 100 | 93.49 |
2021-05-10 | Coal | Australia | ANT | JUL | 2021 | 25 | 93.04 |
2021-05-10 | Coal | Australia | ANT | MAY | 2021 | 100 | 93.49 |
2021-05-10 | Coal | Australia | ANT | SEP | 2021 | 50 | 93.78 |
2021-05-10 | Coal | Australia | ARA | MAY | 2021 | 200 | 90.56 |
2021-05-10 | Coal | Australia | ARA | MAR | 2022 | 100 | 99.20 |
2021-05-10 | Coal | Australia | ROT | JUN | 2021 | 25 | 93.78 |
2021-05-10 | Coal | Australia | ROT | JUN | 2021 | 50 | 95.31 |
2021-05-10 | Coal | Australia | ANT | DEC | 2021 | 100 | 99.98 |
2021-05-10 | Coal | Australia | ROT | MAY | 2021 | 10 | 94.50 |
2021-05-10 | Coal | Australia | AMS | JUL | 2021 | 10 | 93.22 |
2021-05-10 | Coal | Australia | ANT | JUN | 2021 | 40 | 96.32 |
2021-05-10 | Coal | Australia | ARA | MAY | 2021 | 100 | 92.66 |
2021-05-10 | Coal | Australia | ROT | OCT | 2021 | 200 | 94.17 |
2021-05-10 | Coal | Australia | ANT | JUL | 2021 | 25 | 94.46 |
2021-05-10 | Coal | Australia | UK | AUG | 2021 | 200 | 90.28 |
2021-05-10 | Coal | Australia | ARA | DEC | 2021 | 110 | 96.02 |
2021-05-10 | Coal | Australia | AMS | JUN | 2021 | 20 | 98.45 |
2021-05-10 | Coal | Australia | AMS | JUN | 2021 | 100 | 93.35 |
2021-05-10 | Coal | Australia | AMS | JUN | 2021 | 50 | 96.71 |
2021-05-10 | Coal | Australia | AMS | JUN | 2021 | 100 | 10.37 |
2021-05-10 | Coal | Australia | AMS | SEP | 2021 | 100 | 96.95 |
2021-05-11 | Coal | Australia | ROT | MAY | 2021 | 100 | 90.28 |
2021-05-11 | Coal | Australia | ANT | JUN | 2021 | 50 | 90.28 |
2021-05-11 | Coal | Australia | ARA | JUN | 2021 | 100 | 94.17 |
2021-05-11 | Coal | Australia | AMS | AUG | 2021 | 100 | 98.45 |
2021-05-11 | Coal | Australia | ANT | MAY | 2021 | 100 | 94.57 |
2021-05-11 | Coal | Australia | ANT | JUL | 2021 | 50 | 93.27 |
2021-05-11 | Coal | Australia | ARA | JUN | 2021 | 90 | 96.55 |
2021-05-11 | Coal | Australia | AMS | JUN | 2021 | 100 | 99.85 |
2021-05-11 | Coal | Australia | ARA | MAY | 2021 | 100 | 99.85 |
2021-05-11 | Coal | Australia | ARA | JUL | 2021 | 100 | 99.98 |
2021-05-11 | Coal | Australia | UK | MAR | 2021 | 100 | 98.45 |
2021-05-11 | Coal | Australia | AMS | JUN | 2021 | 90 | 99.49 |
2021-05-12 | Coal | Australia | ANT | JUN | 2021 | 100 | 93.49 |
2021-05-12 | Coal | Australia | ANT | JUN | 2021 | 100 | 10.24 |
2021-05-12 | Coal | Australia | ANT | DEC | 2021 | 100 | 93.29 |
2021-05-12 | Coal | Australia | ANT | MAY | 2021 | 100 | 10.79 |
2021-05-12 | Coal | Australia | ANT | JUN | 2021 | 20 | 94.57 |
2021-05-12 | Coal | Australia | ANT | JUN | 2021 | 100 | 99.85 |
2021-05-12 | Coal | Australia | ROT | JUL | 2021 | 200 | 90.56 |
2021-05-12 | Coal | Australia | AMS | JUN | 2021 | 100 | 97.26 |
2021-05-12 | Coal | Australia | UK | JUL | 2021 | 20 | 94.12 |
2021-05-12 | Coal | Australia | ARA | JUN | 2021 | 100 | 92.66 |
2021-05-12 | Coal | Australia | AMS | JUN | 2021 | 50 | 93.35 |
2021-05-12 | Coal | Australia | AMS | JUN | 2021 | 100 | 10.40 |
2021-05-12 | Coal | Australia | SOT | JUN | 2021 | 20 | 97.26 |
2021-05-13 | Coal | Australia | ARA | MAY | 2021 | 1000 | 93.56 |
2021-05-13 | Coal | Australia | UK | AUG | 2021 | 750 | 96.71 |
2021-05-13 | Gas | Australia | ARA | MAY | 2021 | 40 | 92.66 |
2021-05-13 | Coal | Australia | ARA | JUL | 2021 | 250 | 98.25 |
2021-05-13 | Coal | Australia | ARA | JUN | 2021 | 100 | 10.79 |
2021-05-13 | Coal | Australia | ARA | MAY | 2021 | 40 | 94.12 |
2021-05-14 | Coal | Australia | UK | JUN | 2021 | 10 | 98.65 |
2021-05-14 | Coal | Australia | ROT | JUN | 2021 | 50 | 96.95 |
2021-05-14 | Coal | Australia | ANT | MAY | 2021 | 50 | 93.29 |
2021-05-14 | Coal | Australia | AMS | SEP | 2021 | 200 | 91.75 |
2021-05-14 | Coal | Australia | ARA | JUN | 2021 | 100 | 94.50 |
2021-05-14 | Coal | Australia | AMS | DEC | 2021 | 25 | 91.75 |
2021-05-14 | Coal | Australia | UK | JUN | 2021 | 250 | 97.77 |
2021-05-14 | Coal | Australia | ROT | JUN | 2021 | 100 | 94.50 |
2021-05-14 | Coal | Australia | ANT | JUN | 2021 | 200 | 10.48 |
2021-05-14 | Coal | Australia | ARA | JUL | 2021 | 50 | 98.80 |
2021-05-14 | Coal | Australia | ARA | JUN | 2021 | 100 | 93.22 |
2021-05-14 | Gas | Australia | ARA | JUN | 2021 | 20 | 10.24 |
2021-05-14 | Coal | Australia | ANT | JUL | 2021 | 1000 | 93.56 |
2021-05-14 | Coal | Australia | ANT | JUL | 2021 | 750 | 98.42 |
2021-05-14 | Coal | Australia | ARA | MAY | 2021 | 50 | 94.31 |
2021-05-10 | Coal | Columbia | AMS | JUN | 2021 | 10 | 99.20 |
2021-05-10 | Coal | Columbia | ARA | JUL | 2021 | 50 | 97.42 |
2021-05-10 | Coal | Columbia | AMS | JUL | 2021 | 25 | 97.09 |
2021-05-10 | Coal | Columbia | AMS | JUN | 2021 | 100 | 98.65 |
2021-05-10 | Coal | Columbia | ANT | OCT | 2021 | 100 | 98.80 |
2021-05-10 | Coal | Columbia | ARA | MAY | 2021 | 1000 | 97.37 |
2021-05-10 | Coal | Columbia | ANT | OCT | 2021 | 100 | 10.48 |
2021-05-10 | Coal | Columbia | ARA | MAY | 2021 | 50 | 91.76 |
2021-05-10 | Coal | Columbia | UK | JUL | 2021 | 20 | 97.89 |
2021-05-10 | Coal | Columbia | UK | JUN | 2021 | 50 | 98.65 |
2021-05-11 | Coal | Columbia | ANT | JUN | 2021 | 20 | 93.27 |
2021-05-11 | Coal | Columbia | ARA | JUN | 2021 | 100 | 99.98 |
2021-05-11 | Coal | Columbia | ANT | JUN | 2021 | 100 | 97.37 |
2021-05-11 | Coal | Columbia | UK | JUN | 2021 | 50 | 93.04 |
2021-05-12 | Coal | Columbia | ANT | MAY | 2021 | 10 | 94.17 |
2021-05-12 | Coal | Columbia | AMS | MAY | 2021 | 50 | 10.40 |
2021-05-12 | Coal | Columbia | ANT | JUL | 2021 | 50 | 97.37 |
2021-05-12 | Coal | Columbia | AMS | JUN | 2021 | 100 | 99.49 |
2021-05-12 | Coal | Columbia | ARA | AUG | 2021 | 50 | 93.04 |
2021-05-12 | Coal | Columbia | ANT | DEC | 2021 | 100 | 98.25 |
2021-05-12 | Coal | Columbia | AMS | JUN | 2021 | 250 | 97.42 |
2021-05-12 | Coal | Columbia | ANT | JUN | 2021 | 20 | 96.55 |
2021-05-12 | Coal | Columbia | ARA | JUN | 2021 | 50 | 99.97 |
2021-05-12 | Coal | Columbia | AMS | JUN | 2021 | 110 | 90.97 |
2021-05-13 | Coal | Columbia | AMS | JUL | 2021 | 50 | 94.50 |
2021-05-13 | Coal | Columbia | AMS | JUN | 2021 | 100 | 90.56 |
2021-05-13 | Coal | Columbia | AMS | JUN | 2021 | 100 | 97.22 |
2021-05-13 | Coal | Columbia | ARA | JUN | 2021 | 100 | 97.09 |
2021-05-13 | Coal | Columbia | ANT | JUN | 2021 | 250 | 97.42 |
2021-05-13 | Coal | Columbia | ARA | JUN | 2021 | 110 | 97.09 |
2021-05-13 | Coal | Columbia | AMS | JUL | 2021 | 50 | 10.48 |
2021-05-13 | Coal | Columbia | ARA | JUL | 2021 | 500 | 91.76 |
2021-05-14 | Gas | Columbia | ARA | JUN | 2021 | 100 | 98.42 |
2021-05-14 | Coal | Columbia | ARA | JUN | 2021 | 50 | 97.26 |
2021-05-14 | Coal | Columbia | AMS | AUG | 2021 | 10 | 99.97 |
2021-05-14 | Coal | Columbia | ARA | JUN | 2021 | 90 | 96.71 |
2021-05-14 | Coal | Columbia | AMS | MAR | 2021 | 110 | 97.09 |
2021-05-14 | Coal | Columbia | AMS | JUL | 2021 | 10 | 94.17 |
2021-05-14 | Coal | Columbia | ANT | SEP | 2021 | 120 | 99.97 |
2021-05-14 | Coal | Columbia | AMS | JUN | 2021 | 90 | 96.71 |
2021-05-14 | Coal | Columbia | ANT | SEP | 2021 | 500 | 90.56 |
2021-05-14 | Coal | Columbia | AMS | MAY | 2021 | 100 | 10.79 |
The mutate
function above, changes the commodity_source_location
variable to be treated as a factor (categorical data type in R
) with the ordered levels “South Africa”, “Australia”, “Columbia”. Once it is done, we can export this data to any specified path using the following code.
data %>%
dplyr::mutate(commodity_source_location = factor(commodity_source_location,
levels = c("South Africa",
"Australia",
"Columbia"))) %>%
dplyr::arrange(commodity_source_location) %>%
readr::write_csv(file = "path_to_export_the_file/name.csv")