These rates, normally quoted as currency units per U.S. dollar, are reported daily to the Fund by the issuing central bank. Rates are normally reported for members whose currencies are used in Fund financial transactions (see Financial Transaction Plan).
From this website, we can easily see that 19 graduate classes are offered. We can copy and paste this untidy dataset into Excel and save it as a CSV file.
The Untidydat package also includes several untidy datasets that can be useful for practicing data cleaning and preprocessing.
These 3 datasets represent real-life data that we have obtained. We will first perform some exploratory data analysis (EDA) and then extract useful results based on the requirements. For the IMF dataset, I will select a few currencies against the USD to observe trends. The Central Bank dataset is a real example of wide-format data. The last dataset, the SPS class schedule, may not be considered wide-format, but it is an example of untidy data.
Continent Country Measurement Value
1 Africa Algeria Langs 18
2 Africa Angola Langs 42
3 Oceania Australia Langs 234
4 Asia Bangladesh Langs 37
5 Africa Benin Langs 52
6 Americas Bolivia Langs 38
library(tidyr)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
With the dataset in wide format, comparing two countries becomes straightforward and easy for the end user to read. However, if we want to plot any country in R, we still need to reshape the data back to a longer format. For example, I selected Algeria and Costa Rica for comparison.
# A tibble: 14 × 3
Metric Country Value
<chr> <chr> <dbl>
1 Langs Algeria 18
2 Langs Costa Rica 10
3 Area Algeria 2381741
4 Area Costa Rica 51100
5 Population Algeria 25660
6 Population Costa Rica 3064
7 Stations Algeria 102
8 Stations Costa Rica 38
9 MGS Algeria 6.6
10 MGS Costa Rica 8.92
11 Std Algeria 2.29
12 Std Costa Rica 1.78
13 ContinentCode Algeria 1
14 ContinentCode Costa Rica 4
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.1 ✔ readr 2.1.6
✔ ggplot2 4.0.1 ✔ stringr 1.6.0
✔ lubridate 1.9.4 ✔ tibble 3.3.1
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot(df_2_compare, aes(x = Metric, y = Value, fill = Country)) +geom_col(position ="dodge") +# side-by-side barslabs(title ="Comparison of Metrics: Algeria vs Costa Rica",x ="Metric",y ="Value") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
df_long <- df_ld_reshape %>%pivot_longer(cols =-Metric, # all columns except Metricnames_to ="Country", # new column storing country namesvalues_to ="Value"# temporary column for metric values ) %>%pivot_wider(names_from = Metric, # turn metric names into columnsvalues_from = Value # fill cells with the values )df_long
To plot the Chinese Yuan against the Japanese Yen from this dataset, I first need to reshape it into long format and filter the relevant countries before creating the plot.
df_imf_rate <- df_imf_rate %>%pivot_longer(cols =-Currency, # all columns except Currencynames_to ="IMF_Date", # new column storing IMF_Date namesvalues_to ="C_Rate" ) # temporary column for C_Rate values# ) %>%# # pivot_wider(# names_from = Metric, # turn metric names into columns# values_from = Value # fill cells with the values# )df_imf_rate$IMF_Date <-sub("^X", "",df_imf_rate$IMF_Date)df_imf_rate$IMF_Date <-as.Date(df_imf_rate$IMF_Date, format ="%d.%b.%y")df_imf_rate
# A tibble: 720 × 3
Currency IMF_Date C_Rate
<chr> <date> <chr>
1 Chinese yuan 2026-01-02 <NA>
2 Chinese yuan 2026-01-05 6.9806
3 Chinese yuan 2026-01-06 6.9803
4 Chinese yuan 2026-01-07 6.9913
5 Chinese yuan 2026-01-08 6.983
6 Chinese yuan 2026-01-09 6.982
7 Chinese yuan 2026-01-12 6.9746
8 Chinese yuan 2026-01-13 6.9764
9 Chinese yuan 2026-01-14 6.9732
10 Chinese yuan 2026-01-15 6.9714
# ℹ 710 more rows
For this chart, it’s necessary to re-sort the dates and apply a log transformation to the differences, since the exchange rates vary widely. The Chinese Yuan generally ranges from 6 to 7, whereas the Japanese Yen is around 150.
options(max.print =40) # to control total number of values printed.url2<-"https://raw.githubusercontent.com/dyc-sps/-dyc-sps-SPS_Data607_Week6/refs/heads/main/IMF_CentralBankData_US.csv"df_us_cbs <-read.csv(url2)head(df_us_cbs)
DATASET SERIES_CODE OBS_MEASURE COUNTRY INDICATOR TYPE_OF_TRANSFORMATION
FREQUENCY SCALE DECIMALS_DISPLAYED SECTOR MFS_COLTN COUNTERPART_SECTOR
MFS_INSTRL MFS_RA MFS_AGGREGATES ACCOUNTING_ENTRY FI_MATURITY CURRENCY
VALUATION MFS_SRF MFS_COMPONENT FR_ADJ EXRATE TRANSFORMATION UNIT OVERLAP
MFS_EAWR IFS_FLAG DOI FULL_DESCRIPTION AUTHOR PUBLISHER DEPARTMENT
CONTACT_POINT TOPIC TOPIC_DATASET KEYWORDS KEYWORDS_DATASET LANGUAGE
PUBLICATION_DATE
[ reached 'max' / getOption("max.print") -- omitted 506 columns ]
[ reached 'max' / getOption("max.print") -- omitted 6 rows ]
DATASET SERIES_CODE OBS_MEASURE COUNTRY INDICATOR TYPE_OF_TRANSFORMATION
FREQUENCY SCALE DECIMALS_DISPLAYED SECTOR MFS_COLTN COUNTERPART_SECTOR
MFS_INSTRL MFS_RA MFS_AGGREGATES ACCOUNTING_ENTRY FI_MATURITY CURRENCY
VALUATION MFS_SRF MFS_COMPONENT FR_ADJ EXRATE TRANSFORMATION UNIT OVERLAP
MFS_EAWR IFS_FLAG DOI FULL_DESCRIPTION AUTHOR PUBLISHER DEPARTMENT
CONTACT_POINT TOPIC TOPIC_DATASET KEYWORDS KEYWORDS_DATASET LANGUAGE
PUBLICATION_DATE
[ reached 'max' / getOption("max.print") -- omitted 506 columns ]
[ reached 'max' / getOption("max.print") -- omitted 6 rows ]
df_us_cbs <- df_us_cbs %>%pivot_longer(cols =-c(COUNTRY,SERIES_CODE), # all columns except COUNTRYnames_to ="CBS_Date", # new column storing IMF_Date namesvalues_to ="Amount" ) # temporary column for C_Rate valuesdf_us_cbs
# A tibble: 3,264 × 4
SERIES_CODE COUNTRY CBS_Date Amount
<chr> <chr> <chr> <chr>
1 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States DATASET IMF.S…
2 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States OBS_MEASURE OBS_V…
3 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States INDICATOR Liabi…
4 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States TYPE_OF_TRANSFORMATI… US do…
5 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States FREQUENCY Month…
6 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States SCALE Milli…
7 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States DECIMALS_DISPLAYED <NA>
8 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States SECTOR <NA>
9 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States MFS_COLTN <NA>
10 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States COUNTERPART_SECTOR <NA>
# ℹ 3,254 more rows
# A tibble: 570 × 4
SERIES_CODE COUNTRY CBS_Date Amount
<chr> <chr> <chr> <dbl>
1 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2001.M12 6645
2 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M01 13688
3 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M02 5752
4 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M03 5692
5 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M04 5387
6 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M05 5883
7 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M06 8116
8 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M07 6242
9 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M08 4874
10 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M09 7879
# ℹ 560 more rows
ggplot(df_filtered, aes(x = CBS_Date, y = Amount, color =SERIES_CODE,group=SERIES_CODE, fill = SERIES_CODE)) +#geom_col(position = "dodge") + # side-by-side barsgeom_line() +labs(title ="Liabilities to Central Government (CBS) vs Liabilities, Monetary base (CBS) Monthly",x ="Date",y ="Millions") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
ggsave("df_us_cbs_line_plot.png", plot =last_plot(), # your last ggplotwidth =10, # width in inchesheight =6, # height in inchesdpi =300)
df_filtered <- df_us_cbs %>%filter(str_detect(CBS_Date, "^X\\d{4}\\.Q\\d{1}$")) %>%drop_na()#print(df_filtered)df_filtered <- df_filtered %>%mutate(Amount =as.numeric(Amount))ggplot(df_filtered, aes(x = CBS_Date, y = Amount, color =SERIES_CODE,group=SERIES_CODE, fill = SERIES_CODE)) +#geom_col(position = "dodge") + # side-by-side barsgeom_line() +labs(title ="Liabilities to Central Government (CBS) vs Liabilities, Monetary base (CBS) Quarterly",x ="Date",y ="Millions") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
df_filtered <- df_us_cbs %>%filter(str_detect(CBS_Date, "^X\\d{4}$")) %>%drop_na()#print(df_filtered)df_filtered <- df_filtered %>%mutate(Amount =as.numeric(Amount))ggplot(df_filtered, aes(x = CBS_Date, y = Amount, color =SERIES_CODE,group=SERIES_CODE, fill = SERIES_CODE)) +#geom_col(position = "dodge") + # side-by-side barsgeom_line() +labs(title ="Liabilities to Central Government (CBS) vs Liabilities, Monetary base (CBS) Annual",x ="Date",y ="Millions") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
Conclusion:
Wide and long formats each have their strengths. Wide format is easier for humans to read and quickly compare values across multiple variables, making it ideal for tables and reports. Long format, on the other hand, is better suited for data processing, analysis, and plotting in tools, because it allows for easier filtering, grouping, and reshaping. In practice, it’s common to switch between the two formats depending on whether the focus is on readability or data manipulation.
LLMS used:
• OpenAI. (2025). ChatGPT (Version 5.2) [Large language model]. https://chat.openai.com. Accessed Mar 08, 2026.