SPS_Data607_Project2

Author

David Chen

DataSets

IMF exchange rate

https://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=2026-01-01&reportType=REP

These rates, normally quoted as currency units per U.S. dollar, are reported daily to the Fund by the issuing central bank. Rates are normally reported for members whose currencies are used in Fund financial transactions (see Financial Transaction Plan).

https://raw.githubusercontent.com/dyc-sps/-dyc-sps-SPS_Data607_Week6/refs/heads/main/IMF_currencies_exchange_rate.csv

Monetary and Financial Statistics (MFS), Central Bank Data

In this dataset, I will filter the data to include only the United States and practice working with wide-format data.

https://raw.githubusercontent.com/dyc-sps/-dyc-sps-SPS_Data607_Week6/refs/heads/main/IMF_CentralBankData_US.csv

https://data.imf.org/en/datasets/IMF.STA:MFS_CBS

SPS Spring Graduate Class Schedule (optional)

https://globalsearch.cuny.edu/CFGlobalSearchTool/CFSearchToolController

https://raw.githubusercontent.com/dyc-sps/-dyc-sps-SPS_Data607_Week6/refs/heads/main/SPS_Spring2026_Schedule_Search.csv

From this website, we can easily see that 19 graduate classes are offered. We can copy and paste this untidy dataset into Excel and save it as a CSV file.

The Untidydat package also includes several untidy datasets that can be useful for practicing data cleaning and preprocessing.

These 3 datasets represent real-life data that we have obtained. We will first perform some exploratory data analysis (EDA) and then extract useful results based on the requirements. For the IMF dataset, I will select a few currencies against the USD to observe trends. The Central Bank dataset is a real example of wide-format data. The last dataset, the SPS class schedule, may not be considered wide-format, but it is an example of untidy data.

Code base

library(untidydata)
data("language_diversity")
head(language_diversity)

  Continent    Country Measurement Value
1    Africa    Algeria       Langs    18
2    Africa     Angola       Langs    42
3   Oceania  Australia       Langs   234
4      Asia Bangladesh       Langs    37
5    Africa      Benin       Langs    52
6  Americas    Bolivia       Langs    38

library(tidyr)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

df_ld_wide <- pivot_wider(language_diversity,
                          names_from = Measurement,
                          values_from = Value)
df_ld_wide

# A tibble: 74 × 8
   Continent Country      Langs    Area Population Stations   MGS   Std
   <chr>     <chr>        <dbl>   <dbl>      <dbl>    <dbl> <dbl> <dbl>
 1 Africa    Algeria         18 2381741      25660      102  6.6   2.29
 2 Africa    Angola          42 1246700      10303       50  6.22  1.87
 3 Oceania   Australia      234 7713364      17336      134  6     4.17
 4 Asia      Bangladesh      37  143998     118745       20  7.4   0.73
 5 Africa    Benin           52  112622       4889        7  7.14  0.99
 6 Americas  Bolivia         38 1098581       7612       48  6.92  2.5 
 7 Africa    Botswana        27  581730       1348       10  4.6   1.69
 8 Americas  Brazil         209 8511965     153322      245  9.71  5.87
 9 Africa    Burkina Faso    75  274000       9242        6  5.17  1.07
10 Africa    CAR             94  622984       3127       13  8.08  1.21
# ℹ 64 more rows

df_ld_long <- df_ld_wide %>%
    mutate(ContinentCode = case_when(
    Continent == "Africa" ~ 1,
    Continent == "Asia" ~ 2,
    Continent == "Oceania" ~ 3,
    Continent == "Americas" ~ 4
    
  )) %>%
  select(-Continent)
df_ld_long <- df_ld_long %>%
  pivot_longer(cols = -Country, names_to = "Metric", values_to = "Value")
df_ld_long

# A tibble: 518 × 3
   Country Metric             Value
   <chr>   <chr>              <dbl>
 1 Algeria Langs              18   
 2 Algeria Area          2381741   
 3 Algeria Population      25660   
 4 Algeria Stations          102   
 5 Algeria MGS                 6.6 
 6 Algeria Std                 2.29
 7 Algeria ContinentCode       1   
 8 Angola  Langs              42   
 9 Angola  Area          1246700   
10 Angola  Population      10303   
# ℹ 508 more rows

# df_ld_long %>%
#   group_by(ContinentCode) %>%           # group by continent
#   summarise(count = n())

# df_ld_long <- df_ld_long %>%
#   mutate(ContinentCode = case_when(
#     Continent == "Africa" ~ 1,
#     Continent == "Asia" ~ 2,
#     Continent == "Oceania" ~ 3,
#     Continent == "Americas" ~ 4
#     
#   )) %>%
#   select(-Continent)
# 
# df_ld_long

Now Africa=1, Asia=2, Oceania=3, Americas=4.

df_ld_reshape <- df_ld_long %>%
  pivot_wider(names_from =Country, values_from = Value)
df_ld_reshape

# A tibble: 7 × 75
  Metric     Algeria Angola Australia Bangladesh   Benin Bolivia Botswana Brazil
  <chr>        <dbl>  <dbl>     <dbl>      <dbl>   <dbl>   <dbl>    <dbl>  <dbl>
1 Langs       1.8 e1 4.2 e1    2.34e2      37    5.2 e+1  3.8 e1   2.7 e1 2.09e2
2 Area        2.38e6 1.25e6    7.71e6  143998    1.13e+5  1.10e6   5.82e5 8.51e6
3 Population  2.57e4 1.03e4    1.73e4  118745    4.89e+3  7.61e3   1.35e3 1.53e5
4 Stations    1.02e2 5   e1    1.34e2      20    7   e+0  4.8 e1   1   e1 2.45e2
5 MGS         6.6 e0 6.22e0    6   e0       7.4  7.14e+0  6.92e0   4.6 e0 9.71e0
6 Std         2.29e0 1.87e0    4.17e0       0.73 9.9 e-1  2.5 e0   1.69e0 5.87e0
7 Continent…  1   e0 1   e0    3   e0       2    1   e+0  4   e0   1   e0 4   e0
# ℹ 66 more variables: `Burkina Faso` <dbl>, CAR <dbl>, Cambodia <dbl>,
#   Cameroon <dbl>, Chad <dbl>, Colombia <dbl>, Congo <dbl>,
#   `Costa Rica` <dbl>, `Cote d'Ivoire` <dbl>, Cuba <dbl>, Ecuador <dbl>,
#   Egypt <dbl>, Ethiopia <dbl>, `French Guiana` <dbl>, Gabon <dbl>,
#   Ghana <dbl>, Guatemala <dbl>, Guinea <dbl>, Guyana <dbl>, Honduras <dbl>,
#   India <dbl>, Indonesia <dbl>, Kenya <dbl>, Laos <dbl>, Liberia <dbl>,
#   Libya <dbl>, Madagascar <dbl>, Malawi <dbl>, Malaysia <dbl>, Mali <dbl>, …

With the dataset in wide format, comparing two countries becomes straightforward and easy for the end user to read. However, if we want to plot any country in R, we still need to reshape the data back to a longer format. For example, I selected Algeria and Costa Rica for comparison.

df_2_compare <- df_ld_reshape %>%
  select(Metric,Algeria, `Costa Rica`)
df_2_compare <-df_2_compare %>%
  pivot_longer(cols = c(Algeria, `Costa Rica`), 
               names_to = "Country", 
               values_to = "Value")

df_2_compare

# A tibble: 14 × 3
   Metric        Country         Value
   <chr>         <chr>           <dbl>
 1 Langs         Algeria         18   
 2 Langs         Costa Rica      10   
 3 Area          Algeria    2381741   
 4 Area          Costa Rica   51100   
 5 Population    Algeria      25660   
 6 Population    Costa Rica    3064   
 7 Stations      Algeria        102   
 8 Stations      Costa Rica      38   
 9 MGS           Algeria          6.6 
10 MGS           Costa Rica       8.92
11 Std           Algeria          2.29
12 Std           Costa Rica       1.78
13 ContinentCode Algeria          1   
14 ContinentCode Costa Rica       4

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.1     ✔ readr     2.1.6
✔ ggplot2   4.0.1     ✔ stringr   1.6.0
✔ lubridate 1.9.4     ✔ tibble    3.3.1
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

ggplot(df_2_compare, aes(x = Metric, y = Value, fill = Country)) +
  geom_col(position = "dodge") +   # side-by-side bars
  labs(title = "Comparison of Metrics: Algeria vs Costa Rica",
       x = "Metric",
       y = "Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# df_long %>%
#   group_by(Country, Metric) %>%
#   summarise(count = n()) %>%
#   filter(count > 1)

df_long <- df_ld_reshape %>%
  pivot_longer(
    cols = -Metric,           # all columns except Metric
    names_to = "Country",     # new column storing country names
    values_to = "Value"       # temporary column for metric values
  ) %>%
  pivot_wider(
    names_from = Metric,      # turn metric names into columns
    values_from = Value       # fill cells with the values
  )
df_long

# A tibble: 74 × 8
   Country      Langs    Area Population Stations   MGS   Std ContinentCode
   <chr>        <dbl>   <dbl>      <dbl>    <dbl> <dbl> <dbl>         <dbl>
 1 Algeria         18 2381741      25660      102  6.6   2.29             1
 2 Angola          42 1246700      10303       50  6.22  1.87             1
 3 Australia      234 7713364      17336      134  6     4.17             3
 4 Bangladesh      37  143998     118745       20  7.4   0.73             2
 5 Benin           52  112622       4889        7  7.14  0.99             1
 6 Bolivia         38 1098581       7612       48  6.92  2.5              4
 7 Botswana        27  581730       1348       10  4.6   1.69             1
 8 Brazil         209 8511965     153322      245  9.71  5.87             4
 9 Burkina Faso    75  274000       9242        6  5.17  1.07             1
10 CAR             94  622984       3127       13  8.08  1.21             1
# ℹ 64 more rows

Here, we separate each metric indicator into multiple columns to provide more detailed insights for every indicator.

ggplot(df_fileter <- df_long %>% filter(Country %in% c("Algeria", "Costa Rica")), aes(x = Country, y = Langs, fill = Country)) +
  geom_col(position = "dodge") +   # side-by-side bars
  labs(title = "Comparison of Langs: Algeria vs Costa Rica",
       x = "Langs",
       y = "Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

url <- "https://raw.githubusercontent.com/dyc-sps/-dyc-sps-SPS_Data607_Week6/refs/heads/main/IMF_currencies_exchange_rate.csv"
df_imf_rate <- read.csv(url)
head(df_imf_rate)

        Currency X2.Jan.26 X5.Jan.26 X6.Jan.26 X7.Jan.26 X8.Jan.26 X9.Jan.26
1   Chinese yuan      <NA>    6.9806    6.9803    6.9913     6.983     6.982
2        Euro(1)    1.1721    1.1664    1.1707    1.1684    1.1675    1.1642
3   Japanese yen      <NA>    157.24    156.67    156.61    156.77    157.06
4  U.K. pound(1)   1.34375     1.346   1.35215    1.3502   1.34455   1.34125
5    U.S. dollar         1         1         1         1         1         1
6 Algerian dinar      <NA>   129.788  129.5562  129.7706  129.8228  129.9849
  X12.Jan.26 X13.Jan.26 X14.Jan.26 X15.Jan.26 X16.Jan.26 X20.Jan.26 X21.Jan.26
1     6.9746     6.9764     6.9732     6.9714     6.9683     6.9606     6.9638
2     1.1692     1.1654     1.1651     1.1624     1.1617     1.1728     1.1739
3       <NA>     158.24      159.2      158.5      158.2     158.15     158.21
4      1.346    1.34715    1.34495    1.34185    1.34085     1.3463    1.34085
5          1          1          1          1          1          1          1
6       <NA>   129.9154   129.9966   130.0776    130.153   129.7996   129.6896
  X22.Jan.26 X23.Jan.26 X26.Jan.26 X27.Jan.26 X28.Jan.26 X29.Jan.26 X30.Jan.26
1     6.9633     6.9647     6.9577     6.9571     6.9454     6.9469     6.9488
2     1.1706     1.1742     1.1836     1.1929     1.1974     1.1968     1.1919
3      158.2     158.55     154.77     154.29     152.58     153.09      153.6
4    1.34215    1.35165     1.3667    1.37115    1.37705    1.37925    1.37795
5          1          1          1          1          1          1          1
6   129.7822   129.5853   129.3217    129.229   129.1428   129.1078   129.2009

To plot the Chinese Yuan against the Japanese Yen from this dataset, I first need to reshape it into long format and filter the relevant countries before creating the plot.

df_imf_rate <- df_imf_rate %>%
  pivot_longer(
    cols = -Currency,           # all columns except Currency
    names_to = "IMF_Date",     # new column storing IMF_Date names
    values_to = "C_Rate" )     # temporary column for C_Rate values
    
   # ) %>%
  # 
  # pivot_wider(
  #   names_from = Metric,      # turn metric names into columns
  #   values_from = Value       # fill cells with the values
  # )
df_imf_rate$IMF_Date <- sub("^X", "",df_imf_rate$IMF_Date)
df_imf_rate$IMF_Date <- as.Date(df_imf_rate$IMF_Date, format = "%d.%b.%y")
df_imf_rate

# A tibble: 720 × 3
   Currency     IMF_Date   C_Rate
   <chr>        <date>     <chr> 
 1 Chinese yuan 2026-01-02 <NA>  
 2 Chinese yuan 2026-01-05 6.9806
 3 Chinese yuan 2026-01-06 6.9803
 4 Chinese yuan 2026-01-07 6.9913
 5 Chinese yuan 2026-01-08 6.983 
 6 Chinese yuan 2026-01-09 6.982 
 7 Chinese yuan 2026-01-12 6.9746
 8 Chinese yuan 2026-01-13 6.9764
 9 Chinese yuan 2026-01-14 6.9732
10 Chinese yuan 2026-01-15 6.9714
# ℹ 710 more rows

ggplot(df_fileter <- df_imf_rate %>% filter(Currency %in% c("Chinese yuan", "Japanese yen"))%>% drop_na(), aes(x = IMF_Date, y = C_Rate, group=Currency, color=Currency,fill = Currency)) +
  #geom_col(position = "dodge") +   # side-by-side bars
   geom_line() +
  labs(title = "Jan 2026 - Chinese yuan vs Japanese yen",
       x = "Date",
       y = "C_Rate") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

For this chart, it’s necessary to re-sort the dates and apply a log transformation to the differences, since the exchange rates vary widely. The Chinese Yuan generally ranges from 6 to 7, whereas the Japanese Yen is around 150.

options(max.print = 40) # to control total number of values printed.
url2<- "https://raw.githubusercontent.com/dyc-sps/-dyc-sps-SPS_Data607_Week6/refs/heads/main/IMF_CentralBankData_US.csv"
df_us_cbs <- read.csv(url2)
head(df_us_cbs)

     DATASET SERIES_CODE OBS_MEASURE COUNTRY INDICATOR TYPE_OF_TRANSFORMATION
     FREQUENCY SCALE DECIMALS_DISPLAYED SECTOR MFS_COLTN COUNTERPART_SECTOR
     MFS_INSTRL MFS_RA MFS_AGGREGATES ACCOUNTING_ENTRY FI_MATURITY CURRENCY
     VALUATION MFS_SRF MFS_COMPONENT FR_ADJ EXRATE TRANSFORMATION UNIT OVERLAP
     MFS_EAWR IFS_FLAG DOI FULL_DESCRIPTION AUTHOR PUBLISHER DEPARTMENT
     CONTACT_POINT TOPIC TOPIC_DATASET KEYWORDS KEYWORDS_DATASET LANGUAGE
     PUBLICATION_DATE
 [ reached 'max' / getOption("max.print") -- omitted 506 columns ]
 [ reached 'max' / getOption("max.print") -- omitted 6 rows ]

This is a wide dataset with over 500 columns.

options(max.print = 40) 
colnames(df_us_cbs)

 [1] "DATASET"                "SERIES_CODE"            "OBS_MEASURE"           
 [4] "COUNTRY"                "INDICATOR"              "TYPE_OF_TRANSFORMATION"
 [7] "FREQUENCY"              "SCALE"                  "DECIMALS_DISPLAYED"    
[10] "SECTOR"                 "MFS_COLTN"              "COUNTERPART_SECTOR"    
[13] "MFS_INSTRL"             "MFS_RA"                 "MFS_AGGREGATES"        
[16] "ACCOUNTING_ENTRY"       "FI_MATURITY"            "CURRENCY"              
[19] "VALUATION"              "MFS_SRF"                "MFS_COMPONENT"         
[22] "FR_ADJ"                 "EXRATE"                 "TRANSFORMATION"        
[25] "UNIT"                   "OVERLAP"                "MFS_EAWR"              
[28] "IFS_FLAG"               "DOI"                    "FULL_DESCRIPTION"      
[31] "AUTHOR"                 "PUBLISHER"              "DEPARTMENT"            
[34] "CONTACT_POINT"          "TOPIC"                  "TOPIC_DATASET"         
[37] "KEYWORDS"               "KEYWORDS_DATASET"       "LANGUAGE"              
[40] "PUBLICATION_DATE"      
 [ reached 'max' / getOption("max.print") -- omitted 506 entries ]

df_us_cbs <- df_us_cbs %>%
  # mutate(across(where(is.numeric), as.character))
  mutate(across(everything(), as.character))

df_us_cbs

     DATASET SERIES_CODE OBS_MEASURE COUNTRY INDICATOR TYPE_OF_TRANSFORMATION
     FREQUENCY SCALE DECIMALS_DISPLAYED SECTOR MFS_COLTN COUNTERPART_SECTOR
     MFS_INSTRL MFS_RA MFS_AGGREGATES ACCOUNTING_ENTRY FI_MATURITY CURRENCY
     VALUATION MFS_SRF MFS_COMPONENT FR_ADJ EXRATE TRANSFORMATION UNIT OVERLAP
     MFS_EAWR IFS_FLAG DOI FULL_DESCRIPTION AUTHOR PUBLISHER DEPARTMENT
     CONTACT_POINT TOPIC TOPIC_DATASET KEYWORDS KEYWORDS_DATASET LANGUAGE
     PUBLICATION_DATE
 [ reached 'max' / getOption("max.print") -- omitted 506 columns ]
 [ reached 'max' / getOption("max.print") -- omitted 6 rows ]

df_us_cbs <- df_us_cbs %>%
  pivot_longer(
    cols = -c(COUNTRY,SERIES_CODE),           # all columns except COUNTRY
    names_to = "CBS_Date",     # new column storing IMF_Date names
    values_to = "Amount" )      # temporary column for C_Rate values
df_us_cbs

# A tibble: 3,264 × 4
   SERIES_CODE                        COUNTRY       CBS_Date              Amount
   <chr>                              <chr>         <chr>                 <chr> 
 1 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States DATASET               IMF.S…
 2 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States OBS_MEASURE           OBS_V…
 3 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States INDICATOR             Liabi…
 4 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States TYPE_OF_TRANSFORMATI… US do…
 5 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States FREQUENCY             Month…
 6 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States SCALE                 Milli…
 7 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States DECIMALS_DISPLAYED    <NA>  
 8 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States SECTOR                <NA>  
 9 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States MFS_COLTN             <NA>  
10 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States COUNTERPART_SECTOR    <NA>  
# ℹ 3,254 more rows

library(stringr)

df_filtered <- df_us_cbs %>%
  #filter(CBS_Date %in% c("USA.S121_L_LT_S1311MIXED_CBS.USD.M", "USA.S121_L_MB_CBS.USD.M")) %>%
  filter(str_detect(CBS_Date, "^X\\d{4}\\.M\\d{2}$")) %>%
  drop_na()

df_filtered <- df_filtered %>%
  mutate(Amount = as.numeric(Amount))
print(df_filtered)

# A tibble: 570 × 4
   SERIES_CODE                        COUNTRY       CBS_Date  Amount
   <chr>                              <chr>         <chr>      <dbl>
 1 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2001.M12   6645
 2 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M01  13688
 3 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M02   5752
 4 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M03   5692
 5 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M04   5387
 6 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M05   5883
 7 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M06   8116
 8 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M07   6242
 9 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M08   4874
10 USA.S121_L_LT_S1311MIXED_CBS.USD.M United States X2002.M09   7879
# ℹ 560 more rows

ggplot(df_filtered, aes(x = CBS_Date, y = Amount, color =SERIES_CODE,group=SERIES_CODE, fill = SERIES_CODE)) +
  #geom_col(position = "dodge") +   # side-by-side bars
  geom_line() +
  labs(title = "Liabilities to Central Government (CBS) vs Liabilities, Monetary base (CBS) Monthly",
       x = "Date",
       y = "Millions") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggsave("df_us_cbs_line_plot.png", 
       plot = last_plot(),  # your last ggplot
       width = 10,          # width in inches
       height = 6,          # height in inches
       dpi = 300)

df_filtered <- df_us_cbs %>%
  filter(str_detect(CBS_Date, "^X\\d{4}\\.Q\\d{1}$")) %>%
  drop_na()
#print(df_filtered)
df_filtered <- df_filtered %>%
  mutate(Amount = as.numeric(Amount))
ggplot(df_filtered, aes(x = CBS_Date, y = Amount, color =SERIES_CODE,group=SERIES_CODE, fill = SERIES_CODE)) +
  #geom_col(position = "dodge") +   # side-by-side bars
  geom_line() +
  labs(title = "Liabilities to Central Government (CBS) vs Liabilities, Monetary base (CBS) Quarterly",
       x = "Date",
       y = "Millions") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

df_filtered <- df_us_cbs %>%
  filter(str_detect(CBS_Date, "^X\\d{4}$")) %>%
  drop_na()
#print(df_filtered)
df_filtered <- df_filtered %>%
  mutate(Amount = as.numeric(Amount))
ggplot(df_filtered, aes(x = CBS_Date, y = Amount, color =SERIES_CODE,group=SERIES_CODE, fill = SERIES_CODE)) +
  #geom_col(position = "dodge") +   # side-by-side bars
  geom_line() +
  labs(title = "Liabilities to Central Government (CBS) vs Liabilities, Monetary base (CBS) Annual",
       x = "Date",
       y = "Millions") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Conclusion:

Wide and long formats each have their strengths. Wide format is easier for humans to read and quickly compare values across multiple variables, making it ideal for tables and reports. Long format, on the other hand, is better suited for data processing, analysis, and plotting in tools, because it allows for easier filtering, grouping, and reshaping. In practice, it’s common to switch between the two formats depending on whether the focus is on readability or data manipulation.

LLMS used:

• OpenAI. (2025). ChatGPT (Version 5.2) [Large language model]. https://chat.openai.com. Accessed Mar 08, 2026.