Week 4 Data Dive — Sampling and Drawing Conclusions

Introduction

This data dive is an investigation into the instability of truth. I am going to treat our complete climate dataset as if it represents the entire “population” of possible observations, then sample from it repeatedly simulating what different research teams might have collected. its interesting - the conclusions drawn are shockingly dependent on the luck of the draw.

Preparting for the data dive

library(tidyverse)
library(knitr)
library(ggplot2)

df <- read_csv("climate_change_dataset.csv")

glimpse(df)

## Rows: 1,000
## Columns: 10
## $ Year                          <dbl> 2006, 2019, 2014, 2010, 2007, 2020, 2006…
## $ Country                       <chr> "UK", "USA", "France", "Argentina", "Ger…
## $ `Avg Temperature (°C)`        <dbl> 8.9, 31.0, 33.9, 5.9, 26.9, 32.3, 30.7, …
## $ `CO2 Emissions (Tons/Capita)` <dbl> 9.3, 4.8, 2.8, 1.8, 5.6, 1.4, 11.6, 6.0,…
## $ `Sea Level Rise (mm)`         <dbl> 3.1, 4.2, 2.2, 3.2, 2.4, 2.7, 3.9, 4.5, …
## $ `Rainfall (mm)`               <dbl> 1441, 2407, 1241, 1892, 1743, 2100, 1755…
## $ Population                    <dbl> 530911230, 107364344, 441101758, 1069669…
## $ `Renewable Energy (%)`        <dbl> 20.4, 49.2, 33.3, 23.7, 12.5, 49.4, 41.9…
## $ `Extreme Weather Events`      <dbl> 14, 8, 9, 7, 4, 12, 10, 1, 4, 5, 13, 14,…
## $ `Forest Area (%)`             <dbl> 59.8, 31.0, 35.5, 17.7, 17.4, 47.2, 50.5…

Our dataset contains 1000 observations spanning 24 years across 15 countries. Each observation captures a moment in time: temperature, CO2 emissions, sea level rise, rainfall, population, renewable energy adoption, extreme weather events, and forest coverage. It’s a rich tapestry of our planet’s vital signs.

This is our “population.” In reality, this dataset is just a sample from the infinite possible measurements I could have taken.

Experiment 1

Let’s start with a moderate sampling scenario. Here I imagine three different research teams, each collecting 25% of possible observations. This isn’t unrealistic as many climate studies work with limited data due to funding, accessibility, or time constraints.

set.seed(42)  
sample_frac <- 0.25
n_samples <- 5

df_samples_25 <- tibble()

for (sample_i in 1:n_samples) {
  df_i <- df |>
    sample_n(size = sample_frac * nrow(df), replace = TRUE) |>
    mutate(sample_num = sample_i)
  
  df_samples_25 <- bind_rows(df_samples_25, df_i)
}

The Temperature Paradox

Let’s start with something seemingly straightforward: average global temperature. Surely, with 250 observations per sample, I should get consistent estimates, right?

temp_by_sample <- df_samples_25 |>
  group_by(sample_num) |>
  summarise(
    mean_temp = mean(`Avg Temperature (°C)`),
    sd_temp = sd(`Avg Temperature (°C)`),
    min_temp = min(`Avg Temperature (°C)`),
    max_temp = max(`Avg Temperature (°C)`)
  )

kable(temp_by_sample, digits = 2, 
      caption = "Temperature Statistics Across Five 25% Subsamples")

Temperature Statistics Across Five 25% Subsamples
sample_num	mean_temp	sd_temp	min_temp	max_temp
1	20.78	8.61	6.2	34.9
2	18.95	8.39	5.0	34.9
3	20.27	8.55	5.1	34.8
4	19.94	8.44	5.0	34.9
5	20.49	8.65	5.3	34.8

full_pop_temp <- df |>
  summarise(
    mean_temp = mean(`Avg Temperature (°C)`),
    sd_temp = sd(`Avg Temperature (°C)`)
  )

ggplot(df_samples_25, aes(x = factor(sample_num), y = `Avg Temperature (°C)`, 
                           fill = factor(sample_num))) +
  geom_violin(alpha = 0.6) +
  geom_hline(yintercept = full_pop_temp$mean_temp, linetype = "dashed", 
             color = "red", size = 1) +
  labs(title = "Temperature Distributions Across Different Subsamples",
       subtitle = "Red line shows the true population mean",
       x = "Sample Number",
       y = "Average Temperature (°C)") +
  theme_minimal() +
  theme(legend.position = "none")

Here’s what struck me immediately: the mean temperature varies by 1.83°C across our five samples. That might not sound like much, but in climate science, a difference of even 0.5°C can mean the difference between “concerning” and “catastrophic” predictions.

If you were a researcher working only with sample 2, you’d report a mean temperature of 18.95°C. But if you had been assigned to collect Sample 1 instead, you’d be reporting 20.78°C. These teams would be drawing different conclusions about the severity of warming—not because one is right and the other is wrong, but because they got different slices of reality.

The Extreme Weather Conundrum

Now let’s look at something more discrete and countable: extreme weather events. Surely counts are more stable than continuous measurements?

weather_by_sample <- df_samples_25 |>
  group_by(sample_num) |>
  summarise(
    total_events = sum(`Extreme Weather Events`),
    mean_events = mean(`Extreme Weather Events`),
    countries_with_zero = sum(`Extreme Weather Events` == 0),
    countries_with_10plus = sum(`Extreme Weather Events` >= 10),
    max_events = max(`Extreme Weather Events`)
  )

kable(weather_by_sample, digits = 2,
      caption = "Extreme Weather Events Across Subsamples")

Extreme Weather Events Across Subsamples
sample_num	total_events	mean_events	countries_with_zero	countries_with_10plus	max_events
1	1961	7.84	20	95	14
2	1829	7.32	12	89	14
3	1922	7.69	10	91	14
4	1851	7.40	20	94	14
5	1962	7.85	16	106	14

# Full population comparison
full_pop_weather <- df |>
  summarise(
    mean_events = mean(`Extreme Weather Events`),
    total_events = sum(`Extreme Weather Events`),
    pct_zero = sum(`Extreme Weather Events` == 0) / n() * 100
  )

This is where things get genuinely unsettling. Sample 2 recorded a mean of 7.32 extreme weather events per observation, while Sample 5 recorded 7.85.

Think about the policy implications: if you’re a government official reading a report based on Sample 2, you might conclude that extreme weather isn’t as severe as predicted and scale back emergency preparedness funding. But the researchers who happened to collect Sample 5 would be urging you to declare a climate emergency.

Both teams are being honest. Both are reporting what they observed. But they’re telling completely different stories about the same reality.

The Anomaly

Let’s dig into something even more troubling: identifying anomalies. In Sample 1, let’s find observations that would be considered outliers based on CO2 emissions.

anomaly_analysis <- df_samples_25 |>
  group_by(sample_num) |>
  mutate(
    co2_mean = mean(`CO2 Emissions (Tons/Capita)`),
    co2_sd = sd(`CO2 Emissions (Tons/Capita)`),
    co2_z_score = (`CO2 Emissions (Tons/Capita)` - co2_mean) / co2_sd,
    is_anomaly = abs(co2_z_score) > 2
  ) |>
  ungroup()

# Count anomalies per sample
anomaly_counts <- anomaly_analysis |>
  group_by(sample_num) |>
  summarise(
    num_anomalies = sum(is_anomaly),
    pct_anomalies = mean(is_anomaly) * 100,
    highest_co2 = max(`CO2 Emissions (Tons/Capita)`),
    anomaly_threshold_lower = first(co2_mean - 2*co2_sd),
    anomaly_threshold_upper = first(co2_mean + 2*co2_sd)
  )

kable(anomaly_counts, digits = 2,
      caption = "Anomaly Detection Varies Dramatically Across Samples")

Anomaly Detection Varies Dramatically Across Samples
sample_num	highest_co2	anomaly_threshold_lower	anomaly_threshold_upper
1	20.0	-0.12	21.53
2	20.0	-0.53	21.52
3	20.0	-0.68	21.36
4	20.0	-1.70	20.75
5	19.9	-0.13	21.52

what counts as an “anomaly” in one sample is perfectly normal in another. Sample 1 flagged 0 observations as anomalous, while Sample 1 only flagged 0. Leon mentioned this phenomena in his lecture as well.

Imagine you’re investigating a specific country’s CO2 emissions. In Sample 1’s context, an emission level of 18 tons per capita might trigger red flags, leading to international scrutiny and calls for sanctions. But in Sample 3’s context, that same value might be considered well within normal bounds, just business as usual.

The data point hasn’t changed. The country’s actual emissions haven’t changed. But our interpretation and the real world consequences of that interpretation shifts dramatically based on which other observations happened to be in our dataset.

Experiment 2

Now let’s simulate what happens when budget cuts force us to work with even less data. What if we could only collect 10% of observations?

set.seed(123)
sample_frac_small <- 0.10
n_samples_small <- 5

df_samples_10 <- tibble()

for (sample_i in 1:n_samples_small) {
  df_i <- df |>
    sample_n(size = sample_frac_small * nrow(df), replace = TRUE) |>
    mutate(sample_num = sample_i)
  
  df_samples_10 <- bind_rows(df_samples_10, df_i)
}

How country representation collapses here

With smaller samples, the problem becomes even more acute. Let’s see which countries appear in each subsample and how often.

country_rep <- df_samples_10 |>
  group_by(sample_num, Country) |>
  summarise(count = n(), .groups = 'drop') |>
  pivot_wider(names_from = sample_num, values_from = count, 
              names_prefix = "Sample_", values_fill = 0)

kable(country_rep, 
      caption = "Country Representation in 10% Subsamples - Notice the Zeros")

Country Representation in 10% Subsamples - Notice the Zeros
Country	Sample_1	Sample_2	Sample_3	Sample_4	Sample_5
Argentina	3	2	8	7	8
Australia	1	3	8	6	6
Brazil	11	7	5	7	3
Canada	9	5	5	4	7
China	6	8	10	11	9
France	9	12	8	5	5
Germany	7	8	0	7	6
India	5	5	10	4	11
Indonesia	12	7	5	12	7
Japan	3	7	3	8	6
Mexico	6	5	4	1	3
Russia	7	5	7	6	9
South Africa	4	8	6	10	5
UK	6	7	4	2	7
USA	11	11	17	10	8

missing_countries <- df_samples_10 |>
  group_by(sample_num) |>
  summarise(
    unique_countries = n_distinct(Country),
    missing_countries = 15 - n_distinct(Country)  # Total countries in population
  )

kable(missing_countries,
      caption = "Number of Countries Completely Missing from Each Sample")

Number of Countries Completely Missing from Each Sample
sample_num	unique_countries	missing_countries
1	15	0
2	15	0
3	14	1
4	15	0
5	15	0

This is where sampling at small fractions becomes genuinely dangerous for climate research. With only 10% sampling, some countries might not appear in your dataset at all. Imagine publishing a “global” climate report that accidentally excludes Brazil or Indonesia because they just didn’t happen to show up in your random sample.

Also a country might appear only once or twice, giving you a wildly skewed picture of its climate reality. If Mexico appears in your sample only during an anomalous drought year, you might conclude that Mexico is water-scarce, when in reality you just got unlucky with your sampling.

Investigation of the renewable energy mirage

Let’s look at renewable energy adoption.

renewable_comparison <- df_samples_10 |>
  group_by(sample_num) |>
  summarise(
    mean_renewable = mean(`Renewable Energy (%)`),
    median_renewable = median(`Renewable Energy (%)`),
    countries_above_30pct = sum(`Renewable Energy (%)` > 30),
    countries_below_10pct = sum(`Renewable Energy (%)` < 10)
  )

kable(renewable_comparison, digits = 2,
      caption = "Renewable Energy: Wildly Different Pictures from Different 10% Samples")

Renewable Energy: Wildly Different Pictures from Different 10% Samples
sample_num	mean_renewable	median_renewable	countries_above_30pct	countries_below_10pct
1	29.09	29.30	48	8
2	28.94	28.95	49	10
3	27.20	26.75	44	14
4	27.48	28.50	44	8
5	26.71	24.85	39	8

ggplot(df_samples_10, aes(x = factor(sample_num), y = `Renewable Energy (%)`,
                          fill = factor(sample_num))) +
  geom_boxplot(alpha = 0.7) +
  geom_hline(yintercept = mean(df$`Renewable Energy (%)`), 
             linetype = "dashed", color = "red", size = 1) +
  labs(title = "Renewable Energy Adoption: The 10% Sample Lottery",
       subtitle = "Same population, wildly different stories",
       x = "Sample Number",
       y = "Renewable Energy (%)") +
  theme_minimal() +
  theme(legend.position = "none")

The range of mean renewable energy percentages across our small samples is staggering: 2.38 percentage points.

Experiment 3

What happens when we can collect much more data like 75% of all possible observations?

set.seed(789)
sample_frac_large <- 0.75
n_samples_large <- 5

df_samples_75 <- tibble()

for (sample_i in 1:n_samples_large) {
  df_i <- df |>
    sample_n(size = sample_frac_large * nrow(df), replace = TRUE) |>
    mutate(sample_num = sample_i)
  
  df_samples_75 <- bind_rows(df_samples_75, df_i)
}

Conclusion

With larger samples, something reassuring finally happens that the noise starts to cancel out. This too was something I learned from the class and I was able to see it here.

large_sample_metrics <- df_samples_75 |>
  group_by(sample_num) |>
  summarise(
    mean_temp = mean(`Avg Temperature (°C)`),
    mean_co2 = mean(`CO2 Emissions (Tons/Capita)`),
    mean_sea_level = mean(`Sea Level Rise (mm)`),
    mean_renewable = mean(`Renewable Energy (%)`),
    mean_extreme_events = mean(`Extreme Weather Events`)
  )

metric_stability <- large_sample_metrics |>
  summarise(
    temp_range = max(mean_temp) - min(mean_temp),
    co2_range = max(mean_co2) - min(mean_co2),
    sea_level_range = max(mean_sea_level) - min(mean_sea_level),
    renewable_range = max(mean_renewable) - min(mean_renewable),
    events_range = max(mean_extreme_events) - min(mean_extreme_events)
  )

kable(large_sample_metrics, digits = 2,
      caption = "Metrics from 75% Subsamples - Notice the Stability")

Metrics from 75% Subsamples - Notice the Stability
sample_num	mean_temp	mean_co2	mean_sea_level	mean_renewable	mean_extreme_events
1	19.83	10.22	3.06	27.66	7.31
2	19.68	10.74	3.04	27.21	7.39
3	20.42	10.48	3.05	28.12	7.49
4	20.34	10.20	3.00	27.33	7.49
5	19.74	10.50	3.02	27.72	7.32

kable(metric_stability, digits = 3,
      caption = "Range Across 75% Subsamples - Dramatically Smaller Variation")

Range Across 75% Subsamples - Dramatically Smaller Variation
temp_range	co2_range	sea_level_range	renewable_range	events_range
0.743	0.536	0.059	0.913	0.181

The temperature range across our 75% samples is only 0.743°C, compared to 1.83°C in our 25% samples. The extreme events range dropped from substantial variation to just 0.18 events.

With large samples, they naturally filter out the noise and converge on truth.

What remains consistent is the relationships

Even with small samples, some things do hold steady. Let’s investigate whether the relationships between variables remain consistent across sample sizes.

correlations_25 <- df_samples_25 |>
  group_by(sample_num) |>
  summarise(
    cor_co2_temp = cor(`CO2 Emissions (Tons/Capita)`, `Avg Temperature (°C)`),
    cor_renewable_events = cor(`Renewable Energy (%)`, `Extreme Weather Events`),
    sample_size = "25%"
  )

correlations_10 <- df_samples_10 |>
  group_by(sample_num) |>
  summarise(
    cor_co2_temp = cor(`CO2 Emissions (Tons/Capita)`, `Avg Temperature (°C)`),
    cor_renewable_events = cor(`Renewable Energy (%)`, `Extreme Weather Events`),
    sample_size = "10%"
  )

correlations_75 <- df_samples_75 |>
  group_by(sample_num) |>
  summarise(
    cor_co2_temp = cor(`CO2 Emissions (Tons/Capita)`, `Avg Temperature (°C)`),
    cor_renewable_events = cor(`Renewable Energy (%)`, `Extreme Weather Events`),
    sample_size = "75%"
  )

pop_cors <- df |>
  summarise(
    cor_co2_temp = cor(`CO2 Emissions (Tons/Capita)`, `Avg Temperature (°C)`),
    cor_renewable_events = cor(`Renewable Energy (%)`, `Extreme Weather Events`)
  )

all_cors <- bind_rows(correlations_10, correlations_25, correlations_75)

ggplot(all_cors, aes(x = sample_size, y = cor_co2_temp, fill = sample_size)) +
  geom_boxplot(alpha = 0.7) +
  geom_hline(yintercept = pop_cors$cor_co2_temp, linetype = "dashed", 
             color = "red", size = 1) +
  labs(title = "Correlation Between CO2 Emissions and Temperature",
       subtitle = "Relationships are more stable than raw estimates",
       x = "Sample Size",
       y = "Correlation Coefficient") +
  theme_minimal() +
  theme(legend.position = "none")

This is fascinating and somewhat comforting is even when our point estimates are all over the place, the relationships between variables tend to be more consistent. The correlation between CO2 and temperature doesn’t swing wildly like the raw means do. This suggests that while we might struggle to nail down exact values with small samples, we can still more or less identify meaningful patterns and relationships.

The Temporal Dimension

So far, I have been randomly sampling across all years equally. But in real research, there’s often a temporal bias.

# Creating a sample heavily biased toward recent years (2015-2023)
recent_biased <- df |>
  mutate(
    sampling_weight = if_else(Year >= 2015, 3, 1)  
  ) |>
  slice_sample(n = 250, weight_by = sampling_weight, replace = TRUE) |>
  mutate(sample_type = "Recent Bias")

# Creating a sample heavily biased toward early years (2000-2010)
early_biased <- df |>
  mutate(
    sampling_weight = if_else(Year <= 2010, 3, 1)
  ) |>
  slice_sample(n = 250, weight_by = sampling_weight, replace = TRUE) |>
  mutate(sample_type = "Early Bias")

# Creating a truly random sample for comparison
random_sample <- df |>
  slice_sample(n = 250, replace = TRUE) |>
  mutate(sample_type = "Random")

temporal_samples <- bind_rows(recent_biased, early_biased, random_sample)

temporal_summary <- temporal_samples |>
  group_by(sample_type) |>
  summarise(
    mean_year = mean(Year),
    min_year = min(Year),
    max_year = max(Year),
    mean_temp = mean(`Avg Temperature (°C)`),
    mean_renewable = mean(`Renewable Energy (%)`),
    mean_co2 = mean(`CO2 Emissions (Tons/Capita)`)
  )

kable(temporal_summary, digits = 2,
      caption = "How Temporal Bias Completely Changes Our Climate Story")

How Temporal Bias Completely Changes Our Climate Story
sample_type	mean_year	min_year	max_year	mean_temp	mean_renewable	mean_co2
Early Bias	2008.58	2000	2023	20.01	27.65	10.33
Random	2011.33	2000	2023	19.19	27.76	10.47
Recent Bias	2014.98	2000	2023	20.24	28.53	10.46

ggplot(temporal_samples, aes(x = Year, fill = sample_type)) +
  geom_histogram(alpha = 0.6, position = "identity", bins = 20) +
  facet_wrap(~sample_type, ncol = 1) +
  labs(title = "Year Distribution Across Differently Biased Samples",
       subtitle = "Notice how 'Recent Bias' barely includes early 2000s data",
       x = "Year",
       y = "Count") +
  theme_minimal() +
  theme(legend.position = "none")

The “Recent Bias” sample shows 0.24°C higher average temperatures than the “Early Bias” sample not because of random chance, but because we’ve systematically oversampled warmer recent years.

So if you’re a climate change denier looking to downplay warming, you’d strategically collect more data from the cooler early 2000s. If you’re an activist trying to sound the alarm, you’d focus on recent data showing dramatic changes. Both approaches would be technically data-driven, but both would be fundamentally misleading because they’ve violated the principle of random sampling.

Last Insight

One more critical insight: let’s see how sample selection affects our ability to compare countries fairly.

country_comparison <- df_samples_25 |>
  filter(Country %in% c("USA", "China")) |>
  group_by(sample_num, Country) |>
  summarise(
    mean_temp = mean(`Avg Temperature (°C)`),
    mean_co2 = mean(`CO2 Emissions (Tons/Capita)`),
    mean_renewable = mean(`Renewable Energy (%)`),
    n_obs = n(),
    .groups = 'drop'
  ) |>
  pivot_wider(
    names_from = Country,
    values_from = c(mean_temp, mean_co2, mean_renewable, n_obs)
  )

kable(country_comparison, digits = 2,
      caption = "USA vs China Comparison - Which Country Is 'Better'? It Depends on Your Sample!")

USA vs China Comparison - Which Country Is ‘Better’? It Depends on Your Sample!
sample_num	mean_temp_China	mean_temp_USA	mean_co2_China	mean_co2_USA	mean_renewable_China	mean_renewable_USA	n_obs_China	n_obs_USA
1	20.17	20.29	11.16	10.41	29.95	24.41	19	21
2	18.00	16.21	11.25	11.92	31.10	34.27	20	18
3	22.73	19.73	10.90	11.80	29.62	26.83	21	16
4	19.91	19.90	10.81	9.29	30.76	24.04	14	18
5	20.27	15.56	10.49	9.57	27.67	22.29	16	17

In some samples, the USA appears to have higher CO2 emissions than China. In others, it’s reversed. In some samples, China looks like a renewable energy leader; in others, it’s the USA. The number of observations we have for each country (n_obs columns) varies wildly across samples, which means we’re not even comparing apples to apples—sometimes we’re judging one country based on 20 observations and another based on 5.

This has real geopolitical consequences. International climate agreements often involve comparisons between nations. If the sample happens to make China look cleaner than the USA, you might advocate for stricter regulations on America. If your sample makes the USA look cleaner, you’d push for Chinese regulation instead. But these conclusions aren’t reflecting reality, they’re reflecting sampling luck.

Learning Outcomes

After generating dozens of subsamples and analyzing them from every angle, several insights emerge:

First, the instability of point estimates is terrifying. With 25% sampling, our mean temperature estimates varied by enough to change the entire narrative around global warming severity. With 10% sampling, some countries disappeared entirely from our analysis. This means that many published climate studies working with similarly limited samples—might be reporting numbers that would look completely different if they had collected a different set of observations from the same population.

Second, anomaly detection is frighteningly subjective. What counts as an outlier depends entirely on context, and that context shifts with every sample. This has huge implications for identifying climate tipping points, extreme events, and countries that need intervention. The same data point can be a five-alarm fire in one sample and completely unremarkable in another.

Third, sample size matters more than we want to admit. The jump from 10% to 25% sampling dramatically improves stability, and the jump to 75% sampling makes our estimates converge on truth. But in real research, we rarely have the luxury of massive sample sizes. Many climate studies are probably less certain than they appear, because they’re working with sample fractions closer to 10-25% than to 75%.

Finally, the consistency we can trust. Despite all this variation, certain patterns held across every sample - the general relationship between CO2 and temperature, the broad geographic patterns, the relative rankings of countries on major metrics. These findings are where our confidence should lie.

sessionInfo()

## R version 4.5.2 (2025-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.51      lubridate_1.9.4 forcats_1.0.1   stringr_1.6.0  
##  [5] dplyr_1.1.4     purrr_1.2.1     readr_2.1.6     tidyr_1.3.2    
##  [9] tibble_3.3.1    ggplot2_4.0.1   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] bit_4.6.0          gtable_0.3.6       jsonlite_2.0.0     crayon_1.5.3      
##  [5] compiler_4.5.2     tidyselect_1.2.1   parallel_4.5.2     jquerylib_0.1.4   
##  [9] scales_1.4.0       yaml_2.3.12        fastmap_1.2.0      R6_2.6.1          
## [13] labeling_0.4.3     generics_0.1.4     bslib_0.10.0       pillar_1.11.1     
## [17] RColorBrewer_1.1-3 tzdb_0.5.0         rlang_1.1.7        stringi_1.8.7     
## [21] cachem_1.1.0       xfun_0.56          sass_0.4.10        S7_0.2.1          
## [25] bit64_4.6.0-1      otel_0.2.0         timechange_0.3.0   cli_3.6.5         
## [29] withr_3.0.2        magrittr_2.0.4     digest_0.6.39      grid_4.5.2        
## [33] vroom_1.6.7        rstudioapi_0.18.0  hms_1.1.4          lifecycle_1.0.5   
## [37] vctrs_0.7.1        evaluate_1.0.5     glue_1.8.0         farver_2.1.2      
## [41] rmarkdown_2.30     tools_4.5.2        pkgconfig_2.0.3    htmltools_0.5.9