rm = (list = ls())
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 537926 28.8 1193701 63.8 686460 36.7
## Vcells 979800 7.5 8388608 64.0 1876069 14.4
# Load the data
# Load required libraries
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.2
## Warning: package 'tidyr' was built under R version 4.4.2
## Warning: package 'readr' was built under R version 4.4.2
## Warning: package 'dplyr' was built under R version 4.4.2
## Warning: package 'forcats' was built under R version 4.4.2
## Warning: package 'lubridate' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(dplyr)
# Simulated data in wide format
air_quality_wide <- tibble(
date = c("2024-02-01", "2024-02-02", "2024-02-03"),
PM2.5 = c(12, 15, 18),
NO2 = c(25, 30, 28),
O3 = c(35, 40, 38)
)
# Convert to long format
air_quality_long <- air_quality_wide %>%
pivot_longer(cols = c(PM2.5, NO2, O3),
names_to = "pollutant",
values_to = "concentration")
air_quality_wide_again <- air_quality_long %>%
pivot_wider(names_from = pollutant, values_from = concentration)
print(air_quality_long)
## # A tibble: 9 × 3
## date pollutant concentration
## <chr> <chr> <dbl>
## 1 2024-02-01 PM2.5 12
## 2 2024-02-01 NO2 25
## 3 2024-02-01 O3 35
## 4 2024-02-02 PM2.5 15
## 5 2024-02-02 NO2 30
## 6 2024-02-02 O3 40
## 7 2024-02-03 PM2.5 18
## 8 2024-02-03 NO2 28
## 9 2024-02-03 O3 38
print(air_quality_wide_again)
## # A tibble: 3 × 4
## date PM2.5 NO2 O3
## <chr> <dbl> <dbl> <dbl>
## 1 2024-02-01 12 25 35
## 2 2024-02-02 15 30 40
## 3 2024-02-03 18 28 38
This R code demonstrates how to reshape data between wide and long formats using the tidyr and dplyr libraries. The dataset being used is a simulated air quality dataset, where different pollutant measurements (PM2.5, NO2, and O3) are recorded for three dates.
First, the data is created in a wide format, where each pollutant has its own column. The function pivot_longer() is then used to convert the dataset to a long format, where all pollutants are combined into a single column (pollutant), and their values are stored in another column (concentration). This transformation is useful when working with time-series data or when performing statistical analysis that requires categorical groupings.
Next, the pivot_wider() function is applied to convert the dataset back into its original wide format, restoring separate columns for each pollutant. This process is useful when we need data in a more readable spreadsheet-like structure. The final step prints both the long format and the wide format again to verify that the transformations worked correctly.
```