Assignment 3

rm = (list = ls())
gc()

##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 537926 28.8    1193701 63.8   686460 36.7
## Vcells 979800  7.5    8388608 64.0  1876069 14.4

# Load the data
# Load required libraries
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.4.2

## Warning: package 'ggplot2' was built under R version 4.4.2

## Warning: package 'tidyr' was built under R version 4.4.2

## Warning: package 'readr' was built under R version 4.4.2

## Warning: package 'dplyr' was built under R version 4.4.2

## Warning: package 'forcats' was built under R version 4.4.2

## Warning: package 'lubridate' was built under R version 4.4.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tidyr)
library(dplyr)

# Simulated data in wide format
air_quality_wide <- tibble(
  date = c("2024-02-01", "2024-02-02", "2024-02-03"),
  PM2.5 = c(12, 15, 18),
  NO2 = c(25, 30, 28),
  O3 = c(35, 40, 38)
)

# Convert to long format
air_quality_long <- air_quality_wide %>%
  pivot_longer(cols = c(PM2.5, NO2, O3),
               names_to = "pollutant",
               values_to = "concentration")

air_quality_wide_again <- air_quality_long %>%
  pivot_wider(names_from = pollutant, values_from = concentration)

print(air_quality_long)

## # A tibble: 9 × 3
##   date       pollutant concentration
##   <chr>      <chr>             <dbl>
## 1 2024-02-01 PM2.5                12
## 2 2024-02-01 NO2                  25
## 3 2024-02-01 O3                   35
## 4 2024-02-02 PM2.5                15
## 5 2024-02-02 NO2                  30
## 6 2024-02-02 O3                   40
## 7 2024-02-03 PM2.5                18
## 8 2024-02-03 NO2                  28
## 9 2024-02-03 O3                   38

print(air_quality_wide_again)

## # A tibble: 3 × 4
##   date       PM2.5   NO2    O3
##   <chr>      <dbl> <dbl> <dbl>
## 1 2024-02-01    12    25    35
## 2 2024-02-02    15    30    40
## 3 2024-02-03    18    28    38

This R code demonstrates how to reshape data between wide and long formats using the tidyr and dplyr libraries. The dataset being used is a simulated air quality dataset, where different pollutant measurements (PM2.5, NO2, and O3) are recorded for three dates.

First, the data is created in a wide format, where each pollutant has its own column. The function pivot_longer() is then used to convert the dataset to a long format, where all pollutants are combined into a single column (pollutant), and their values are stored in another column (concentration). This transformation is useful when working with time-series data or when performing statistical analysis that requires categorical groupings.

Next, the pivot_wider() function is applied to convert the dataset back into its original wide format, restoring separate columns for each pollutant. This process is useful when we need data in a more readable spreadsheet-like structure. The final step prints both the long format and the wide format again to verify that the transformations worked correctly.

```

Assignment 3

Donna Parker

2025-03-04