Reading and writing tabular data in plain-text files (CSV, TSV, etc.)

# Exploring the Parameters
# When I work with CSV files, I find it essential to understand a few key parameters:
# 
# File: This is the name of the CSV file I want to read.
# Header: I always check if my file has a header row with column names. It makes working with data much easier.
# Separator (sep): I use this to specify what separates the cells in each row—usually a comma.
# Quote: This symbol is used when I need to quote character strings in the file.
# Decimal (dec): I specify this for the decimal separator, which varies depending on regional formats.
# Fill: I enable this when dealing with rows of unequal length to fill missing fields.
# Comment Character: I use this to ignore any comments in the CSV file.
# Additional Arguments: Sometimes, I pass extra arguments to read.table for specific needs.
# Importing CSV Files: My Experience
# Using Base R
# Whenever I need to import CSV files, I often rely on the read.csv function. It’s convenient because it defaults to using commas as separators. Here's what I do:
# 
# I find the file path of the CSV I want to work with.
# I read the CSV file using read.csv, which simplifies my workflow.
# I like how read.csv automatically assumes that the first row contains column names. I don’t have to worry about setting the header parameter. However, I learned to use as.is = TRUE or stringsAsFactors = FALSE when I don’t want strings to be converted into factors.
# 
# For international datasets, I sometimes switch to read.csv2, which uses semicolons as separators and commas for decimals, depending on the regional formatting.
# 
# Using the readr Package
# For larger files, I turn to the readr package because its read_csv function is faster and offers a progress bar, which I appreciate when working with big datasets. I also like its default setting of not converting strings to factors. This helps me maintain control over my data types.
# 
# Importing with data.table
# I discovered that fread from the data.table package is a game-changer. It’s incredibly fast and flexible. I love how it guesses the file’s delimiter automatically, saving me the trouble of specifying it.
# 
# When I use fread, I notice it guesses variable types quickly by scanning only a few lines, which speeds up the process. While fread lacks some options like na.comment, it still works great for my needs.
# 
# Exporting CSV Files: My Approach
# Using Base R
# To export data, I typically use write.csv. It’s straightforward, and I often set row.names = FALSE to exclude row names and na = "" to handle missing values.
# 
# Using the readr Package
# For better performance, I prefer readr::write_csv. It’s faster and doesn’t include row names by default, which aligns with how I usually format my exported files.
# 
# Importing Multiple CSV Files: My Strategy
# When I have multiple CSV files to import, I use list.files to gather all the file names and then apply read.table to each file using lapply. This way, I efficiently create a list of data frames. Afterward, I combine them into one big data frame with do.call(rbind, data_list). I find this approach handy for large datasets.
# 
# Importing Fixed-Width Files: My Journey
# Using Base R
# Fixed-width files can be tricky, but I manage them with read.fwf. I specify the width of each column, and while it’s not as flexible as CSVs, it works well for structured data.
# 
# Using the readr Package
# When I want more speed and flexibility, I switch to read_fwf from the readr package. I like how I can define column widths with fwf_cols or let the function guess them with fwf_empty. This makes dealing with fixed-width files much easier and faster.

# Load necessary libraries
library(readr)
library(data.table)
library(knitr)
library(kableExtra)

# Importing .csv Files using Base R
#df_base <- read.csv("C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx")

# Importing .csv Files using readr
#df_readr <- read_csv("C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx")

# Section 20.2: Importing with data.table
#df_dt <- fread("C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx")

# Exporting .csv Files
# Exporting using Base R
#write.csv(df_base, "Exported_Warehouse_Shipping_base.csv", row.names = FALSE)

# Exporting using readr
#write_csv(df_readr, "Exported_Warehouse_Shipping_readr.csv")

# Importing Multiple CSV Files
# Listing all CSV files and reading them into a list
files <- list.files(pattern = "*.csv")
data_list <- lapply(files, read.csv, header = TRUE)

# Combining the list into one data.frame
df_combined <- do.call(rbind, data_list)

# Importing Fixed-Width Files
# Importing a hypothetical fixed-width file 'Warehouse_Fixed.txt'
#df_fwf_base <- read.fwf('Warehouse_Fixed.txt', widths = c(8, 15, 10, 12), header = FALSE, skip = 1)

# Importing Fixed-Width Files using readr
#df_fwf_readr <- read_fwf('Warehouse_Fixed.txt', fwf_cols(Shipment_ID = 8, Item_Name = 15, Quantity = 10, Shipping_Cost = 12), skip = 1)

# Tabular 
code_steps <- data.frame(
  Section = c(
    "Section 20.1: Importing .csv Files using Base R",
    "Section 20.1: Importing .csv Files using readr",
    "Section 20.2: Importing with data.table",
    "Section 20.3: Exporting .csv Files using Base R",
    "Section 20.3: Exporting .csv Files using readr",
    "Section 20.4: Importing Multiple CSV Files",
    "Section 20.5: Importing Fixed-Width Files using Base R",
    "Section 20.5: Importing Fixed-Width Files using readr"
  ),
  Code = c(
    'df_base <- read.csv("C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx")',
    'df_readr <- read_csv("C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx")',
    'df_dt <- fread("C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx")',
    'write.csv(df_base, "Exported_Warehouse_Shipping_base.csv", row.names = FALSE)',
    'write_csv(df_readr, "Exported_Warehouse_Shipping_readr.csv")',
    'files <- list.files(pattern = "*.csv"); data_list <- lapply(files, read.csv, header = TRUE); df_combined <- do.call(rbind, data_list)',
    'df_fwf_base <- read.fwf("Warehouse_Fixed.txt", widths = c(8, 15, 10, 12), header = FALSE, skip = 1)',
    'df_fwf_readr <- read_fwf("Warehouse_Fixed.txt", fwf_cols(Shipment_ID = 8, Item_Name = 15, Quantity = 10, Shipping_Cost = 12), skip = 1)'
  )
)

# Create a colorful tabular table
code_steps %>%
  kbl() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(1:8, background = c("lightcyan", "lavender", "lightpink", "lightyellow", "lightgreen", "lightblue", "lightcoral", "lightgray")) %>%
  column_spec(1, bold = TRUE, background = "lightsteelblue") %>%
  column_spec(2, background = "lightgoldenrodyellow")

Section	Code
Section 20.1: Importing .csv Files using Base R	df_base <- read.csv(“C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx”)
Section 20.1: Importing .csv Files using readr	df_readr <- read_csv(“C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx”)
Section 20.2: Importing with data.table	df_dt <- fread(“C:/Users/jacob/Downloads/Warehouse_Shipping.xlsx”)
Section 20.3: Exporting .csv Files using Base R	write.csv(df_base, “Exported_Warehouse_Shipping_base.csv”, row.names = FALSE)
Section 20.3: Exporting .csv Files using readr	write_csv(df_readr, “Exported_Warehouse_Shipping_readr.csv”)
Section 20.4: Importing Multiple CSV Files	files <- list.files(pattern = “*.csv”); data_list <- lapply(files, read.csv, header = TRUE); df_combined <- do.call(rbind, data_list)
Section 20.5: Importing Fixed-Width Files using Base R	df_fwf_base <- read.fwf(“Warehouse_Fixed.txt”, widths = c(8, 15, 10, 12), header = FALSE, skip = 1)
Section 20.5: Importing Fixed-Width Files using readr	df_fwf_readr <- read_fwf(“Warehouse_Fixed.txt”, fwf_cols(Shipment_ID = 8, Item_Name = 15, Quantity = 10, Shipping_Cost = 12), skip = 1)

Reading and writing tabular data in plain-text files (CSV, TSV, etc.)

Avery Holloman

2024-11-04