File Manipulation in R

Author

Daniel Zhang

base r functions to manipulate files/directories

Take a look at our working directories

This is gonna be the default path

getwd()
[1] "/Users/handingzhang/Desktop/r_notes/useful_functions"

dir.create()

Create a temporary relative path based on current working directory.

dir.exists()

What if we forgot if we already have the dir created?

we can check by dir.exists() before attempting to create it.

if(!dir.exists("temp_relative_path/temp2")){
  dir.create("temp_relative_path/temp2")
  } else print("Directory already exists!")
[1] "Directory already exists!"

list.files()

Take a look at all files in current working directory.

list.files()
[1] "base_file_manipulations_files"     "base_file_manipulations.html"     
[3] "base_file_manipulations.qmd"       "base_file_manipulations.rmarkdown"
[5] "base_split.R"                      "temp_relative_path"               

Take a look at the files in the temporary folder we just created, which is empty.

list.files("temp_relative_path")
[1] "temp2"

file.path()

file.path helps us easily construct path by simply providing the component without worrying about pasting them with proper separator(here it is assumed to be “/”)

We recognize the above chunk is equivalent to this below as current working directory is by default assumed.

(path1 <- file.path(getwd(), "temp_relative_path"))
[1] "/Users/handingzhang/Desktop/r_notes/useful_functions/temp_relative_path"
list.files(path1)
[1] "temp2"

dir()

We also recognize that list.files() is basically the same as dir().

dir("temp_relative_path")
[1] "temp2"

file.create()

Now we create some files in the empty directory.

Notice by assumption it overwrites previously created file with the same path.

file.create(... = "temp_relative_path/cool.txt")
[1] TRUE

Notice we can create multiple in the same time.

file.create(... = c("temp_relative_path/cool2.txt", 
                    "temp_relative_path/cool3.txt"))
[1] TRUE TRUE

if we have many files to create it is a good idea to wrap the file path before the file name

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# let's leverage purrr to create multiple file paths.

target_path <- "temp_relative_path/"

# names of files to create
file_names_to_create <- rep("nice", 10) %>% purrr::imap_chr(~paste0(.x, .y, ".csv"))

# paste the path and the names
(files_to_create <-paste0(target_path, file_names_to_create))
 [1] "temp_relative_path/nice1.csv"  "temp_relative_path/nice2.csv" 
 [3] "temp_relative_path/nice3.csv"  "temp_relative_path/nice4.csv" 
 [5] "temp_relative_path/nice5.csv"  "temp_relative_path/nice6.csv" 
 [7] "temp_relative_path/nice7.csv"  "temp_relative_path/nice8.csv" 
 [9] "temp_relative_path/nice9.csv"  "temp_relative_path/nice10.csv"
# another fun way to do so is by map_chr()
# files_to_create <- map_chr(file_names_to_create, function(x){paste0(target_path, x)})

Now we create them!

file.create(files_to_create)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

file.remove()

Cool now let’s try deleting some files, starting from cool.txt

file.remove(... = "temp_relative_path/cool.txt")
[1] TRUE

We can also delete multiple in the same time.

file.remove(... = c("temp_relative_path/cool2.txt", "temp_relative_path/nice10.csv"))
[1] TRUE TRUE

Wait this is too slow, can we simply delete all .csv files in this directory?\

file.remove() + list.files()

Yes we can do so by combining list.files() and file.remove() .

Let’s say we want to delete all files in the temp folder that contains .html or .csv

(files_to_delete <- list.files(path = "temp_relative_path/", 
                              pattern = ".html|.csv",
                              full.names = T # to make sure even when they are not directly present in current working directory, we can locate them using the correct full relative location.
                              ))
[1] "temp_relative_path//nice1.csv" "temp_relative_path//nice2.csv"
[3] "temp_relative_path//nice3.csv" "temp_relative_path//nice4.csv"
[5] "temp_relative_path//nice5.csv" "temp_relative_path//nice6.csv"
[7] "temp_relative_path//nice7.csv" "temp_relative_path//nice8.csv"
[9] "temp_relative_path//nice9.csv"

Then, as simply as

file.remove(files_to_delete)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

They are gone.

Now let’s try to delete the temp folder we created to wrap up this practice :D.

file.remove(... = "temp_relative_path/")
# Warning: cannot remove file 'temp_relative_path/', reason 'Directory not empty'[1] FALSE

Oh no! file.remove() cannot delete folder/directories

Warning

Everything deleted by file.remove and unlink cannot be recovered. So be extremely careful when using them. especially file.remove + list.files combo and unlink() with argument recursive = T.