Introduction

So many of online .csv files are zipped. Many find it really troublesome getting the data into R and manipulating it. Especially if they are dealing with reproducible research that needs all the processess documented.

I have previously had alot of problems downloading and unzipping files from various online sources. But when I finally read this answer from stackoverflow, life has been really easy.

I wanted to make it more available for all those new R users out there to interact with it and provide feedback on how they find it.

Types of zipped files

Zipping files is basically compressing them into sizes for storage. There are different types of file compression algorithms but the most common ones are: - .z - .gz - .bz2 - .zip The first three are just the file meaning you only need to open the file without unzipping in R. To demonstrate this, let’s look at how they perform:

Downloading and unzipping file

To download and unzip these are the four steps needed:

  1. Create a temporary file eg (tempfile())
  2. Fetch the online file online using the download.file()
  3. Extract the file from the temp file using unz() command.
  4. Remove the temp file using unlink()

The Code chunk below downloads files from an online source and manipulates it in R.

#temp <- tempfile()
#download.file("https://perso.telecom-paristech.fr/eagan/class/igr204/data/BabyData.zip",temp)
#carsData <- read.table(unz(temp, "a1.dat"))
#unlink(temp)

While reading the .z, .gz, .bz2 you don’t need to unzip it. Simply read it directly into R.

References

Many credits to Dirk Eddelbuettel for his informative answer to this question in stackoverflow.