So many of online .csv files are zipped. Many find it really troublesome getting the data into R and manipulating it. Especially if they are dealing with reproducible research that needs all the processess documented.
I have previously had alot of problems downloading and unzipping files from various online sources. But when I finally read this answer from stackoverflow, life has been really easy.
I wanted to make it more available for all those new R users out there to interact with it and provide feedback on how they find it.
Zipping files is basically compressing them into sizes for storage. There are different types of file compression algorithms but the most common ones are: - .z - .gz - .bz2 - .zip The first three are just the file meaning you only need to open the file without unzipping in R. To demonstrate this, let’s look at how they perform:
To download and unzip these are the four steps needed:
The Code chunk below downloads files from an online source and manipulates it in R.
#temp <- tempfile()
#download.file("https://perso.telecom-paristech.fr/eagan/class/igr204/data/BabyData.zip",temp)
#carsData <- read.table(unz(temp, "a1.dat"))
#unlink(temp)
While reading the .z, .gz, .bz2 you don’t need to unzip it. Simply read it directly into R.
Many credits to Dirk Eddelbuettel for his informative answer to this question in stackoverflow.