This section is adapted from Karlijn Willems 2018 Datacamp tutorial on loading data titled “This R Data Import Tutorial Is Everything You Need”.
Loading data into R is often frustrating. It can be fairly easy to mix up things no matter if you are a beginner or a more advanced R user. In this class, we will mostly be using .csv files. In your other data science classes, you will use a wider breadth of data files and we will talk about some of them here.
First, we must have data in some type of file. The data can be saved in a file onto your computer in an Excel, SPSS, or another file type. Saving your data locally allows you to edit the data (or labels) and add more data all while preserving the formulas that you used in your analysis in R.
Data can also be found on the Internet or can be obtained through other sources such as a client.
We note a few tricks for successful spreadsheets:
We use RStudio for projects in this course. Make sure you know where your working directory is set at the moment using the getwd() command. You can change the directory using setwd(“location of your dataset”). You can also change the working directory under the session tab at the top of the RStudio page.
If you have a .txt or a tab-delimited text file, you can easily import it with the basic R function read.table(). For example,
df <- read.table("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test.txt",
header = FALSE)
Ideally, you should just pass in the file name and the extension because you have set your working directory to the folder in which your data set is located; however, in the code chunk above, the first argument could be a webpage that contains data. The header argument specifies whether or not you have specified column names in your data file. Your data from the file will become a data.frame object.
For files that are not delimited by tabs, like .csv, you actually use variants of this basic function. These variants differ from read.table in three ways:
This will be the most common file type for you to use in this course. If your separator the values with a , or ;, you usually are working with a .csv file. Of course, if your data is in Excel, you can save your file as a .csv file by specifying the file type as .csv.
To successfully load this file into R, you can use the read.table() function in which you specify the separator character, or you can use the read.csv() or read.csv2() functions. The former function is used if the separator is a ,, the latter if ; is used to separate the values in your data file.
Remember that the read.csv() as well as the read.csv2() function are almost identical to the read.table() function, with the sole difference that they have the header and fill arguments set as TRUE by default.
df <- read.table("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test.csv",
header = FALSE,
sep = ",")
df <- read.csv("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test.csv",
header = FALSE)
df <- read.csv2("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test.csv",
header= FALSE)
Note that if you get a warning message that reads like “incomplete final line found by readTableHeader”, you can try to go and “stand” on the cell that contains the last value (c in this case) and press ENTER. This will normally fix the warning because the message indicates that the last line of the file doesn’t end with an End Of Line (EOL) character, which can be a linefeed or a carriage return and linefeed.
Use a text editor like NotePad to make sure that you add an EOL character without adding new rows or columns to your data.
Also note that if you have initialized other cells than the ones that your data contains, you’ll see some rows or columns with NA values appear. The best case is then to remove those rows and columns!
In case you have a file with a separator character that is different from a tab, a comma or a semicolon, you can always use the read.delim() and read.delim2() functions. These are variants of the read.table() function, just like the read.csv() function.
df <- read.delim("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test_delim.txt", sep="$")
df <- read.delim2("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test_delim.txt", sep="$")
The first way to get Excel files directly into R is by using the XLConnect package. Install the package and if you’re not sure whether or not you already have it, check if it is already there.
Next, you can start using the readWorksheetFromFile() function, just like shown here below:
library(XLConnect)
df <- readWorksheetFromFile("<file name and extension>",
sheet = 1)
Note that you need to add the sheet argument to specify which sheet you want to load into R. You can also add more specifications. You can find these explained in our tutorial on reading and importing Excel files into R.
You can also load in a whole workbook with the loadWorkbook() function, to then read in worksheets that you desire to appear as data frames in R through readWorksheet():
wb <- loadWorkbook("<name and extension of your file>")
df <- readWorksheet(wb, sheet=1)
Note again that the sheet argument is not the only argument that you can use in readWorkSheetFromFile(). If you want more information about the package or about all the arguments that you can pass to the readWorkSheetFromFile() function or to the two alternative functions that were mentioned, you can visit the package’s RDocumentation page.
The readxl package allows R users to easily read in Excel files, just like this:
library(readxl)
df <- read_excel("<name and extension of your file>")
Note that the first argument specifies the path to your .xls or .xlsx file, which you can set by using the getwd() and setwd() functions. You can also add a sheet argument, just like with the XLConnect package, and many more arguments on which you can read up here or in this blog post.
To get JSON files into R, you first need to install or load the rjson package. Once this is done, you can use the fromJSON() function. Here, you have two options:
Your JSON file is stored in your working directory:
library(rjson)
JsonData <- fromJSON(file= "<filename.json>" )
Your JSON file is available through a URL:
library(rjson)
JsonData <- fromJSON(file= "<URL to your JSON file>" )
This is a snippet of what you can import. Other, less common, file types can be imported as well. You simply need to look up how to do so. However, loading your data into R is just the beginning of a data analysis, manipulation and visualization.
Willems, Karlijn. November 20th, 2018. This R Data Import Tutorial Is Everything You Need. Available here