Excel is a spreadsheet application that has become hugely popular for data entry and analysis in many fields. You have likely used it, or some analagous spreadsheet program, such as OpenOffice Calc, Numbers, or Google Docs Spreadsheets. All of these spreadsheet applications mimic paper accounting systems: they contain a grid of ‘cells’ arranged in a tabular structure and they can comprise multiple ‘sheets’, e.g. pages. Cells may contain individual pieces of data (e.g. numbers, strings, etc.), or they may contain formulas (e.g. a formula to calculate the average of some other set of cells).

Despite their popularity, spreadsheet-style formats have important limitations: they can’t handle larger data sets, they are difficult to combine with automated or parallelized operations, they are not well-suited for complex statistics, and they tend to encourage cut-and-paste work, which can make debugging extremely difficult.

For these, and other, reasons, some would argue that we should abandon spreadsheets altogether. But sometimes we just want to be able to transfer data from Excel to R, or vice versa.

Objectives

Exporting .csv files from Excel

The easiest way to move data from Excel to R is to save the contents of your data file as a .csv file. CSV stands for comma separated values, meaning that each cell in your spreadsheet will be delimited by commas. Have a look at the Software Carpentry R lesson ‘Getting Started With Data’ under the section ‘Loading Data’ for more details on how to read .csv files into R.

To export an Excel worksheet as a .csv file:

  1. Open your Excel data file and use the ‘Save As’ feature, which you can find in current versions of Excel by clicking on the Microsoft Office button.
  2. In the ‘Save as type’ box, choose CSV for comma delimited (alternatively, you could select Tab to achieve tab delimited format).
  3. Specify the file path location where you would like your .csv file to be stored.
  4. Click ‘Save’.
  5. A dialogue box will appear reminding you that you will only be saving the contents of the current worksheet, i.e. the ‘page’ you are currently working on. Since .csv files don’t support the structure of multiple ‘pages’, you will need to save these separately (or combine them into a single worksheet before saving). Click ‘OK’ if you are happy with the current worksheet.
  6. A second dialogue box will appear to remind you that a .csv version of your spreadsheet will not retain certain formatting elements contained in the Excel version of the file. For example, formulas, formatting, and graphics will be lost. If this is ok, click ‘Yes’.

Congratulations! You now have a .csv version of your data saved that can be imported into R!

Importing data directly from a .xls or .xlsx file

There are a few packages that can help you directly read in data from Excel files, rather than exporting a .csv file. We will look at how to use one, XLConnect; the packages gdata and xlsx are other good alternatives that you may wish to explore on your own.

To read an Excel worksheet directly into R:

  1. Open R and install the XLConnect package using install.packages() and then load it using library().
install.packages("XLConnect")
library(XLConnect)
  1. Use the function loadWorkbook() to read in the .xlsx (or .xls) file of your choosing. We will use a file that contains data on known exoplanets for this demonstration. Note that we have created a ‘workbook’ object, which can contain multiple pages.
wb <- loadWorkbook(filename = '~/Documents/swc/exoplanets.xlsx')
class(wb)
## [1] "workbook"
## attr(,"package")
## [1] "XLConnect"
  1. Load data using readWorksheet() and specifying the workbook object name and the desired sheet (specifying the sheet name, e.g. “Sheet 1” would work equally well). Note that this function produces a dataframe object.
data <- readWorksheet(wb, sheet = 1)
class(data)
## [1] "data.frame"
  1. Take a look at our dataframe using head().
head(data)
##   Planet.Name Pl..Mass Pl..Radius Pl..Period
## 1    55 Cnc e    0.027      0.194     0.7365
## 2   CoRoT-1 b    1.030      1.490     1.5090
## 3  CoRoT-10 b    2.750      0.970    13.2406
## 4  CoRoT-11 b    2.330      1.430     2.9943
## 5  CoRoT-12 b    0.917      1.440     2.8280
## 6  CoRoT-13 b    1.308      0.885     4.0352

An alternative function, readWorksheetFromFile(), combines both operations and produces a dataframe, with the same content as above.

sheet <- readWorksheetFromFile(file = '~/Documents/swc/exoplanets.xlsx', sheet = 1)
class(sheet)
## [1] "data.frame"
head(sheet)
##   Planet.Name Pl..Mass Pl..Radius Pl..Period
## 1    55 Cnc e    0.027      0.194     0.7365
## 2   CoRoT-1 b    1.030      1.490     1.5090
## 3  CoRoT-10 b    2.750      0.970    13.2406
## 4  CoRoT-11 b    2.330      1.430     2.9943
## 5  CoRoT-12 b    0.917      1.440     2.8280
## 6  CoRoT-13 b    1.308      0.885     4.0352

Congratulations! You have just read data into R directly from an Excel spreadsheet file!