Excel is a spreadsheet application that has become hugely popular for data entry and analysis in many fields. You have likely used it, or some analagous spreadsheet program, such as OpenOffice Calc, Numbers, or Google Docs Spreadsheets. All of these spreadsheet applications mimic paper accounting systems: they contain a grid of ‘cells’ arranged in a tabular structure and they can comprise multiple ‘sheets’, e.g. pages. Cells may contain individual pieces of data (e.g. numbers, strings, etc.), or they may contain formulas (e.g. a formula to calculate the average of some other set of cells).
Despite their popularity, spreadsheet-style formats have important limitations: they can’t handle larger data sets, they are difficult to combine with automated or parallelized operations, they are not well-suited for complex statistics, and they tend to encourage cut-and-paste work, which can make debugging extremely difficult.
For these, and other, reasons, some would argue that we should abandon spreadsheets altogether. But sometimes we just want to be able to transfer data from Excel to R, or vice versa.
The easiest way to move data from Excel to R is to save the contents of your data file as a .csv file. CSV stands for comma separated values, meaning that each cell in your spreadsheet will be delimited by commas. Have a look at the Software Carpentry R lesson ‘Getting Started With Data’ under the section ‘Loading Data’ for more details on how to read .csv files into R.
To export an Excel worksheet as a .csv file:
Congratulations! You now have a .csv version of your data saved that can be imported into R!
There are a few packages that can help you directly read in data from Excel files, rather than exporting a .csv file. We will look at how to use one, XLConnect; the packages gdata and xlsx are other good alternatives that you may wish to explore on your own.
To read an Excel worksheet directly into R:
install.packages() and then load it using library().install.packages("XLConnect")
library(XLConnect)
loadWorkbook() to read in the .xlsx (or .xls) file of your choosing. We will use a file that contains data on known exoplanets for this demonstration. Note that we have created a ‘workbook’ object, which can contain multiple pages.wb <- loadWorkbook(filename = '~/Documents/swc/exoplanets.xlsx')
class(wb)
## [1] "workbook"
## attr(,"package")
## [1] "XLConnect"
readWorksheet() and specifying the workbook object name and the desired sheet (specifying the sheet name, e.g. “Sheet 1” would work equally well). Note that this function produces a dataframe object.data <- readWorksheet(wb, sheet = 1)
class(data)
## [1] "data.frame"
head().head(data)
## Planet.Name Pl..Mass Pl..Radius Pl..Period
## 1 55 Cnc e 0.027 0.194 0.7365
## 2 CoRoT-1 b 1.030 1.490 1.5090
## 3 CoRoT-10 b 2.750 0.970 13.2406
## 4 CoRoT-11 b 2.330 1.430 2.9943
## 5 CoRoT-12 b 0.917 1.440 2.8280
## 6 CoRoT-13 b 1.308 0.885 4.0352
An alternative function, readWorksheetFromFile(), combines both operations and produces a dataframe, with the same content as above.
sheet <- readWorksheetFromFile(file = '~/Documents/swc/exoplanets.xlsx', sheet = 1)
class(sheet)
## [1] "data.frame"
head(sheet)
## Planet.Name Pl..Mass Pl..Radius Pl..Period
## 1 55 Cnc e 0.027 0.194 0.7365
## 2 CoRoT-1 b 1.030 1.490 1.5090
## 3 CoRoT-10 b 2.750 0.970 13.2406
## 4 CoRoT-11 b 2.330 1.430 2.9943
## 5 CoRoT-12 b 0.917 1.440 2.8280
## 6 CoRoT-13 b 1.308 0.885 4.0352
Congratulations! You have just read data into R directly from an Excel spreadsheet file!