library(XLConnect)
## XLConnect 1.0.1 by Mirai Solutions GmbH [aut],
## Martin Studer [cre],
## The Apache Software Foundation [ctb, cph] (Apache POI),
## Graph Builder [ctb, cph] (Curvesapi Java library)
## http://www.mirai-solutions.com
## https://github.com/miraisolutions/xlconnect
#Build connection to urbanpop.xlsx
my_book <- loadWorkbook("urbanpop.xlsx")
# Print out the class of my_book
class(my_book)
## [1] "workbook"
## attr(,"package")
## [1] "XLConnect"
# Build connection to urbanpop.xlsx
mybook <- loadWorkbook("urbanpop.xlsx")
# List the sheets in my_book
getSheets(my_book)
## [1] "1960-1966" "1967-1974" "1975-2011"
# Import the second sheet in my_book
head(readWorksheet(my_book, sheet=2),n=10)
To get a clear overview about urbanpop.xlsx without having to open up the Excel file, you can execute the following code:
my_book <- loadWorkbook(“urbanpop.xlsx”) sheets <- getSheets(my_book) all <- lapply(sheets, readWorksheet, object = my_book) str(all)
Suppose we’re only interested in urban population data of the years 1968, 1969 and 1970. The data for these years is in the columns 3, 4, and 5 of the second sheet. Only selecting these columns will leave us in the dark about the actual countries the figures belong to.
# Import columns 3, 4, and 5 from second sheet in my_book: urbanpop_sel
urbanpop_sel <- readWorksheet(my_book, sheet = 2, startCol = 3, endCol = 5)
# urbanpop_sel no longer contains information about the countries now. Can you write another
# only the first column
# Import first column from second sheet in my_book: countries
countries <- readWorksheet(my_book, sheet = 2, startCol = 1, endCol = 1)
# Use cbind() to paste together countries and urbanpop_sel, in this order. Store the result as selection.
selection <- cbind(countries, urbanpop_sel)
urbanpop_sel
# result of binding 2 columns: countries&urbanpop_sel
selection
Where readxl and gdata were only able to import Excel data, XLConnect’s approach of providing an actual interface to an Excel file makes it able to edit your Excel files from inside R. In this exercise, you’ll create a new sheet. In the next exercise, you’ll populate the sheet with data, and save the results in a new Excel file.
You’ll continue to work with urbanpop.xlsx. The my_book object that links to this Excel file is already available.
# Add a worksheet to my_book, named "data_summary"
data_summary <- createSheet(my_book, "data_summary")
# Use getSheets() on my_book
getSheets(my_book)
## [1] "1960-1966" "1967-1974" "1975-2011" "data_summary"
The first step of creating a sheet is done; let’s populate it with some data now! summ, a data frame with some summary statistics on the two Excel sheets is already coded so you can take it from there.
# Add a worksheet to my_book, named "data_summary"
createSheet(my_book, "data_summary")
# Create data frame: summ
sheets <- getSheets(my_book)[1:3]
dims <- sapply(sheets, function(x) dim(readWorksheet(my_book, sheet = x)), USE.NAMES = FALSE)
summ <- data.frame(sheets = sheets,
nrows = dims[1, ],
ncols = dims[2, ])
# Add data in summ to "data_summary" sheet
writeWorksheet(my_book, summ, "data_summary")
# Save workbook as summary.xlsx
saveWorkbook(my_book, "summary.xlsx")
dir()
## [1] "chapter1.pdf" "chapter2.pdf"
## [3] "chapter3.pdf" "clean.xlsx"
## [5] "readr_import.R" "readxl_import.R"
## [7] "rename.xlsx" "renamed.xlsx"
## [9] "summary.xlsx" "urbanpop.xlsx"
## [11] "XLConnect_notebook.html" "XLConnect_notebook.nb.html"
## [13] "XLConnect_notebook.Rmd"
Come to think of it, “data_summary” is not an ideal name. As the summary of these excel sheets is always data-related, you simply want to name the sheet “summary”.
The code to build a connection to “urbanpop.xlsx” and create my_book is already provided for you. It refers to an Excel file with 4 sheets: the three data sheets, and the “data_summary” sheet.
# Rename "data_summary" sheet to "summary"
renameSheet(my_book, sheet = "data_summary", newName = "summary")
# Print out sheets of my_book
getSheets(my_book)
## [1] "1960-1966" "1967-1974" "1975-2011" "summary"
# Save workbook to "renamed.xlsx"
saveWorkbook(my_book, "renamed.xlsx")
dir()
## [1] "chapter1.pdf" "chapter2.pdf"
## [3] "chapter3.pdf" "clean.xlsx"
## [5] "readr_import.R" "readxl_import.R"
## [7] "rename.xlsx" "renamed.xlsx"
## [9] "summary.xlsx" "urbanpop.xlsx"
## [11] "XLConnect_notebook.html" "XLConnect_notebook.nb.html"
## [13] "XLConnect_notebook.Rmd"
After presenting the new Excel sheet to your peers, it appears not everybody is a big fan. Why summarize sheets and store the info in Excel if all the information is implicitly available? To hell with it, just remove the entire fourth sheet!
# Build connection to renamed.xlsx: my_book
my_book <- loadWorkbook("renamed.xlsx")
#Use removeSheet() to remove the fourth sheet from my_book. The sheet name is "summary".
#Recall that removeSheet() accepts either the index or the name of the sheet as the second argument.
# Remove the fourth sheet
removeSheet(my_book, sheet = "summary")
# Save workbook to "clean.xlsx"
saveWorkbook(my_book, "clean.xlsx")
dir()
## [1] "chapter1.pdf" "chapter2.pdf"
## [3] "chapter3.pdf" "clean.xlsx"
## [5] "readr_import.R" "readxl_import.R"
## [7] "rename.xlsx" "renamed.xlsx"
## [9] "summary.xlsx" "urbanpop.xlsx"
## [11] "XLConnect_notebook.html" "XLConnect_notebook.nb.html"
## [13] "XLConnect_notebook.Rmd"