List the sheets of an Excel file
Before you can start importing from Excel, you should find out which sheets are available in the workbook. You can use the excel_sheets() function for this.
Import an Excel sheet
Now that you know the names of the sheets in the Excel file you want to import, it is time to import those sheets into R. You can do this with the read_excel() function. Have a look at this recipe:
- data <- read_excel(“data.xlsx”, sheet = “my_sheet”)
This call simply imports the sheet with the name “my_sheet” from the “data.xlsx” file. You can also pass a number to the sheet argument; this will cause read_excel() to import the sheet with the given sheet number. sheet = 1 will import the first sheet, sheet = 2 will import the second sheet, and so on.
# Read the first sheet of latitude.xlsx: latitude_1
latitude_1 <- read_excel("latitude.xlsx", sheet=1)
# Read the second sheet of latitude.xlsx: latitude_2
latitude_2 <- read_excel("latitude.xlsx", sheet="1900")
# Put latitude_1 and latitude_2 in a list: lat_list
lat_list <- list(latitude_1, latitude_2)
# Display the structure of lat_list
str(lat_list)
## List of 2
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 246 obs. of 2 variables:
## ..$ country : chr [1:246] "Afghanistan" "Akrotiri and Dhekelia" "Albania" "Algeria" ...
## ..$ latitude_1700: num [1:246] 34.6 34.6 41.3 36.7 -14.3 ...
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 246 obs. of 2 variables:
## ..$ country : chr [1:246] "Afghanistan" "Akrotiri and Dhekelia" "Albania" "Algeria" ...
## ..$ latitude_1900: num [1:246] 34.6 34.6 41.3 36.7 -14.3 ...
Reading a workbook
In the previous exercise you generated a list of both Excel sheets that you imported. However, loading in every sheet manually and then merging them in a list can be quite tedious. There must be a way to elegantly automate this, right? lapply() comes to the rescue!
Have a look at the example code below:
- my_workbook <- lapply(excel_sheets(“data.xlsx”), read_excel, path = “data.xlsx”)
The read_excel() function is called multiple times on the “data.xlsx” file and each sheet is loaded in one after the other. The result is a list of data frames, each data frame representing one of the sheets in data.xlsx.
# Read all Excel sheets with lapply(): lat_list
lat_list <- lapply(excel_sheets("latitude.xlsx"), read_excel, path="latitude.xlsx")
# Display the structure of lat_list
str(lat_list)
## List of 2
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 246 obs. of 2 variables:
## ..$ country : chr [1:246] "Afghanistan" "Akrotiri and Dhekelia" "Albania" "Algeria" ...
## ..$ latitude_1700: num [1:246] 34.6 34.6 41.3 36.7 -14.3 ...
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 246 obs. of 2 variables:
## ..$ country : chr [1:246] "Afghanistan" "Akrotiri and Dhekelia" "Albania" "Algeria" ...
## ..$ latitude_1900: num [1:246] 34.6 34.6 41.3 36.7 -14.3 ...
The col_names argument
Apart from path and sheet, there are several other arguments you can specify in read_excel(). One of these arguments is called col_names.
By default it is TRUE, denoting that the first row in the Excel sheets denote the column names. If this is not the case, you can either set col_names to FALSE. In this case, R will choose column names for you. You can also choose to set col_names to a character vector with names for each column. It works exactly the same as in the readr package.
You’ll be working with the latitude_nonames.xlsx file. It contains the same data as latitude.xlsx but has no column names in the first row of the excel sheets.
# Import the the first Excel sheet of latitude_nonames.xlsx (R gives names): latitude_3
# Have R set the column names of the resulting data frame itself.
latitude_3 <- read_excel("latitude_nonames.xlsx", sheet=1, col_names=FALSE)
# Import the the second Excel sheet of latitude_nonames.xlsx (specify col_names): latitude_4
# This time, specify the column names to be "country" and "latitude"
latitude_4 <- read_excel("latitude_nonames.xlsx", sheet=1, col_names=c("country", "latitude"))
# Print the summary of latitude_3
summary(latitude_3)
## X0 X1
## Length:247 Min. :-51.750
## Class :character 1st Qu.: 2.557
## Mode :character Median : 16.755
## Mean : 18.026
## 3rd Qu.: 39.051
## Max. : 78.000
## NA's :5
# Print the summary of latitude_4
summary(latitude_4)
## country latitude
## Length:247 Min. :-51.750
## Class :character 1st Qu.: 2.557
## Mode :character Median : 16.755
## Mean : 18.026
## 3rd Qu.: 39.051
## Max. : 78.000
## NA's :5
The skip argument
Another argument that can be very useful when reading in Excel files that are less tidy, is skip. With skip, you can tell R to ignore a specified number of rows inside the Excel sheets you’re trying to pull data from. Have a look at this example:
- read_excel(“data.xlsx”, skip = 15)
Import a local file
In this part of the chapter you’ll learn how to import .xls files of the format .xls using the gdata package. Similar to the readxl package, you can import single Excel sheets from Excel sheets to start your analysis in R.
You’ll be working with the urbanpop.xls dataset; it contains urban population metrics for practically all countries in the world throughout time (Source: Gapminder). It contains three sheets for three different time periods. In each sheet, the first row contains the column names.
# Load the gdata package
#install.packages("gdata")
#library(gdata)
# Import the second sheet of urbanpop.xls: urban_pop
#urban_pop <- read.xls("urbanpop.xls", sheet="1967-1974")
# Print the first 11 observations using head()
#head(urban_pop, n=11)
Import a workbook
The functions you’ve learned about in the video provide a good starting point to experiment with XLConnect package.Typically, the first step will be to load a workbook in your R session with loadWorkbook(); this function will build a “bridge” between your Excel file and your R session.
List and read Excel sheets
Just as readxl and gdata, you can use XLConnect to import data from Excel file into R. To list the sheets in an Excel file, use getSheets(). To actually import data from a sheet, you can use readWorksheet(). Both functions require an XLConnect workbook object as the first argument.
# Build connection to latitude.xlsx
library(XLConnect)
my_book <- loadWorkbook("latitude.xlsx")
# List the sheets in my_book
getSheets(my_book)
## [1] "1700" "1900"
# Import the second sheet in my_book
readWorksheet(my_book, sheet=2)
## country latitude_1900
## 1 Afghanistan 34.5650000
## 2 Akrotiri and Dhekelia 34.6166667
## 3 Albania 41.3120000
## 4 Algeria 36.7200000
## 5 American Samoa -14.3070000
## 6 Andorra 42.5460000
## 7 Angola -8.8430000
## 8 Anguilla 18.2500000
## 9 Antigua and Barbuda 17.0720000
## 10 Argentina -36.6760000
## 11 Armenia 40.2540000
## 12 Aruba 12.5130000
## 13 Australia -32.2190000
## 14 Austria 48.2310000
## 15 Azerbaijan 40.3520000
## 16 Bahamas 24.7000000
## 17 Bahrain 26.0240000
## 18 Bangladesh 23.8800000
## 19 Barbados 13.1790000
## 20 Belarus 53.5470900
## 21 Belgium 50.8370000
## 22 Belize 17.8430000
## 23 Benin 6.3640000
## 24 Bermuda 32.2170000
## 25 Bhutan 27.4790000
## 26 Bolivia -15.1900000
## 27 Bosnia and Herzegovina 44.1750100
## 28 Botswana -21.5360000
## 29 Brazil -19.5570000
## 30 British Virgin Islands 18.5000000
## 31 Brunei 4.5010000
## 32 Bulgaria 42.0730000
## 33 Burkina Faso 12.0490000
## 34 Burundi -3.3650000
## 35 Cambodia 12.0260000
## 36 Cameroon 10.7300000
## 37 Canada 43.7270000
## 38 Cape Verde 15.0910000
## 39 Cayman Islands 19.3190000
## 40 Central African Rep. 4.3310000
## 41 Chad 10.3770000
## 42 Channel Islands 49.2170000
## 43 Chile -33.5540000
## 44 China 29.5610000
## 45 Christmas Island -10.5000000
## 46 Cocos Island -12.5000000
## 47 Colombia 4.7880000
## 48 Comoros -11.6710000
## 49 Congo, Dem. Rep. -2.9202760
## 50 Congo, Rep. -3.6840000
## 51 Cook Islands -21.2333333
## 52 Costa Rica 9.9410000
## 53 Cote d'Ivoire 5.4960000
## 54 Croatia 45.1100800
## 55 Cuba 23.0840000
## 56 Cyprus 35.0810000
## 57 Czech Rep. 49.7792900
## 58 Denmark 55.7180000
## 59 Djibouti 11.5050000
## 60 Dominica 15.4330000
## 61 Dominican Rep. 18.5610000
## 62 East Germany NA
## 63 Ecuador -2.0620000
## 64 Egypt 29.9960000
## 65 El Salvador 13.7750000
## 66 Equatorial Guinea 2.3260000
## 67 Eritrea 15.3120000
## 68 Estonia 58.6850000
## 69 Ethiopia 9.0070000
## 70 Faeroe Islands 62.0000000
## 71 Falkland Islands (Malvinas) -51.7500000
## 72 Fiji -17.8270000
## 73 Finland 60.2120000
## 74 France 48.8570000
## 75 French Guiana 3.9880000
## 76 French Polynesia -17.6660000
## 77 Gabon 0.3720000
## 78 Gambia 13.2570000
## 79 Georgia 42.0340000
## 80 Germany 48.1610000
## 81 Ghana 6.6940000
## 82 Gibraltar 36.1333333
## 83 Greece 38.0580000
## 84 Greenland 64.2170000
## 85 Grenada 12.1150000
## 86 Guadeloupe 16.1630000
## 87 Guam 13.4400000
## 88 Guatemala 14.6220000
## 89 Guernsey 49.4666667
## 90 Guinea 11.6710000
## 91 Guinea-Bissau 12.2620000
## 92 Guyana 5.7610000
## 93 Haiti 18.9320000
## 94 Holy See 41.9000000
## 95 Honduras 14.1940000
## 96 Hong Kong, China 22.7040000
## 97 Hungary 47.4190000
## 98 Iceland 63.8920000
## 99 India 25.2740000
## 100 Indonesia -6.5620000
## 101 Iran 35.3800000
## 102 Iraq 33.3140000
## 103 Ireland 54.6110000
## 104 Isle of Man 54.2250000
## 105 Israel 32.0840000
## 106 Italy 45.4150000
## 107 Jamaica 18.0550000
## 108 Japan 35.7120000
## 109 Jersey 49.2500000
## 110 Jordan 31.6020000
## 111 Kazakhstan 44.3080000
## 112 Kenya -0.5130000
## 113 Kiribati 1.8470000
## 114 Korea, Dem. Rep. 39.5270000
## 115 Korea, Rep. 37.5530000
## 116 Kosovo 42.5833333
## 117 Kuwait 29.3260000
## 118 Kyrgyzstan 41.4894000
## 119 Laos 16.5470000
## 120 Latvia 56.8580000
## 121 Lebanon 34.1110000
## 122 Lesotho -29.5950000
## 123 Liberia 6.3850000
## 124 Libya 32.6070000
## 125 Liechtenstein 47.1410000
## 126 Lithuania 55.3180000
## 127 Luxembourg 49.7800000
## 128 Macao, China 22.5060000
## 129 Macedonia, FYR 41.5738100
## 130 Madagascar -18.9580000
## 131 Malawi -15.8110000
## 132 Malaysia 3.2690000
## 133 Maldives 3.2500000
## 134 Mali 12.5080000
## 135 Malta 35.8870000
## 136 Marshall Islands 9.0000000
## 137 Martinique 14.6540000
## 138 Mauritania 17.9250000
## 139 Mauritius -20.2320000
## 140 Mayotte -12.8333333
## 141 Mexico 16.7590000
## 142 Micronesia, Fed. Sts. 7.3570000
## 143 Moldova 47.1660000
## 144 Monaco 43.7460000
## 145 Mongolia 47.4930000
## 146 Montenegro 42.7889900
## 147 Montserrat 16.7500000
## 148 Morocco 33.5930000
## 149 Mozambique -18.4990000
## 150 Myanmar 17.6790000
## 151 Namibia -17.9790000
## 152 Nauru -0.5333333
## 153 Nepal 27.7120000
## 154 Netherlands 51.8740000
## 155 Netherlands Antilles 12.1920000
## 156 New Caledonia -21.3290000
## 157 New Zealand -36.8920000
## 158 Nicaragua 12.2110000
## 159 Niger 13.8760000
## 160 Nigeria 6.5430000
## 161 Niue -19.0333333
## 162 Norfolk Island -29.0333333
## 163 Northern Mariana Islands 15.1780000
## 164 Norway 59.9770000
## 165 Oman 20.4450000
## 166 Pakistan 31.1730000
## 167 Palau 7.5000000
## 168 Panama 9.2060000
## 169 Papua New Guinea -6.6000000
## 170 Paraguay -25.5830000
## 171 Peru -11.7940000
## 172 Philippines 13.9220000
## 173 Pitcairn -25.0666667
## 174 Poland 50.2440000
## 175 Portugal 38.8160000
## 176 Puerto Rico 18.2250000
## 177 Qatar 25.3090000
## 178 Reunion -20.9540000
## 179 Romania 44.5260000
## 180 Russia 55.6750000
## 181 Rwanda -2.0320000
## 182 Saint Barthélemy 17.9000000
## 183 Saint Helena -15.9500000
## 184 Saint Kitts and Nevis 17.3270000
## 185 Saint Lucia 13.8980000
## 186 Saint Martin 18.0833333
## 187 Saint Vincent and the Grenadines 13.2540000
## 188 Saint-Pierre-et-Miquelon 46.8333333
## 189 Samoa -13.6330000
## 190 San Marino 43.7666667
## 191 Sao Tome and Principe 1.0000000
## 192 Saudi Arabia 23.0690000
## 193 Senegal 14.7720000
## 194 Serbia 44.0467300
## 195 Seychelles -4.6640000
## 196 Sierra Leone 8.7010000
## 197 Singapore 1.3550000
## 198 Slovak Republic 48.7853800
## 199 Slovenia 46.1279000
## 200 Solomon Islands -9.6250000
## 201 Somalia 10.6330000
## 202 South Africa -29.1300000
## 203 Spain 37.3980000
## 204 Sri Lanka 6.8680000
## 205 Sudan 14.0430000
## 206 Suriname 5.6050000
## 207 Svalbard 78.0000000
## 208 Swaziland -26.5450000
## 209 Sweden 59.2780000
## 210 Switzerland 47.4080000
## 211 Syria 33.4580000
## 212 Taiwan 23.6448500
## 213 Tajikistan 37.8060000
## 214 Tanzania -2.1540000
## 215 Thailand 13.7700000
## 216 Timor-Leste -8.8333333
## 217 Togo 6.1940000
## 218 Tokelau -9.0000000
## 219 Tonga -21.1730000
## 220 Trinidad and Tobago 10.4180000
## 221 Tunisia 36.8160000
## 222 Turkey 41.2020000
## 223 Turkmenistan 39.1293800
## 224 Turks and Caicos Islands 21.7500000
## 225 Tuvalu -8.0000000
## 226 Uganda 0.2280000
## 227 Ukraine 50.2810000
## 228 United Arab Emirates 23.3900000
## 229 United Kingdom 51.5100000
## 230 United States 34.3600000
## 231 Uruguay -34.8220000
## 232 USSR NA
## 233 Uzbekistan 41.2720000
## 234 Wallis et Futuna -13.3000000
## 235 Vanuatu -15.2330000
## 236 Venezuela 9.8430000
## 237 West Bank and Gaza 31.4166667
## 238 West Germany NA
## 239 Western Sahara 24.6191300
## 240 Vietnam 10.7980000
## 241 Virgin Islands (U.S.) 17.7360000
## 242 Yemen, Rep. 15.2280000
## 243 Yugoslavia NA
## 244 Zambia -12.9420000
## 245 Zimbabwe -17.8760000
## 246 Åland 60.0000000
Add and populate worksheets. Where readxl and gdata were only able to import Excel data, XLConnect’s approach of providing an actual interface to an Excel file makes it able to edit your Excel files from inside R. In this exercise, you’ll create a new sheet, populate the sheet with data, and save the results in a new Excel file..
# Build connection to latitude.xlsx
library(XLConnect)
my_book <- loadWorkbook("latitude.xlsx")
# Create data frame: summ
dims1 <- dim(readWorksheet(my_book, 1))
dims2 <- dim(readWorksheet(my_book, 2))
summ <- data.frame(sheets = getSheets(my_book),
nrows = c(dims1[1], dims2[1]),
ncols = c(dims1[2], dims2[2]))
# Add a worksheet to my_book, named "data_summary"
createSheet(my_book, name="data_summary")
# Populate "data_summary" with summ
writeWorksheet(my_book, summ, sheet="data_summary")
# Save workbook as latitude_with_summ.xlsx
saveWorkbook(my_book, "latitude_with_summ.xlsx")