The main goal of this project was to integrate a Microsoft Access database into R using the RODBC package. While this appeared very straightforward, we ran into several complex issues. Here, we document the problems, solutions we found, and some results we were able to accomplish.
Access is a database management system using Microsoft’s Jet Database Engine Databases are created and stored in a unique format, making them not readily accessible to everyone
There are two methods to connect with Access databases 1. Using the ODBC (Open DataBase Connectivity) facility 2. Using the DBI (DataBase Interface) package in R ODBC allows connection to a database to be opened Then, SQL is needed to actually import database tables into R
ODBC allows access to a variety of database systems via a common interface ODBC requires a specific driver to be installed (in our case, a Microsoft Access Driver which is already installed with the installation of Office) which serves as a translation layer between ODBC and the database system
Our project was investigating using the R package ‘RODBC’ to establish a connection to a Microsoft Access Database (.mdb , .accdb) through RStudio, in order to query the database using SQL language within R.
https://CRAN.R-project.org/package=RODBC
install.packages("RODBC")
library("RODBC") #load package
db<-file.path("C:/path/to/your/database.accdb") #connect database.
channel<-odbcConnectAccess2007(db) #internal RODBC function
dataSetName<-sqlFetch(channel,"TableName") #read particular table from Access database file.
## Set up driver info and database path
DRIVERINFO <- "Driver={Microsoft Access Driver (*.mdb, *.accdb)};"
MDBPATH <- "C:/path/to/your/database.accdb"
PATH <- paste0(DRIVERINFO, "DBQ=", MDBPATH)
## Establish connection
channel <- odbcDriverConnect(PATH)
If you have Microsoft Access installed, the driver should already be installed. However, you can also install the drivers (32-bit or 64-bit) separately: https://www.microsoft.com/en-us/download/details.aspx?id=54920
The first way is to provide all information from within R each time the connection to the database system is established. To do this, you will need the file path of the database (EX. C:.accdb), and the driver information, as in the two examples above.
Alternatively, you can establish a DSN (Data Source Name) so that the DNS describes the connection and only the DSN needs to be provided each time a connection is made. This is a workaround for establishing the Driver Connection on Windows, that seems to work ~ 50% of the time
Source: https://www.r-bloggers.com/2013/01/getting-access-data-into-r/
#load package
library(RODBC)
#establish connection to DSN that you created manually
#we named our DSN 'ERMNSQL'
con <- odbcConnect("ERMNSQL") #only the name is needed, not the full path
sqlTables(con, tableType = "TABLE")$TABLE_NAME
## [1] "tbl_Discharge" "tbl_Discharge_alternate"
## [3] "tbl_Events" "tbl_Fish"
## [5] "tbl_Fish_Count" "tbl_Habitat"
## [7] "tbl_Habitat_LWD" "tbl_Habitat_LWD_Transect"
## [9] "tbl_Habitat_Transect" "tbl_Habitat_Transect_Fields"
## [11] "tbl_Sites" "tlu_Condition_Category_10"
## [13] "tlu_Condition_Category_20" "tlu_Contacts"
## [15] "tlu_Cover_Percent" "tlu_Diamater"
## [17] "tlu_Riparian_Vegetation" "tlu_Species"
## [19] "tlu_Species_BMI" "tlu_Substrate_Size_Class"
## [21] "tlu_Water_Clarity" "tlu_Water_Color"
## [23] "tlu_Were_Fish_Captured"
sqlColumns(con, "tbl_Events")$COLUMN_NAME
## [1] "Event_ID" "Site_ID"
## [3] "Start_Date" "Visit_Number"
## [5] "Observer_1" "Observer_2"
## [7] "Observer_3" "WQ_Time"
## [9] "Temp_C" "SpecficConductance"
## [11] "DO_mgL" "DO_Percent"
## [13] "pH" "WaterColor"
## [15] "Secchi_Depth_cm" "Discharge_Notes"
## [17] "Event_Notes" "Event_Metadata"
## [19] "DischargeQ_Calculated_m3_sec" "DischargeQ_Calculated_ft3_sec"
## [21] "Verified_Date" "Verified_By"
## [23] "Certified_Date" "Certified_By"
## [25] "SampleID"
sqlColumns(con, "tbl_Sites")$COLUMN_NAME
## [1] "Site_ID" "Type" "GRTS_Num" "Status"
## [5] "LAT_Y" "LON_X" "Site_Name" "Panel"
## [9] "Unit_Code" "State" "Reach_Width" "Reach_Length"
## [13] "Site_Metadata" "IsSiteActive"
sqlColumns(con, "tbl_Fish_Count")$COLUMN_NAME
## [1] "Fish_Count_ID" "Fish_ID" "Scientific_Name"
## [4] "Adult_Count" "YoY_Count" "Total_Count_Calculated"
## [7] "Photo_File_Notes"
sqlFetch(con, "tbl_Events", max = 10)
## Event_ID Site_ID Start_Date Visit_Number
## 1 804B8A3B-45C5-422B-A089-CCC82F48A178 BLUE.1001 2022-05-18 1
## 2 2C6ABDD1-339C-426F-B7C3-E1A3E2BD1086 BLUE.1002 2022-06-10 1
## 3 171826A0-006A-4215-B3D0-CB5FCFE75DE7 BLUE.1003 2022-06-09 1
## 4 608845CD-366A-4BC6-88D9-E8F76F7C72AA DEWA.3001 2022-06-16 1
## 5 E8597FCC-87F1-40EB-A76F-129D4853F069 DEWA.3001 2022-06-24 2
## 6 4D1303D7-A5E0-4595-B37B-48D12617CD9D DEWA.3001 2022-06-27 3
## 7 E4085646-B39D-45DF-A1F9-50B5882F5AEC DEWA.3002 2022-06-17 1
## 8 1AB58A1C-E839-4F2F-BC91-6CB7A56112CE DEWA.3002 2022-06-24 2
## 9 8B8A2B41-B2E5-4047-B513-9FD41AEDF55C DEWA.3002 2022-06-27 3
## 10 A4CC461E-8690-4E01-8BB0-385790A87999 DEWA.3003 2022-06-20 1
## Observer_1 Observer_2 Observer_3 WQ_Time Temp_C
## 1 Stum Haglund Nulton 1899-12-30 10:03:00 12.53
## 2 Stum Haglund Nulton 1899-12-30 09:48:00 15.14
## 3 Stum Haglund Nulton 1899-12-30 10:07:00 18.12
## 4 Stum Nulton Haglund 1899-12-30 11:09:00 14.03
## 5 Stum Haglund Nulton 1899-12-30 09:57:00 13.10
## 6 Stum Haglund Nulton 1899-12-30 12:59:00 15.79
## 7 Stum Haglund Nulton 1899-12-30 08:13:00 16.77
## 8 Stum Haglund Nulton 1899-12-30 12:48:00 15.86
## 9 Stum Haglund Nulton 1899-12-30 15:32:00 18.36
## 10 Stum Haglund Nulton 1899-12-30 09:40:00 12.54
## SpecficConductance DO_mgL DO_Percent pH WaterColor Secchi_Depth_cm
## 1 102.4 10.29 97.9 7.57 Clear 44
## 2 172.6 9.69 97.6 7.59 Clear 120
## 3 105.4 8.94 95.9 7.64 Clear 120
## 4 32.8 10.32 101.5 6.92 Brown 97
## 5 39.0 10.31 99.3 6.97 Clear 120
## 6 40.5 9.54 97.5 6.63 Clear 120
## 7 223.9 9.56 99.8 7.23 Clear 120
## 8 236.7 9.93 101.6 7.44 Clear 120
## 9 242.8 9.20 99.2 7.12 Clear 120
## 10 178.9 10.19 96.9 6.80 Clear 120
## Discharge_Notes
## 1 <NA>
## 2 taken in pool ~ 10m. Upstream of reach endpoint
## 3 <NA>
## 4 Taken ~ 15m upstream of marked midpoint in tailout of deep pool between two LWD piles
## 5 taken at T5
## 6 <NA>
## 7 Taken ~ 5m upstream of midpoint before cutbank pool on River Right.
## 8 <NA>
## 9 <NA>
## 10 Taken ~5m above reach start @ large beech tree
## Event_Notes Event_Metadata
## 1 <NA> NA
## 2 Additional observers: Jennifer Flippin & Haley King NA
## 3 Additional observers: Jennifer Flippin & Haley King NA
## 4 <NA> NA
## 5 <NA> NA
## 6 <NA> NA
## 7 <NA> NA
## 8 <NA> NA
## 9 <NA> NA
## 10 <NA> NA
## DischargeQ_Calculated_m3_sec DischargeQ_Calculated_ft3_sec Verified_Date
## 1 0 0 2022-09-26
## 2 0 0 2022-09-26
## 3 0 0 2022-09-26
## 4 0 0 2022-10-12
## 5 0 0 2022-10-12
## 6 0 0 2022-10-12
## 7 0 0 2022-10-12
## 8 0 0 2022-10-12
## 9 0 0 2022-10-12
## 10 0 0 2022-10-12
## Verified_By Certified_Date Certified_By SampleID
## 1 Thomas 2022-10-28 Stum 8
## 2 Thomas 2022-10-28 Stum 9
## 3 Thomas 2022-10-28 Stum 10
## 4 Thomas <NA> <NA> 45
## 5 Thomas <NA> <NA> 46
## 6 Thomas <NA> <NA> 47
## 7 Thomas <NA> <NA> 48
## 8 Thomas <NA> <NA> 49
## 9 Thomas <NA> <NA> 50
## 10 Thomas <NA> <NA> 51
WQqry <- "SELECT Event_ID, Site_ID, Start_Date, Visit_Number, WQ_Time, Temp_C,
SpecficConductance, DO_mgl, DO_Percent, pH, WaterColor, Secchi_Depth_cm
FROM tbl_Events" #embed the SQL query language in an object
#use the RODBC command 'sqlQuery' to pass your query object to the connected database
SiteWQ <- sqlQuery(con, WQqry)
head(SiteWQ)
## Event_ID Site_ID Start_Date Visit_Number
## 1 804B8A3B-45C5-422B-A089-CCC82F48A178 BLUE.1001 2022-05-18 1
## 2 2C6ABDD1-339C-426F-B7C3-E1A3E2BD1086 BLUE.1002 2022-06-10 1
## 3 171826A0-006A-4215-B3D0-CB5FCFE75DE7 BLUE.1003 2022-06-09 1
## 4 608845CD-366A-4BC6-88D9-E8F76F7C72AA DEWA.3001 2022-06-16 1
## 5 E8597FCC-87F1-40EB-A76F-129D4853F069 DEWA.3001 2022-06-24 2
## 6 4D1303D7-A5E0-4595-B37B-48D12617CD9D DEWA.3001 2022-06-27 3
## WQ_Time Temp_C SpecficConductance DO_mgl DO_Percent pH
## 1 1899-12-30 10:03:00 12.53 102.4 10.29 97.9 7.57
## 2 1899-12-30 09:48:00 15.14 172.6 9.69 97.6 7.59
## 3 1899-12-30 10:07:00 18.12 105.4 8.94 95.9 7.64
## 4 1899-12-30 11:09:00 14.03 32.8 10.32 101.5 6.92
## 5 1899-12-30 09:57:00 13.10 39.0 10.31 99.3 6.97
## 6 1899-12-30 12:59:00 15.79 40.5 9.54 97.5 6.63
## WaterColor Secchi_Depth_cm
## 1 Clear 44
## 2 Clear 120
## 3 Clear 120
## 4 Brown 97
## 5 Clear 120
## 6 Clear 120
#another query from a different table
nameqry <- "SELECT Site_ID, Site_Name From tbl_Sites"
SiteNames <- sqlQuery(con, nameqry)
head(SiteNames)
## Site_ID Site_Name
## 1 ALPO.1001 Millstone Run
## 2 ALPO.2001 Blair Gap Run
## 3 BLUE.1001 Indian Branch
## 4 BLUE.1002 Mountain Creek
## 5 BLUE.1003 Little Bluestone River
## 6 BLUE.1004 Jarrell Branch
pHqry <- "SELECT Site_ID, avg(pH) AS Average_pH, max(pH) AS Max_pH, min(pH) AS Min_pH
FROM tbl_EVENTS
GROUP BY Site_ID"
pHWQ <- sqlQuery(con, pHqry)
head(pHWQ)
## Site_ID Average_pH Max_pH Min_pH
## 1 BLUE.1001 7.570000 7.57 7.57
## 2 BLUE.1002 7.590000 7.59 7.59
## 3 BLUE.1003 7.640000 7.64 7.64
## 4 DEWA.3001 6.840000 6.97 6.63
## 5 DEWA.3002 7.263333 7.44 7.12
## 6 DEWA.3003 6.830000 7.04 6.65
#merging the two dataframes created from the first two queries above
SiteWQ <- merge(SiteNames, SiteWQ, by="Site_ID")
head(SiteWQ)
## Site_ID Site_Name Event_ID
## 1 BLUE.1001 Indian Branch 804B8A3B-45C5-422B-A089-CCC82F48A178
## 2 BLUE.1002 Mountain Creek 2C6ABDD1-339C-426F-B7C3-E1A3E2BD1086
## 3 BLUE.1003 Little Bluestone River 171826A0-006A-4215-B3D0-CB5FCFE75DE7
## 4 DEWA.3001 Caledonia Creek 13 608845CD-366A-4BC6-88D9-E8F76F7C72AA
## 5 DEWA.3001 Caledonia Creek 13 E8597FCC-87F1-40EB-A76F-129D4853F069
## 6 DEWA.3001 Caledonia Creek 13 4D1303D7-A5E0-4595-B37B-48D12617CD9D
## Start_Date Visit_Number WQ_Time Temp_C SpecficConductance DO_mgl
## 1 2022-05-18 1 1899-12-30 10:03:00 12.53 102.4 10.29
## 2 2022-06-10 1 1899-12-30 09:48:00 15.14 172.6 9.69
## 3 2022-06-09 1 1899-12-30 10:07:00 18.12 105.4 8.94
## 4 2022-06-16 1 1899-12-30 11:09:00 14.03 32.8 10.32
## 5 2022-06-24 2 1899-12-30 09:57:00 13.10 39.0 10.31
## 6 2022-06-27 3 1899-12-30 12:59:00 15.79 40.5 9.54
## DO_Percent pH WaterColor Secchi_Depth_cm
## 1 97.9 7.57 Clear 44
## 2 97.6 7.59 Clear 120
## 3 95.9 7.64 Clear 120
## 4 101.5 6.92 Brown 97
## 5 99.3 6.97 Clear 120
## 6 97.5 6.63 Clear 120
Microsoft Access isn’t included in MS Office for macOS This means although you don’t need to have the Microsoft Access file opened to use in R, you won’t have the Microsoft Access Driver needed. You could use a virtual machine on your Mac to run window applications or dual boot Windows and macOS on your machine…
Instead of being able to connect to the Microsoft Access driver
through a line of code as per the package documentation, we had to
manually create a driver as outlined previously. Even then, only 2 of us
were able to get R to connect to the DSN, as the computer is still
unable to find the DSN or there is a mismatch between the Driver, the
Database, and R.
Using odbcConnectAccess() will sometimes give an error message indicating that odbcConnectAccess is only usable with 32-bit Windows. This can sometimes occur if you have 32 bit office installed on your machine and using 64-bit R… or for other unknown causes in our situation… (we got this error even though we had 64-bit R and Microsoft).
One way to potentially fix this is if you have R that has both 32-bit and 64-bit versions so you can switch to 32-bit, or download an older R version with 32-bit. The downside to switching to 32-bit is that you may come across issues such as memory issues. Then, make sure you have the 32-bit version of the Driver and proceed to establish the connection the same way as steps previously mentioned. Although, we tried this workaround and were still unable to get it to work.
If you are familiar with arguments for SQL queries with the ‘DBI’ package, the arguments and syntax look a little different under the ‘RODBC’ package. This made it a little more challenging than we initially thought to just transfer the queries we learned to do in class.