dbconnect

Author

Keith Karani

Overview

Working and connecting with databases in R requires two important packages namely, ‘odbc’ package odbc package documentation. The goal of the odbc package is to provide a DBI-compliant interface to ODBC drivers. This makes it easy to connect databases such as SQL Server, Oracle, Databricks, and Snowflake. Secondly, the ‘DBI’ package DBI package documentation which helps connecting R to database management systems.

#install.packages('odbc')
#install.packages('DBI')

# load the packages
library(odbc)
library(DBI)

connecting to database

con <- DBI::dbConnect(drv = odbc::odbc(),
                      Driver = "driver_name",
                      Server = "server_url",
                      Database = "database_name",
                      user = "user",
                      password = "password" )
  • dbConnect() is a function from the DBI package that creates a connection to a database.

  • odbc::odbc() specifies that the connection is made using the odbc package, which is a wrapper for Open Database Connectivity (ODBC)

    Connection Parameters:

  • Driver = “driver_name”: Specifies the ODBC driver to use. The driver must be installed and configured in your system. for instance “SQL Server for Microsoft SQL Server.

  • Server = “server_url” : specifies the the database server url or IP. eg “localhost” for a locally hosted database or “192.168.1.12” for a hosted database.

  • Database = “database_name” : specifies the name of the database you want to connect to.

  • user = “user”: The correct username to authenticate the connection

  • password = “password”: The corresponding password for authentication

Let’s create a sample database to this in action

we will require these packages to execute this:

  1. RSQL package RSQL Ppackage to generate and process SQL Queries in R,

  2. RSQLite package RSQLite package which embeds the SQLite database engine in R, providing a DBI-compliant interface.

#install.packages('RSQL')
#install.packages('RSQLite')


# load the packages
library(RSQL)
library(RSQLite)
library(tidyverse)

sample data

we are going to use two datasets from the tidyverse package, population and who. Both datasets were created by the World Health Organisation as part of their mandate in tracking TB cases between 1980 and 2013.

preview the data

library(tidyr)
#data("population")
#head(population)


data("who")
head(who)
# A tibble: 6 × 60
  country   iso2  iso3   year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544
  <chr>     <chr> <chr> <dbl>       <dbl>        <dbl>        <dbl>        <dbl>
1 Afghanis… AF    AFG    1980          NA           NA           NA           NA
2 Afghanis… AF    AFG    1981          NA           NA           NA           NA
3 Afghanis… AF    AFG    1982          NA           NA           NA           NA
4 Afghanis… AF    AFG    1983          NA           NA           NA           NA
5 Afghanis… AF    AFG    1984          NA           NA           NA           NA
6 Afghanis… AF    AFG    1985          NA           NA           NA           NA
# ℹ 52 more variables: new_sp_m4554 <dbl>, new_sp_m5564 <dbl>,
#   new_sp_m65 <dbl>, new_sp_f014 <dbl>, new_sp_f1524 <dbl>,
#   new_sp_f2534 <dbl>, new_sp_f3544 <dbl>, new_sp_f4554 <dbl>,
#   new_sp_f5564 <dbl>, new_sp_f65 <dbl>, new_sn_m014 <dbl>,
#   new_sn_m1524 <dbl>, new_sn_m2534 <dbl>, new_sn_m3544 <dbl>,
#   new_sn_m4554 <dbl>, new_sn_m5564 <dbl>, new_sn_m65 <dbl>,
#   new_sn_f014 <dbl>, new_sn_f1524 <dbl>, new_sn_f2534 <dbl>, …

Create a placeholder for a database and store it in he con object . You can view the database using the dblistTables function which will return a list of tables stored within the database which in our case is currently empty.

library(DBI)

con <- dbConnect(drv = RSQLite::SQLite(),
                 dbname = ":memory:")


dbListTables(con)
character(0)

using dbWriteTable function to store the who and population tables in SQL database. again we’ll use the dbListTables function to verify that they were successfully loaded.

# load
dbWriteTable(conn = con, 
             name = "population",
             value = population)

# load the who table
dbWriteTable(conn = con,
             name = "who",
             value = who)

dbListTables(con)
[1] "population" "who"       

Can we query from our database

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
results <- tbl( src = con, "who" ) |> 
  filter(country %in% c("Kenya", "Nigeria"),
         year >= 2000,
         year <= 2013) |> 
  select(country, year, new_sp_m014) |> 
  left_join(y = tbl(src = con, "population"), 
            by = c("country", "year")) |> 
  collect()

print(results)
# A tibble: 28 × 4
   country  year new_sp_m014 population
   <chr>   <dbl>       <dbl>      <dbl>
 1 Kenya    2000         264   31285050
 2 Kenya    2001         299   32126351
 3 Kenya    2002         299   33000524
 4 Kenya    2003         341   33905011
 5 Kenya    2004         391   34834606
 6 Kenya    2005         359   35785718
 7 Kenya    2006         387   36757498
 8 Kenya    2007         474   37752304
 9 Kenya    2008         451   38773277
10 Kenya    2009         470   39824734
# ℹ 18 more rows