#install.packages('odbc')
#install.packages('DBI')
# load the packages
library(odbc)
library(DBI)dbconnect
Overview
Working and connecting with databases in R requires two important packages namely, ‘odbc’ package odbc package documentation. The goal of the odbc package is to provide a DBI-compliant interface to ODBC drivers. This makes it easy to connect databases such as SQL Server, Oracle, Databricks, and Snowflake. Secondly, the ‘DBI’ package DBI package documentation which helps connecting R to database management systems.
connecting to database
con <- DBI::dbConnect(drv = odbc::odbc(),
Driver = "driver_name",
Server = "server_url",
Database = "database_name",
user = "user",
password = "password" )dbConnect()is a function from the DBI package that creates a connection to a database.odbc::odbc()specifies that the connection is made using the odbc package, which is a wrapper for Open Database Connectivity (ODBC)Connection Parameters:
Driver = “driver_name”: Specifies the ODBC driver to use. The driver must be installed and configured in your system. for instance “SQL Server for Microsoft SQL Server.
Server = “server_url” : specifies the the database server url or IP. eg “localhost” for a locally hosted database or “192.168.1.12” for a hosted database.
Database = “database_name” : specifies the name of the database you want to connect to.
user = “user”: The correct username to authenticate the connection
password = “password”: The corresponding password for authentication
Let’s create a sample database to this in action
we will require these packages to execute this:
RSQL package RSQL Ppackage to generate and process SQL Queries in R,
RSQLite package RSQLite package which embeds the SQLite database engine in R, providing a DBI-compliant interface.
#install.packages('RSQL')
#install.packages('RSQLite')
# load the packages
library(RSQL)
library(RSQLite)
library(tidyverse)sample data
we are going to use two datasets from the tidyverse package, population and who. Both datasets were created by the World Health Organisation as part of their mandate in tracking TB cases between 1980 and 2013.
preview the data
library(tidyr)
#data("population")
#head(population)
data("who")
head(who)# A tibble: 6 × 60
country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Afghanis… AF AFG 1980 NA NA NA NA
2 Afghanis… AF AFG 1981 NA NA NA NA
3 Afghanis… AF AFG 1982 NA NA NA NA
4 Afghanis… AF AFG 1983 NA NA NA NA
5 Afghanis… AF AFG 1984 NA NA NA NA
6 Afghanis… AF AFG 1985 NA NA NA NA
# ℹ 52 more variables: new_sp_m4554 <dbl>, new_sp_m5564 <dbl>,
# new_sp_m65 <dbl>, new_sp_f014 <dbl>, new_sp_f1524 <dbl>,
# new_sp_f2534 <dbl>, new_sp_f3544 <dbl>, new_sp_f4554 <dbl>,
# new_sp_f5564 <dbl>, new_sp_f65 <dbl>, new_sn_m014 <dbl>,
# new_sn_m1524 <dbl>, new_sn_m2534 <dbl>, new_sn_m3544 <dbl>,
# new_sn_m4554 <dbl>, new_sn_m5564 <dbl>, new_sn_m65 <dbl>,
# new_sn_f014 <dbl>, new_sn_f1524 <dbl>, new_sn_f2534 <dbl>, …
Create a placeholder for a database and store it in he con object . You can view the database using the dblistTables function which will return a list of tables stored within the database which in our case is currently empty.
library(DBI)
con <- dbConnect(drv = RSQLite::SQLite(),
dbname = ":memory:")
dbListTables(con)character(0)
using dbWriteTable function to store the who and population tables in SQL database. again we’ll use the dbListTables function to verify that they were successfully loaded.
# load
dbWriteTable(conn = con,
name = "population",
value = population)
# load the who table
dbWriteTable(conn = con,
name = "who",
value = who)
dbListTables(con)[1] "population" "who"
Can we query from our database
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
results <- tbl( src = con, "who" ) |>
filter(country %in% c("Kenya", "Nigeria"),
year >= 2000,
year <= 2013) |>
select(country, year, new_sp_m014) |>
left_join(y = tbl(src = con, "population"),
by = c("country", "year")) |>
collect()
print(results)# A tibble: 28 × 4
country year new_sp_m014 population
<chr> <dbl> <dbl> <dbl>
1 Kenya 2000 264 31285050
2 Kenya 2001 299 32126351
3 Kenya 2002 299 33000524
4 Kenya 2003 341 33905011
5 Kenya 2004 391 34834606
6 Kenya 2005 359 35785718
7 Kenya 2006 387 36757498
8 Kenya 2007 474 37752304
9 Kenya 2008 451 38773277
10 Kenya 2009 470 39824734
# ℹ 18 more rows