Email             :
Instagram     : https://www.instagram.com/dsciencelabs
RPubs            : https://rpubs.com/dsciencelabs/
Github           : https://github.com/dsciencelabs/
Telegram       : @dsciencelabs
Department  : Business Statistics
Address         : ARA Center, Matana University Tower
                         Jl. CBD Barat Kav, RT.1, Curug Sangereng, Kelapa Dua, Tangerang, Banten 15810.




1 Introduction

A database is a structured set of data. Terminology is a little bit different when working with a database management system compared to working with data in R.

  • field: variable or quantity
  • record: collection of fields
  • table: collection of records with all the same fields
  • database: collection of tables

The relationship between R terminology and database terminology is explained below.

R terminology Database terminology
column field
row record
data frame table
types of columns table schema
collection of data frames database

SQL (Structured Query Language) allows you to directly interact with a database and perform tasks such as pull data and make updates. R has two packages available that make it easy to run SQL queries: DBI,RMariaDB,RMySQL, and RSQLite.


2 Getting Started

To get started, install the following packages in your R Console pane.

install.packages(c(
                   "RMariaDB",
                   "RMySQL",
                   "RSQLite",
                   "RPostgres"
                   )
                 )

Then, load all these requirement packages:

# install.packages("pacman")
pacman::p_load(
               RMariaDB,             # Database Interface and 'MariaDB' Driver
               RMySQL,               # Database Interface and 'RMySQL' Driver
               RSQLite,              # Database Interface and 'RSQLite' Driver
               RPostgres           # Database Interface and 'RPostgres' Driver
          )

3 Connecting R to SQL

There are many ways to connect your database with R. This article shows you three of the most common ways:

MariaDB <- dbConnect(RMariaDB::MariaDB(), 
                  user='root',
                  password='', 
                  dbname='bakti', 
                  host='localhost')
dbListTables(MariaDB)                             # table list on your database
dbExecute(MariaDB,"CREATE DATABASE new_MariaDB")  # Create a new Database
dbExecute(MariaDB,"DROP DATABASE new_MariaDB")    # Drop a Database
MySQL <- dbConnect(MySQL(), 
                  user='root',
                  password='', 
                  dbname='bakti', 
                  host='localhost')
dbListTables(MySQL)                               # table list on your database
dbExecute(MySQL,"CREATE DATABASE new_MySQL")      # Create a new Database
dbExecute(MySQL,"DROP DATABASE new_MySQL")        # Drop a Database
postgres <- dbConnect(RPostgres::Postgres(), 
                  user='postgres',
                  password='1234', 
                  dbname='postgres', 
                  host='localhost')
dbListTables(postgres)                               # table list on your database
dbExecute(postgres,"CREATE DATABASE new_MySQL")      # Create a new Database
dbExecute(postgres,"DROP DATABASE new_MySQL")        # Drop a Database
RSQLite <- dbConnect(RSQLite::SQLite(), "folder_db/mydb3.sqlite")
dbListTables(RSQLite)                             # table list on your database
  • Notes: RSQLite will store the database you created in your current working directory.

4 Import Data

This section can be ignored if the data (table) that you need is already registered in your database. If not, then it is necessary to import data set according to your available files (here, I guide you with files CSV and XLSX). In your practice, you can choose either one.

4.1 CSV Files

Customers   <-read.csv("data/Customers.csv")
Categories  <-read.csv("data/Categories.csv")  
Employees   <-read.csv("data/Employees.csv")  
OrderDetails<-read.csv("data/OrderDetails.csv")  
Orders      <-read.csv("data/Orders.csv")  
Products    <-read.csv("data/Products.csv")  
Shippers    <-read.csv("data/Shippers.csv")
Suppliers   <-read.csv("data/Suppliers.csv")  

4.2 XLSX Files

library("readxl")                                  
Customers   <-read_excel("data/RawDatabase.xlsx",sheet=1)      
Categories  <-read_excel("data/RawDatabase.xlsx",sheet=2) 
Employees   <-read_excel("data/RawDatabase.xlsx",sheet=3)
OrderDetails<-read_excel("data/RawDatabase.xlsx",sheet=4) 
Orders      <-read_excel("data/RawDatabase.xlsx",sheet=5)
Products    <-read_excel("data/RawDatabase.xlsx",sheet=6) 
Shippers    <-read_excel("data/RawDatabase.xlsx",sheet=7)  
Suppliers   <-read_excel("data/RawDatabase.xlsx",sheet=8)  

5 Write Dataframe to Database

The key here is the dbWriteTable function which allows us to write an R data frame directly to a database table. The data frame’s column names will be used as the database table’s fields. In the following example I use RMariaDB connection, you can apply another driver as you like.

new_con <- dbConnect(MariaDB(), 
                  user='root',
                  password='', 
                  dbname='new_mariadb', 
                  host='localhost')

dbWriteTable(new_con, "Customers", Customers, append=T) 
dbWriteTable(new_con, "Categories", Categories, append=T) 
dbWriteTable(new_con, "Employees", Employees, append=T) 
dbWriteTable(new_con, "OrderDetails", OrderDetails, append=T) 
dbWriteTable(new_con, "Orders", Orders, append=T) 
dbWriteTable(new_con, "Products", Products, append=T) 
dbWriteTable(new_con, "Shippers", Shippers, append=T) 
dbWriteTable(new_con, "Suppliers", Suppliers, append=T) 

Note: Some important things that must be considered when storing table data are as follows:

  • Data Structure adjustments
  • Changes Data type (especially, Date and Time)

In this case, we have a problem with the data table Employees and Orders. When you consider these Table (Employees and Orders), you will find there is no date are written correctly in the database. In order to handle this problem, just type the following code in your R console:

dbRemoveTable(new_con, "Orders")
Orders["OrderDate"] <-as.Date(Orders$OrderDate, format = "%Y-%m-%d") 
dbWriteTable(new_con, "Orders", Orders, append=T) 

Your Exercise: Do the same thing to update data table Employees

6 General Query

6.1 SELECT FROM

The SELECT statement is used to select data from a database.

library(DT)
df1<-dbGetQuery(new_con,'SELECT City, Address,Country
                         FROM Customers')
datatable(df1)

6.2 WHERE

The WHERE clause is used to filter records, extract only those records that fulfill a specified condition.

df2<-dbGetQuery(new_con,"SELECT * 
                         FROM Customers 
                         WHERE Country='Germany'")
datatable(df2)

6.3 INSERT INTO

If you are adding values for all the columns of the table, you do not need to specify the column names in the SQL query. However, make sure the order of the values is in the same order as the columns in the table. The INSERT INTO syntax would be as follows:

dbExecute(new_con,"INSERT INTO Customers(CustomerName,ContactName,Address,City,PostalCode, Country)
                   VALUES('Bakti','Siregar','Jl.Bahagia Selalu','Tangerang','081369','Indonesia')")

6.4 DELETE FROM

The DELETE statement is used to delete existing records in a table.

dbExecute(new_con,"DELETE FROM Customers
                   WHERE CustomerName ='Bakti' ")  

6.5 UPDATE

The UPDATE statement is used to modify the existing records in a table.

dbExecute(new_con,"UPDATE Customers
                   SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
                   WHERE CustomerID = 1")

6.6 Disconnect Database

If you are done with the query process and you don’t want to use it anymore, you should disconnect the connection from your database.

dbDisconnect(new_con)                                        # disconnect from your database