Email : dsciencelabs@outlook.com
Instagram : https://www.instagram.com/dsciencelabs
RPubs
: https://rpubs.com/dsciencelabs/
Github :
https://github.com/dsciencelabs/
Telegram :
@dsciencelabs
Department : Business
Statistics
Address : ARA Center, Matana University
Tower
Jl. CBD Barat Kav, RT.1, Curug
Sangereng, Kelapa Dua, Tangerang, Banten 15810.
A database is a structured set of data. Terminology is a little bit different when working with a database management system compared to working with data in R.
The relationship between R terminology and database terminology is explained below.
R terminology | Database terminology |
---|---|
column | field |
row | record |
data frame | table |
types of columns | table schema |
collection of data | frames database |
SQL (Structured Query Language) allows you to directly interact with a database and perform tasks such as pull data and make updates. R has two packages available that make it easy to run SQL queries: DBI,RMariaDB,RMySQL, and RSQLite.
To get started, install the following packages in your R Console pane.
install.packages(c(
"RMariaDB",
"RMySQL",
"RSQLite",
"RPostgres"
) )
Then, load all these requirement packages:
# install.packages("pacman")
::p_load(
pacman# Database Interface and 'MariaDB' Driver
RMariaDB, # Database Interface and 'RMySQL' Driver
RMySQL, # Database Interface and 'RSQLite' Driver
RSQLite, # Database Interface and 'RPostgres' Driver
RPostgres )
There are many ways to connect your database with R. This article shows you three of the most common ways:
<- dbConnect(RMariaDB::MariaDB(),
MariaDB user='root',
password='',
dbname='bakti',
host='localhost')
dbListTables(MariaDB) # table list on your database
dbExecute(MariaDB,"CREATE DATABASE new_MariaDB") # Create a new Database
dbExecute(MariaDB,"DROP DATABASE new_MariaDB") # Drop a Database
<- dbConnect(MySQL(),
MySQL user='root',
password='',
dbname='bakti',
host='localhost')
dbListTables(MySQL) # table list on your database
dbExecute(MySQL,"CREATE DATABASE new_MySQL") # Create a new Database
dbExecute(MySQL,"DROP DATABASE new_MySQL") # Drop a Database
<- dbConnect(RPostgres::Postgres(),
postgres user='postgres',
password='1234',
dbname='postgres',
host='localhost')
dbListTables(postgres) # table list on your database
dbExecute(postgres,"CREATE DATABASE new_MySQL") # Create a new Database
dbExecute(postgres,"DROP DATABASE new_MySQL") # Drop a Database
<- dbConnect(RSQLite::SQLite(), "folder_db/mydb3.sqlite")
RSQLite dbListTables(RSQLite) # table list on your database
This section can be ignored if the data (table) that you need is already registered in your database. If not, then it is necessary to import data set according to your available files (here, I guide you with files CSV and XLSX). In your practice, you can choose either one.
<-read.csv("data/Customers.csv")
Customers <-read.csv("data/Categories.csv")
Categories <-read.csv("data/Employees.csv")
Employees <-read.csv("data/OrderDetails.csv")
OrderDetails<-read.csv("data/Orders.csv")
Orders <-read.csv("data/Products.csv")
Products <-read.csv("data/Shippers.csv")
Shippers <-read.csv("data/Suppliers.csv") Suppliers
library("readxl")
<-read_excel("data/RawDatabase.xlsx",sheet=1)
Customers <-read_excel("data/RawDatabase.xlsx",sheet=2)
Categories <-read_excel("data/RawDatabase.xlsx",sheet=3)
Employees <-read_excel("data/RawDatabase.xlsx",sheet=4)
OrderDetails<-read_excel("data/RawDatabase.xlsx",sheet=5)
Orders <-read_excel("data/RawDatabase.xlsx",sheet=6)
Products <-read_excel("data/RawDatabase.xlsx",sheet=7)
Shippers <-read_excel("data/RawDatabase.xlsx",sheet=8) Suppliers
The key here is the dbWriteTable
function which allows
us to write an R data frame directly to a database table. The data
frame’s column names will be used as the database table’s fields. In the
following example I use RMariaDB
connection, you can apply
another driver as you like.
<- dbConnect(MariaDB(),
new_con user='root',
password='',
dbname='new_mariadb',
host='localhost')
dbWriteTable(new_con, "Customers", Customers, append=T)
dbWriteTable(new_con, "Categories", Categories, append=T)
dbWriteTable(new_con, "Employees", Employees, append=T)
dbWriteTable(new_con, "OrderDetails", OrderDetails, append=T)
dbWriteTable(new_con, "Orders", Orders, append=T)
dbWriteTable(new_con, "Products", Products, append=T)
dbWriteTable(new_con, "Shippers", Shippers, append=T)
dbWriteTable(new_con, "Suppliers", Suppliers, append=T)
Note: Some important things that must be considered when storing table data are as follows:
In this case, we have a problem with the data table
Employees
and Orders
. When you consider these
Table (Employees and Orders), you will find there is no date are written
correctly in the database. In order to handle this problem, just type
the following code in your R console:
dbRemoveTable(new_con, "Orders")
"OrderDate"] <-as.Date(Orders$OrderDate, format = "%Y-%m-%d") Orders[
dbWriteTable(new_con, "Orders", Orders, append=T)
Your Exercise: Do the same thing to update data table
Employees
The SELECT statement is used to select data from a database.
library(DT)
<-dbGetQuery(new_con,'SELECT City, Address,Country
df1 FROM Customers')
datatable(df1)
The WHERE clause is used to filter records, extract only those records that fulfill a specified condition.
<-dbGetQuery(new_con,"SELECT *
df2 FROM Customers
WHERE Country='Germany'")
datatable(df2)
If you are adding values for all the columns of the table, you do not need to specify the column names in the SQL query. However, make sure the order of the values is in the same order as the columns in the table. The INSERT INTO syntax would be as follows:
dbExecute(new_con,"INSERT INTO Customers(CustomerName,ContactName,Address,City,PostalCode, Country)
VALUES('Bakti','Siregar','Jl.Bahagia Selalu','Tangerang','081369','Indonesia')")
The DELETE statement is used to delete existing records in a table.
dbExecute(new_con,"DELETE FROM Customers
WHERE CustomerName ='Bakti' ")
The UPDATE statement is used to modify the existing records in a table.
dbExecute(new_con,"UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1")
If you are done with the query process and you don’t want to use it anymore, you should disconnect the connection from your database.
dbDisconnect(new_con) # disconnect from your database