Email             :
Instagram     : https://www.instagram.com/dsciencelabs
RPubs            : https://rpubs.com/dsciencelabs/
Github           : https://github.com/dsciencelabs/
Telegram       : @dsciencelabs
Department  : Business Statistics
Address         : ARA Center, Matana University Tower
                         Jl. CBD Barat Kav, RT.1, Curug Sangereng, Kelapa Dua, Tangerang, Banten 15810.




In the previous section, we learned how to connect R to a Database System (SQL) Such as RMariaDB, RMySQL, and RSQLite. In this section, we continue to cover all that you have to know about fundamental operations in SQL (Here, focus on RMySQL).

1 Connecting MySQL

Connecting to MySQL is made very easy with the RMySQL package. To connect to a MySQL database simply install the package and load the library.

# install.packages("pacman")
pacman::p_load(DBI,                  # DBI package helps connecting R to DBMS
               odbc,                 # Connect to ODBC Compatible Databases
               RMariaDB,             # Database Interface and 'MariaDB' Driver
               RMySQL,               # Database Interface and 'RMySQL' Driver
               RSQLite               # Database Interface and 'RSQLite' Driver
          )
MySQL <- dbConnect(MySQL(), 
                  user='root',
                  password='', 
                  dbname='mysql', 
                  host='localhost')
dbListTables(MySQL)                  # a list of the tables in our connection

Note: Open and your XAMPP, click start on Apache and MySQL. Then, make sure you have the admin privilege before creating any database.


1.1 Create DB

If you want to create a new database, then the CREATE DATABASE statement would be as shown below:

dbExecute(MySQL, "CREATE DATABASE factory_db")

The result show us 1, means that you have succeeded to create a database.


1.2 Drop DB

If you want to delete an existing database, then the DROP DATABASE statement would be as shown below:

dbExecute(MySQL, "DROP DATABASE factory_db")

The result show us 0, means that you have succeeded to remove (Drop) a database.


2 Create Table

Once you have a database, you can continue to create table as shown below:

dbExecute(MySQL, "CREATE TABLE Persons(
                 PersonID int,
                 LastName varchar(255),
                 FirstName varchar(255),
                 Address varchar(255),
                 City varchar(255))")

2.1 Insert Value

If you are adding values for all the columns of the table, you do not need to specify the column names in the SQL query. However, make sure the order of the values is in the same order as the columns in the table. The INSERT INTO syntax would be as follows:

dbExecute(MySQL,"INSERT INTO Persons(PersonID,LastName,FirstName, Address,City)
                 VALUES(1,'Siregar','Bakti', 'Jl.Bahagia','Tangerang')")

2.2 Truncate Table

The TRUNCATE TABLE statement is used to delete the data inside a table, but not the table itself.

dbExecute(MySQL, "TRUNCATE TABLE Persons")

2.3 Drop Table

The DROP TABLE statement is used to drop an existing table in a database.

dbExecute(MySQL, "DROP TABLE Persons")

2.4 Write Table

The key here is the dbWriteTable function which allows us to write an R data frame directly to a database table. The data frame’s column names will be used as the database table’s fields.

Orders      <-read.csv("data/Orders.csv")  
dbWriteTable(MySQL, "Orders", Orders, append=T) 

3 Alter Table

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table. The ALTER TABLE statement is also used to add and drop various constraints on an existing table.


3.1 Add Column

To add a column in a table, use the following syntax:

dbExecute(MySQL, "ALTER TABLE Orders
                 ADD Email varchar(255)")

3.2 Drop Column

To delete a column in a table, use the following syntax (notice that some database systems don’t allow deleting a column):

dbSendQuery(MySQL, "ALTER TABLE Orders
                   DROP COLUMN Email")

3.3 Modify Column

dbSendQuery(MySQL," ALTER TABLE Orders
                  MODIFY COLUMN OrderDate date")

4 Constraints

SQL constraints are used to specify rules for the data in a table. Constraints are used to limit the type of data that can go into a table. This ensures the accuracy and reliability of the data in the table. If there is any violation between the constraint and the data action, the action is aborted.

Constraints can be column level or table level. Column level constraints apply to a column, and table level constraints apply to the whole table. The following constraints are commonly used in SQL:

  • NOT NULL: Ensures that a column cannot have a NULL value
  • UNIQUE: Ensures that all values in a column are different
  • PRIMARY KEY: A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table
  • FOREIGN KEY: Uniquely identifies a row/record in another table
  • CHECK: Ensures that all values in a column satisfies a specific condition
  • DEFAULT: Sets a default value for a column when no value is specified
  • INDEX: Used to create and retrieve data from the database very quickly

4.1 Not Null

The following SQL ensures that the “ID”, “LastName”, and “FirstName” columns will NOT accept NULL values when the “Persons_NotNull” table is created:

dbSendQuery(MySQL,"CREATE TABLE Person_NotNull (
                   ID int NOT NULL,
                   LastName varchar(255) NOT NULL,
                   FirstName varchar(255) NOT NULL,
                   Age int)")

4.2 Unique

The following SQL creates a UNIQUE constraint on the “ID” column when the “Persons” table is created:

dbSendQuery(MySQL,"CREATE TABLE Persons_Unique (ID int NOT NULL UNIQUE,
                   LastName varchar(255) NOT NULL,
                   FirstName varchar(255) NOT NULL,
                   Age int)")

To create a UNIQUE constraint on the “ID” column when the table is already created, use the following SQL:

dbSendQuery(MySQL,"ALTER TABLE Persons_Unique
                   ADD UNIQUE (ID)")

To define a UNIQUE constraint on multiple columns, use the following SQL syntax:

dbSendQuery(MySQL,"ALTER TABLE Persons_Unique
                  ADD CONSTRAINT UNIQUE (ID,LastName)"

To drop a UNIQUE constraint, use the following SQL:

dbSendQuery(MySQL,"ALTER TABLE Persons_Unique
                  DROP INDEX ID")

4.3 Primary Key

The PRIMARY KEY constraint uniquely identifies each record in a table. Primary keys must contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).

dbSendQuery(MySQL,"CREATE TABLE Persons_PK (ID int NOT NULL PRIMARY KEY,
                   LastName varchar(255) NOT NULL,
                   FirstName varchar(255),
                   Age int)")

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns, use the following SQL syntax:

dbSendQuery(MySQL,"CREATE TABLE Persons_PK (ID int NOT NULL,
                  LastName varchar(255) NOT NULL,
                  FirstName varchar(255),
                  Age int,
                  CONSTRAINT Persons_PK PRIMARY KEY (ID,LastName))")

To create a PRIMARY KEY constraint on the “ID” column when the table is already created, use the following SQL:

dbSendQuery(MySQL, "ALTER TABLE Persons_PK
                   ADD PRIMARY KEY (ID)")

4.4 Foreign Key

A FOREIGN KEY is a key used to link two tables together. A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY KEY in another table. The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.

Look at the following two tables:

  • “Persons” table:
PersonID LastName FirstName Age
1 Xi Bakti 28
2 Li Chong 23
3 Gou Mei 20
  • “Orders” table:
OrderID OrderNumber PersonID
1 77895 3
2 44678 3

Notice that the “PersonID” column in the “Orders” table points to the “PersonID” column in the “Persons” table.

  • The “PersonID” column in the “Persons” table is the PRIMARY KEY in the “Persons” table.
  • The “PersonID” column in the “Orders” table is a FOREIGN KEY in the “Orders” table.
  • The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables.
  • The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign key column, because it has to be one of the values contained in the table it points to.

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple columns, use the following SQL syntax:

dbSendQuery(MySQL,
"CREATE TABLE Orders (OrderID int NOT NULL,
                     OrderNumber int NOT NULL,
                     PersonID int,
                     CONSTRAINT FOREIGN KEY (PersonID))")

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple columns, use the following SQL syntax:

dbSendQuery(MySQL,
"CREATE TABLE Orders (
    OrderID int NOT NULL,
    OrderNumber int NOT NULL,
    PersonID int,
    PRIMARY KEY (OrderID),
    FOREIGN KEY (PersonID) REFERENCES Persons_pk (PersonID))")

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple columns, use the following SQL syntax:

dbSendQuery(MySQL,
"ALTER TABLE Orders
ADD CONSTRAINT FK_PersonOrder
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)")

4.5 Check

The CHECK constraint is used to limit the value range that can be placed in a column. If you define a CHECK constraint on a single column it allows only certain values for this column. If you define a CHECK constraint on a table it can limit the values in certain columns based on values in other columns in the row. The following SQL creates a CHECK constraint on the “Age” column when the “Persons” table is created. The CHECK constraint ensures that the age of a person must be 18, or older:

dbSendQuery(MySQL,
"CREATE TABLE Persons (ID int NOT NULL,
                      LastName varchar(255) NOT NULL,
                      FirstName varchar(255),
                      Age int,
                      CHECK (Age>=18))")

To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple columns, use the following SQL syntax:

dbSendQuery(MySQL,
"CREATE TABLE Persons (
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    City varchar(255),
    CONSTRAINT CHK_Person CHECK (Age>=18 AND City='Sandnes'))")

To create a CHECK constraint on the “Age” column when the table is already created, use the following SQL:

dbSendQuery(MySQL, "ALTER TABLE Persons
                         ADD CHECK (Age>=18)")

To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple columns, use the following SQL syntax:

dbSendQuery(MySQL, "ALTER TABLE Persons
                    ADD CONSTRAINT CHK_PersonAge 
                    CHECK (Age>=18 AND City='Sandnes')")

4.6 Default

The DEFAULT constraint is used to provide a default value for a column. The default value will be added to all new records IF no other value is specified. The following SQL sets a DEFAULT value for the “City” column when the “Persons” table is created:

dbSendQuery(MySQL,
"CREATE TABLE Persons_default (ID int NOT NULL,
                              LastName varchar(255) NOT NULL,
                              FirstName varchar(255),
                              Age int,
                              City varchar(255) DEFAULT 'Sandnes')")

To create a DEFAULT constraint on the “City” column when the table is already created, use the following SQL:

dbSendQuery(MySQL,"ALTER TABLE Persons
                        ALTER City SET DEFAULT 'Sandnes'")

4.7 Index

The CREATE INDEX statement is used to create indexes in tables. Indexes are used to retrieve data from the database more quickly than otherwise. The users cannot see the indexes, they are just used to speed up searches/queries. Creates an index on a table. Duplicate values are allowed:

dbSendQuery(MySQL, "CREATE INDEX idx_pname
                   ON Persons (LastName, FirstName)")

Note: Updating a table with indexes takes more time than updating a table without (because the indexes also need an update). So, only create indexes on columns that will be frequently searched against.


4.8 Auto Increment

Auto-increment allows a unique number to be generated automatically when a new record is inserted into a table. Often this is the primary key field that we would like to be created automatically every time a new record is inserted. The following SQL statement defines the “Personid” column to be an auto-increment primary key field in the “Persons” table:

dbSendQuery(MySQL,
"CREATE TABLE Persons_ai (
    Personid int NOT NULL AUTO_INCREMENT,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    PRIMARY KEY (Personid))")