In the previous section, we learned how to connect R to a Database
System (SQL) Such as RMariaDB, RMySQL, and RSQLite. In this section, we
continue to cover all that you have to know about fundamental operations
in SQL (Here, focus on RMySQL).
Connecting MySQL
Connecting to MySQL is made very easy with the RMySQL
package. To connect to a MySQL database simply install the package and
load the library.
# install.packages("pacman")
pacman::p_load(DBI, # DBI package helps connecting R to DBMS
odbc, # Connect to ODBC Compatible Databases
RMariaDB, # Database Interface and 'MariaDB' Driver
RMySQL, # Database Interface and 'RMySQL' Driver
RSQLite # Database Interface and 'RSQLite' Driver
)
MySQL <- dbConnect(MySQL(),
user='root',
password='',
dbname='mysql',
host='localhost')
dbListTables(MySQL) # a list of the tables in our connection
Note: Open and your XAMPP, click start on Apache and MySQL.
Then, make sure you have the admin privilege before creating any
database.
Create DB
If you want to create a new database, then the CREATE DATABASE
statement would be as shown below:
dbExecute(MySQL, "CREATE DATABASE factory_db")
The result show us 1
, means that you have succeeded to
create a database.
Drop DB
If you want to delete an existing database, then the DROP DATABASE
statement would be as shown below:
dbExecute(MySQL, "DROP DATABASE factory_db")
The result show us 0
, means that you have succeeded to
remove (Drop) a database.
Create Table
Once you have a database, you can continue to create table as shown
below:
dbExecute(MySQL, "CREATE TABLE Persons(
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255))")
Insert Value
If you are adding values for all the columns of the table, you do not
need to specify the column names in the SQL query. However, make sure
the order of the values is in the same order as the columns in the
table. The INSERT INTO syntax would be as follows:
dbExecute(MySQL,"INSERT INTO Persons(PersonID,LastName,FirstName, Address,City)
VALUES(1,'Siregar','Bakti', 'Jl.Bahagia','Tangerang')")
Truncate Table
The TRUNCATE TABLE statement is used to delete the data inside a
table, but not the table itself.
dbExecute(MySQL, "TRUNCATE TABLE Persons")
Drop Table
The DROP TABLE statement is used to drop an existing table in a
database.
dbExecute(MySQL, "DROP TABLE Persons")
Write Table
The key here is the dbWriteTable
function which allows
us to write an R data frame directly to a database table. The data
frame’s column names will be used as the database table’s fields.
Orders <-read.csv("data/Orders.csv")
dbWriteTable(MySQL, "Orders", Orders, append=T)
Alter Table
The ALTER TABLE statement is used to add, delete, or modify columns
in an existing table. The ALTER TABLE statement is also used to add and
drop various constraints on an existing table.
Add Column
To add a column in a table, use the following syntax:
dbExecute(MySQL, "ALTER TABLE Orders
ADD Email varchar(255)")
Drop Column
To delete a column in a table, use the following syntax (notice that
some database systems don’t allow deleting a column):
dbSendQuery(MySQL, "ALTER TABLE Orders
DROP COLUMN Email")
Modify Column
dbSendQuery(MySQL," ALTER TABLE Orders
MODIFY COLUMN OrderDate date")
Constraints
SQL constraints are used to specify rules for the data in a table.
Constraints are used to limit the type of data that can go into a table.
This ensures the accuracy and reliability of the data in the table. If
there is any violation between the constraint and the data action, the
action is aborted.
Constraints can be column level or table level. Column level
constraints apply to a column, and table level constraints apply to the
whole table. The following constraints are commonly used in SQL:
- NOT NULL: Ensures that a column cannot have a NULL
value
- UNIQUE: Ensures that all values in a column are
different
- PRIMARY KEY: A combination of a NOT NULL and UNIQUE.
Uniquely identifies each row in a table
- FOREIGN KEY: Uniquely identifies a row/record in another
table
- CHECK: Ensures that all values in a column satisfies a
specific condition
- DEFAULT: Sets a default value for a column when no value is
specified
- INDEX: Used to create and retrieve data from the database
very quickly
Not Null
The following SQL ensures that the “ID”, “LastName”, and “FirstName”
columns will NOT accept NULL values when the “Persons_NotNull” table is
created:
dbSendQuery(MySQL,"CREATE TABLE Person_NotNull (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255) NOT NULL,
Age int)")
Unique
The following SQL creates a UNIQUE constraint on the “ID” column when
the “Persons” table is created:
dbSendQuery(MySQL,"CREATE TABLE Persons_Unique (ID int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255) NOT NULL,
Age int)")
To create a UNIQUE constraint on the “ID” column when the table is
already created, use the following SQL:
dbSendQuery(MySQL,"ALTER TABLE Persons_Unique
ADD UNIQUE (ID)")
To define a UNIQUE constraint on multiple columns, use the following
SQL syntax:
dbSendQuery(MySQL,"ALTER TABLE Persons_Unique
ADD CONSTRAINT UNIQUE (ID,LastName)"
To drop a UNIQUE constraint, use the following SQL:
dbSendQuery(MySQL,"ALTER TABLE Persons_Unique
DROP INDEX ID")
Primary Key
The PRIMARY KEY constraint uniquely identifies each record in a
table. Primary keys must contain UNIQUE values, and cannot contain NULL
values. A table can have only ONE primary key; and in the table, this
primary key can consist of single or multiple columns (fields).
dbSendQuery(MySQL,"CREATE TABLE Persons_PK (ID int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int)")
To allow naming of a PRIMARY KEY constraint, and for defining a
PRIMARY KEY constraint on multiple columns, use the following SQL
syntax:
dbSendQuery(MySQL,"CREATE TABLE Persons_PK (ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CONSTRAINT Persons_PK PRIMARY KEY (ID,LastName))")
To create a PRIMARY KEY constraint on the “ID” column when the table
is already created, use the following SQL:
dbSendQuery(MySQL, "ALTER TABLE Persons_PK
ADD PRIMARY KEY (ID)")
Foreign Key
A FOREIGN KEY is a key used to link two tables together. A FOREIGN
KEY is a field (or collection of fields) in one table that refers to the
PRIMARY KEY in another table. The table containing the foreign key is
called the child table, and the table containing the candidate key is
called the referenced or parent table.
Look at the following two tables:
1 |
Xi |
Bakti |
28 |
2 |
Li |
Chong |
23 |
3 |
Gou |
Mei |
20 |
Notice that the “PersonID” column in the “Orders” table points to the
“PersonID” column in the “Persons” table.
- The “PersonID” column in the “Persons” table is the PRIMARY KEY in
the “Persons” table.
- The “PersonID” column in the “Orders” table is a FOREIGN KEY in the
“Orders” table.
- The FOREIGN KEY constraint is used to prevent actions that would
destroy links between tables.
- The FOREIGN KEY constraint also prevents invalid data from being
inserted into the foreign key column, because it has to be one of the
values contained in the table it points to.
To allow naming of a FOREIGN KEY constraint, and for defining a
FOREIGN KEY constraint on multiple columns, use the following SQL
syntax:
dbSendQuery(MySQL,
"CREATE TABLE Orders (OrderID int NOT NULL,
OrderNumber int NOT NULL,
PersonID int,
CONSTRAINT FOREIGN KEY (PersonID))")
To allow naming of a FOREIGN KEY constraint, and for defining a
FOREIGN KEY constraint on multiple columns, use the following SQL
syntax:
dbSendQuery(MySQL,
"CREATE TABLE Orders (
OrderID int NOT NULL,
OrderNumber int NOT NULL,
PersonID int,
PRIMARY KEY (OrderID),
FOREIGN KEY (PersonID) REFERENCES Persons_pk (PersonID))")
To allow naming of a FOREIGN KEY constraint, and for defining a
FOREIGN KEY constraint on multiple columns, use the following SQL
syntax:
dbSendQuery(MySQL,
"ALTER TABLE Orders
ADD CONSTRAINT FK_PersonOrder
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)")
Check
The CHECK constraint is used to limit the value range that can be
placed in a column. If you define a CHECK constraint on a single column
it allows only certain values for this column. If you define a CHECK
constraint on a table it can limit the values in certain columns based
on values in other columns in the row. The following SQL creates a CHECK
constraint on the “Age” column when the “Persons” table is created. The
CHECK constraint ensures that the age of a person must be 18, or
older:
dbSendQuery(MySQL,
"CREATE TABLE Persons (ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CHECK (Age>=18))")
To allow naming of a CHECK constraint, and for defining a CHECK
constraint on multiple columns, use the following SQL syntax:
dbSendQuery(MySQL,
"CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
City varchar(255),
CONSTRAINT CHK_Person CHECK (Age>=18 AND City='Sandnes'))")
To create a CHECK constraint on the “Age” column when the table is
already created, use the following SQL:
dbSendQuery(MySQL, "ALTER TABLE Persons
ADD CHECK (Age>=18)")
To allow naming of a CHECK constraint, and for defining a CHECK
constraint on multiple columns, use the following SQL syntax:
dbSendQuery(MySQL, "ALTER TABLE Persons
ADD CONSTRAINT CHK_PersonAge
CHECK (Age>=18 AND City='Sandnes')")
Default
The DEFAULT constraint is used to provide a default value for a
column. The default value will be added to all new records IF no other
value is specified. The following SQL sets a DEFAULT value for the
“City” column when the “Persons” table is created:
dbSendQuery(MySQL,
"CREATE TABLE Persons_default (ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
City varchar(255) DEFAULT 'Sandnes')")
To create a DEFAULT constraint on the “City” column when the table is
already created, use the following SQL:
dbSendQuery(MySQL,"ALTER TABLE Persons
ALTER City SET DEFAULT 'Sandnes'")
Index
The CREATE INDEX statement is used to create indexes in tables.
Indexes are used to retrieve data from the database more quickly than
otherwise. The users cannot see the indexes, they are just used to speed
up searches/queries. Creates an index on a table. Duplicate values are
allowed:
dbSendQuery(MySQL, "CREATE INDEX idx_pname
ON Persons (LastName, FirstName)")
Note: Updating a table with indexes takes more time than
updating a table without (because the indexes also need an update). So,
only create indexes on columns that will be frequently searched
against.
Auto Increment
Auto-increment allows a unique number to be generated automatically
when a new record is inserted into a table. Often this is the primary
key field that we would like to be created automatically every time a
new record is inserted. The following SQL statement defines the
“Personid” column to be an auto-increment primary key field in the
“Persons” table:
dbSendQuery(MySQL,
"CREATE TABLE Persons_ai (
Personid int NOT NULL AUTO_INCREMENT,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (Personid))")