Email             :
Instagram     : https://www.instagram.com/dsciencelabs
RPubs            : https://rpubs.com/dsciencelabs/
Github           : https://github.com/dsciencelabs/
Telegram       : @dsciencelabs
Department  : Business Statistics
Address         : ARA Center, Matana University Tower
                         Jl. CBD Barat Kav, RT.1, Curug Sangereng, Kelapa Dua, Tangerang, Banten 15810.




1 Introduction

The real power of a relational database lies in its ability to quickly retrieve and analyze your data by running a query. Queries allow you to pull information from one or more tables based on a set of search conditions you define. In this section, you will learn how to create a simple one-table query.

First, we need to connect to our database. Please type the following code in your R console:

# set up the connection and save it into the workspace
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
library(RMySQL)
library(DBI)
bakti <- dbConnect(RMySQL::MySQL(),
                  dbname='factory_db',
                  username='root',
                  password='', 
                  host='localhost',
                  port=3306)
knitr::opts_chunk$set(connection = "bakti") # to set up the connection in your Rmarkdown chunk

2 SELECT

The SQL SELECT statement is used to fetch the data from a database table which returns this data in the form of a result table. These result tables are called result-sets. The basic syntax of the SELECT statement is as follows:

SELECT column1, column2, columnN 
  FROM table_name;

Assume, column1, column2… are the fields of a table whose values you want to fetch. If you want to fetch some of the fields available in the field, then you can use the following syntax.

SELECT CustomerName, Address, City, Country 
  FROM CUSTOMERS;
Displaying records 1 - 10
CustomerName Address City Country
Alfreds Futterkiste Obere Str. 57 Berlin Germany
Ana Trujillo Emparedados y helados Avda. de la Constitucin 2222 Mxico D.F. Mexico
Antonio Moreno Taquera Mataderos 2312 Mxico D.F. Mexico
Around the Horn 120 Hanover Sq. London UK
Berglunds snabbkp Berguvsvgen 8 Lule Sweden
Blauer See Delikatessen Forsterstr. 57 Mannheim Germany
Blondel pre et fils 24, place Klber Strasbourg France
Blido Comidas preparadas C/ Araquil, 67 Madrid Spain
Bon app’ 12, rue des Bouchers Marseille France
Bottom-Dollar Marketse 23 Tsawassen Blvd. Tsawassen Canada

If you want to fetch all the fields of the CUSTOMERS table, then you should use the following query.

SELECT *
  FROM CUSTOMERS;
Displaying records 1 - 10
CustomerID CustomerName ContactName Address City PostalCode Country
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constitucin 2222 Mxico D.F. 5021 Mexico
3 Antonio Moreno Taquera Antonio Moreno Mataderos 2312 Mxico D.F. 5023 Mexico
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
5 Berglunds snabbkp Christina Berglund Berguvsvgen 8 Lule S-958 22 Sweden
6 Blauer See Delikatessen Hanna Moos Forsterstr. 57 Mannheim 68306 Germany
7 Blondel pre et fils Frdrique Citeaux 24, place Klber Strasbourg 67000 France
8 Blido Comidas preparadas Martn Sommer C/ Araquil, 67 Madrid 28023 Spain
9 Bon app’ Laurence Lebihans 12, rue des Bouchers Marseille 13008 France
10 Bottom-Dollar Marketse Elizabeth Lincoln 23 Tsawassen Blvd. Tsawassen T2F 8M4 Canada

3 DISTINCT

The SQL DISTINCT keyword is used in conjunction with the SELECT statement to eliminate all the duplicate records and fetching only unique records. There may be a situation when you have multiple duplicate records in a table. While fetching such records, it makes more sense to fetch only those unique records instead of fetching duplicate records.

The basic syntax of DISTINCT keyword to eliminate the duplicate records is as follows:

SELECT DISTINCT column_name
  FROM table_name

Now, let us use the DISTINCT keyword with the above SELECT query and then see the result.

SELECT DISTINCT Country   
  FROM customers;
Displaying records 1 - 10
Country
Germany
Mexico
UK
Sweden
France
Spain
Canada
Argentina
Switzerland
Brazil

4 WHERE

The WHERE clause is used to filter records. The WHERE clause is used to extract only those records that fulfill a specified condition.

SELECT column1, column2, ...
  FROM table_name
    WHERE [condition];

Note: The WHERE clause is not only used in SELECT statement, it is also used in UPDATE, DELETE statement, etc.!

The following SQL statement selects all the customers from the country “Mexico”, in the “Customers” table:

SELECT * 
  FROM Customers
    WHERE Country='Mexico';
5 records
CustomerID CustomerName ContactName Address City PostalCode Country
2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constitucin 2222 Mxico D.F. 5021 Mexico
3 Antonio Moreno Taquera Antonio Moreno Mataderos 2312 Mxico D.F. 5023 Mexico
13 Centro comercial Moctezuma Francisco Chang Sierras de Granada 9993 Mxico D.F. 5022 Mexico
58 Pericles Comidas clsicas Guillermo Fernndez Calle Dr. Jorge Cash 321 Mxico D.F. 5033 Mexico
80 Tortuga Restaurante Miguel Angel Paolino Avda. Azteca 123 Mxico D.F. 5033 Mexico

The following operators can be used in the WHERE clause, please try it by your self!

Operator Description
= Equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
<> Not equal. Note: In some versions of SQL this operator may be written as !=
IS NULL or IS NOT NUL A field with a NULL value is a field with no value.
BETWEEN Between a certain range
LIKE Search for a pattern
IN To specify multiple possible values for a column

5 BETWEEN

The BETWEEN operator selects values within a given range. The values can be numbers, text, or dates. The BETWEEN operator is inclusive: begin and end values are included.

SELECT column_name(s)
  FROM table_name
    WHERE column_name 
      BETWEEN value1 AND value2;

The following SQL statement selects all products with a price BETWEEN 10 and 20:

SELECT * 
  FROM Products
    WHERE Price 
      BETWEEN 10 AND 20;
Displaying records 1 - 10
ProductID ProductName SupplierID CategoryID Unit Price
1 Chais 1 1 10 boxes x 20 bags 18.00
2 Chang 1 1 24 - 12 oz bottles 19.00
3 Aniseed Syrup 1 2 12 - 550 ml bottles 10.00
15 Genen Shouyu 6 2 24 - 250 ml bottles 15.50
16 Pavlova 7 3 32 - 500 g boxes 17.45
21 Sir Rodney’s Scones 8 3 24 pkgs. x 4 pieces 10.00
25 NuNuCa Nu-Nougat-Creme 11 3 20 - 450 g glasses 14.00
31 Gorgonzola Telino 14 4 12 - 100 g pkgs 12.50
34 Sasquatch Ale 16 1 24 - 12 oz bottles 14.00
35 Steeleye Stout 16 1 24 - 12 oz bottles 18.00

The following SQL statement selects all orders with an OrderDate BETWEEN ‘01-July-1996’ and ‘31-July-1996’:

SELECT * 
  FROM Orders
    WHERE OrderDate 
      BETWEEN '1996-07-01' AND '1996-07-31';
Displaying records 1 - 10
OrderID CustomerID EmployeeID OrderDate ShipperID
10248 90 5 1996-07-04 3
10249 81 6 1996-07-05 1
10250 34 4 1996-07-08 2
10251 84 3 1996-07-08 1
10252 76 4 1996-07-09 2
10253 34 3 1996-07-10 2
10254 14 5 1996-07-11 2
10255 68 9 1996-07-12 3
10256 88 3 1996-07-15 2
10257 35 4 1996-07-16 3

6 IN

The IN operator allows you to specify multiple values in a WHERE clause. The IN operator is a shorthand for multiple OR conditions.

SELECT column_name(s)
  FROM table_name
    WHERE column_name 
      IN (SELECT STATEMENT);

The following SQL statement selects all customers that are located in “Germany”, “France” or “UK”:

SELECT * 
  FROM Customers
    WHERE Country 
      IN ('Germany', 'France', 'UK');
Displaying records 1 - 10
CustomerID CustomerName ContactName Address City PostalCode Country
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
6 Blauer See Delikatessen Hanna Moos Forsterstr. 57 Mannheim 68306 Germany
7 Blondel pre et fils Frdrique Citeaux 24, place Klber Strasbourg 67000 France
9 Bon app’ Laurence Lebihans 12, rue des Bouchers Marseille 13008 France
11 B’s Beverages Victoria Ashworth Fauntleroy Circus London EC2 5NT UK
16 Consolidated Holdings Elizabeth Brown Berkeley Gardens 12 Brewery London WX1 6LT UK
17 Drachenblut Delikatessend Sven Ottlieb Walserweg 21 Aachen 52066 Germany
18 Du monde entier Janine Labrune 67, rue des Cinquante Otages Nantes 44000 France
19 Eastern Connection Ann Devon 35 King George London WX3 6FW UK

The following SQL statement selects all customers that are from the same countries as the suppliers:

SELECT * 
  FROM Customers
    WHERE Country 
      IN (SELECT Country FROM Suppliers);
Displaying records 1 - 10
CustomerID CustomerName ContactName Address City PostalCode Country
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
5 Berglunds snabbkp Christina Berglund Berguvsvgen 8 Lule S-958 22 Sweden
6 Blauer See Delikatessen Hanna Moos Forsterstr. 57 Mannheim 68306 Germany
7 Blondel pre et fils Frdrique Citeaux 24, place Klber Strasbourg 67000 France
8 Blido Comidas preparadas Martn Sommer C/ Araquil, 67 Madrid 28023 Spain
9 Bon app’ Laurence Lebihans 12, rue des Bouchers Marseille 13008 France
10 Bottom-Dollar Marketse Elizabeth Lincoln 23 Tsawassen Blvd. Tsawassen T2F 8M4 Canada
11 B’s Beverages Victoria Ashworth Fauntleroy Circus London EC2 5NT UK
15 Comrcio Mineiro Pedro Afonso Av. dos Lusadas, 23 So Paulo 05432-043 Brazil

7 LIKE

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column. There are two wildcards often used in conjunction with the LIKE operator:

SELECT column1, column2, ...
  FROM table_name
    WHERE columnN LIKE pattern;
  • % : The percent sign represents zero, one, or multiple characters
  • _ : The underscore represents a single character

The following SQL statement selects all customers with a CustomerName starting with “a”:

SELECT * 
  FROM Customers
    WHERE CustomerName 
      LIKE 'a%';
4 records
CustomerID CustomerName ContactName Address City PostalCode Country
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constitucin 2222 Mxico D.F. 5021 Mexico
3 Antonio Moreno Taquera Antonio Moreno Mataderos 2312 Mxico D.F. 5023 Mexico
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK

Here are some examples showing different LIKE operators with ‘%’ and ’_’ wildcards:

LIKE Operator Description
WHERE CustomerName LIKE ‘a%’ Finds any values that start with “a”
WHERE CustomerName LIKE ‘%a’ Finds any values that end with “a”
WHERE CustomerName LIKE ‘%or%’ Finds any values that have “or” in any position
WHERE CustomerName LIKE ’_r%’ Finds any values that have “r” in the second position
WHERE CustomerName LIKE ‘a_%’ Finds any values that start with “a” and are at least 2 characters in length
WHERE CustomerName LIKE ’a__%’ Finds any values that start with “a” and are at least 3 characters in length
WHERE ContactName LIKE ‘a%o’

8 AND, OR and NOT

The WHERE clause can be combined with AND, OR, and NOT operators. The AND and OR operators are used to filter records based on more than one condition:

  • The AND operator displays a record if all the conditions separated by AND are TRUE.
  • The OR operator displays a record if any of the conditions separated by OR is TRUE.
  • The NOT operator displays a record if the condition(s) is NOT TRUE.
SELECT column1, column2, ...
  FROM table_name
    WHERE condition1 AND condition2 OR condition3 NOT condition4;

The following SQL statement selects all fields from “Customers” where country is “Germany” AND city must be “Berlin” OR “München” (use parenthesis to form complex expressions):

SELECT * FROM Customers
WHERE Country='Germany' AND (City='Berlin' OR City='München');
1 records
CustomerID CustomerName ContactName Address City PostalCode Country
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

Let see one more example, the following SQL statement selects all fields from “Customers” where country is NOT “Germany” and NOT “USA”:

SELECT * FROM Customers
WHERE Country='Germany' AND (City='Berlin' OR City='München');
1 records
CustomerID CustomerName ContactName Address City PostalCode Country
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

9 ORDER BY

The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or more columns. Some databases sort the query results in an ascending order by default. The basic syntax of the ORDER BY clause is as follows:

SELECT column-list 
  FROM table_name 
    [WHERE condition] 
      [ORDER BY column1, column2, .. columnN] [ASC | DESC];
  • By default ORDER BY sorts the data in ascending order.
  • We can use the keyword DESC to sort the data in descending order and the keyword ASC to sort in ascending order.

You can use more than one column in the ORDER BY clause. Make sure whatever column you are using to sort that column should be in the column-list.The following code block has an example, which would sort the result in an ascending order by the City and the Country:

SELECT * FROM Customers
WHERE Country='Germany' AND (City='Berlin' OR City='München')
  ORDER BY Country, City;
1 records
CustomerID CustomerName ContactName Address City PostalCode Country
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

10 LIMIT

If there are a large number of tuples satisfying the query conditions, it might be resourceful to view only a handful of them at a time.

  • The LIMIT clause is used to set an upper limit on the number of tuples returned by SQL.
  • It is important to note that this clause is not supported by all SQL versions.
  • The LIMIT clause can also be specified using the SQL 2008 OFFSET/FETCH FIRST clauses.
  • The limit/offset expressions must be a non-negative integer.
SELECT column-list 
  FROM table_name 
    [WHERE condition] 
      [ORDER BY column1, column2, .. columnN] [ASC | DESC]
        LIMIT rows_to_skip, next_rows_to_skip;
          

The following illustrates the LIMIT clauses to collect TOP 3 rows:

SELECT * FROM Customers
WHERE Country='Germany' AND (City='Berlin' OR City='München')
  ORDER BY Country, City
      LIMIT 3;
1 records
CustomerID CustomerName ContactName Address City PostalCode Country
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany

Next, the following illustrates the LIMIT clauses to collect TOP 5 rows after TOP 3 rows:

SELECT CustomerName, Address, City, Country 
  FROM customers
    ORDER BY City, Country DESC
      LIMIT 3, 5;
5 records
CustomerName Address City Country
Old World Delicatessen 2743 Bering St. Anchorage USA
Galera del gastrnomo Rambla de Catalua, 23 Barcelona Spain
LILA-Supermercado Carrera 52 con Ave. Bolvar #65-98 Llano Largo Barquisimeto Venezuela
Magazzini Alimentari Riuniti Via Ludovico il Moro 22 Bergamo Italy
Alfreds Futterkiste Obere Str. 57 Berlin Germany

11 MIN and MAX

The MIN() function returns the smallest value of the selected column. The MAX() function returns the largest value of the selected column.

SELECT MIN/MAX(column_name)
  FROM table_name
    WHERE condition;

The following SQL statement finds the price of the cheapest product:

SELECT MIN(Price) AS SmallestPrice
  FROM Products;
1 records
SmallestPrice
2.5

The following SQL statement finds the price of the most expensive product:

SELECT MAX(Price) AS LargestPrice
  FROM Products;
1 records
LargestPrice
263.5

12 COUNT, SUM, and AVG

The COUNT() function returns the number of rows that matches a specified criterion. The AVG() function returns the average value of a numeric column. The SUM() function returns the total sum of a numeric column.

SELECT COUNT/SUM/AVG(column_name)
FROM table_name
WHERE condition;

The following SQL statement finds the average price of all products:

SELECT AVG(Price)
  FROM Products;
1 records
AVG(Price)
28.86636

Note: Please try other functions, to get more convenient with SQL!

13 HAVING

The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.

SELECT column_name(s)
  FROM table_name
    WHERE condition
      GROUP BY column_name(s)
        HAVING condition
          ORDER BY column_name(s);

The following SQL statement lists the number of customers in each country, sorted high to low (Only include countries with more than 5 customers):

SELECT COUNT(CustomerID), Country
  FROM Customers
    GROUP BY Country
      HAVING COUNT(CustomerID) > 5
        ORDER BY COUNT(CustomerID) DESC;
5 records
COUNT(CustomerID) Country
13 USA
11 Germany
11 France
9 Brazil
7 UK

14 CASE

The CASE statement goes through conditions and returns a value when the first condition is met (like an IF-THEN-ELSE statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause.

If there is no ELSE part and no conditions are true, it returns NULL.

CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    WHEN conditionN THEN resultN
    ELSE result
END;

The following SQL goes through conditions and returns a value when the first condition is met:

SELECT OrderID, Quantity,
CASE
    WHEN Quantity > 30 THEN 'The quantity is greater than 30'
    WHEN Quantity = 30 THEN 'The quantity is 30'
    ELSE 'The quantity is under 30'
END AS QuantityText
FROM OrderDetails;
Displaying records 1 - 10
OrderID Quantity QuantityText
10248 12 The quantity is under 30
10248 10 The quantity is under 30
10248 5 The quantity is under 30
10249 9 The quantity is under 30
10249 40 The quantity is greater than 30
10250 10 The quantity is under 30
10250 35 The quantity is greater than 30
10250 15 The quantity is under 30
10251 6 The quantity is under 30
10251 15 The quantity is under 30

The following SQL will order the customers by City. However, if City is NULL, then order by Country:

SELECT CustomerName, City, Country
FROM Customers
ORDER BY
(CASE
    WHEN City IS NULL THEN Country
    ELSE City
END);
Displaying records 1 - 10
CustomerName City Country
Vaffeljernet rhus Denmark
Drachenblut Delikatessend Aachen Germany
Rattlesnake Canyon Grocery Albuquerque USA
Old World Delicatessen Anchorage USA
Galera del gastrnomo Barcelona Spain
LILA-Supermercado Barquisimeto Venezuela
Magazzini Alimentari Riuniti Bergamo Italy
Alfreds Futterkiste Berlin Germany
Chop-suey Chinese Bern Switzerland
Save-a-lot Markets Boise USA

15 Exercise

  • Select Some attributes of suppliers in alphabetical order!
  • Some attributes of suppliers in reverse alphabetical order!
  • Some attributes of suppliers ordered by country, then by city!
  • All atributes of suppliers and reverse alphabetical ordered by country, then by city!
  • All orders, sorted by total amount, the largest first!
  • Get all but the 10 most expensive products sorted by price!
  • Get the 10th to 15th most expensive products sorted by price!
  • List all supplier countries in alphabetical order!
  • Find the cheapest product and Expensive Orders!
  • Find the number of Supplier USA!
  • Compute the total Quantity of orderitem!
  • Compute the average UnitPrice of all product!
  • Get all information about customer named Thomas Hardy!
  • List all customers from Spain or France!
  • List all customers that are not from the USA!
  • List all orders that not between $50 and $15000!
  • List all products between $10 and $20
  • List all products not between $10 and $100 sorted by price!
  • Get the list of orders and amount sold between 1996 Jan 01 and 1996 Des 31!
  • List all suppliers from the USA, UK, OR Japan!
  • List all products that are not exactly $10, $20, $30, $40, or $50!
  • List all customers that are from the same countries as the suppliers!
  • List all products that start with ‘Cha’ or ‘Chan’ and have one more character!
  • List all suppliers that do have a fax number!
  • List all customer with average orders between $1000 and $1200 !
  • List total customers in each country.
  • Display results with easy to understand column headers.
  • Measure the average order of product names from each country and order it from max to min.
  • Compare the average order of product names from each country in the year 1996 vs 1997 order it from max to min.