Introduction

This is by no means a comprhensive guide to learning SQL. Rather, it gets straight to the point of learning the syntax (i.e. how to retrieve data, sorting data, concepts, etc.). This is a great resource for those who want to learn SQL relatively quickly. If you wish to truly master SQL, then I recommend purchasing more cmomprehnsive books. In this tutorial, you will learn

1. Different SQL syntaxes.
2. Brief overview of concepts.
3. Best practices.

I have read quite a few SQL books, and they can be daunting to those who have never worked with SQL before. This is a perfect guidie, in my opinion, to get straight to the point, and I firmly believe that the concepts covered in this tutorial are sufficient to understanding the basics of SQL and even enough hopefully land you a job that requires the skill.

NOTE: The syntax used in the examples correspond to Oracle MySQL.The syntax may differ depending on which SQL vendor you use.

1.Overview

What is SQL? SQL stans for structured query language, and it is the language that is commonly used in relatinal database management systems (RDBMS). As the name implies, SQL is a language, and it is by far the most easiest to understand programming language, as it is very intuitive.

There are a few concepts one must know about SQL and relational databases. I won’t go into too much detail on relational databases, as it is not within the scope of this tutorial. Essesntially, a relational database is a databae management system that connects logically connects tables. To get a basic understanding of SQL, it is imparative that we go over a few concepts in order to better understand the language.

1.1 The Relational Database

In its most simplest terms, a relational database is a database in which tables are linked together using some form of logic. A typical relational database mnight look something like

The RDBMS Design

The RDBMS Design

where each square represents a table and the arrows that point to different squares indicates a relationship between one or more tables.

1.2 Relational Database Concepts

As previously mentioned, SQL is the language used for relational databases. A database is essentially a container of file(s) used to store organized data.

A relational database management system (RDBMS) consists of a few concepts. An RDMS typically consists of

A table is a strucutred list of data or specific data type. Tables have properties that describe how data is stored in them. These range from informatin about what data may be stored, how it is broken up, how individiual peices of information are named, and many more.

A schema is the set of ifnormation that describes a table and they are usued to describe the specifci tables within a database, as well as entire databases. In short, a schema is information about database and table layout and properties.

A column or (field) is a single field in a table. Every table is made up of one ore more columns. One thing to note about a column is that it must be of the same data type. That is, if a column is of a numeric type (e.g. int), then SQL will not allow a string type for that column.

A row is a record in a table. A row, or record, is where data is stored.

A prmary key is a column, or set of columns, whose values uniquely identifies every row in a table. As a general rule and best practice, one shoudl aways define primary keys. Some notes about primary keys

  1. The primary key is used to refer to a specific row/record. 
  2. No two rows can have the same primary key value. That is, each primary key for a row must be unique. 
  3. Values in primary key columns should never be modified or updated.
  4. Primary key values should never be reused.
  

A clause usually consists of a keyword and supplied data. An example of a clause is the FROM clause, which, as we we will see, comes after the SELECT statement. Note that some clauses are optional while otheres are required. The following is an exmaple of a FROM clause

  SELECT prod_id
  FROM Products;    -- "FROM" is a SQL clause that is required. Without it, SQL does not know where to retrieve the data.
  

2. Retrieving Data

Retrieving data is the core of any SQL job. To retrieve data, one uses the SELECT statement. Theh SELECT statement, at the very minimum, specifies two pieces of information – what you want to select (i.e. the columns or fields) and where you want to select it from (e.g. a table or view).

2.1 Retrieving Individual Columns

The most simple SELECT statement retrieves individual columns. The syntaxd for doing so is as follows

  SELECT column_name 
  FROM table_name;

A more concrete example might be to select the product names from the Products table, in which case the syntax is

  SELECT prod_name
  FROM Products;
  

2.2 Retrieving Multiple Columns

Retrieving multiple columns is very similar to retrieving a single column. Rather than specifying only one column, all you need to do is specify the columns you want to retrieve. For example, the following command retrieves multiple columns from a single table (theh Products table).

  SELECT prod_id, prod_name, prod_price
  FROM Products;
  

2.3 Retrieving All Columns From a Single Table

Retrieving all columns from a table requires the use of the “*" wildcard. The “*" wildcard tells SQL to retrieve all columns from a single table, as shown below

  SELECT * 
  FROM Products;
  

2.4 Retrieviing Distinct Rows

When using the SELECT statement, all records are retrieved. However, if you want only want unique records, you can use the DISTINCT keyword to specify that only distinct values should be returned (NOTE: the syntax might differ depending on the SQL vendor you’re using).

  SELECT prod_id
  FROM Products;
  

The above statement, as we have seen before, will return all product IDs. For example, there could be two product IDs with a product ID of BRS01. If, however, you wanted to retrieve only distinct product IDs, the syntax would be

  SELECT DISTINCT prod_id
  FROM Products;
  

The above command retrieves only distinct records. Thus, only one product ID of BRS01 will be returned. There is one quick note about retrieving distinct records. Using the DISTINCT statement can’t be used partially. That is, if more than one column is retrieved with the DISTINCT statement, say prod_id and prod_price, both distinct records of prod_id and prod_price will be retrieved; not just distinct records of prod_id.

2.5 Limiting Results

Sometimes you may not want SQL to retrieve all records. To do so, you can use the LIMIT statement to limit the number of results to a specified number (note: the syntax for limiting results may differ depending on the SQL vendor you’re using). Generally speaking, in MySQL, to limit the number of results returned, you can use

  SELECT prod_name
  FROM Products
  LIMIT 5;
  

This limits the results to five records. Note that the LIMIT statement usually comes at the end of a query.

# Wrong. Will result in error.
SELECT prod_price
FROM Products
LIMIT 10
ORDER BY prod_price;

# Correct
SELECT prod_price
FROM Products
ORDER BY prod_price
LIMIT 10;

2.6 Using Comments

Comments can be extremely useful especially if you’re working on a team or with more than one person. It reduces the ambiguity of your code. There are multiple ways to make comments in SQL (MySQL in particular)

  SELECT prod_name    -- this is a comment
  FROM Products;
  
  # This is a comment
  SELECT prod_name
  FROM Products;
  
  /* This is also a comment */
  SELECT prod_name
  FROM Products;
  
  

3. Sorting Results

When using the SELECT statement, SQL retrieves data in no particular order. To sort results, one must use the ORDER BY statement to explicitly tell SQL how you want the results to be sorted. For example, if you want to retrieve data and sort them in alphabetical order, the syntax would be as follows

  SELECT prod_name
  FROM Products
  ORDER BY prod_name;
  

You can also sort results in descending order using the DESC statement after the ORDER BY statement, as in the following case

  SELECT prod_name
  FROM Products
  ORDER BY prod_name DESC;
  

The abaove command differs from the previous one in that it sorts the results from Z to A, as opposed to A to Z. In dealing with numerics, using the DESC statement will sort by largest to smallest, and omitting the DESC statement will sort the results from smallest to largest.

  # Sorts by price from smallest to largest
  SELECT prod_price
  FROM Products
  ORDER BY prod_price
  
  # Sorts by price from largest to smallest
  SELECT prod_price
  FROM Products
  ORDER BY prod_price DESC;
  

3.1 Sorting by Multiple Columns

In the same way you can order results using multiple columns, you can also do the same when sorting results. The syntax is as follows

  SELECT prod_id, prod_price, prod_name
  FROM Products
  ORDER BY prod_price, prod_name;
  

3.2 Sorting Columns by Column Position

Another way of sorting columns is by sorting them by position. To understand how this works, let’s first take a look at the syntax

  SELECT prod_id, prod_price, prod_name
  FROM Products
  ORDER BY 2, 3;
  

The above command is the exact same command as explicitly stating which columns are to be sorted. However, isntead of explicitly stating which columns to sort by, we specify the relative position of each column. Thus, ORDER BY 2, 3 means that you want to order the results by prod_price and prod_name, since prod_price is in position 2 and prod_name is in position 3. The primary advantage of using this technique is that it saves retyping the column names. However, there are also some disadvantages in using this technique. For one, not explicitly listingn the column names increases the probability of mistakenly specifying the wrong column. Second, it is all too easy to mistakenly reorder data when making changes to the SELECT list. Finally, you cannot obviousuly use this technique when sorting by columns that are not in the SELECT list (i.e. you wouldn’t be able to do ORDER BY 4, 5, since there are only three columns in the SELECT list). Generally speaking, try to avoid sorting columns by column position (although this is just a matter of preference).

4. Filtering Data

Database tables typically contain large amounts of data, and seldom does one need to retrieve all the rows within a table. Retrieving just the data you want involves specifying search criteria, or what is also known as a filter condition.

4.1 Using the WHERE clause

You can filter data using the WHERE clause after the SELECT statement as shown below

  SELECT prod_name, prod_price
  FROM Products
  WHERE prod_price = 3.49;
  

In the above command, the results that are retrieved are specified by the WHERE clause, which states that only product prices equal to 3.49 be retrieved.

4.2 Checking Against a Single Value

You can also check against a signle value using mathematical operators.

  SELECT prod_name, prod_price
  FROM Products
  WHERE prod_price < 10;
  

The above command retrieves any record where the product price is less than ten.

4.3 Checking for a Range of Values

In addition to checking against a single value, you can also check for a range of values using the BETWEEN operator.

  SELECT prod_name, prod_price
  FROM Products
  WHERE prod_price BETWEEN 5 AND 10;
  

From the syntax above, the resulting query set will include all product prices between 5 and 10.

4.4 Checking for NULL values

A NULL value is defined as having no value. It is not to be confused with a field containing 0 or an empty string, or just spaces. To check for NULL values,

  SELECT prod_name
  FROM Products
  WHERE prod_price IS NULL;
  

The above command selects product names where the corresponding product price is NULL (i.e. contains no value).

5. Advanced Data Filtering

5.1 Combining WHERE clauses

SQL allows you to specify more than just one WHERE clause. These clauses are typically used in two ways: using the AND operator or the OR operator.

5.2 The AND operator

To filter by more than one column, you use the AND operator to append conditions to your WHERE clause, such as the following

  SELECT prod_id, prod_price, prod_name
  FROM Products
  WHERE vend_id = 'DLL01' AND prod_price <= 4;
  

The above syntax tells SQL to retrieve results where the vendor ID is DLL01 and the product price is less than or equal to 4.

5.3 The OR operator

The OR operator is the opposite of the AND operator. Whereas the AND operator will only retrieve results if both conditions are met, the OR operator will retrieve results if only one of the conditions are met. For example

  SELECT prod_name, prod_price
  FROM Products
  WHERE vend_id = 'DLL01' OR vend_id = 'BRS01';
  

Here, if SQL finds either vendor ID DLL01 or BRS01, it will retrieve the results. If one were to usue the AND operator, no results might be retrieved if neither DLL01 or BRS01 vendor ID exists. In other words, when using the OR operator, SQL only needs to satisfy one part of the OR condition in order to retrieve the results.

5.4 The Order of Evaluation

While it is perfectly legal to combine both the AND operator and OR operator in your query, one must be careful in writing the query. That is, order matters when combining the two operators. When using both operators, a condition surrounded by parantheses has a higher order of evaluation than either AND or OR operators. The database management system filters the OR condition within the parantheses first. When using both operators, it is best practice to always use parantheses in your WHERE clauses so as to avoid mistakes. For example

  # Using the AND and OR operator without parantheses
  SELECT prod_name, prod_price
  FROM Products
  WHERE 
      vend_id = 'DLL01' OR vend_id = 'BRS01' AND
      prod_price >= 10;
      
  # Using parantheses
  SELECT prod_name, prod_price
  FROM Products
  WHERE
      (vend_id = 'DLL01' OR vend_id = 'BRS01') AND
      prod_price >= 10;
      

The two syntaxes will retrieve different results. Note that the columns and tables we have been using so far are actual tables and columns. Thus, in the first syntax, the result query will be

prod_name prod_price
Fish bean bag toy 3.49
Bird bean bag toy 3.49
Rabbit bean bag toy 3.49
18 inch teddy bear 11.99
Raggedy Ann 4.99

In the second syntax however, the resulting query set will look something like the following

prod_name prod_price
18 inch teddy bear 11.99

Depending on what you are wanting to query, the first result maybe the result you were looking for. However, as best practice, use parantheses whenever you use the AND and OR operators simultaneously, especially if the query tends to get trickier and longer. This can save you a lot of time and headaches.

5.5 The IN operator

The IN operator is used to specify a range of conditions. Any of these conditions can be matched. The IN operator takes a commma-delimited list of valid values, all enclosed within parantheses like so

  SELECT prod_name, prod_price
  FROM Products
  WHERE
      vend_id IN ('DLL01', 'BRS01')
  ORDER BY prod_name;  -- The ORDER statement isn't necessary. 
  

For the astute, or experienced SQL programmer, you’ll notice that the above syntax is not only similar to using the OR operator, but it can also be re-written as such.

  SELECT prod_name, prod_price
  FROM Products
      WHERE vend_id = 'DLL01' OR vend_id = 'BRS01'
  ORDER BY prod_name;
  

The above syntax accomplishes the same thing as the IN operator.

5.6 Wildcard Filtering

Suppose you wanted to find any song with the word “baby” in it. Based on what we’ve covered so far, you could technically create a query such as

  SELECT song
  FROM Songs
  WHERE
      song = 'baby';
      

Depending on the schema, this approach isn’t necessarily wrong. However, the limitation to using this kind of search is that it searches for songs where the is exactly titled “baby.” What if there were multiple songs with the word “baby” in them, and you wanted to retrieve all such songs? Obviously the above syntax would not work. This is where wildcard filtering can fix the issue. A wildcard filter, unlike a regular filter, can search for any word that is partial or not partial. A wildcard uses special characters to match parts of a value. That is, if there is a song called, “Damn, you a fine ass baby,” you can use wildcard filtering to include that song in your results. When using wildcard filtering, the LIKE operator is often used. The LIKE operator is equivalent to partial string matching. Thus, using the wildcard filtering option, you could technically find all songs that contain the word “baby” by using the following syntax

  SELECT song
  FROM Songs
  WHERE
      song LIKE '%baby%';
      

In the above example, any song containing the word “baby” would be returned. Notice that a % is typically used during wildcard filtering. If one % is only used, and it appears after the word to be searched, then the wildcard searches any song that contains the letters preceding the % symbol. Likewise, any letter after the wildcard symbol searches for any song before the wildcard symbol. For example, using the same example anove, we could also write our query as

THE above query searches for any song containing the word “baby.” Because the letters b,a,b, and y appear before the wildcard, SQL will retrieve any record that begins with those letters. For example,

  SELECT song
  FROM songs
  WHERE
      song LIKE 'baby%';
      

This approach may be useful if you’re unsure of how to correctly spell a particular item. One thing to note in using the above method is that using the LIKE operator only searches for words that start with the expression you give it. Thus, if “Kanye” does not appear first, the query will not return any results. Using two %’s WILL search for any record in which the word you’re looking for appears. As an example, let’s look for a product name called “Raggedy Ann” using the LIKE operator:

# Does not return any ressults
SELECT prod_name
FROM Products
WHERE
    prod_name LIKE 'aggedy%';
    
# Will return results
SELECT prod_name
FROM Products
WHERE
    prod_name LIKE '%aggedy%';
    

In the first example, the query will not return any results for products named Raggedy Ann because SQL searches for product names that begin with “aggedy.” However, in the second example, SQL will search for any record that contains the word “aggedy.”

In the same way you can search for records that begin with a particular set of letters, you can also search for records that end with a particular set of letters. You can do achieve this by putting the % in front of the expression, as shown below

  SELECT prod_name
  FROM Products
  WHERE
      prod_name LIKE '%ann';
      

Using our “Raggedy Ann” example, SQL will return a result because “ann” appears at the end of the product name. And much like placing the % after the expression, placing it before will only search for records that end in that particular expression. Thus, if the product you were searching for was “Raggedy Ann Doll,” SQL would not return any results, since the product name ends in the word “Doll.”

Another use of the % operator is to search for a record that begins and ends with a particular letter. For example, suppose you wanted to search for the product “Fish bean bag toy.” The product name begins with “F” and ends in “y.” You already know that a % at the end of an expression searches for any record that matches the letters in front of the operator, and conversely, anything after a % searches for all the letters that match after the operator. Using our “fish bean bag toy” example, let’s see how we can use the % operator to search for this product name.

  SELECT prod_name
  FROM Products
  WHERE
      prod_name LIKE 'F%y';
      

The result set will consist of any product that starts with an F and ends in a y.

Lastly, there are two important considerations regarding the use of %. For one, the % can also match zero characters. This is because it represents zero, one, or more characters at the specified location in your search pattern. Secondly, You cannot use % to search for NULL values. For example, take a look at the following code

  # The wrong way to retrieve NULL values.
  SELECT cust_name, cust_email
  FROM Customers
  WHERE 
      cust_email LIKE '%';
      
  # The correct way to retrieve NULL values
  SELECT cust_name, cust_email
  FROM Customers
  WHERE
      cust_email IS NULL;
      

Each syntax will retrieve different results. For the first query, the result set is

cust_name cust_email
Village Toys sales@villagetoys.com
Fun4All jjones@fun4all.com
Fun4All dstephens@fun4all.com

Notice that the query result did not return emails in which the values were NULL. However, if the second syntax produces different results.

cust_name cust_email
Kids Place NULL
The Toy Store NULL

This is obviously the correct syntax if your intention is to retrieve customers who have a NULL value in the email field.

5.7 The Underscore (_) Wildcard

The underscore wildicard is another useful wildcard that can be used to retrieve specific records. It operates very similar to the % wildcard, except that instead of matching multiple characters, the underscore wildcard only matches a single character. Let us examine how this works.

  SELECT prod_id, prod_name
  FROM Products
  WHERE 
    prod_name LIKE '_ inch teddy bear';
  

The query result will return the product titled “8 inch teddy bear.” Why? Because only one underscore was used in the condition. What if the teddy bear you were looking for was ten inches or greater? Well, all you need to do is add an extra underscore, like so

  SELECT prod_id, prod_name
  FROM Products
  WHERE
    prod_name LIKE '__ inch teddy bear';
    

Because you are now using two underscores versus one, SQL will search for any character(s) that are of length two. Let’s compare the resulting outputs. For the first query which uses only one underscore, the result is

prod_id prod_name
BR01 8 inch teddy bear

For the second query, the resulting output is

prod_id prod_name
BR02 12 inch teddy bear
BR03 18 inch teddy bear

The reason that the 8 inch teddy bear did not appear in the second ressults is because we explicitly used two underscores for our search pattern. Because 8 contains only one digit, it is omitted from the results. Similarly, anything greater than two digits will also be omitted.

Tips on Using Wildcards

Hopefully you’ve noticed just how powerful and useful wildcards can be. However, there is no such thing as a free lunch. That is, one mustn’t be too liberal in using wildcards because they typically take a longer time to process than the other search types that we’ve discussed so far. Here are a few tips on how to efficiently use wildcards.

  1. Do not overuse them. If there is another search operator that can accomplish the same thing, use that one instead.
  2. If and when you do use wildcards, be careful to not use them at the beginning of the search pattern unless it is absolutely necessary. The reason being is that search patterns that begin with wildcards are the slowest to process.
  3. Pay close attention to the placement of your wildcard symbols. If they are misplaced, you might not get the results you intended to retrieve.

7. Creating Calculated Fields

One of the nice things about SQL is the ability to calculate fields. Say you had an employee table with a column for each employee’s first name and last name. Your boss wants a report where the full name is included. You can do this by concatenating the first and last name column. Other common usues for calculating fields are taking the sums, averages, or other mathematical calculations. It is also important to note that calculated fields don’t actually exist in database tables. Rather, they are created on the fly within a SQL select statement, like the following

  SELECT cust_id, cust_name CONCAT(cust_city, " ", ",", cust_state) AS city_state
  FRROM Customers;
  

Here the calculated field is the city_state field. We used the CONCAT() function to concatenate the city and the state of the customer.

7.1 Mathematical Operations

Calculating mathematicaloperations is very common in SQL.As previously mentioned, you can take the sum, average, median, and a bunch of mathematical operations. Let’s take a look at a calculated field that takes advantage of a very simple mathematical operation of multiplying two columns.

  SELECT
      prod_id,
      quantity,
      item_price,
      (quntity * item_price) AS expanded_price
  FROM OrderItems
  WHERE
      order_num = 20008;
      

Here, the calculated field is expanded_price, which is calculated by multiplying the quantity column by the item_price column. The resulting set is

prod_id quantity item_price expanded_price
RGAN01 5 4.99 24.95
BR03 5 11.99 59.95
BNBG01 10 3.49 34.90
BNBG02 10 3.49 34.90
BNBG03 10 3.49 34.90

8. Data Manipulation Functions

Much like other programming languages, SQL also supports the use of functions to manipulate data. These functions are operations that are usually performed on data with the general purpose of facilitating conversion and manipulation. One very important thing to note is that data manipulation functions are quite DBMS specific. That is, depending on the SQL vendor you are using, functions might vary by name, or in some cases, they may not even be supported at all. SQL supports four main types of functions:

  1. Text functions are generally used to manipulate strings of text. For example, if you wanted an entire string of characters to be all uppercase, you could call the UPPER() function on that field.
  2. Numeric functions are primarily used for mathematical operations. For example, the AVG() function will take the average of a given numerical field.
  3. Date functions are used to manipulate date and time data types. For example, if you wanted to extract the year from a date field, you could call the DATE_FORMAT() function and specify that you only want the year.
  4. System functions return specific information pertaining to the DBMS being used (e.g. user login information).

8.1 Concatenating Fields

A commonly used function in SQL is the CONCAT() function, which concatenates (i.e. combines) multiple fields into one single field. The following concatenates the city and state fields from the Customers table.

  SELECT cust_name, CONCAT(cust_city, " ", ",", cust_state) AS location
  FROM Customers;
  
The results of the query
cust_name location
Village Toys Detroit ,MI
Kids Place Columbus ,OH
Fun4All Muncie ,IN
Fun4All Phoenix ,AZ
The Toy Store Chicago ,IL

8.2 Text Manipulation Functions

Let’s take a look at some text manipulation functions. We’ve already seen one example in the CONCAT() function.

  SELECT vend_name, UPPER(vend_name), AS vendor_uppercase
  FROM Vendors
  ORDER BY vend_name;
  

As you might expect, the UPPER() function uppercases every character in a string. Conversely, the LOWER() function lowercases every character in a string. A simple example is shown below.

  SELECT AVG(prod_price) AS average_price
  FROM Products;
  

Had we not used the alias “average_price,” the field name would have defaulted to “AVG(prod_price),” which in this example isn’t too grave of a sin, but the more complex your functions, the more difficult it will become to determine what that calculated field represents.

8.3 Using Aliases

An alias in SQL is an alternative name that can be given to a calculated field. Without aliases, the field name for a calculated field would be whatever function you called on it. For example, using our concatenation example, we used an alias for the concatenated field of city and state and called it “location.” However, had we not used an alias, the name of the field would have been “CONCAT(…).” This is obviously not as descriptive as location. Aliases can be extremely useful when manipulating or calculating fields because you can rename a field that is more readable and logical.

8.4 Date and Time Functions

Date and time functions are some of the most extremely useful functions in SQL. A very common application of a date and time function is reformatting a date. For example, say you only wanted to extract the moth and year from a date. In MySQL, the statement would be as follows

  SELECT order_num, DATE_FORMAT(order_date, '%Y-%m') AS date
  FROM Orders;
  
The resulting output is
order_num date
20005 2012-May
20006 2012-Jan
20007 2012-Jan
20008 2012-Feb
20009 2012-Feb

One thing to note about date time functions is that they tend to differ depending on which SQL vendor you are using. Thus, function name for formatting a date in MySQL might differ from that of another SQL vendor.

8.5 Mathematical Functions

Mathematical functions are extremely useful and common in SQL. We’ve already covered some of the math functions that SQL has at its disposal, such as the average, total, and sum. Let’s go over some common math functions you’ll probably end up using at least once when programming in SQL.

The AVG() function The following statements takes the average of a single column from a table. This takes the average price of all of the products listed in the Products table.

  SELECT AVG(prod_price) AS avg_price
  FROM Products;
  
The resulting query set is
avg_price
-6.823333

The average price for all products in the Products table is $6.82. Notice that we only selected one column in our query. When using mathematical operations on more than one column, it must always be accompanied by a GROUP BY statement. The exceeption to this is when the fields you are selecting are all calculated fields, in which case the GROUP BY statement is not needed. In generalThe GROUP BY statement groups the query set by a specific field(s). Let’s take the average price of each vendor ID.

  SELECT vend_id, AVG(prod_price) AS avg_price
  FROM Products
  GROUP BY vend_id;
  
The resulting output is
vend_id avg_price
BRS01 8.899
DLL01 3.865
FNG01 9.490

Here’s an example of selecting all calculated fields with no GROUP BY to accompany it.

  SELECT
    COUNT(*) AS num_items,
    MIN(prod_price) AS price_min,
    MAX(prod_price) AS price_max,
    AVG(prod_price) AS price_avg
  FROM Products;
  
The resulting output contains only one row which can be thought of as summary statistics of the Products table based on the selected calculated fields.
num_items price_min price_max price_avg
9 3.49 11.99 6.823333

Another common and very useful math function is the COUNT() function. The COUNT() function returns the total records for a given field. For example, suppose you wanted to know how many records existed in the Vendors table. The statement would be

  SELECT COUNT(*) AS 'count'
  FROM Vendors;
  

I’m not going to bother you with the results of the query, but in running it through MySQL, there are a total of six records in the Vendors table. Like, the AVG() function, you can also use COUNT() using multiple columns. Furthermore, much like the AVG() function, the COUNT() function always has to have a GROUP BY statement when selecting multiple fields. Suppose you wanted a count of how many products each vendor sold in the Products table. The syntax would be

  SELECT vend_id, COUNT(prod_id) AS num_products
  FROM Products
  GROUP BY vend_id;
  
The resulting query set is
vend_id num_products
BRS01 3
DLL01 4
FNG01 2

The SUM() Function The SUM() function is exactly what you think it is. It takes the sum of values in a numerical field. Don’t confuse it with count, which merely counts the number of records for a given field. Using the Products table, let’s see the sum of product prices for each vendor.

  SELECT vend_id, SUM(prod_price)
  FROM Products
  GROUP BY vend_id;
  
The resulting set is
vend_id sum_prod_price
BRS01 26.97
DLL01 15.46
FNG01 18.98

If you really wanted to get fancy, you can calculate the total revenue for each vendor by multiplying the quantity of products by their price.

  SELECT vend_id (COUNT(vend_id) * SUM(prod_price)) AS total_sales
  FROM Products
  GROUP BY vend_id;
  
The resulting query set…
vend_id total_sales
BRS01 80.91
DLL01 61.84
FNG01 37.96

9. Grouping Data

We’ve already seen in the previous section how to group data usuing the GROUP BY clause. Grouping allows you to divide data into logical sets that you can perform aggregate calculations on each group. Grouping is accompilshed using the GROUP BY clause. Let’s revisit a query with a GROUP BY clause.

  SELECT vend_id, COUNT(*) AS num_prods
  FROM Products
  GROUP BY vend_id;
  

We already know what this does, as we performed the same query in the previous section. There are a couple things to furthernote regarding the GROUP BY clause.

  1. You can group by more than one column.
  2. Every field listed in the GROUP BY clause must be a retrieved column or valid expression (not withstanding aggregate functions).
  3. With the exception of aggregate functions, every column in your SELECT statement must also be present in your GROUP BY clause. Also, GROUP BY does not allow you to use aliases when grouping a column. Thus, if you have a field like ROUND(AVG(prod_price), 2) AS avg_price, you would not be able to use avg_price as the column name in your GROUP BY clause. You would have to use the calculated field.
  4. NULL values, if present, will be returned as a group.

9.1 Filtering Grouped Data

SQL allows for the filtering of grouped data. This is done by using the HAVING clause. The HAVING clause is similar, if not the exact same, as the WHERE clause, except HAVING obviously only applies to grouped columns. Also, one very important thing to note is that if you have both a WHERE clause and a HAVINV clause, the HAVING clause comes after the WHERE clause. Here’s an example of filtering grouped data using the HAVING clause.

  SELECT vend_id, AVG(prod_price)
  FROM Products
  GROUP BY vend_id
  HAVING AVG(prod_price) > 4;
  

Without the HAVING clause the result set would have contained three records. However, since we filtered only vendor IDs with an average product price greater than 4, one vendor was excluded from the result set, since that vendor’s average price was 3.833.

10. Subqueries

So far, our example queries have been simple and single SELECT queries. However, SQL has enables you to also write subqueries, which are queries nested inside a query. The following is an example of a SQL subquery that returns all customers who ordered item RGAN01.

  SELECT cust_name, cust_contact
  FROM Customers
  WHERE cust_id IN
    (SELECT cust_id
     FROM Orders
     WHERE order_num IN 
      (SELECT order_num
       FROM OrderItems
       WHERE prod_id = 'RGAN01'));
       

The above subquery actually has two subqueries. Subqueries can sometimes be difficult to wrap your brain around. Formulating a strategy on how to write your query will often times require a bit of trial and error. As we will see in the next section, using JOINs over subqueries is a more intuitive and efficient approach to accomplishing the same thing.

SQL is actually performing three SELECT statements in the above statement. As you can see each SELECT statement is selecting fields from a different table (another reason to use JOINs over subqueries). The outermost query returns all orderr numbers with a product idi of RGAN01. The innermost returns the customer ID; in particular, the customer ID that appears in the outermost query. The top-level query returns the customer name and customer contact fields where the customer ID is in the subsequent SELECT statements. As you can see, this can sometimes be mentally excruciating; not to mention that it isn’t the most elegant piece of code to look at (not that aesthetic matters, but just sayin’).

The question is, what are the steps one should take in performing a subquery? A common way of approaching subqueries is to write each SELECT statement individually to see what it returns. From there, you can build on top of that. However, you have to first understand what it is you are looking for, so you know which tables to query from and so on. In the subquery above, we are looking for customers who bought product ID RGAN01. Customers, purchases, and product ID should all come to your mind, since, after all, that is what you are supposed to be returning. Once you know what you’re looking for, you then have to figure out which table housues each field.

Let’s walk through this by looking at the subquery with the question in mind. The Orders and OrderItems tables obviously both contain information regarding orders. That’s two thirds of what we are looking for. As mentioned earlier, a common approach is to a trial and error approach of writing each SELECT statement individually, and then building on top of that. The OrderItems table contains information about the ordered, or purchased, items. The Orders table contains information regarding transactions by customers. The Customers table contains information regarding the customers. Let’s look at what each SELECT statement returns when we run it individually.

  SELECT order_num
  FROM OrderItems
  WHERE prod_id = 'RGAN01';
  

This returns the order numbers relating to product ID RGAN01. Terrific.

  SELECT cust_id
  FROM Orders;

The preceding statement simply returns the field cust_id from the Orders table. However, when we combine it with the statement that queries from the OrderItems table, we get a list of all customer IDs who ordered product ID RGAN01. We could actually have stopped there, but maybe your boss was looking for the actual customer name, and not just his or her ID. The last step is to query from the customers table and use the combined SELECT statements above as your subqueries. Even though there are three separate SELECT statements, you can now think of this as querying from a table where a field exists in another table. That is, you can think of the following query as a single table.

  SELECT cust_id
  FROM Orders
  WHERE order_num IN
    (SELECT order_num
     FROM OrderItems
     WHERE prod_id = 'RGAN01');
     

This returns two records of customer IDs who ordered RGAN01. Now, all you have to do is query from the Customers table and write the appropriate condition in your WHERE clause. Simple, right?

As mentioned earlier, using a JOIN is often preferred since it is generally easier to read, and for the most part, more efficient than subqueries. So, how would we achieve the same results using a JOIN? Well, we know that we have to join three tables. We also know from our final subquery that the field cust_id can be found in the Orders table, and the field order_num can also be found in the OrderItems table. Thus, we know by which fields to link the tables. Here’s how you would write subquery as a JOIN

  SELECT c.cust_name, c.cust_contact
  FROM Customers c
  INNER JOIN Orders o
  ON
    c.cust_id = o.cust_id
  INNER JOIN OrderItems oi
  ON
    o.order_num = oi.order_num
  WHERE oi.prod_id = 'RGAN01';
  

Not only is the above statement more elegant, but the advantage of JOINs is that it’s relatively easier to formulate a strategy on how to return fields from two or more tables when you think about them in terms of a JOIN statement as opposed to a subquery. For most SQL programmers, JOINs are more intuitive than subqueries.

10. Joining Tables

The ability to JOIN tables is one of the most, if not the most, important concept one must understand to become a proficient SQL programmer In an enterprise setting, it is very rare, and quite unrealistic that you will be writing queries from one table only. Neirly all database schemas involve multiple tables that are logically linked together, and this is the very crux of relational database maangement systems. If you were to go into an interview for a data analyst position that required SQL skills, there is a good chance that they will test you on your ability to join multiple tables. That is why this section is probably one of the most important sections of this tutorial.

10a. Understanding Relational Databases and Joins

Recall at the beginning of the section that a relational database is a DBMS that breaks down data into tables that are logically linked to one another. The advantage of breaking down data into multiple tables is scalability, which in the case lf RDBMS, means that it is able to handle an increasing load without failing.

10b. The EER Diagram

First, let’s break down what relational database looks like using an enhanced-entity relationship (EER) diagram. In its most simplest terms, an EER diagram is a visual representation of how tables are linked in a given database schema. Using our Orders, Products, etc. examples we’ve been using throughout this tutorial, the EER diagram for our schema looks something like the following

The EER diagram is very useful in understanding how each table is linked. It is beyond the scope of this tutorial to go into detail regarding the different types of relationships between tables (e.g. one-to-many, one-to-one, and many-to-many). However, in looking at the EER diagram above, we can see that the vend_id column in the Vendors table has a one-to-many relationship with the vend_id column in the Products table. EER diagrams can be extremely useful when dealing with multiple tables, as is usually the case in enterprise systems. Knowing that the Vendors table and the Products table are linked by the vend_id column, we can do a hypothetical join such as the following

  SELECT v.vend_name, p.prod_name, p.prod_price
  FROM Products p
  INNER JOIN Vendors v
  ON
    p.vend_id = vend_id;
    

The above JOIN statement returns vendors and their associated products. You might notice the prefix in front of each column name being selected. These are aliases, and we’ll get into further discussion regardding the importance of using them when making JOIN statements. For now, here is the following output from the above statement.

vend_name prod_name prod_price
Doll House Inc. Fish bean bag toy 3.49
Doll House Inc. Bird bean bag toy 3.49
Doll House Inc. Rabbit bean bag toy 3.49
Bears R Us 8 inch teddy bear 5.99
Bears R Us 12 inch teddy bear 8.99
Bears R Us 18 inch teddy bear 11.99
Doll House Inc. Raggedy Ann 4.99
Fun and Games King doll 9.49
Fun and Games Queen doll 9.49

10.1 Creating a JOIN

Let’s put the JOIN statement into practice. Creating a simple join between two tables is relatively simple. You must specify all the tables to be included and how they are related to each other, as in the following

    SELECT vend_name, prod_name, prod_price
    FROM Vendors, Products
    WHERE Vendors.vend_id = Products.vend_id;
    

The above query is exactly the same as the query preceding it, except the syntax is obviously very different. While this is obviously perfectly acceptable in most SQL vendors, most SQL programmers generally do not write their JOIN statements this way.

10.2 Types of JOINS

10.2a The INNER JOIN

The INNER JOIN is the most commonly-used join in SQL. It is also called an equijoin, which is a join based on the testing of equality between two tables. Here is an example of an INNER JOIN that we’ve already used in our beginning example.

  SELECT vend_name, prod_name, prod_price
  FROM Vendors
  INNER JOIN Products
  ON
    Vendors.vend_id = Products.vend_id;
    

The above statement will return the exact same results as the beginng example of our JOIN statement. What differes in the statement above is the exlcusion of aliases. Aliases aren’t necessary when your join is simple, as is the case in the example above. However, as we will see down the road, the use of aliases is almost necessary so as to avoid confusion, especially when joining more than two tables.

10.2a.1 Joining Multiple Tables

JOINS are not restricted to two tables. You can theoretically join as many tables as you wish (although this obviously comes at a cost, namely performance). Here is an example of a multiple join

  SELECT p.prod_name, v.vend_name, p.prod_price, oi.quantity
  FROM OrderItems oi
  INNER JOIN Products p
  ON
    oi.prod_id = p.prod_id
  INNER JOIN Vendors v
  ON
    p.vend_id = p.vend_id
  WHERE oi.order_num = 20007;
  
The resulting query set returned is
prod_name vend_name prod_price quantity
18 inch teddy bear Bears R Us 11.99 50
Fish bean bag toy Doll House Inc. 3.49 100
Bird bean bag toy Doll House Inc. 3.49 100
Rabbit bean bag toy Doll House Inc. 3.49 100
Raggedy Ann Doll House Inc. 4.99 50

In the above statement, we joined three tables: the OrderItems table, the Products table, and the Vendors table. One important consideration a SQL programmer must always take into consideration is performance considerations. The more tables you join, the slower the performance will be. One way to mitigate this is through the use of INDEXes, which we will cover later on.

Creating Aliases

Before we move on to other types of JOINs, it is imparative we cover the use of aliases in JOIN statements. One obvious advantage we have already seen in using aliases is that it lessens confusion, especially when dealing with multiple joins. It also reduces your SQL syntax. As a general rule of thumb, regardless of how simple or complicated your join statements are, using aliases is always considered good practice. While it isn’t required, it makes for more readable code.

10.2b The Self Join

A self join is a type of JOIN in which a table is joined with itself. Suppose you wanated to send an email to all customer contacts who work for the same company for which Jim Jones works. Here’s how one might be able to accomplish that using a self-join.

  SELECT c1.cust_id, c1.cust_name, c1.cust_contact
  FROM Customers c1
  INNER JOIN Customers c2
  ON
    c1.cust_name = c2.cust_name
  WHERE
    c2.cust_contact = 'Jim Jones';
    
The resulting output is
cust_id cust_name cust_contact
1e+09 Fun4All Jim Jones
1e+09 Fun4All Denise L. Stephens

Self-joins are often used to reeplace statements using subqueries that retrieve data from the same table as the outer statement.

10.2c The Natural Join

In order to join two tables, a field in one table must be linked to the same field in the other table. A natural join is a join that combines two tables based on such columns. In essence, you can think of it as a shortcut for an INNER JOIN without having to explicitly state the condition in which to match the two tables. Let’s revisit our INNER JOIN example where we combined the Vendors table to the Products table based on the vend_id fields.

  -- Using the INNER JOIN
  SELECT vend_name, prod_name, prod_price
  FROM Vendors
  INNER JOIN Products
  ON
    Vendors.vend_id = Products.vend_id;
    
  -- Using the NATURAL JOIN
  SELECT vend_name, prod_name, prod_price
  FROM Vendors
  NATURAL JOIN Products;
  

The two statements result in the exact same query set. The difference is that the NATURAL JOIN does not require a condition in which to link the two tables. In a way, SQL looks for columns in each table that link the two tables together. In this case, it’s the vend_id column. Essentially, every INNER JOIN can be thought of as a NATURAL JOIN.

10.2d Outer Joins

When performing an INNER JOIN, SQL retrieves only rows that have a match. However, there will be multiple occasions in which you may want to retrieve a result set with records that don’t have a match in one table. This is where the OUTER JOIN comes in handy. The LEFT OUTER JOIN is one of the more commonly used OUTER JOINs in SQL. Let’s compare the results retrieved by an INNER JOIN to a LEFT OUTER JOIN

  SELECT c.cust_id, o.order_num
  FROM Customers c
  INNER JOIN Orders o
  ON
    c.cust_id = o.cust_id;
    
The resulting query set is
cust_id order_num
1000000001 20005
1000000001 20009
1000000003 20006
1000000004 20007
1000000005 20008

Now, let’s take a look at the query results for a LEFT OUTER JOIN (or LEFT JOIN for short).

  SELECT c.cust_id, o.order_num
  FROM Customers c
  LEFT OUTER JOIN Orders o
  ON
    c.cust_id = o.cust_id;
    
The resulting query set…
cust_id order_num
1000000001 20005
1000000001 20009
1000000002 NULL
1000000003 20006
1000000004 20007
1000000005 20008

Notice that the LEFT OUTER JOIN includes a NULL record in the order_num field. As mentioned earlier, OUTER JOINS include all records, regardless if there is a match in the table that it is being joined with.

There are other types of outer joins, such as RIGHT OUTER JOIN and FULL OUTER JOIN. The distinction between LEFT, RIGHT, and FULL refers to the table from which to include all rows. For example, the LEFT OUTER JOIN will include all rows on the LEFT, or first table specified, and RIGHT will include all rows for the right, or second table specified. Below is a diagram for the different types of joins.

Types of SQL Joins

Types of SQL Joins

A special type of join is the FULL OUTER JOIN. The FULL OUTER JOIN includes unrelated rows from both tables. That is, it retrieves all records from both joined tables, as shown in the above diagram. Unfortunately, not all SQL vendors support FULL OUTER JOINs. MySQL, for example, does not currently support FULL OUTER JOINS. However, there are alterrnative workarounds. A useful tip on joins: it pays to experiment. Depending on how well the DBA(s) designed the schema and how well you know the schme design (i.e. which tables contains certain fields, how tables are related other tables, etc.), many SQL programmers will often go through a trial and error approach when creating joins, especially mnore complex ones.

10.3 Using Joins with Aggregate Functions

JOIN statements can also be used with aggregate functions. To demonstrate, take a look at the following query and resulting query set.

  SELECT c.cust_id, COUNT(o.order_num) AS num_orders
  FROM Customers c
  INNER JOIN Orders o
  ON
    c.cust_id = o.cusut_id
  GROUP BY c.cust_id;
The resulting query set is
cust_id num_orders
1000000001 2
1000000003 1
1000000004 1
1000000005 1

11. Combing Queries

Many SQL queries contain a single SELECT statement that returns data from one or more tables. sQL also enables you to execute multiple queries and return the results as a single query result set. These combined queries are what are known as unions or compound queries. When would one might have to use a UNION statement? Generally speaking, the two most common scenarios are

  • If you want to return similarly structured data from differetn tables in a single query.
  • If you wnt to execute multiple queries against a single table returning the data as one query.

11.1 Using the UNION Operator to Create Combined Queries

Combined queries can be accomplished using the UNION operator. The UNION operator uses multiple SELECT statements that are specified, and their results an be combined into a single result set. To illustrate how the UNION operator works, recall that combined queries are queries that consist of multiple select statements. Thus, let’s take a look at the two following queries.

  # The first SELECT statement.
  SELECT cust_name, cust_contact, cust_email
  FROM Customers
  WHERE cust_state IN ('IL' 'IN', 'MI');
  
  # The second SELECT statement.
  SELECT cust_name, cust_contact, cust_email
  FROM Customers
  WHERE cust_name = 'Fun4All';
  
The first query returns the following result set.
cust_name cust_contact cust_email
Village Toys John Smith sales@villagetoys.com
Fun4All Jim Jones jjones@fun4all.com
The Toy Store Kim Howard NULL
While the second result set returns
cust_name cust_contact cust_email
Fun4All Jim Jones jjones@fun4all.com
Fun4All Denise L. Stephens dstephens@fun4all.com

Using the UNION operator, for the most part, is fairly straight forward. Using the example we just used, to combine the two query results, just add the UNION operator between the two SELECT statements, as shown below

  SELECT cust_name, cust_contact, cust_email
  FROM Customers
  WHERE cust_state IN ('IL' 'IN', 'MI')
  UNION
  SELECT cust_name, cust_contact, cust_email
  FROM Customers
  WHERE cust_name = 'Fun4All';
  
The resulting query set essentially binds the rows from the second SELECT statement to the first.
cust_name cust_contact cust_email
Village Toys John Smith sales@villagetoys.com
Fun4All Jim Jones jjones@fun4all.com
The Toy Store Kim Howard NULL
Fun4All Denise L Stephens dstephens@fun4all.com
For the astute reader, you may have noticed that the UNION operator only returned a total of four records. Shouldn’t it return the records returend from the first statement and the records from the second statement, which would total to six records instead of five? That’s because the UNION operator only returns unique records. If you want all records to be returned, you can use the UNION ALL operator, which wuld result in
cust_name cust_contact cust_email
Village Toys John Smith sales@villagetoys.com
Fun4All Jim Jones jjones@fun4all.com
The Toy Store Kim Howard NULL
Fun4All Jim Jones jjones@fun4all.com
Fun4All Denise L. Stephens dstephens@fun4all.com