Course Description

The role of a data scientist is to turn raw data into actionable insights. Much of the world’s raw data—from electronic medical records to customer transaction histories—lives in organized collections of tables called relational databases. To be an effective data scientist, you must know how to wrangle and extract data from these databases using a language called SQL . This course teaches syntax in SQL shared by many types of databases, such as PostgreSQL, MySQL, SQL Server, and Oracle. This course teaches you everything you need to know to begin working with databases today!

Selecting columns

This chapter provides a brief introduction to working with relational databases. You’ll learn about their structure, how to talk about them using database lingo, and how to begin an analysis using simple SQL commands to select and summarize columns from database tables.

Onboarding | Tables

The DataCamp interface for SQL courses contains a few unique features you should be aware of.

For this course, you’ll be using a database containing information on almost 5000 films. To the right, underneath the editor, you can see the data in this database by clicking through the tabs.

From looking at the tabs, who is the first person listed in the people table?

SELECT * FROM people;
Displaying records 1 - 10
id name birthdate deathdate
1 50 Cent 1975-07-06T00:00:00.000Z null
2 A. Michael Baldwin 1963-04-04T00:00:00.000Z null
3 A. Raven Cruz null null
4 A.J. Buckley 1978-02-09T00:00:00.000Z null
5 A.J. DeLucia null null
6 A.J. Langer 1974-05-22T00:00:00.000Z null
7 Aaliyah 1979-01-16T00:00:00.000Z 2001-08-25T00:00:00.000Z
8 Aaron Ashmore 1979-10-07T00:00:00.000Z null
9 Aaron Hann null null
10 Aaron Hill 1983-04-23T00:00:00.000Z null
  • Kanye West
  • Biggie Smalls
  • 50 Cent
  • Jay Z
  • That’s correct! Head over to the next exercise to see how we can answer a similar question with code!

    Onboarding | Query Result

    Notice the query result tab in the bottom right corner of your screen. This is where the results of your SQL queries will be displayed.

    Run this query in the editor and check out the resulting table in the query result tab!

    SELECT name FROM people;
    

    Who is the second person listed in the query result?

    SELECT name FROM people;
    Displaying records 1 - 10
    name
    50 Cent
    A. Michael Baldwin
    A. Raven Cruz
    A.J. Buckley
    A.J. DeLucia
    A.J. Langer
    Aaliyah
    Aaron Ashmore
    Aaron Hann
    Aaron Hill
  • Kanye West
  • A. Michael Baldwin
  • 50 Cent
  • Jay Z
  • Correct!

    Onboarding | Errors

    If you submit the code to the right, you’ll see that you get two types of errors.

    SQL errors are shown below the editor. These are errors returned by the SQL engine. You should see:

    syntax error at or near "'DataCamp <3 SQL'" LINE 2: 'DataCamp <3 SQL' ^
    


    DataCamp errors are shown in the Instructions box. These will let you know in plain English where you went wrong in your code! You should see:

    You need to add SELECT at the start of line 2!
    

    Submit the code to the right, check out the errors, then fix them!

    -- Try running me!
    SELECT 'DataCamp <3 SQL'
    1 records
    ‘DataCamp <3 SQL’
    DataCamp <3 SQL
    -- Try running me!
    SELECT 'DataCamp <3 SQL'
    AS result;
    1 records
    result
    DataCamp <3 SQL

    Excellent error editing! You can feel safe experimenting with code in the editor – you’ll always get feedback if something goes wrong.

    Onboarding | Multi-step

    The following multi-step exercise allows you to practice a new concept through repetition. Check it out!

    Submit the query in the editor! Don’t worry, you’ll learn how it works soon.

    SELECT 'SQL'
    AS result;
    1 records
    result
    SQL

    Now change ‘SQL’ to ‘SQL is’ and click Submit!

    SELECT 'SQL is'
    AS result;
    1 records
    result
    SQL is

    Finally, change ‘SQL is’ to ‘SQL is cool’ and click Submit!

    SELECT 'SQL is cool'
    AS result;
    1 records
    result
    SQL is cool

    Well done! The time has come to actually fetch information from tables now!

    Beginning your SQL journey

    Now that you’re familiar with the interface, let’s get straight into it.

    SQL, which stands for Structured Query Language, is a language for interacting with data stored in something called a relational database.

    You can think of a relational database as a collection of tables. A table is just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.

    Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table. For example, in a table representing employees, we might have a column containing first and last names for all employees.

    The table of employees might look something like this:

    id name age nationality
    1 Jessica 22 Ireland
    2 Gabriel 48 France
    3 Laura 36 USA

    How many fields does the employees table above contain?

  • 1
  • 2
  • 3
  • 4
  • Correct! The table contains four columns, or fields.

    SELECTing single columns

    While SQL can be used to create and modify databases, the focus of this course will be querying databases. A query is a request for data from a database table (or combination of tables). Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.

    In SQL, you can select data from a table using a SELECT statement. For example, the following query selects the name column from the people table:

    SELECT name
    FROM people;
    

    In this query, SELECT and FROM are called keywords. In SQL, keywords are not case-sensitive, which means you can write the same query as:

    select name
    from people;
    

    That said, it’s good practice to make SQL keywords uppercase to distinguish them from other parts of your query, like column and table names.

    It’s also good practice (but not necessary for the exercises in this course) to include a semicolon at the end of your query. This tells SQL where the end of your query is!

    Remember, you can see the results of executing your query in the query tab!

    Select the title column from the films table.

    SELECT title
    FROM films;
    Displaying records 1 - 10
    title
    Steamboy
    Summer Storm
    Super Size Me
    Superbabies: Baby Geniuses 2
    Suspect Zero
    Tae Guk Gi: The Brotherhood of War
    Teacher’s Pet
    Team America: World Police
    The Alamo
    The Alamo

    Select the release_year column from the films table.

    SELECT release_year
    FROM films;
    Displaying records 1 - 10
    release_year
    2004
    2004
    2004
    2004
    2004
    2004
    2004
    2004
    2004
    2004

    Select the name of each person in the people table.

    SELECT name 
    FROM people;
    Displaying records 1 - 10
    name
    50 Cent
    A. Michael Baldwin
    A. Raven Cruz
    A.J. Buckley
    A.J. DeLucia
    A.J. Langer
    Aaliyah
    Aaron Ashmore
    Aaron Hann
    Aaron Hill

    Great work!

    SELECTing multiple columns

    Well done! Now you know how to select single columns.

    In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!

    For example, this query selects two columns, name and birthdate, from the people table:

    SELECT name, birthdate
    FROM people;
    

    Sometimes, you may want to select all columns from a table. Typing out every column name would be a pain, so there’s a handy shortcut:

    SELECT *
    FROM people;
    

    If you only want to return a certain number of results, you can use the LIMIT keyword to limit the number of rows returned:

    SELECT *
    FROM people
    LIMIT 10;
    

    Before getting started with the instructions below, check out the column names in the films table!

    Get the title of every film from the films table.

    SELECT title
    FROM films;
    Displaying records 1 - 10
    title
    Steamboy
    Summer Storm
    Super Size Me
    Superbabies: Baby Geniuses 2
    Suspect Zero
    Tae Guk Gi: The Brotherhood of War
    Teacher’s Pet
    Team America: World Police
    The Alamo
    The Alamo

    Get the title and release year for every film.

    SELECT title, release_year
    FROM films;
    Displaying records 1 - 10
    title release_year
    Steamboy 2004
    Summer Storm 2004
    Super Size Me 2004
    Superbabies: Baby Geniuses 2 2004
    Suspect Zero 2004
    Tae Guk Gi: The Brotherhood of War 2004
    Teacher’s Pet 2004
    Team America: World Police 2004
    The Alamo 2004
    The Alamo 2004

    Get the title, release year and country for every film.

    SELECT title, release_year, country
    FROM films;
    Displaying records 1 - 10
    title release_year country
    Steamboy 2004 Japan
    Summer Storm 2004 Germany
    Super Size Me 2004 USA
    Superbabies: Baby Geniuses 2 2004 Germany
    Suspect Zero 2004 USA
    Tae Guk Gi: The Brotherhood of War 2004 South Korea
    Teacher’s Pet 2004 USA
    Team America: World Police 2004 USA
    The Alamo 2004 USA
    The Alamo 2004 USA

    Get all columns from the films table.

    SELECT *
    FROM films;
    Displaying records 1 - 10
    id title release_year country duration language certification gross budget
    2223 Steamboy 2004 Japan 103 Japanese PG-13 410388 2127519898
    2224 Summer Storm 2004 Germany 98 German R 95016 2700000
    2225 Super Size Me 2004 USA 100 English PG 11529368 65000
    2226 Superbabies: Baby Geniuses 2 2004 Germany 88 English PG 9109322 20000000
    2227 Suspect Zero 2004 USA 99 English R 8712564 27000000
    2228 Tae Guk Gi: The Brotherhood of War 2004 South Korea 148 Korean R 1110186 12800000
    2229 Teacher’s Pet 2004 USA 74 English PG 6491350 10000000
    2230 Team America: World Police 2004 USA 98 English R 32774834 30000000
    2231 The Alamo 2004 USA 137 English PG-13 22406362 107000000
    2232 The Alamo 2004 USA 137 English PG-13 22406362 107000000

    Great work!

    SELECT DISTINCT

    Often your results will include many duplicate values. If you want to select all the unique values from a column, you can use the DISTINCT keyword.

    This might be useful if, for example, you’re interested in knowing which languages are represented in the films table:

    SELECT DISTINCT language
    FROM films;
    

    Remember, you can check out the data in the tables by clicking on the table name!

    Get all the unique countries represented in the films table.

    SELECT DISTINCT country
    FROM films;
    Displaying records 1 - 10
    country
    Japan
    Germany
    USA
    South Korea
    Canada
    UK
    France
    Argentina
    Spain
    India

    Get all the different film certifications from the films table.

    SELECT DISTINCT certification
    FROM films;
    8 records
    certification
    PG-13
    R
    PG
    G
    Unrated
    Not Rated
    NC-17

    Get the different types of film roles from the roles table.

    SELECT DISTINCT role
    FROM roles;
    2 records
    role
    director
    actor

    Great work!

    Learning to COUNT

    What if you want to count the number of employees in your employees table? The COUNT() function lets you do this by returning the number of rows in one or more columns.

    For example, this code gives the number of rows in the people table:

    SELECT COUNT(*)
    FROM people;
    


    How many records are contained in the reviews table?
    SELECT COUNT(*)
    FROM reviews;
    1 records
    COUNT(*)
    4968
  • 9,468
  • 8,397
  • 4,968
  • 9,837
  • 9,864
  • Correct!

    Practice with COUNT

    As you’ve seen, COUNT(*) tells you how many rows are in a table. However, if you want to count the number of non-missing values in a particular column, you can call COUNT() on just that column.

    For example, to count the number of birth dates present in the people table:

    SELECT COUNT(birthdate)
    FROM people;
    

    It’s also common to combine COUNT() with DISTINCT to count the number of distinct values in a column.

    For example, this query counts the number of distinct birth dates contained in the people table:

    SELECT COUNT(DISTINCT birthdate)
    FROM people;
    

    Let’s get some practice with COUNT()!

    Count the number of rows in the people table.

    SELECT COUNT(*)
    FROM people;
    1 records
    COUNT(*)
    8397

    Count the number of (non-missing) birth dates in the people table.

    SELECT COUNT(birthdate)
    FROM people;
    1 records
    COUNT(birthdate)
    8397

    Count the number of unique birth dates in the people table.

    SELECT COUNT(DISTINCT birthdate)
    FROM people;
    1 records
    COUNT(DISTINCT birthdate)
    5399

    Count the number of unique languages in the films table.

    SELECT COUNT(DISTINCT language)
    FROM films;
    1 records
    COUNT(DISTINCT language)
    47

    Count the number of unique countries in the films table.

    SELECT COUNT(DISTINCT country)
    FROM films;
    1 records
    COUNT(DISTINCT country)
    57

    Great work!

    Filtering rows

    This chapter builds on the first by teaching you how to filter tables for rows satisfying some criteria of interest. You’ll learn how to use basic comparison operators, combine multiple criteria, match patterns in text, and much more.

    Filtering results

    Congrats on finishing the first chapter! You now know how to select columns and perform basic counts. This chapter will focus on filtering your results.

    In SQL, the WHERE keyword allows you to filter based on both text and numeric values in a table. There are a few different comparison operators you can use:

    • = equal
    • <> not equal
    • < less than
    • > greater than
    • <= less than or equal to
    • >= greater than or equal to

    For example, you can filter text records such as title. The following code returns all films with the title ‘Metropolis’:

    SELECT title
    FROM films
    WHERE title = 'Metropolis';
    

    Notice that the WHERE clause always comes after the FROM statement!

    Note that in this course we will use <> and not != for the not equal operator, as per the SQL standard.


    What does the following query return?

    SELECT title
    FROM films
    WHERE release_year > 2000;
    
  • Films released before the year 2000
  • Films released after the year 2000
  • Films released after the year 2001
  • Films released in 2000
  • Correct!

    Simple filtering of values

    As you learned in the previous exercise, the WHERE clause can also be used to filter numeric records, such as years or ages.

    For example, the following query selects all details for films with a budget over ten thousand dollars:

    SELECT *
    FROM films
    WHERE budget > 10000;
    

    Now it’s your turn to use the WHERE clause to filter numeric values!

    Get all details for all films released in 2016.

    SELECT *
    FROM films
    WHERE release_year = 2016;
    Displaying records 1 - 10
    id title release_year country duration language certification gross budget
    4821 10 Cloverfield Lane 2016 USA 104 English PG-13 71897215 1.5e+07
    4822 13 Hours 2016 USA 144 English R 52822418 5.0e+07
    4823 A Beginner’s Guide to Snuff 2016 USA 87 English NA NA
    4824 Airlift 2016 India 130 Hindi NA 4.4e+06
    4825 Alice Through the Looking Glass 2016 USA 113 English PG 76846624 1.7e+08
    4826 Allegiant 2016 USA 120 English PG-13 66002193 1.1e+08
    4827 Alleluia! The Devil’s Carnival 2016 USA 97 English NA 5.0e+05
    4828 Antibirth 2016 USA 94 English NA 3.5e+06
    4829 Bad Moms 2016 USA 100 English R 55461307 2.0e+07
    4830 Bad Moms 2016 USA 100 English R 55461307 2.0e+07

    Get the number of films released before 2000.

    SELECT COUNT(*)
    FROM films
    WHERE release_year < 2000;
    1 records
    COUNT(*)
    0

    Get the title and release year of films released after 2000.

    SELECT title, release_year
    FROM films
    WHERE release_year > 2000;
    Displaying records 1 - 10
    title release_year
    Steamboy 2004
    Summer Storm 2004
    Super Size Me 2004
    Superbabies: Baby Geniuses 2 2004
    Suspect Zero 2004
    Tae Guk Gi: The Brotherhood of War 2004
    Teacher’s Pet 2004
    Team America: World Police 2004
    The Alamo 2004
    The Alamo 2004

    Great job! After filtering of numeric values, it’s time to explore filtering of text!

    Simple filtering of text

    Remember, the WHERE clause can also be used to filter text results, such as names or countries.

    For example, this query gets the titles of all films which were filmed in China:

    SELECT title
    FROM films
    WHERE country = 'China';
    

    Now it’s your turn to practice using WHERE with text values!

    Important: in PostgreSQL (the version of SQL we’re using), you must use single quotes with WHERE.

    Get all details for all French language films.

    SELECT *
    FROM films
    WHERE language = 'French';
    Displaying records 1 - 10
    id title release_year country duration language certification gross budget
    2239 The Case of the Grinning Cat 2004 France 59 French NA 1.5e+05
    2240 The Chorus 2004 France 97 French PG-13 3629758 5.5e+06
    2307 Camping sauvage 2005 France 79 French NA NA
    2368 Joyeux Noel 2005 France 116 French PG-13 1050445 2.2e+07
    2387 March of the Penguins 2005 France 80 French G 77413017 8.0e+06
    2475 The Rocket: The Legend of Rocket Richard 2005 Canada 124 French PG NA 8.0e+06
    2641 One to Another 2006 France 95 French Unrated 18435 1.4e+06
    2647 “Paris, je t’aime” 2006 France 120 French R 4857376 1.3e+07
    2843 Molière 2007 France 120 French PG-13 634277 1.6e+07
    2859 Persepolis 2007 France 89 French PG-13 4443403 7.3e+06

    Get the name and birth date of the person born on November 11th, 1974. Remember to use ISO date format (‘1974-11-11’)!

    SELECT name, birthdate
    FROM people
    WHERE birthdate = '1974-11-11';
    0 records
    name birthdate

    Get the number of Hindi language films.

    SELECT COUNT(*)
    FROM films
    WHERE language = 'Hindi';
    1 records
    COUNT(*)
    24

    Get all details for all films with an R certification.

    SELECT *
    FROM films
    WHERE certification = 'R';
    Displaying records 1 - 10
    id title release_year country duration language certification gross budget
    2224 Summer Storm 2004 Germany 98 German R 95016 2700000
    2227 Suspect Zero 2004 USA 99 English R 8712564 27000000
    2228 Tae Guk Gi: The Brotherhood of War 2004 South Korea 148 Korean R 1110186 12800000
    2230 Team America: World Police 2004 USA 98 English R 32774834 30000000
    2238 The Butterfly Effect 2004 USA 120 English R 23947 13000000
    2244 The Girl Next Door 2004 USA 110 English R 14589444 25000000
    2246 The Holy Girl 2004 Argentina 106 Spanish R 304124 1400000
    2247 The I Inside 2004 UK 90 English R NA 8800000
    2249 The Ladykillers 2004 USA 104 English R 39692139 35000000
    2250 The Last Shot 2004 USA 93 English R 463730 NA

    Wonderful! Let’s look at combining different conditions now!

    WHERE AND

    Often, you’ll want to select data based on multiple conditions. You can build up your WHERE queries by combining multiple conditions with the AND keyword.

    For example,

    SELECT title
    FROM films
    WHERE release_year > 1994
    AND release_year < 2000;
    

    gives you the titles of films released between 1994 and 2000.

    Note that you need to specify the column name separately for every AND condition, so the following would be invalid:

    SELECT title
    FROM films
    WHERE release_year > 1994 AND < 2000;
    

    You can add as many AND conditions as you need!

    Get the title and release year for all Spanish language films released before 2000.

    SELECT title, release_year
    FROM films
    WHERE release_year < 2000
    AND language = 'Spanish';
    0 records
    title release_year

    Get all details for Spanish language films released after 2000.

    SELECT *
    FROM films
    WHERE release_year > 2000
    AND language = 'Spanish';
    Displaying records 1 - 10
    id title release_year country duration language certification gross budget
    2246 The Holy Girl 2004 Argentina 106 Spanish R 304124 1400000
    2263 The Sea Inside 2004 Spain 125 Spanish PG-13 2086345 10000000
    2458 The Legend of Zorro 2005 USA 129 Spanish PG 45356386 75000000
    2542 Captain Alatriste: The Spanish Musketeer 2006 Spain 145 Spanish NA 24000000
    2646 Pan’s Labyrinth 2006 Spain 112 Spanish R 37623143 13500000
    2655 Quinceañera 2006 USA 90 Spanish R 1689999 400000
    2738 Volver 2006 Spain 121 Spanish R 12899702 NA
    2748 [Rec] 2007 Spain 78 Spanish R NA 1500000
    2921 The Orphanage 2007 Spain 105 Spanish R 7159147 3400000
    2932 Timecrimes 2007 Spain 66 Spanish R 38108 2600000

    Get all details for Spanish language films released after 2000, but before 2010.

    SELECT *
    FROM films
    WHERE release_year > 2000
    AND release_year < 2010
    AND language = 'Spanish';
    Displaying records 1 - 10
    id title release_year country duration language certification gross budget
    2246 The Holy Girl 2004 Argentina 106 Spanish R 304124 1400000
    2263 The Sea Inside 2004 Spain 125 Spanish PG-13 2086345 10000000
    2458 The Legend of Zorro 2005 USA 129 Spanish PG 45356386 75000000
    2542 Captain Alatriste: The Spanish Musketeer 2006 Spain 145 Spanish NA 24000000
    2646 Pan’s Labyrinth 2006 Spain 112 Spanish R 37623143 13500000
    2655 Quinceañera 2006 USA 90 Spanish R 1689999 400000
    2738 Volver 2006 Spain 121 Spanish R 12899702 NA
    2748 [Rec] 2007 Spain 78 Spanish R NA 1500000
    2921 The Orphanage 2007 Spain 105 Spanish R 7159147 3400000
    2932 Timecrimes 2007 Spain 66 Spanish R 38108 2600000

    Great work! Being able to combine conditions with AND will prove to be very useful if you only want your query to return a specific subset of records!

    WHERE AND OR

    What if you want to select rows based on multiple conditions where some but not all of the conditions need to be met? For this, SQL has the OR operator.

    For example, the following returns all films released in either 1994 or 2000:

    SELECT title
    FROM films
    WHERE release_year = 1994
    OR release_year = 2000;
    

    Note that you need to specify the column for every OR condition, so the following is invalid:

    SELECT title
    FROM films
    WHERE release_year = 1994 OR 2000;
    

    When combining AND and OR, be sure to enclose the individual clauses in parentheses, like so:

    SELECT title
    FROM films
    WHERE (release_year = 1994 OR release_year = 1995)
    AND (certification = 'PG' OR certification = 'R');
    

    Otherwise, due to SQL’s precedence rules, you may not get the results you’re expecting!


    What does the OR operator do?

  • Display only rows that meet at least one of the specified conditions.
  • Display only rows that meet all of the specified conditions.
  • Display only rows that meet none of the specified conditions.
  • Correct!

    WHERE AND OR (2)

    You now know how to select rows that meet some but not all conditions by combining AND and OR.

    For example, the following query selects all films that were released in 1994 or 1995 which had a rating of PG or R.

    SELECT title
    FROM films
    WHERE (release_year = 1994 OR release_year = 1995)
    AND (certification = 'PG' OR certification = 'R');
    

    Now you’ll write a query to get the title and release year of films released in the 90s which were in French or Spanish and which took in more than $2M gross.

    It looks like a lot, but you can build the query up one step at a time to get comfortable with the underlying concept in each step. Let’s go!

    Get the title and release year for films released in the 90s.

    SELECT title, release_year
    FROM films
    WHERE release_year >= 1990 AND release_year < 2000;
    0 records
    title release_year

    Now, build on your query to filter the records to only include French or Spanish language films.

    SELECT title, release_year
    FROM films
    WHERE (release_year >= 1990 AND release_year < 2000)
    AND (language = 'French' OR language = 'Spanish');
    0 records
    title release_year

    Finally, restrict the query to only return films that took in more than $2M gross.

    SELECT title, release_year
    FROM films
    WHERE (release_year >= 1990 AND release_year < 2000)
    AND (language = 'French' OR language = 'Spanish')
    AND gross > 2000000;
    0 records
    title release_year

    That was pretty involved!

    BETWEEN

    As you’ve learned, you can use the following query to get titles of all films released in and between 1994 and 2000:

    SELECT title
    FROM films
    WHERE release_year >= 1994
    AND release_year <= 2000;
    

    Checking for ranges like this is very common, so in SQL the BETWEEN keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above:

    SELECT title
    FROM films
    WHERE release_year
    BETWEEN 1994 AND 2000;
    

    It’s important to remember that BETWEEN is inclusive, meaning the beginning and end values are included in the results!


    What does the BETWEEN keyword do?

  • Filter numeric values
  • Filter text values
  • Filter values in a specified list
  • Filter values in a specified range
  • Correct!

    BETWEEN (2)

    Similar to the WHERE clause, the BETWEEN clause can be used with multiple AND and OR operators, so you can build up your queries and make them even more powerful!

    For example, suppose we have a table called kids. We can get the names of all kids between the ages of 2 and 12 from the United States:

    SELECT name
    FROM kids
    WHERE age BETWEEN 2 AND 12
    AND nationality = 'USA';
    

    Take a go at using BETWEEN with AND on the films data to get the title and release year of all Spanish language films released between 1990 and 2000 (inclusive) with budgets over $100 million. We have broken the problem into smaller steps so that you can build the query as you go along!

    Get the title and release year of all films released between 1990 and 2000 (inclusive).

    SELECT title, release_year
    FROM films
    WHERE release_year BETWEEN 1990 AND 2000;
    0 records
    title release_year

    Now, build on your previous query to select only films that have budgets over $100 million.

    SELECT title, release_year
    FROM films
    WHERE release_year BETWEEN 1990 AND 2000
    AND budget > 100000000;
    0 records
    title release_year

    Now restrict the query to only return Spanish language films.

    SELECT title, release_year
    FROM films
    WHERE release_year BETWEEN 1990 AND 2000
    AND budget > 100000000
    AND language = 'Spanish';
    0 records
    title release_year

    Finally, modify to your previous query to include all Spanish language or French language films with the same criteria as before. Don’t forget your parentheses!

    SELECT title, release_year
    FROM films
    WHERE release_year BETWEEN 1990 AND 2000
    AND budget > 100000000
    AND (language = 'Spanish' OR language = 'French');
    0 records
    title release_year

    Well done! Off to the next filtering operator!

    WHERE IN

    As you’ve seen, WHERE is very useful for filtering results. However, if you want to filter based on many conditions, WHERE can get unwieldy. For example:

    SELECT name
    FROM kids
    WHERE age = 2
    OR age = 4
    OR age = 6
    OR age = 8
    OR age = 10;
    

    Enter the IN operator! The IN operator allows you to specify multiple values in a WHERE clause, making it easier and quicker to specify multiple OR conditions! Neat, right?

    So, the above example would become simply:

    SELECT name
    FROM kids
    WHERE age IN (2, 4, 6, 8, 10);
    

    Try using the IN operator yourself!

    Get the title and release year of all films released in 1990 or 2000 that were longer than two hours. Remember, duration is in minutes!

    SELECT title, release_year
    FROM films
    WHERE release_year IN (1990, 2000)
    AND duration > 120;
    0 records
    title release_year

    Get the title and language of all films which were in English, Spanish, or French.

    SELECT title, language
    FROM films
    WHERE language IN ('English', 'Spanish', 'French');
    Displaying records 1 - 10
    title language
    Super Size Me English
    Superbabies: Baby Geniuses 2 English
    Suspect Zero English
    Teacher’s Pet English
    Team America: World Police English
    The Alamo English
    The Alamo English
    The Aviator English
    The Big Bounce English
    The Blue Butterfly English

    Get the title and certification of all films with an NC-17 or R certification.

    SELECT title, certification
    FROM films
    WHERE certification IN ('NC-17', 'R');
    Displaying records 1 - 10
    title certification
    Summer Storm R
    Suspect Zero R
    Tae Guk Gi: The Brotherhood of War R
    Team America: World Police R
    The Butterfly Effect R
    The Girl Next Door R
    The Holy Girl R
    The I Inside R
    The Ladykillers R
    The Last Shot R

    Your SQL vocabulary is growing by the minute!

    Introduction to NULL

    In SQL, NULL represents a missing or unknown value. You can check for NULL values using the expression IS NULL. For example, to count the number of missing birth dates in the people table:

    SELECT COUNT(*)
    FROM people
    WHERE birthdate IS NULL;
    

    As you can see, IS NULL is useful when combined with WHERE to figure out what data you’re missing.

    Sometimes, you’ll want to filter out missing values so you only get results which are not NULL. To do this, you can use the IS NOT NULL operator.

    For example, this query gives the names of all people whose birth dates are not missing in the people table.

    SELECT name
    FROM people
    WHERE birthdate IS NOT NULL;
    


    What does NULL represent?

  • A corrupt entry
  • A missing value
  • An empty string
  • An invalid value
  • Correct! NULL is used to represent unknown values.

    NULL and IS NULL

    Now that you know what NULL is and what it’s used for, it’s time for some practice!

    Get the names of people who are still alive, i.e. whose death date is missing.

    SELECT name
    FROM people
    WHERE deathdate IS NULL;
    0 records
    name

    Get the title of every film which doesn’t have a budget associated with it.

    SELECT title
    FROM films
    WHERE budget IS NULL;
    Displaying records 1 - 10
    title
    The Cookout
    The Last Shot
    The Libertine
    The Prince and Me
    What the #$*! Do We (K)now!?
    Woman Thou Art Loosed
    51 Birch Street
    Beauty Shop
    Camping sauvage
    Ice Princess

    Get the number of films which don’t have a language associated with them.

    SELECT COUNT(*)
    FROM films
    WHERE language IS NULL;
    1 records
    COUNT(*)
    0

    Alright! Are you ready for a last type of operator?

    LIKE and NOT LIKE

    As you’ve seen, the WHERE clause can be used to filter text data. However, so far you’ve only been able to filter by specifying the exact text you’re interested in. In the real world, often you’ll want to search for a pattern rather than a specific text string.

    In SQL, the LIKE operator can be used in a WHERE clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE:

    The % wildcard will match zero, one, or many characters in text. For example, the following query matches companies like ‘Data’, ‘DataC’ ‘DataCamp’, ‘DataMind’, and so on:

    SELECT name
    FROM companies
    WHERE name LIKE 'Data%';
    

    The _ wildcard will match a single character. For example, the following query matches companies like ‘DataCamp’, ‘DataComp’, and so on:

    SELECT name
    FROM companies
    WHERE name LIKE 'DataC_mp';
    

    You can also use the NOT LIKE operator to find records that don’t match the pattern you specify.

    Got it? Let’s practice!

    Get the names of all people whose names begin with ‘B’. The pattern you need is ‘B%’.

    SELECT name
    FROM people
    WHERE name LIKE 'B%';
    Displaying records 1 - 10
    name
    B.J. Novak
    Babak Najafi
    Babar Ahmed
    Bahare Seddiqi
    Bai Ling
    Bailee Madison
    Balinese Tari Legong Dancers
    Bálint Péntek
    Baltasar Korm√°kur
    Balthazar Getty

    Get the names of people whose names have ‘r’ as the second letter. The pattern you need is ’_r%’.

    SELECT name
    FROM people
    WHERE name LIKE '_r%';
    Displaying records 1 - 10
    name
    Ara Celi
    Aramis Knight
    Arben Bajraktaraj
    Arcelia Ramírez
    Archie Kao
    Archie Panjabi
    Aretha Franklin
    Ari Folman
    Ari Gold
    Ari Graynor

    Get the names of people whose names don’t start with A. The pattern you need is ‘A%’.

    SELECT name
    FROM people
    WHERE name NOT LIKE 'A%';
    Displaying records 1 - 10
    name
    50 Cent
    Álex Angulo
    Álex de la Iglesia
    Ángela Molina
    B.J. Novak
    Babak Najafi
    Babar Ahmed
    Bahare Seddiqi
    Bai Ling
    Bailee Madison

    This concludes the second chapter of the intro to SQL course. Rush over to chapter 3 if you want to learn more about aggregate functions!

    Aggregate Functions

    This chapter teaches you how to use aggregate functions to summarize data and gain useful insights. You’ll also learn about arithmetic in SQL and how to use aliases to make your results more readable.

    Aggregate functions

    Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.

    For example,

    SELECT AVG(budget)
    FROM films;
    

    gives you the average value from the budget column of the films table. Similarly, the MAX() function returns the highest budget:

    SELECT MAX(budget)
    FROM films;
    

    The SUM() function returns the result of adding up the numeric values in a column:

    SELECT SUM(budget)
    FROM films;
    

    You can probably guess what the MIN() function does! Now it’s your turn to try out some SQL functions.

    Use the SUM() function to get the total duration of all films.

    SELECT SUM(duration)
    FROM films;
    1 records
    SUM(duration)
    289871

    Get the average duration of all films.

    SELECT AVG(duration)
    FROM films;
    1 records
    AVG(duration)
    106.0245

    Get the duration of the shortest film.

    SELECT MIN(duration)
    FROM films;
    1 records
    MIN(duration)
    7

    Get the duration of the longest film.

    SELECT MAX(duration)
    FROM films;
    1 records
    MAX(duration)
    334

    Sequelistic!

    Aggregate functions practice

    Good work. Aggregate functions are important to understand, so let’s get some more practice!

    Use the SUM() function to get the total amount grossed by all films.

    SELECT SUM(gross)
    FROM films;
    1 records
    SUM(gross)
    113392619293

    Get the average amount grossed by all films.

    SELECT AVG(gross)
    FROM films;
    1 records
    AVG(gross)
    51054759

    Get the amount grossed by the worst performing film.

    SELECT MIN(gross)
    FROM films;
    1 records
    MIN(gross)
    162

    Get the amount grossed by the best performing film.

    SELECT MAX(gross)
    FROM films;
    1 records
    MAX(gross)
    936627416

    Well done! Don’t forget about these functions. You’ll find yourself using them over and over again to get a quick grasp of the data in a SQL database.

    Combining aggregate functions

    Aggregate functions can be combined with the WHERE clause to gain further insights from your data.

    For example, to get the total budget of movies made in the year 2010 or later:

    SELECT SUM(budget)
    FROM films
    WHERE release_year >= 2010;
    

    Now it’s your turn!

    Use the SUM() function to get the total amount grossed by all films made in the year 2000 or later.

    SELECT SUM(gross)
    FROM films
    WHERE release_year >= 2000;
    1 records
    SUM(gross)
    113392474175

    Get the average amount grossed by all films whose titles start with the letter ‘A’.

    SELECT AVG(gross)
    FROM films
    where title LIKE 'A%';
    1 records
    AVG(gross)
    49647086

    Get the amount grossed by the worst performing film in 1994.

    SELECT MIN(gross)
    FROM films
    WHERE release_year = 1994;
    1 records
    MIN(gross)
    NA

    Get the amount grossed by the best performing film between 2000 and 2012, inclusive.

    SELECT MAX(gross)
    FROM films
    WHERE release_year BETWEEN 2000 AND 2012;
    1 records
    MAX(gross)
    760505847

    Nice. Can you see how SQL basically provides you a bunch of building blocks that you can combine in all kinds of ways? Hence the name: Structured Query Language.

    A note on arithmetic

    In addition to using aggregate functions, you can perform basic arithmetic with symbols like +, -, *, and /.

    So, for example, this gives a result of 12:

    SELECT (4 * 3);
    

    However, the following gives a result of 1:

    SELECT (4 / 3);
    

    What’s going on here?

    SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing!

    If you want more precision when dividing, you can add decimal places to your numbers. For example,

    SELECT (4.0 / 3.0) AS result;
    

    gives you the result you would expect: 1.333.


    What is the result of SELECT (10 / 3);?
    SELECT (10 / 3);
    1 records
    (10 / 3)
    3
  • 2.333
  • 3.333
  • 3
  • 3.0
  • Correct!

    It’s AS simple AS aliasing

    You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example,

    SELECT MAX(budget)
    FROM films;
    

    gives you a result with one column, named max. But what if you use two functions like this?

    SELECT MAX(budget), MAX(duration)
    FROM films;
    

    Well, then you’d have two columns named max, which isn’t very useful!

    To avoid situations like this, SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS keyword, which you’ve already seen earlier in this course.

    For example, in the above example we could use aliases to make the result clearer:

    SELECT MAX(budget) AS max_budget,
           MAX(duration) AS max_duration
    FROM films;
    

    Aliases are helpful for making results more readable!

    Get the title and net profit (the amount a film grossed, minus its budget) for all films. Alias the net profit as net_profit.

    SELECT title, gross - budget AS net_profit
    FROM films;
    Displaying records 1 - 10
    title net_profit
    Steamboy -2127109510
    Summer Storm -2604984
    Super Size Me 11464368
    Superbabies: Baby Geniuses 2 -10890678
    Suspect Zero -18287436
    Tae Guk Gi: The Brotherhood of War -11689814
    Teacher’s Pet -3508650
    Team America: World Police 2774834
    The Alamo -84593638
    The Alamo -84593638

    Get the title and duration in hours for all films. The duration is in minutes, so you’ll need to divide by 60.0 to get the duration in hours. Alias the duration in hours as duration_hours.

    SELECT title, duration / 60.0 AS duration_hours
    FROM films;
    Displaying records 1 - 10
    title duration_hours
    Steamboy 1.716667
    Summer Storm 1.633333
    Super Size Me 1.666667
    Superbabies: Baby Geniuses 2 1.466667
    Suspect Zero 1.650000
    Tae Guk Gi: The Brotherhood of War 2.466667
    Teacher’s Pet 1.233333
    Team America: World Police 1.633333
    The Alamo 2.283333
    The Alamo 2.283333

    Get the average duration in hours for all films, aliased as avg_duration_hours.

    SELECT AVG(duration) / 60.0 AS avg_duration_hours  
    FROM films;
    1 records
    avg_duration_hours
    1.767075

    Boooom!

    Even more aliasing

    Let’s practice your newfound aliasing skills some more before moving on!

    Recall: SQL assumes that if you divide an integer by an integer, you want to get an integer back.

    This means that the following will erroneously result in 400.0:

    SELECT 45 / 10 * 100.0;
    

    This is because 45 / 10 evaluates to an integer (4), and not a decimal number like we would expect.

    So when you’re dividing make sure at least one of your numbers has a decimal place:

    SELECT 45 * 100.0 / 10;
    

    The above now gives the correct answer of 450.0 since the numerator (45 * 100.0) of the division is now a decimal!

    Get the percentage of people who are no longer alive. Alias the result as percentage_dead. Remember to use 100.0 and not 100!

    -- get the count(deathdate) and multiply by 100.0
    -- then divide by count(*) 
    SELECT COUNT(deathdate) * 100.0 / COUNT(*) AS percentage_dead
    FROM people;
    1 records
    percentage_dead
    100

    Get the number of years between the newest film and oldest film. Alias the result as difference.

    SELECT MAX(release_year) - MIN(release_year)
    AS difference
    FROM films;
    1 records
    difference
    12

    Get the number of decades the films table covers. Alias the result as number_of_decades. The top half of your fraction should be enclosed in parentheses.

    SELECT (MAX(release_year) - MIN(release_year)) / 10.0
    AS number_of_decades
    FROM films;
    1 records
    number_of_decades
    1.2

    We’re at the end of chapter 3! In chapter 4, you will learn about sorting, grouping and joins. Head over there quickly!

    Sorting and grouping

    This chapter provides a brief introduction to sorting and grouping your results.

    ORDER BY

    Congratulations on making it this far! You now know how to select and filter your results.

    In this chapter you’ll learn how to sort and group your results to gain further insight. Let’s go!

    In SQL, the ORDER BY keyword is used to sort results in ascending or descending order according to the values of one or more columns.

    By default ORDER BY will sort in ascending order. If you want to sort the results in descending order, you can use the DESC keyword. For example,

    SELECT title
    FROM films
    ORDER BY release_year DESC;
    

    gives you the titles of films sorted by release year, from newest to oldest.


    How do you think ORDER BY sorts a column of text values by default?

  • Alphabetically (A-Z)
  • Reverse alphabetically (Z-A)
  • There’s no natural ordering to text data
  • By number of characters (fewest to most)
  • Correct!

    Sorting single columns

    Now that you understand how ORDER BY works, give these exercises a go!

    Get the names of people from the people table, sorted alphabetically.

    SELECT name
    FROM people
    ORDER BY name;
    Displaying records 1 - 10
    name
    50 Cent
    A. Michael Baldwin
    A. Raven Cruz
    A.J. Buckley
    A.J. DeLucia
    A.J. Langer
    AJ Michalka
    Aaliyah
    Aaron Ashmore
    Aaron Hann

    Get the names of people, sorted by birth date.

    SELECT name
    FROM people
    ORDER BY birthdate;
    Displaying records 1 - 10
    name
    Robert Shaw
    Lucille La Verne
    Mary Carr
    D.W. Griffith
    Finlay Currie
    Lionel Barrymore
    Billy Gilbert
    Cecil B. DeMille
    Leopold Stokowski
    Éric Tessier

    Get the birth date and name for every person, in order of when they were born.

    sqldf("SELECT birthdate, name
    FROM people
    ORDER BY birthdate;") %>% head
    ##                  birthdate             name
    ## 1 1837-10-10T00:00:00.000Z      Robert Shaw
    ## 2 1872-11-07T00:00:00.000Z Lucille La Verne
    ## 3 1874-03-14T00:00:00.000Z        Mary Carr
    ## 4 1875-01-22T00:00:00.000Z    D.W. Griffith
    ## 5 1878-01-20T00:00:00.000Z    Finlay Currie
    ## 6 1878-04-28T00:00:00.000Z Lionel Barrymore

    Great work!

    Sorting single columns (2)

    Let’s get some more practice with ORDER BY!

    Get the title of films released in 2000 or 2012, in the order they were released.

    SELECT title
    FROM films
    WHERE release_year IN (2000, 2012)
    ORDER BY release_year;
    Displaying records 1 - 10
    title
    2016: Obama’s America
    21 Jump Street
    A Perfect Plan
    A Thousand Words
    Abraham Lincoln: Vampire Hunter
    Act of Valor
    After
    Alex Cross
    Alien Uprising
    American Reunion

    Get all details for all films except those released in 2015 and order them by duration.

    SELECT *
    FROM films
    WHERE release_year <> 2015
    ORDER BY duration;
    Displaying records 1 - 10
    id title release_year country duration language certification gross budget
    2326 Dil Jo Bhi Kahey… 2005 India NA English 129319 7.0e+07
    2712 The Naked Ape 2006 USA NA English NA NA
    3208 Black Water Transit 2009 USA NA English NA 2.3e+07
    3504 Harry Potter and the Deathly Hallows: Part I 2010 UK NA English NA NA
    3552 N-Secure 2010 USA NA English R 2592808 NA
    3728 Harry Potter and the Deathly Hallows: Part II 2011 UK NA English NA NA
    4018 Should’ve Been Romeo 2012 USA NA English NA 5.0e+06
    4138 Barfi 2013 India NA Kannada NA NA
    4396 Destiny 2014 USA NA English NA NA
    2926 The Touch 2007 USA 7 English NA 1.3e+04

    Get the title and gross earnings for movies which begin with the letter ‘M’ and order the results alphabetically.

    SELECT title, gross
    FROM films
    WHERE title LIKE 'M%'
    ORDER BY title;
    Displaying records 1 - 10
    title gross
    MacGruber 8460995
    Machete 26589953
    Machete Kills 7268659
    Machine Gun Preacher 537580
    Mad Hot Ballroom 8044906
    Mad Max: Fury Road 153629485
    Mad Money 20668843
    Madagascar 193136719
    Madagascar 3: Europe’s Most Wanted 216366733
    Madagascar: Escape 2 Africa 179982968

    Can you feel the SQL power dawn on you?!

    Sorting single columns (DESC)

    To order results in descending order, you can put the keyword DESC after your ORDER BY. For example, to get all the names in the people table, in reverse alphabetical order:

    SELECT name
    FROM people
    ORDER BY name DESC;
    

    Now practice using ORDER BY with DESC to sort single columns in descending order!

    Get the IMDB score and film ID for every film from the reviews table, sorted from highest to lowest score.

    SELECT imdb_score, film_id
    FROM reviews
    ORDER BY imdb_score DESC;
    Displaying records 1 - 10
    imdb_score film_id
    9.5 4960
    9.3 742
    9.2 178
    9.1 4866
    9.0 3110
    9.0 192
    8.9 120
    8.9 676
    8.9 2045
    8.9 723

    Get the title for every film, in reverse order.

    SELECT title
    FROM films
    ORDER BY title DESC;
    Displaying records 1 - 10
    title
    Æon Flux
    xXx: State of the Union
    [Rec] 2
    [Rec]
    Zulu
    Zoom
    Zoolander 2
    Zookeeper
    Zombieland
    Zombie Hunter

    Get the title and duration for every film, in order of longest duration to shortest.

    SELECT title, duration
    FROM films
    ORDER BY duration DESC;
    Displaying records 1 - 10
    title duration
    Carlos 334
    The Company 286
    Arn: The Knight Templar 270
    The Wolf of Wall Street 240
    Emma 240
    Watchmen 215
    King Kong 201
    King Kong 201
    King Kong 201
    Deadline Gallipoli 197

    Nice. Let’s explore how you can sort multiple columns!

    Sorting multiple columns

    ORDER BY can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example,

    SELECT birthdate, name
    FROM people
    ORDER BY birthdate, name;
    

    sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. The order of columns is important!

    Try using ORDER BY to sort multiple columns! Remember, to specify multiple columns you separate the column names with a comma.

    Get the birth date and name of people in the people table, in order of when they were born and alphabetically by name.

    sqldf("SELECT birthdate, name
    FROM people
    ORDER BY birthdate, name;") %>% head
    ##                  birthdate             name
    ## 1 1837-10-10T00:00:00.000Z      Robert Shaw
    ## 2 1872-11-07T00:00:00.000Z Lucille La Verne
    ## 3 1874-03-14T00:00:00.000Z        Mary Carr
    ## 4 1875-01-22T00:00:00.000Z    D.W. Griffith
    ## 5 1878-01-20T00:00:00.000Z    Finlay Currie
    ## 6 1878-04-28T00:00:00.000Z Lionel Barrymore

    Get the release year, duration, and title of films ordered by their release year and duration.

    sqldf("SELECT release_year, duration, title
    FROM films
    ORDER BY release_year, duration;") %>% head
    ##   release_year duration                   title
    ## 1           NA       NA              Wolf Creek
    ## 2           NA       22           "10,000 B.C."
    ## 3           NA       22        Anger Management
    ## 4           NA       24                Lovesick
    ## 5           NA       24 Yu-Gi-Oh! Duel Monsters
    ## 6           NA       30                Fired Up

    Get certifications, release years, and titles of films ordered by certification (alphabetically) and release year.

    SELECT certification, release_year, title
    FROM films
    ORDER BY certification, release_year;
    Displaying records 1 - 10
    certification release_year title
    NA “10,000 B.C.”
    NA A Touch of Frost
    NA Anger Management
    NA Animal Kingdom
    NA BrainDead
    NA Creature
    NA Deadline Gallipoli
    NA Del 1 - M√§n som hatar kvinnor
    NA Emma
    NA Fired Up

    Get the names and birthdates of people ordered by name and birth date.

    SELECT name, birthdate
    FROM people
    ORDER BY name, birthdate;
    Displaying records 1 - 10
    name birthdate
    50 Cent 1975-07-06T00:00:00.000Z
    A. Michael Baldwin 1963-04-04T00:00:00.000Z
    A. Raven Cruz null
    A.J. Buckley 1978-02-09T00:00:00.000Z
    A.J. DeLucia null
    A.J. Langer 1974-05-22T00:00:00.000Z
    AJ Michalka 1991-04-10T00:00:00.000Z
    Aaliyah 1979-01-16T00:00:00.000Z
    Aaron Ashmore 1979-10-07T00:00:00.000Z
    Aaron Hann null

    Well done. Notice how the second column you order on only steps in when the first column is not decisive to tell the order. The second column acts as a tie breaker.

    GROUP BY

    Now you know how to sort results! Often you’ll need to aggregate results. For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, GROUP BY allows you to group a result by one or more columns, like so:

    SELECT sex, count(*)
    FROM employees
    GROUP BY sex;
    

    This might give, for example:

    sex count
    male 15
    female 19


    Commonly, GROUP BY is used with aggregate functions like COUNT() or MAX(). Note that GROUP BY always goes after the FROM clause!


    What is GROUP BY used for?

  • Performing operations by column
  • Performing operations all at once
  • Performing operations in a particular order
  • Performing operations by group
  • Correct! GROUP BY is for performing operations within groups.

    GROUP BY practice

    As you’ve just seen, combining aggregate functions with GROUP BY can yield some powerful results!

    A word of warning: SQL will return an error if you try to SELECT a field that is not in your GROUP BY clause without using it to calculate some kind of value about the entire group.

    Note that you can combine GROUP BY with ORDER BY to group your results, calculate something about them, and then order your results. For example,

    SELECT sex, count(*)
    FROM employees
    GROUP BY sex
    ORDER BY count DESC;
    

    might return something like

    sex count
    female 19
    male 15


    because there are more females at our company than males. Note also that ORDER BY always goes after GROUP BY. Let’s try some exercises!

    Get the release year and count of films released in each year.

    SELECT release_year, COUNT(*)
    FROM films
    GROUP BY release_year;
    Displaying records 1 - 10
    release_year COUNT(*)
    NA 42
    2004 66
    2005 221
    2006 238
    2007 203
    2008 225
    2009 260
    2010 227
    2011 224
    2012 220

    Get the release year and average duration of all films, grouped by release year.

    SELECT release_year, AVG(duration)
    FROM films
    GROUP BY release_year;
    Displaying records 1 - 10
    release_year AVG(duration)
    NA 77.43902
    2004 111.62121
    2005 107.50455
    2006 107.18565
    2007 106.87685
    2008 105.38222
    2009 105.12741
    2010 105.43556
    2011 105.11659
    2012 106.00000

    Get the release year and largest budget for all films, grouped by release year.

    SELECT release_year, MAX(budget)
    FROM films
    GROUP BY release_year;
    Displaying records 1 - 10
    release_year MAX(budget)
    NA 1.50000e+07
    2004 2.12752e+09
    2005 4.20000e+09
    2006 1.22000e+14
    2007 3.00000e+08
    2008 5.53632e+08
    2009 2.50000e+08
    2010 6.00000e+08
    2011 2.50000e+08
    2012 2.63700e+08

    Get the IMDB score and count of film reviews grouped by IMDB score in the reviews table.

    SELECT imdb_score, COUNT(*)
    FROM reviews
    GROUP BY imdb_score;
    Displaying records 1 - 10
    imdb_score COUNT(*)
    1.6 1
    1.7 1
    1.9 3
    2.0 2
    2.1 3
    2.2 3
    2.3 3
    2.4 2
    2.5 2
    2.6 2

    Now that you’ve accustomed yourself with GROUP BY, let’s throw it in the mix with other SQL constructs you already know!

    GROUP BY practice (2)

    Now practice your new skills by combining GROUP BY and ORDER BY with some more aggregate functions!

    Make sure to always put the ORDER BY clause at the end of your query. You can’t sort values that you haven’t calculated yet!

    Get the release year and lowest gross earnings per release year.

    SELECT release_year, MIN(gross)
    FROM films
    GROUP BY release_year;
    Displaying records 1 - 10
    release_year MIN(gross)
    NA 145118
    2004 23947
    2005 12667
    2006 721
    2007 5561
    2008 3478
    2009 828
    2010 12561
    2011 2245
    2012 1332

    Get the language and total gross amount films in each language made.

    SELECT language, SUM(gross)
    FROM films
    GROUP BY language;
    Displaying records 1 - 10
    language SUM(gross)
    1319281
    Aboriginal 72515360
    Arabic 1681831
    Aramaic 499263
    Bosnian 301305
    Cantonese 2558828
    Chinese 50000
    Czech 617228
    Danish 756077
    Dari 15797907

    Get the country and total budget spent making movies in each country.

    SELECT country, SUM(budget)
    FROM films
    GROUP BY country;
    Displaying records 1 - 10
    country SUM(budget)
    3500000
    Argentina 3400000
    Australia 1037800000
    Bahamas 5000000
    Belgium 49000000
    Brazil 23000000
    Bulgaria 7000000
    Cambodia NA
    Cameroon NA
    Canada 991440000

    Get the release year, country, and highest budget spent making a film for each year, for each country. Sort your results by release year and country.

    sqldf("SELECT release_year, country, MAX(budget)
    FROM films
    GROUP BY release_year, country
    ORDER BY release_year, country;") %>% head
    ##   release_year   country MAX(budget)
    ## 1           NA                    NA
    ## 2           NA Australia     1.5e+07
    ## 3           NA    Canada          NA
    ## 4           NA    France          NA
    ## 5           NA   Iceland          NA
    ## 6           NA     Japan          NA

    Get the country, release year, and lowest amount grossed per release year per country. Order your results by country and release year.

    SELECT country, release_year, MIN(gross)
    FROM films
    GROUP BY country, release_year
    ORDER BY country, release_year;
    Displaying records 1 - 10
    country release_year MIN(gross)
    NA NA
    2014 NA
    Argentina 2004 304124
    Argentina 2009 20167424
    Australia NA NA
    Australia 2005 1900725
    Australia 2006 13751
    Australia 2008 1027749
    Australia 2009 110029
    Australia 2010 NA

    Off to the next statement!

    HAVING a great time

    In SQL, aggregate functions can’t be used in WHERE clauses. For example, the following query is invalid:

    SELECT release_year
    FROM films
    GROUP BY release_year
    WHERE COUNT(title) > 10;
    

    This means that if you want to filter based on the result of an aggregate function, you need another way! That’s where the HAVING clause comes in. For example,

    SELECT release_year
    FROM films
    GROUP BY release_year
    HAVING COUNT(title) > 10;
    

    shows only those years in which more than 10 films were released.


    In how many different years were more than 200 movies released?
    SELECT release_year
    FROM films
    GROUP BY release_year
    HAVING COUNT(release_year) > 200;
    Displaying records 1 - 10
    release_year
    2005
    2006
    2007
    2008
    2009
    2010
    2011
    2012
    2013
    2014
  • 2
  • 13
  • 44
  • 63
  • That’s correct! We’re nearing the end of the course, so it’s time to combine everything that you’ve learned and answer some interesting questions!

    All together now

    Time to practice using ORDER BY, GROUP BY and HAVING together.

    Now you’re going to write a query that returns the average budget and average gross earnings for films in each year after 1990, if the average budget is greater than $60 million.

    This is going to be a big query, but you can handle it!

    Get the release year, budget and gross earnings for each film in the films table.

    SELECT release_year, budget, gross
    FROM films;
    Displaying records 1 - 10
    release_year budget gross
    2004 2127519898 410388
    2004 2700000 95016
    2004 65000 11529368
    2004 20000000 9109322
    2004 27000000 8712564
    2004 12800000 1110186
    2004 10000000 6491350
    2004 30000000 32774834
    2004 107000000 22406362
    2004 107000000 22406362

    Modify your query so that only records with a release_year after 1990 are included.

    SELECT release_year, budget, gross
    FROM films
    WHERE release_year > 1990;
    Displaying records 1 - 10
    release_year budget gross
    2004 2127519898 410388
    2004 2700000 95016
    2004 65000 11529368
    2004 20000000 9109322
    2004 27000000 8712564
    2004 12800000 1110186
    2004 10000000 6491350
    2004 30000000 32774834
    2004 107000000 22406362
    2004 107000000 22406362

    Remove the budget and gross columns, and group your results by release year.

    SELECT release_year
    FROM films
    WHERE release_year > 1990
    GROUP BY release_year;
    Displaying records 1 - 10
    release_year
    2004
    2005
    2006
    2007
    2008
    2009
    2010
    2011
    2012
    2013

    Modify your query to include the average budget and average gross earnings for the results you have so far. Alias the average budget as avg_budget; alias the average gross earnings as avg_gross.

    SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
    FROM films
    WHERE release_year > 1990
    GROUP BY release_year;
    Displaying records 1 - 10
    release_year avg_budget avg_gross
    2004 76476582 35899988
    2005 70323938 41159143
    2006 572806572216 39237856
    2007 35271131 46267501
    2008 41804886 44573509
    2009 37073287 46207440
    2010 46094665 49908326
    2011 37775252 45785837
    2012 41331819 62873528
    2013 40519045 56158358

    Modify your query so that only years with an average budget of greater than $60 million are included.

    SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
    FROM films
    WHERE release_year > 1990
    GROUP BY release_year
    HAVING AVG(budget) > 60000000;
    3 records
    release_year avg_budget avg_gross
    2004 76476582 35899988
    2005 70323938 41159143
    2006 572806572216 39237856

    Finally, modify your query to order the results from highest average gross earnings to lowest.

    SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
    FROM films
    WHERE release_year > 1990
    GROUP BY release_year
    HAVING AVG(budget) > 60000000
    ORDER BY avg_gross DESC;
    3 records
    release_year avg_budget avg_gross
    2005 70323938 41159143
    2006 572806572216 39237856
    2004 76476582 35899988

    Wooooow! Let’s do another one!

    All together now (2)

    Great work! Now try another large query. This time, all in one go!

    Remember, if you only want to return a certain number of results, you can use the LIMIT keyword to limit the number of rows returned

    Get the country, average budget, and average gross take of countries that have made more than 10 films. Order the result by country name, and limit the number of results displayed to 5. You should alias the averages as avg_budget and avg_gross respectively.

    -- select country, average budget, 
    -- and average gross
    SELECT country, AVG(budget) AS avg_budget, 
           AVG(gross) AS avg_gross
    -- from the films table
    FROM films
    -- group by country 
    GROUP BY country
    -- where the country has more than 10 titles
    HAVING COUNT(title) > 10
    -- order by country
    ORDER BY country
    -- limit to only show 5 results
    LIMIT 5;
    5 records
    country avg_budget avg_gross
    Australia 41512000 49731026
    Canada 15021818 23838592
    China 65580895 14234826
    France 24209136 19736688
    Germany 33800196 26597907

    Superb work on a selection saga! SELECT queries can get rather long, but breaking them down into individual clauses makes them easier to write.

    A taste of things to come

    Congrats on making it to the end of the course! By now you should have a good understanding of the basics of SQL.

    There’s one more concept we’re going to introduce. You may have noticed that all your results so far have been from just one table, e.g., films or people.

    In the real world however, you will often want to query multiple tables. For example, what if you want to see the IMDB score for a particular movie?

    In this case, you’d want to get the ID of the movie from the films table and then use it to get IMDB information from the reviews table. In SQL, this concept is known as a join, and a basic join is shown in the editor to the right.

    The query in the editor gets the IMDB score for the film To Kill a Mockingbird! Cool right?

    As you can see, joins are incredibly useful and important to understand for anyone using SQL.

    We have a whole follow-up course dedicated to them called Joining Data in SQL for you to hone your database skills further!

    Submit the code in the editor and inspect the results.

    SELECT title, imdb_score
    FROM films
    JOIN reviews
    ON films.id = reviews.film_id
    WHERE title = 'To Kill a Mockingbird';
    0 records
    title imdb_score

    What is the IMDB score for the film To Kill a Mockingbird?

  • 8.1
  • 8.4
  • 7.7
  • 9.3