Course Description
The role of a data scientist is to turn raw data into actionable insights. Much of the world’s raw data—from electronic medical records to customer transaction histories—lives in organized collections of tables called relational databases. To be an effective data scientist, you must know how to wrangle and extract data from these databases using a language called SQL . This course teaches syntax in SQL shared by many types of databases, such as PostgreSQL, MySQL, SQL Server, and Oracle. This course teaches you everything you need to know to begin working with databases today!
This chapter provides a brief introduction to working with relational databases. You’ll learn about their structure, how to talk about them using database lingo, and how to begin an analysis using simple SQL commands to select and summarize columns from database tables.
The DataCamp interface for SQL courses contains a few unique features you should be aware of.
For this course, you’ll be using a database containing information on almost 5000 films. To the right, underneath the editor, you can see the data in this database by clicking through the tabs.
From looking at the tabs, who is the first person listed in the
people
table?
SELECT * FROM people;
id | name | birthdate | deathdate |
---|---|---|---|
1 | 50 Cent | 1975-07-06T00:00:00.000Z | null |
2 | A. Michael Baldwin | 1963-04-04T00:00:00.000Z | null |
3 | A. Raven Cruz | null | null |
4 | A.J. Buckley | 1978-02-09T00:00:00.000Z | null |
5 | A.J. DeLucia | null | null |
6 | A.J. Langer | 1974-05-22T00:00:00.000Z | null |
7 | Aaliyah | 1979-01-16T00:00:00.000Z | 2001-08-25T00:00:00.000Z |
8 | Aaron Ashmore | 1979-10-07T00:00:00.000Z | null |
9 | Aaron Hann | null | null |
10 | Aaron Hill | 1983-04-23T00:00:00.000Z | null |
That’s correct! Head over to the next exercise to see how we can answer a similar question with code!
Notice the query result tab in the bottom right corner of your screen. This is where the results of your SQL queries will be displayed.
Run this query in the editor and check out the resulting table in the query result tab!
SELECT name FROM people;
Who is the second person listed in the query result?
SELECT name FROM people;
name |
---|
50 Cent |
A. Michael Baldwin |
A. Raven Cruz |
A.J. Buckley |
A.J. DeLucia |
A.J. Langer |
Aaliyah |
Aaron Ashmore |
Aaron Hann |
Aaron Hill |
Correct!
If you submit the code to the right, you’ll see that you get two types of errors.
SQL errors are shown below the editor. These are errors returned by the SQL engine. You should see:
syntax error at or near "'DataCamp <3 SQL'" LINE 2: 'DataCamp <3 SQL' ^
DataCamp errors are shown in the
Instructions box. These will let you know in plain
English where you went wrong in your code! You should see:
You need to add SELECT at the start of line 2!
Submit the code to the right, check out the errors, then fix them!
-- Try running me!
SELECT 'DataCamp <3 SQL'
‘DataCamp <3 SQL’ |
---|
DataCamp <3 SQL |
-- Try running me!
SELECT 'DataCamp <3 SQL'
AS result;
result |
---|
DataCamp <3 SQL |
Excellent error editing! You can feel safe experimenting with code in the editor – you’ll always get feedback if something goes wrong.
The following multi-step exercise allows you to practice a new concept through repetition. Check it out!
Submit the query in the editor! Don’t worry, you’ll learn how it works soon.
SELECT 'SQL'
AS result;
result |
---|
SQL |
Now change ‘SQL’
to ‘SQL is’
and click Submit!
SELECT 'SQL is'
AS result;
result |
---|
SQL is |
Finally, change ‘SQL is’
to ‘SQL is cool’
and
click Submit!
SELECT 'SQL is cool'
AS result;
result |
---|
SQL is cool |
Well done! The time has come to actually fetch information from tables now!
Now that you’re familiar with the interface, let’s get straight into it.
SQL, which stands for Structured Query Language, is a language for interacting with data stored in something called a relational database.
You can think of a relational database as a collection of tables. A table is just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.
Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table. For example, in a table representing employees, we might have a column containing first and last names for all employees.
The table of employees might look something like this:
id | name | age | nationality |
---|---|---|---|
1 | Jessica | 22 | Ireland |
2 | Gabriel | 48 | France |
3 | Laura | 36 | USA |
How many fields does the employees table above contain?
While SQL can be used to create and modify databases, the focus of this course will be querying databases. A query is a request for data from a database table (or combination of tables). Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.
In SQL, you can select data from a table using a SELECT
statement. For example, the following query selects the
name
column from the people
table:
SELECT name
FROM people;
In this query, SELECT
and FROM
are called
keywords. In SQL, keywords are not case-sensitive, which means you can
write the same query as:
select name
from people;
That said, it’s good practice to make SQL keywords uppercase to distinguish them from other parts of your query, like column and table names.
It’s also good practice (but not necessary for the exercises in this course) to include a semicolon at the end of your query. This tells SQL where the end of your query is!
Remember, you can see the results of executing your query in the query tab!
Select the title
column from the films
table.
SELECT title
FROM films;
title |
---|
Steamboy |
Summer Storm |
Super Size Me |
Superbabies: Baby Geniuses 2 |
Suspect Zero |
Tae Guk Gi: The Brotherhood of War |
Teacher’s Pet |
Team America: World Police |
The Alamo |
The Alamo |
Select the release_year
column from the films
table.
SELECT release_year
FROM films;
release_year |
---|
2004 |
2004 |
2004 |
2004 |
2004 |
2004 |
2004 |
2004 |
2004 |
2004 |
Select the name
of each person in the people
table.
SELECT name
FROM people;
name |
---|
50 Cent |
A. Michael Baldwin |
A. Raven Cruz |
A.J. Buckley |
A.J. DeLucia |
A.J. Langer |
Aaliyah |
Aaron Ashmore |
Aaron Hann |
Aaron Hill |
Great work!
Well done! Now you know how to select single columns.
In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!
For example, this query selects two columns, name
and
birthdate
, from the people
table:
SELECT name, birthdate
FROM people;
Sometimes, you may want to select all columns from a table. Typing out every column name would be a pain, so there’s a handy shortcut:
SELECT *
FROM people;
If you only want to return a certain number of results, you can use the
LIMIT
keyword to limit the number of rows returned:
SELECT *
FROM people
LIMIT 10;
Before getting started with the instructions below, check out the column
names in the films
table!
Get the title of every film from the films
table.
SELECT title
FROM films;
title |
---|
Steamboy |
Summer Storm |
Super Size Me |
Superbabies: Baby Geniuses 2 |
Suspect Zero |
Tae Guk Gi: The Brotherhood of War |
Teacher’s Pet |
Team America: World Police |
The Alamo |
The Alamo |
Get the title and release year for every film.
SELECT title, release_year
FROM films;
title | release_year |
---|---|
Steamboy | 2004 |
Summer Storm | 2004 |
Super Size Me | 2004 |
Superbabies: Baby Geniuses 2 | 2004 |
Suspect Zero | 2004 |
Tae Guk Gi: The Brotherhood of War | 2004 |
Teacher’s Pet | 2004 |
Team America: World Police | 2004 |
The Alamo | 2004 |
The Alamo | 2004 |
Get the title, release year and country for every film.
SELECT title, release_year, country
FROM films;
title | release_year | country |
---|---|---|
Steamboy | 2004 | Japan |
Summer Storm | 2004 | Germany |
Super Size Me | 2004 | USA |
Superbabies: Baby Geniuses 2 | 2004 | Germany |
Suspect Zero | 2004 | USA |
Tae Guk Gi: The Brotherhood of War | 2004 | South Korea |
Teacher’s Pet | 2004 | USA |
Team America: World Police | 2004 | USA |
The Alamo | 2004 | USA |
The Alamo | 2004 | USA |
Get all columns from the films
table.
SELECT *
FROM films;
id | title | release_year | country | duration | language | certification | gross | budget |
---|---|---|---|---|---|---|---|---|
2223 | Steamboy | 2004 | Japan | 103 | Japanese | PG-13 | 410388 | 2127519898 |
2224 | Summer Storm | 2004 | Germany | 98 | German | R | 95016 | 2700000 |
2225 | Super Size Me | 2004 | USA | 100 | English | PG | 11529368 | 65000 |
2226 | Superbabies: Baby Geniuses 2 | 2004 | Germany | 88 | English | PG | 9109322 | 20000000 |
2227 | Suspect Zero | 2004 | USA | 99 | English | R | 8712564 | 27000000 |
2228 | Tae Guk Gi: The Brotherhood of War | 2004 | South Korea | 148 | Korean | R | 1110186 | 12800000 |
2229 | Teacher’s Pet | 2004 | USA | 74 | English | PG | 6491350 | 10000000 |
2230 | Team America: World Police | 2004 | USA | 98 | English | R | 32774834 | 30000000 |
2231 | The Alamo | 2004 | USA | 137 | English | PG-13 | 22406362 | 107000000 |
2232 | The Alamo | 2004 | USA | 137 | English | PG-13 | 22406362 | 107000000 |
Great work!
Often your results will include many duplicate values. If you want to
select all the unique values from a column, you can use the
DISTINCT
keyword.
This might be useful if, for example, you’re interested in knowing which
languages are represented in the films
table:
SELECT DISTINCT language
FROM films;
Remember, you can check out the data in the tables by clicking on the table name!
Get all the unique countries represented in the films
table.
SELECT DISTINCT country
FROM films;
country |
---|
Japan |
Germany |
USA |
South Korea |
Canada |
UK |
France |
Argentina |
Spain |
India |
Get all the different film certifications from the films
table.
SELECT DISTINCT certification
FROM films;
certification |
---|
PG-13 |
R |
PG |
G |
Unrated |
Not Rated |
NC-17 |
Get the different types of film roles from the roles
table.
SELECT DISTINCT role
FROM roles;
role |
---|
director |
actor |
Great work!
What if you want to count the number of employees in your employees
table? The COUNT()
function lets you do this by returning
the number of rows in one or more columns.
For example, this code gives the number of rows in the
people
table:
SELECT COUNT(*)
FROM people;
reviews
table?
SELECT COUNT(*)
FROM reviews;
COUNT(*) |
---|
4968 |
Correct!
As you’ve seen, COUNT(*)
tells you how many rows are in a
table. However, if you want to count the number of non-missing
values in a particular column, you can call COUNT()
on just
that column.
For example, to count the number of birth dates present in the
people
table:
SELECT COUNT(birthdate)
FROM people;
It’s also common to combine COUNT()
with
DISTINCT
to count the number of distinct values in
a column.
For example, this query counts the number of distinct birth dates
contained in the people
table:
SELECT COUNT(DISTINCT birthdate)
FROM people;
Let’s get some practice with COUNT()
!
Count the number of rows in the people
table.
SELECT COUNT(*)
FROM people;
COUNT(*) |
---|
8397 |
Count the number of (non-missing) birth dates in the people
table.
SELECT COUNT(birthdate)
FROM people;
COUNT(birthdate) |
---|
8397 |
Count the number of unique birth dates in the people
table.
SELECT COUNT(DISTINCT birthdate)
FROM people;
COUNT(DISTINCT birthdate) |
---|
5399 |
Count the number of unique languages in the films
table.
SELECT COUNT(DISTINCT language)
FROM films;
COUNT(DISTINCT language) |
---|
47 |
Count the number of unique countries in the films
table.
SELECT COUNT(DISTINCT country)
FROM films;
COUNT(DISTINCT country) |
---|
57 |
Great work!
This chapter builds on the first by teaching you how to filter tables for rows satisfying some criteria of interest. You’ll learn how to use basic comparison operators, combine multiple criteria, match patterns in text, and much more.
Congrats on finishing the first chapter! You now know how to select columns and perform basic counts. This chapter will focus on filtering your results.
In SQL, the WHERE
keyword allows you to filter based on
both text and numeric values in a table. There are a few different
comparison operators you can use:
=
equal
<>
not equal
<
less than
>
greater than
<=
less than or equal to
>=
greater than or equal to
For example, you can filter text records such as title
. The
following code returns all films with the title
‘Metropolis’
:
SELECT title
FROM films
WHERE title = 'Metropolis';
Notice that the WHERE
clause always comes after the
FROM
statement!
Note that in this course we will use <>
and
not !=
for the not equal operator, as per the SQL
standard.
SELECT title
FROM films
WHERE release_year > 2000;
As you learned in the previous exercise, the WHERE
clause
can also be used to filter numeric records, such as years or ages.
For example, the following query selects all details for films with a budget over ten thousand dollars:
SELECT *
FROM films
WHERE budget > 10000;
Now it’s your turn to use the WHERE
clause to filter
numeric values!
Get all details for all films released in 2016.
SELECT *
FROM films
WHERE release_year = 2016;
id | title | release_year | country | duration | language | certification | gross | budget |
---|---|---|---|---|---|---|---|---|
4821 | 10 Cloverfield Lane | 2016 | USA | 104 | English | PG-13 | 71897215 | 1.5e+07 |
4822 | 13 Hours | 2016 | USA | 144 | English | R | 52822418 | 5.0e+07 |
4823 | A Beginner’s Guide to Snuff | 2016 | USA | 87 | English | NA | NA | |
4824 | Airlift | 2016 | India | 130 | Hindi | NA | 4.4e+06 | |
4825 | Alice Through the Looking Glass | 2016 | USA | 113 | English | PG | 76846624 | 1.7e+08 |
4826 | Allegiant | 2016 | USA | 120 | English | PG-13 | 66002193 | 1.1e+08 |
4827 | Alleluia! The Devil’s Carnival | 2016 | USA | 97 | English | NA | 5.0e+05 | |
4828 | Antibirth | 2016 | USA | 94 | English | NA | 3.5e+06 | |
4829 | Bad Moms | 2016 | USA | 100 | English | R | 55461307 | 2.0e+07 |
4830 | Bad Moms | 2016 | USA | 100 | English | R | 55461307 | 2.0e+07 |
Get the number of films released before 2000.
SELECT COUNT(*)
FROM films
WHERE release_year < 2000;
COUNT(*) |
---|
0 |
Get the title and release year of films released after 2000.
SELECT title, release_year
FROM films
WHERE release_year > 2000;
title | release_year |
---|---|
Steamboy | 2004 |
Summer Storm | 2004 |
Super Size Me | 2004 |
Superbabies: Baby Geniuses 2 | 2004 |
Suspect Zero | 2004 |
Tae Guk Gi: The Brotherhood of War | 2004 |
Teacher’s Pet | 2004 |
Team America: World Police | 2004 |
The Alamo | 2004 |
The Alamo | 2004 |
Great job! After filtering of numeric values, it’s time to explore filtering of text!
Remember, the WHERE
clause can also be used to filter text
results, such as names or countries.
For example, this query gets the titles of all films which were filmed in China:
SELECT title
FROM films
WHERE country = 'China';
Now it’s your turn to practice using WHERE
with text
values!
Important: in PostgreSQL (the version of SQL we’re using), you
must use single quotes with WHERE
.
Get all details for all French language films.
SELECT *
FROM films
WHERE language = 'French';
id | title | release_year | country | duration | language | certification | gross | budget |
---|---|---|---|---|---|---|---|---|
2239 | The Case of the Grinning Cat | 2004 | France | 59 | French | NA | 1.5e+05 | |
2240 | The Chorus | 2004 | France | 97 | French | PG-13 | 3629758 | 5.5e+06 |
2307 | Camping sauvage | 2005 | France | 79 | French | NA | NA | |
2368 | Joyeux Noel | 2005 | France | 116 | French | PG-13 | 1050445 | 2.2e+07 |
2387 | March of the Penguins | 2005 | France | 80 | French | G | 77413017 | 8.0e+06 |
2475 | The Rocket: The Legend of Rocket Richard | 2005 | Canada | 124 | French | PG | NA | 8.0e+06 |
2641 | One to Another | 2006 | France | 95 | French | Unrated | 18435 | 1.4e+06 |
2647 | “Paris, je t’aime” | 2006 | France | 120 | French | R | 4857376 | 1.3e+07 |
2843 | Molière | 2007 | France | 120 | French | PG-13 | 634277 | 1.6e+07 |
2859 | Persepolis | 2007 | France | 89 | French | PG-13 | 4443403 | 7.3e+06 |
Get the name and birth date of the person born on November 11th, 1974.
Remember to use ISO date format (‘1974-11-11’
)!
SELECT name, birthdate
FROM people
WHERE birthdate = '1974-11-11';
name | birthdate |
---|
Get the number of Hindi language films.
SELECT COUNT(*)
FROM films
WHERE language = 'Hindi';
COUNT(*) |
---|
24 |
Get all details for all films with an R certification.
SELECT *
FROM films
WHERE certification = 'R';
id | title | release_year | country | duration | language | certification | gross | budget |
---|---|---|---|---|---|---|---|---|
2224 | Summer Storm | 2004 | Germany | 98 | German | R | 95016 | 2700000 |
2227 | Suspect Zero | 2004 | USA | 99 | English | R | 8712564 | 27000000 |
2228 | Tae Guk Gi: The Brotherhood of War | 2004 | South Korea | 148 | Korean | R | 1110186 | 12800000 |
2230 | Team America: World Police | 2004 | USA | 98 | English | R | 32774834 | 30000000 |
2238 | The Butterfly Effect | 2004 | USA | 120 | English | R | 23947 | 13000000 |
2244 | The Girl Next Door | 2004 | USA | 110 | English | R | 14589444 | 25000000 |
2246 | The Holy Girl | 2004 | Argentina | 106 | Spanish | R | 304124 | 1400000 |
2247 | The I Inside | 2004 | UK | 90 | English | R | NA | 8800000 |
2249 | The Ladykillers | 2004 | USA | 104 | English | R | 39692139 | 35000000 |
2250 | The Last Shot | 2004 | USA | 93 | English | R | 463730 | NA |
Wonderful! Let’s look at combining different conditions now!
Often, you’ll want to select data based on multiple conditions. You can
build up your WHERE
queries by combining multiple
conditions with the AND
keyword.
For example,
SELECT title
FROM films
WHERE release_year > 1994
AND release_year < 2000;
gives you the titles of films released between 1994 and 2000.
Note that you need to specify the column name separately for every
AND
condition, so the following would be invalid:
SELECT title
FROM films
WHERE release_year > 1994 AND < 2000;
You can add as many AND
conditions as you need!
Get the title and release year for all Spanish language films released before 2000.
SELECT title, release_year
FROM films
WHERE release_year < 2000
AND language = 'Spanish';
title | release_year |
---|
Get all details for Spanish language films released after 2000.
SELECT *
FROM films
WHERE release_year > 2000
AND language = 'Spanish';
id | title | release_year | country | duration | language | certification | gross | budget |
---|---|---|---|---|---|---|---|---|
2246 | The Holy Girl | 2004 | Argentina | 106 | Spanish | R | 304124 | 1400000 |
2263 | The Sea Inside | 2004 | Spain | 125 | Spanish | PG-13 | 2086345 | 10000000 |
2458 | The Legend of Zorro | 2005 | USA | 129 | Spanish | PG | 45356386 | 75000000 |
2542 | Captain Alatriste: The Spanish Musketeer | 2006 | Spain | 145 | Spanish | NA | 24000000 | |
2646 | Pan’s Labyrinth | 2006 | Spain | 112 | Spanish | R | 37623143 | 13500000 |
2655 | Quinceañera | 2006 | USA | 90 | Spanish | R | 1689999 | 400000 |
2738 | Volver | 2006 | Spain | 121 | Spanish | R | 12899702 | NA |
2748 | [Rec] | 2007 | Spain | 78 | Spanish | R | NA | 1500000 |
2921 | The Orphanage | 2007 | Spain | 105 | Spanish | R | 7159147 | 3400000 |
2932 | Timecrimes | 2007 | Spain | 66 | Spanish | R | 38108 | 2600000 |
Get all details for Spanish language films released after 2000, but before 2010.
SELECT *
FROM films
WHERE release_year > 2000
AND release_year < 2010
AND language = 'Spanish';
id | title | release_year | country | duration | language | certification | gross | budget |
---|---|---|---|---|---|---|---|---|
2246 | The Holy Girl | 2004 | Argentina | 106 | Spanish | R | 304124 | 1400000 |
2263 | The Sea Inside | 2004 | Spain | 125 | Spanish | PG-13 | 2086345 | 10000000 |
2458 | The Legend of Zorro | 2005 | USA | 129 | Spanish | PG | 45356386 | 75000000 |
2542 | Captain Alatriste: The Spanish Musketeer | 2006 | Spain | 145 | Spanish | NA | 24000000 | |
2646 | Pan’s Labyrinth | 2006 | Spain | 112 | Spanish | R | 37623143 | 13500000 |
2655 | Quinceañera | 2006 | USA | 90 | Spanish | R | 1689999 | 400000 |
2738 | Volver | 2006 | Spain | 121 | Spanish | R | 12899702 | NA |
2748 | [Rec] | 2007 | Spain | 78 | Spanish | R | NA | 1500000 |
2921 | The Orphanage | 2007 | Spain | 105 | Spanish | R | 7159147 | 3400000 |
2932 | Timecrimes | 2007 | Spain | 66 | Spanish | R | 38108 | 2600000 |
Great work! Being able to combine conditions with AND
will
prove to be very useful if you only want your query to return a specific
subset of records!
What if you want to select rows based on multiple conditions where some
but not all of the conditions need to be met? For this, SQL has
the OR
operator.
For example, the following returns all films released in either 1994 or 2000:
SELECT title
FROM films
WHERE release_year = 1994
OR release_year = 2000;
Note that you need to specify the column for every OR
condition, so the following is invalid:
SELECT title
FROM films
WHERE release_year = 1994 OR 2000;
When combining AND
and OR
, be sure to enclose
the individual clauses in parentheses, like so:
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
Otherwise, due to SQL’s precedence rules, you may not get the results you’re expecting!
OR
operator do?
You now know how to select rows that meet some but not
all conditions by combining AND
and
OR
.
For example, the following query selects all films that were released in 1994 or 1995 which had a rating of PG or R.
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
Now you’ll write a query to get the title and release year of films released in the 90s which were in French or Spanish and which took in more than $2M gross.
It looks like a lot, but you can build the query up one step at a time to get comfortable with the underlying concept in each step. Let’s go!
Get the title and release year for films released in the 90s.
SELECT title, release_year
FROM films
WHERE release_year >= 1990 AND release_year < 2000;
title | release_year |
---|
Now, build on your query to filter the records to only include French or Spanish language films.
SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish');
title | release_year |
---|
Finally, restrict the query to only return films that took in more than $2M gross.
SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish')
AND gross > 2000000;
title | release_year |
---|
That was pretty involved!
As you’ve learned, you can use the following query to get titles of all films released in and between 1994 and 2000:
SELECT title
FROM films
WHERE release_year >= 1994
AND release_year <= 2000;
Checking for ranges like this is very common, so in SQL the
BETWEEN
keyword provides a useful shorthand for filtering
values within a specified range. This query is equivalent to the one
above:
SELECT title
FROM films
WHERE release_year
BETWEEN 1994 AND 2000;
It’s important to remember that BETWEEN
is
inclusive, meaning the beginning and end values are included in
the results!
BETWEEN
keyword do?
Similar to the WHERE
clause, the BETWEEN
clause can be used with multiple AND
and OR
operators, so you can build up your queries and make them even more
powerful!
For example, suppose we have a table called kids
. We can
get the names of all kids between the ages of 2 and 12 from the United
States:
SELECT name
FROM kids
WHERE age BETWEEN 2 AND 12
AND nationality = 'USA';
Take a go at using BETWEEN
with AND
on the
films data to get the title and release year of all Spanish language
films released between 1990 and 2000 (inclusive) with budgets over $100
million. We have broken the problem into smaller steps so that you can
build the query as you go along!
Get the title and release year of all films released between 1990 and 2000 (inclusive).
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000;
title | release_year |
---|
Now, build on your previous query to select only films that have budgets over $100 million.
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000;
title | release_year |
---|
Now restrict the query to only return Spanish language films.
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000
AND language = 'Spanish';
title | release_year |
---|
Finally, modify to your previous query to include all Spanish language or French language films with the same criteria as before. Don’t forget your parentheses!
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000
AND (language = 'Spanish' OR language = 'French');
title | release_year |
---|
Well done! Off to the next filtering operator!
As you’ve seen, WHERE
is very useful for filtering results.
However, if you want to filter based on many conditions,
WHERE
can get unwieldy. For example:
SELECT name
FROM kids
WHERE age = 2
OR age = 4
OR age = 6
OR age = 8
OR age = 10;
Enter the IN
operator! The IN
operator allows
you to specify multiple values in a WHERE
clause, making it
easier and quicker to specify multiple OR
conditions! Neat,
right?
So, the above example would become simply:
SELECT name
FROM kids
WHERE age IN (2, 4, 6, 8, 10);
Try using the IN
operator yourself!
Get the title and release year of all films released in 1990 or 2000 that were longer than two hours. Remember, duration is in minutes!
SELECT title, release_year
FROM films
WHERE release_year IN (1990, 2000)
AND duration > 120;
title | release_year |
---|
Get the title and language of all films which were in English, Spanish, or French.
SELECT title, language
FROM films
WHERE language IN ('English', 'Spanish', 'French');
title | language |
---|---|
Super Size Me | English |
Superbabies: Baby Geniuses 2 | English |
Suspect Zero | English |
Teacher’s Pet | English |
Team America: World Police | English |
The Alamo | English |
The Alamo | English |
The Aviator | English |
The Big Bounce | English |
The Blue Butterfly | English |
Get the title and certification of all films with an NC-17 or R certification.
SELECT title, certification
FROM films
WHERE certification IN ('NC-17', 'R');
title | certification |
---|---|
Summer Storm | R |
Suspect Zero | R |
Tae Guk Gi: The Brotherhood of War | R |
Team America: World Police | R |
The Butterfly Effect | R |
The Girl Next Door | R |
The Holy Girl | R |
The I Inside | R |
The Ladykillers | R |
The Last Shot | R |
Your SQL vocabulary is growing by the minute!
In SQL, NULL
represents a missing or unknown value. You can
check for NULL
values using the expression IS
NULL
. For example, to count the number of missing birth dates in
the people
table:
SELECT COUNT(*)
FROM people
WHERE birthdate IS NULL;
As you can see, IS NULL
is useful when combined with
WHERE
to figure out what data you’re missing.
Sometimes, you’ll want to filter out missing values so you only get
results which are not NULL
. To do this, you can use the
IS NOT NULL
operator.
For example, this query gives the names of all people whose birth dates
are not missing in the people
table.
SELECT name
FROM people
WHERE birthdate IS NOT NULL;
NULL
represent?
Now that you know what NULL
is and what it’s used for, it’s
time for some practice!
Get the names of people who are still alive, i.e. whose death date is missing.
SELECT name
FROM people
WHERE deathdate IS NULL;
name |
---|
Get the title of every film which doesn’t have a budget associated with it.
SELECT title
FROM films
WHERE budget IS NULL;
title |
---|
The Cookout |
The Last Shot |
The Libertine |
The Prince and Me |
What the #$*! Do We (K)now!? |
Woman Thou Art Loosed |
51 Birch Street |
Beauty Shop |
Camping sauvage |
Ice Princess |
Get the number of films which don’t have a language associated with them.
SELECT COUNT(*)
FROM films
WHERE language IS NULL;
COUNT(*) |
---|
0 |
Alright! Are you ready for a last type of operator?
As you’ve seen, the WHERE
clause can be used to filter text
data. However, so far you’ve only been able to filter by specifying the
exact text you’re interested in. In the real world, often you’ll want to
search for a pattern rather than a specific text string.
In SQL, the LIKE
operator can be used in a
WHERE
clause to search for a pattern in a column. To
accomplish this, you use something called a wildcard as a
placeholder for some other values. There are two wildcards you can use
with LIKE
:
The %
wildcard will match zero, one, or many characters in
text. For example, the following query matches companies like
‘Data’
, ‘DataC’
‘DataCamp’
,
‘DataMind’
, and so on:
SELECT name
FROM companies
WHERE name LIKE 'Data%';
The _
wildcard will match a single character. For
example, the following query matches companies like
‘DataCamp’
, ‘DataComp’
, and so on:
SELECT name
FROM companies
WHERE name LIKE 'DataC_mp';
You can also use the NOT LIKE
operator to find records that
don’t match the pattern you specify.
Got it? Let’s practice!
Get the names of all people whose names begin with ‘B’. The pattern you
need is ‘B%’
.
SELECT name
FROM people
WHERE name LIKE 'B%';
name |
---|
B.J. Novak |
Babak Najafi |
Babar Ahmed |
Bahare Seddiqi |
Bai Ling |
Bailee Madison |
Balinese Tari Legong Dancers |
Bálint Péntek |
Baltasar Korm√°kur |
Balthazar Getty |
Get the names of people whose names have ‘r’ as the second letter. The
pattern you need is ’_r%’
.
SELECT name
FROM people
WHERE name LIKE '_r%';
name |
---|
Ara Celi |
Aramis Knight |
Arben Bajraktaraj |
Arcelia Ramírez |
Archie Kao |
Archie Panjabi |
Aretha Franklin |
Ari Folman |
Ari Gold |
Ari Graynor |
Get the names of people whose names don’t start with A. The pattern you
need is ‘A%’
.
SELECT name
FROM people
WHERE name NOT LIKE 'A%';
name |
---|
50 Cent |
Álex Angulo |
Álex de la Iglesia |
Ángela Molina |
B.J. Novak |
Babak Najafi |
Babar Ahmed |
Bahare Seddiqi |
Bai Ling |
Bailee Madison |
This concludes the second chapter of the intro to SQL course. Rush over to chapter 3 if you want to learn more about aggregate functions!
This chapter teaches you how to use aggregate functions to summarize data and gain useful insights. You’ll also learn about arithmetic in SQL and how to use aliases to make your results more readable.
Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.
For example,
SELECT AVG(budget)
FROM films;
gives you the average value from the budget
column of the
films
table. Similarly, the MAX()
function
returns the highest budget:
SELECT MAX(budget)
FROM films;
The SUM()
function returns the result of adding up the
numeric values in a column:
SELECT SUM(budget)
FROM films;
You can probably guess what the MIN()
function does! Now
it’s your turn to try out some SQL functions.
Use the SUM()
function to get the total duration of all
films.
SELECT SUM(duration)
FROM films;
SUM(duration) |
---|
289871 |
Get the average duration of all films.
SELECT AVG(duration)
FROM films;
AVG(duration) |
---|
106.0245 |
Get the duration of the shortest film.
SELECT MIN(duration)
FROM films;
MIN(duration) |
---|
7 |
Get the duration of the longest film.
SELECT MAX(duration)
FROM films;
MAX(duration) |
---|
334 |
Sequelistic!
Good work. Aggregate functions are important to understand, so let’s get some more practice!
Use the SUM()
function to get the total amount grossed by
all films.
SELECT SUM(gross)
FROM films;
SUM(gross) |
---|
113392619293 |
Get the average amount grossed by all films.
SELECT AVG(gross)
FROM films;
AVG(gross) |
---|
51054759 |
Get the amount grossed by the worst performing film.
SELECT MIN(gross)
FROM films;
MIN(gross) |
---|
162 |
Get the amount grossed by the best performing film.
SELECT MAX(gross)
FROM films;
MAX(gross) |
---|
936627416 |
Well done! Don’t forget about these functions. You’ll find yourself using them over and over again to get a quick grasp of the data in a SQL database.
Aggregate functions can be combined with the WHERE
clause
to gain further insights from your data.
For example, to get the total budget of movies made in the year 2010 or later:
SELECT SUM(budget)
FROM films
WHERE release_year >= 2010;
Now it’s your turn!
Use the SUM()
function to get the total amount grossed by
all films made in the year 2000 or later.
SELECT SUM(gross)
FROM films
WHERE release_year >= 2000;
SUM(gross) |
---|
113392474175 |
Get the average amount grossed by all films whose titles start with the letter ‘A’.
SELECT AVG(gross)
FROM films
where title LIKE 'A%';
AVG(gross) |
---|
49647086 |
Get the amount grossed by the worst performing film in 1994.
SELECT MIN(gross)
FROM films
WHERE release_year = 1994;
MIN(gross) |
---|
NA |
Get the amount grossed by the best performing film between 2000 and 2012, inclusive.
SELECT MAX(gross)
FROM films
WHERE release_year BETWEEN 2000 AND 2012;
MAX(gross) |
---|
760505847 |
Nice. Can you see how SQL basically provides you a bunch of building blocks that you can combine in all kinds of ways? Hence the name: Structured Query Language.
In addition to using aggregate functions, you can perform basic
arithmetic with symbols like +
, -
,
*
, and /
.
So, for example, this gives a result of 12
:
SELECT (4 * 3);
However, the following gives a result of 1
:
SELECT (4 / 3);
What’s going on here?
SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing!
If you want more precision when dividing, you can add decimal places to your numbers. For example,
SELECT (4.0 / 3.0) AS result;
gives you the result you would expect: 1.333
.
SELECT (10 / 3);
?
SELECT (10 / 3);
(10 / 3) |
---|
3 |
Correct!
You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example,
SELECT MAX(budget)
FROM films;
gives you a result with one column, named max
. But what if
you use two functions like this?
SELECT MAX(budget), MAX(duration)
FROM films;
Well, then you’d have two columns named max
, which isn’t
very useful!
To avoid situations like this, SQL allows you to do something called
aliasing. Aliasing simply means you assign a temporary name to
something. To alias, you use the AS
keyword, which you’ve
already seen earlier in this course.
For example, in the above example we could use aliases to make the result clearer:
SELECT MAX(budget) AS max_budget,
MAX(duration) AS max_duration
FROM films;
Aliases are helpful for making results more readable!
Get the title and net profit (the amount a film grossed, minus its
budget) for all films. Alias the net profit as net_profit
.
SELECT title, gross - budget AS net_profit
FROM films;
title | net_profit |
---|---|
Steamboy | -2127109510 |
Summer Storm | -2604984 |
Super Size Me | 11464368 |
Superbabies: Baby Geniuses 2 | -10890678 |
Suspect Zero | -18287436 |
Tae Guk Gi: The Brotherhood of War | -11689814 |
Teacher’s Pet | -3508650 |
Team America: World Police | 2774834 |
The Alamo | -84593638 |
The Alamo | -84593638 |
Get the title and duration in hours for all films. The duration is in
minutes, so you’ll need to divide by 60.0 to get the duration in hours.
Alias the duration in hours as duration_hours
.
SELECT title, duration / 60.0 AS duration_hours
FROM films;
title | duration_hours |
---|---|
Steamboy | 1.716667 |
Summer Storm | 1.633333 |
Super Size Me | 1.666667 |
Superbabies: Baby Geniuses 2 | 1.466667 |
Suspect Zero | 1.650000 |
Tae Guk Gi: The Brotherhood of War | 2.466667 |
Teacher’s Pet | 1.233333 |
Team America: World Police | 1.633333 |
The Alamo | 2.283333 |
The Alamo | 2.283333 |
Get the average duration in hours for all films, aliased as
avg_duration_hours
.
SELECT AVG(duration) / 60.0 AS avg_duration_hours
FROM films;
avg_duration_hours |
---|
1.767075 |
Let’s practice your newfound aliasing skills some more before moving on!
Recall: SQL assumes that if you divide an integer by an integer, you want to get an integer back.
This means that the following will erroneously result in
400.0
:
SELECT 45 / 10 * 100.0;
This is because 45 / 10
evaluates to an integer
(4
), and not a decimal number like we would expect.
So when you’re dividing make sure at least one of your numbers has a decimal place:
SELECT 45 * 100.0 / 10;
The above now gives the correct answer of 450.0
since the
numerator (45 * 100.0
) of the division is now a decimal!
Get the percentage of people
who are no longer alive. Alias
the result as percentage_dead
. Remember to use
100.0
and not 100
!
-- get the count(deathdate) and multiply by 100.0
-- then divide by count(*)
SELECT COUNT(deathdate) * 100.0 / COUNT(*) AS percentage_dead
FROM people;
percentage_dead |
---|
100 |
Get the number of years between the newest film and oldest film. Alias
the result as difference
.
SELECT MAX(release_year) - MIN(release_year)
AS difference
FROM films;
difference |
---|
12 |
Get the number of decades the films
table covers. Alias the
result as number_of_decades
. The top half of your fraction
should be enclosed in parentheses.
SELECT (MAX(release_year) - MIN(release_year)) / 10.0
AS number_of_decades
FROM films;
number_of_decades |
---|
1.2 |
We’re at the end of chapter 3! In chapter 4, you will learn about sorting, grouping and joins. Head over there quickly!
This chapter provides a brief introduction to sorting and grouping your results.
Congratulations on making it this far! You now know how to select and filter your results.
In this chapter you’ll learn how to sort and group your results to gain further insight. Let’s go!
In SQL, the ORDER BY
keyword is used to sort results in
ascending or descending order according to the values of one or more
columns.
By default ORDER BY
will sort in ascending order. If you
want to sort the results in descending order, you can use the
DESC
keyword. For example,
SELECT title
FROM films
ORDER BY release_year DESC;
gives you the titles of films sorted by release year, from newest to oldest.
ORDER BY
sorts a column of text values by
default?
Now that you understand how ORDER BY
works, give these
exercises a go!
Get the names of people from the people
table, sorted
alphabetically.
SELECT name
FROM people
ORDER BY name;
name |
---|
50 Cent |
A. Michael Baldwin |
A. Raven Cruz |
A.J. Buckley |
A.J. DeLucia |
A.J. Langer |
AJ Michalka |
Aaliyah |
Aaron Ashmore |
Aaron Hann |
Get the names of people, sorted by birth date.
SELECT name
FROM people
ORDER BY birthdate;
name |
---|
Robert Shaw |
Lucille La Verne |
Mary Carr |
D.W. Griffith |
Finlay Currie |
Lionel Barrymore |
Billy Gilbert |
Cecil B. DeMille |
Leopold Stokowski |
Éric Tessier |
Get the birth date and name for every person, in order of when they were born.
sqldf("SELECT birthdate, name
FROM people
ORDER BY birthdate;") %>% head
## birthdate name
## 1 1837-10-10T00:00:00.000Z Robert Shaw
## 2 1872-11-07T00:00:00.000Z Lucille La Verne
## 3 1874-03-14T00:00:00.000Z Mary Carr
## 4 1875-01-22T00:00:00.000Z D.W. Griffith
## 5 1878-01-20T00:00:00.000Z Finlay Currie
## 6 1878-04-28T00:00:00.000Z Lionel Barrymore
Great work!
Let’s get some more practice with ORDER BY
!
Get the title of films released in 2000 or 2012, in the order they were released.
SELECT title
FROM films
WHERE release_year IN (2000, 2012)
ORDER BY release_year;
title |
---|
2016: Obama’s America |
21 Jump Street |
A Perfect Plan |
A Thousand Words |
Abraham Lincoln: Vampire Hunter |
Act of Valor |
After |
Alex Cross |
Alien Uprising |
American Reunion |
Get all details for all films except those released in 2015 and order them by duration.
SELECT *
FROM films
WHERE release_year <> 2015
ORDER BY duration;
id | title | release_year | country | duration | language | certification | gross | budget |
---|---|---|---|---|---|---|---|---|
2326 | Dil Jo Bhi Kahey… | 2005 | India | NA | English | 129319 | 7.0e+07 | |
2712 | The Naked Ape | 2006 | USA | NA | English | NA | NA | |
3208 | Black Water Transit | 2009 | USA | NA | English | NA | 2.3e+07 | |
3504 | Harry Potter and the Deathly Hallows: Part I | 2010 | UK | NA | English | NA | NA | |
3552 | N-Secure | 2010 | USA | NA | English | R | 2592808 | NA |
3728 | Harry Potter and the Deathly Hallows: Part II | 2011 | UK | NA | English | NA | NA | |
4018 | Should’ve Been Romeo | 2012 | USA | NA | English | NA | 5.0e+06 | |
4138 | Barfi | 2013 | India | NA | Kannada | NA | NA | |
4396 | Destiny | 2014 | USA | NA | English | NA | NA | |
2926 | The Touch | 2007 | USA | 7 | English | NA | 1.3e+04 |
Get the title and gross earnings for movies which begin with the letter ‘M’ and order the results alphabetically.
SELECT title, gross
FROM films
WHERE title LIKE 'M%'
ORDER BY title;
title | gross |
---|---|
MacGruber | 8460995 |
Machete | 26589953 |
Machete Kills | 7268659 |
Machine Gun Preacher | 537580 |
Mad Hot Ballroom | 8044906 |
Mad Max: Fury Road | 153629485 |
Mad Money | 20668843 |
Madagascar | 193136719 |
Madagascar 3: Europe’s Most Wanted | 216366733 |
Madagascar: Escape 2 Africa | 179982968 |
Can you feel the SQL power dawn on you?!
To order results in descending order, you can put the keyword
DESC
after your ORDER BY
. For example, to get
all the names in the people
table, in reverse alphabetical
order:
SELECT name
FROM people
ORDER BY name DESC;
Now practice using ORDER BY
with DESC
to sort
single columns in descending order!
Get the IMDB score and film ID for every film from the reviews table, sorted from highest to lowest score.
SELECT imdb_score, film_id
FROM reviews
ORDER BY imdb_score DESC;
imdb_score | film_id |
---|---|
9.5 | 4960 |
9.3 | 742 |
9.2 | 178 |
9.1 | 4866 |
9.0 | 3110 |
9.0 | 192 |
8.9 | 120 |
8.9 | 676 |
8.9 | 2045 |
8.9 | 723 |
Get the title for every film, in reverse order.
SELECT title
FROM films
ORDER BY title DESC;
title |
---|
Æon Flux |
xXx: State of the Union |
[Rec] 2 |
[Rec] |
Zulu |
Zoom |
Zoolander 2 |
Zookeeper |
Zombieland |
Zombie Hunter |
Get the title and duration for every film, in order of longest duration to shortest.
SELECT title, duration
FROM films
ORDER BY duration DESC;
title | duration |
---|---|
Carlos | 334 |
The Company | 286 |
Arn: The Knight Templar | 270 |
The Wolf of Wall Street | 240 |
Emma | 240 |
Watchmen | 215 |
King Kong | 201 |
King Kong | 201 |
King Kong | 201 |
Deadline Gallipoli | 197 |
Nice. Let’s explore how you can sort multiple columns!
ORDER BY
can also be used to sort on multiple columns. It
will sort by the first column specified, then sort by the next, then the
next, and so on. For example,
SELECT birthdate, name
FROM people
ORDER BY birthdate, name;
sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. The order of columns is important!
Try using ORDER BY
to sort multiple columns! Remember, to
specify multiple columns you separate the column names with a comma.
Get the birth date and name of people in the people
table,
in order of when they were born and alphabetically by name.
sqldf("SELECT birthdate, name
FROM people
ORDER BY birthdate, name;") %>% head
## birthdate name
## 1 1837-10-10T00:00:00.000Z Robert Shaw
## 2 1872-11-07T00:00:00.000Z Lucille La Verne
## 3 1874-03-14T00:00:00.000Z Mary Carr
## 4 1875-01-22T00:00:00.000Z D.W. Griffith
## 5 1878-01-20T00:00:00.000Z Finlay Currie
## 6 1878-04-28T00:00:00.000Z Lionel Barrymore
Get the release year, duration, and title of films ordered by their release year and duration.
sqldf("SELECT release_year, duration, title
FROM films
ORDER BY release_year, duration;") %>% head
## release_year duration title
## 1 NA NA Wolf Creek
## 2 NA 22 "10,000 B.C."
## 3 NA 22 Anger Management
## 4 NA 24 Lovesick
## 5 NA 24 Yu-Gi-Oh! Duel Monsters
## 6 NA 30 Fired Up
Get certifications, release years, and titles of films ordered by certification (alphabetically) and release year.
SELECT certification, release_year, title
FROM films
ORDER BY certification, release_year;
certification | release_year | title |
---|---|---|
NA | “10,000 B.C.” | |
NA | A Touch of Frost | |
NA | Anger Management | |
NA | Animal Kingdom | |
NA | BrainDead | |
NA | Creature | |
NA | Deadline Gallipoli | |
NA | Del 1 - M√§n som hatar kvinnor | |
NA | Emma | |
NA | Fired Up |
Get the names and birthdates of people ordered by name and birth date.
SELECT name, birthdate
FROM people
ORDER BY name, birthdate;
name | birthdate |
---|---|
50 Cent | 1975-07-06T00:00:00.000Z |
A. Michael Baldwin | 1963-04-04T00:00:00.000Z |
A. Raven Cruz | null |
A.J. Buckley | 1978-02-09T00:00:00.000Z |
A.J. DeLucia | null |
A.J. Langer | 1974-05-22T00:00:00.000Z |
AJ Michalka | 1991-04-10T00:00:00.000Z |
Aaliyah | 1979-01-16T00:00:00.000Z |
Aaron Ashmore | 1979-10-07T00:00:00.000Z |
Aaron Hann | null |
Well done. Notice how the second column you order on only steps in when the first column is not decisive to tell the order. The second column acts as a tie breaker.
Now you know how to sort results! Often you’ll need to aggregate
results. For example, you might want to count the number of male and
female employees in your company. Here, what you want is to group all
the males together and count them, and group all the females together
and count them. In SQL, GROUP BY
allows you to group a
result by one or more columns, like so:
SELECT sex, count(*)
FROM employees
GROUP BY sex;
This might give, for example:
sex | count |
---|---|
male | 15 |
female | 19 |
Commonly, GROUP BY
is used with aggregate
functions like COUNT()
or MAX()
. Note
that GROUP BY
always goes after the FROM
clause!
GROUP BY
used for?
As you’ve just seen, combining aggregate functions with GROUP
BY
can yield some powerful results!
A word of warning: SQL will return an error if you try to
SELECT
a field that is not in your GROUP BY
clause without using it to calculate some kind of value about the entire
group.
Note that you can combine GROUP BY
with ORDER
BY
to group your results, calculate something about them, and
then order your results. For example,
SELECT sex, count(*)
FROM employees
GROUP BY sex
ORDER BY count DESC;
might return something like
sex | count |
---|---|
female | 19 |
male | 15 |
because there are more females at our company than males. Note also
that ORDER BY
always goes after GROUP BY
.
Let’s try some exercises!
Get the release year and count of films released in each year.
SELECT release_year, COUNT(*)
FROM films
GROUP BY release_year;
release_year | COUNT(*) |
---|---|
NA | 42 |
2004 | 66 |
2005 | 221 |
2006 | 238 |
2007 | 203 |
2008 | 225 |
2009 | 260 |
2010 | 227 |
2011 | 224 |
2012 | 220 |
Get the release year and average duration of all films, grouped by release year.
SELECT release_year, AVG(duration)
FROM films
GROUP BY release_year;
release_year | AVG(duration) |
---|---|
NA | 77.43902 |
2004 | 111.62121 |
2005 | 107.50455 |
2006 | 107.18565 |
2007 | 106.87685 |
2008 | 105.38222 |
2009 | 105.12741 |
2010 | 105.43556 |
2011 | 105.11659 |
2012 | 106.00000 |
Get the release year and largest budget for all films, grouped by release year.
SELECT release_year, MAX(budget)
FROM films
GROUP BY release_year;
release_year | MAX(budget) |
---|---|
NA | 1.50000e+07 |
2004 | 2.12752e+09 |
2005 | 4.20000e+09 |
2006 | 1.22000e+14 |
2007 | 3.00000e+08 |
2008 | 5.53632e+08 |
2009 | 2.50000e+08 |
2010 | 6.00000e+08 |
2011 | 2.50000e+08 |
2012 | 2.63700e+08 |
Get the IMDB score and count of film reviews grouped by IMDB score in
the reviews
table.
SELECT imdb_score, COUNT(*)
FROM reviews
GROUP BY imdb_score;
imdb_score | COUNT(*) |
---|---|
1.6 | 1 |
1.7 | 1 |
1.9 | 3 |
2.0 | 2 |
2.1 | 3 |
2.2 | 3 |
2.3 | 3 |
2.4 | 2 |
2.5 | 2 |
2.6 | 2 |
Now that you’ve accustomed yourself with GROUP BY
, let’s
throw it in the mix with other SQL constructs you already know!
Now practice your new skills by combining GROUP BY
and
ORDER BY
with some more aggregate functions!
Make sure to always put the ORDER BY
clause at the end of
your query. You can’t sort values that you haven’t calculated yet!
Get the release year and lowest gross earnings per release year.
SELECT release_year, MIN(gross)
FROM films
GROUP BY release_year;
release_year | MIN(gross) |
---|---|
NA | 145118 |
2004 | 23947 |
2005 | 12667 |
2006 | 721 |
2007 | 5561 |
2008 | 3478 |
2009 | 828 |
2010 | 12561 |
2011 | 2245 |
2012 | 1332 |
Get the language and total gross amount films in each language made.
SELECT language, SUM(gross)
FROM films
GROUP BY language;
language | SUM(gross) |
---|---|
1319281 | |
Aboriginal | 72515360 |
Arabic | 1681831 |
Aramaic | 499263 |
Bosnian | 301305 |
Cantonese | 2558828 |
Chinese | 50000 |
Czech | 617228 |
Danish | 756077 |
Dari | 15797907 |
Get the country and total budget spent making movies in each country.
SELECT country, SUM(budget)
FROM films
GROUP BY country;
country | SUM(budget) |
---|---|
3500000 | |
Argentina | 3400000 |
Australia | 1037800000 |
Bahamas | 5000000 |
Belgium | 49000000 |
Brazil | 23000000 |
Bulgaria | 7000000 |
Cambodia | NA |
Cameroon | NA |
Canada | 991440000 |
Get the release year, country, and highest budget spent making a film for each year, for each country. Sort your results by release year and country.
sqldf("SELECT release_year, country, MAX(budget)
FROM films
GROUP BY release_year, country
ORDER BY release_year, country;") %>% head
## release_year country MAX(budget)
## 1 NA NA
## 2 NA Australia 1.5e+07
## 3 NA Canada NA
## 4 NA France NA
## 5 NA Iceland NA
## 6 NA Japan NA
Get the country, release year, and lowest amount grossed per release year per country. Order your results by country and release year.
SELECT country, release_year, MIN(gross)
FROM films
GROUP BY country, release_year
ORDER BY country, release_year;
country | release_year | MIN(gross) |
---|---|---|
NA | NA | |
2014 | NA | |
Argentina | 2004 | 304124 |
Argentina | 2009 | 20167424 |
Australia | NA | NA |
Australia | 2005 | 1900725 |
Australia | 2006 | 13751 |
Australia | 2008 | 1027749 |
Australia | 2009 | 110029 |
Australia | 2010 | NA |
Off to the next statement!
In SQL, aggregate functions can’t be used in WHERE
clauses.
For example, the following query is invalid:
SELECT release_year
FROM films
GROUP BY release_year
WHERE COUNT(title) > 10;
This means that if you want to filter based on the result of an
aggregate function, you need another way! That’s where the
HAVING
clause comes in. For example,
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10;
shows only those years in which more than 10 films were released.
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(release_year) > 200;
release_year |
---|
2005 |
2006 |
2007 |
2008 |
2009 |
2010 |
2011 |
2012 |
2013 |
2014 |
That’s correct! We’re nearing the end of the course, so it’s time to combine everything that you’ve learned and answer some interesting questions!
Time to practice using ORDER BY
, GROUP BY
and
HAVING
together.
Now you’re going to write a query that returns the average budget and average gross earnings for films in each year after 1990, if the average budget is greater than $60 million.
This is going to be a big query, but you can handle it!
Get the release year, budget and gross earnings for each film in the
films
table.
SELECT release_year, budget, gross
FROM films;
release_year | budget | gross |
---|---|---|
2004 | 2127519898 | 410388 |
2004 | 2700000 | 95016 |
2004 | 65000 | 11529368 |
2004 | 20000000 | 9109322 |
2004 | 27000000 | 8712564 |
2004 | 12800000 | 1110186 |
2004 | 10000000 | 6491350 |
2004 | 30000000 | 32774834 |
2004 | 107000000 | 22406362 |
2004 | 107000000 | 22406362 |
Modify your query so that only records with a release_year
after 1990 are included.
SELECT release_year, budget, gross
FROM films
WHERE release_year > 1990;
release_year | budget | gross |
---|---|---|
2004 | 2127519898 | 410388 |
2004 | 2700000 | 95016 |
2004 | 65000 | 11529368 |
2004 | 20000000 | 9109322 |
2004 | 27000000 | 8712564 |
2004 | 12800000 | 1110186 |
2004 | 10000000 | 6491350 |
2004 | 30000000 | 32774834 |
2004 | 107000000 | 22406362 |
2004 | 107000000 | 22406362 |
Remove the budget and gross columns, and group your results by release year.
SELECT release_year
FROM films
WHERE release_year > 1990
GROUP BY release_year;
release_year |
---|
2004 |
2005 |
2006 |
2007 |
2008 |
2009 |
2010 |
2011 |
2012 |
2013 |
Modify your query to include the average budget and average gross
earnings for the results you have so far. Alias the average budget as
avg_budget
; alias the average gross earnings as
avg_gross
.
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year;
release_year | avg_budget | avg_gross |
---|---|---|
2004 | 76476582 | 35899988 |
2005 | 70323938 | 41159143 |
2006 | 572806572216 | 39237856 |
2007 | 35271131 | 46267501 |
2008 | 41804886 | 44573509 |
2009 | 37073287 | 46207440 |
2010 | 46094665 | 49908326 |
2011 | 37775252 | 45785837 |
2012 | 41331819 | 62873528 |
2013 | 40519045 | 56158358 |
Modify your query so that only years with an average budget of greater than $60 million are included.
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000;
release_year | avg_budget | avg_gross |
---|---|---|
2004 | 76476582 | 35899988 |
2005 | 70323938 | 41159143 |
2006 | 572806572216 | 39237856 |
Finally, modify your query to order the results from highest average gross earnings to lowest.
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000
ORDER BY avg_gross DESC;
release_year | avg_budget | avg_gross |
---|---|---|
2005 | 70323938 | 41159143 |
2006 | 572806572216 | 39237856 |
2004 | 76476582 | 35899988 |
Wooooow! Let’s do another one!
Great work! Now try another large query. This time, all in one go!
Remember, if you only want to return a certain number of results, you
can use the LIMIT
keyword to limit the number of rows
returned
Get the country, average budget, and average gross take of countries
that have made more than 10 films. Order the result by country name, and
limit the number of results displayed to 5. You should alias the
averages as avg_budget
and avg_gross
respectively.
-- select country, average budget,
-- and average gross
SELECT country, AVG(budget) AS avg_budget,
AVG(gross) AS avg_gross
-- from the films table
FROM films
-- group by country
GROUP BY country
-- where the country has more than 10 titles
HAVING COUNT(title) > 10
-- order by country
ORDER BY country
-- limit to only show 5 results
LIMIT 5;
country | avg_budget | avg_gross |
---|---|---|
Australia | 41512000 | 49731026 |
Canada | 15021818 | 23838592 |
China | 65580895 | 14234826 |
France | 24209136 | 19736688 |
Germany | 33800196 | 26597907 |
Superb work on a selection saga! SELECT
queries can get
rather long, but breaking them down into individual clauses makes them
easier to write.
Congrats on making it to the end of the course! By now you should have a good understanding of the basics of SQL.
There’s one more concept we’re going to introduce. You may have noticed
that all your results so far have been from just one table, e.g.,
films
or people
.
In the real world however, you will often want to query multiple tables. For example, what if you want to see the IMDB score for a particular movie?
In this case, you’d want to get the ID of the movie from the
films
table and then use it to get IMDB information from
the reviews
table. In SQL, this concept is known as a
join, and a basic join is shown in the editor to the
right.
The query in the editor gets the IMDB score for the film To Kill a Mockingbird! Cool right?
As you can see, joins are incredibly useful and important to understand for anyone using SQL.
We have a whole follow-up course dedicated to them called Joining Data in SQL for you to hone your database skills further!
Submit the code in the editor and inspect the results.
SELECT title, imdb_score
FROM films
JOIN reviews
ON films.id = reviews.film_id
WHERE title = 'To Kill a Mockingbird';
title | imdb_score |
---|
What is the IMDB score for the film To Kill a Mockingbird?