However, whereas an aggregate operation groups query rows into a single result row, a window function produces a result for each query row:
FUNCTION_NAME() OVER()
ORDER BY
PARTITION BY
ROWS/RANGE PRECEDING/FOLLOWING/UNBOUNDED
library(tidyverse)
<-tribble(~ year , ~country , ~product , ~profit ,~Own ,~Time,
sales2000 , "Finland", "Computer" , 1500 ,"Y" ,"D",
2000 , "Finland" ,"Phone" , 100 ,"Y" ,"D",
2001 , "Finland" ,"Phone" , 10 ,"N" ,"D",
2000 , "India" ,"Calculator" , 75 ,"Y" ,"E",
2000 , "India" ,"Calculator" , 75 ,"N" ,"E",
2000 , "India" ,"Computer" , 1200 ,"N" ,"E",
2000 , "USA" , "Calculator", 75 ,"Y" ,"E",
2000 , "USA" , "Computer" , 1500 ,"N" ,"E",
2001 , "USA" ,"Calculator" , 50 ,"Y" ,"E",
2001 , "USA" ,"Computer" , 1500 ,"Y" ,"E",
2001 , "USA" ,"Computer" , 1200 ,"Y" ,"D",
2001 , "USA" , "TV" , 150 ,"Y" ,"D",
2001 , "USA" , "TV" , 100 ,"Y" ,"D")
sales
library(odbc)
library(DBI)
library(RSQLite)
<-dbConnect(RSQLite::SQLite(), ":memory:")
windcopy_to(wind,sales)
ROW_NUMBER() OVER()
SELECT ROW_NUMBER() OVER () AS row_id , year, country, product, profit
FROM sales;
row_id | year | country | product | profit |
---|---|---|---|---|
1 | 2000 | Finland | Computer | 1500 |
2 | 2000 | Finland | Phone | 100 |
3 | 2001 | Finland | Phone | 10 |
4 | 2000 | India | Calculator | 75 |
5 | 2000 | India | Calculator | 75 |
6 | 2000 | India | Computer | 1200 |
7 | 2000 | USA | Calculator | 75 |
8 | 2000 | USA | Computer | 1500 |
9 | 2001 | USA | Calculator | 50 |
10 | 2001 | USA | Computer | 1500 |
partitions
based on a
column’s unique values and results aren’t rolled into one columntherefore
ROW_NUMBER()
WithPARTITION BY
produces the row number of each row within its partition. In this case, rows are numbered per country. By default, partition rows are unordered and row numbering is nondeterministic.
SELECT ROW_NUMBER() OVER(PARTITION BY country) AS row_num1,
year, country, product, profit
FROM sales;
row_num1 | year | country | product | profit |
---|---|---|---|---|
1 | 2000 | Finland | Computer | 1500 |
2 | 2000 | Finland | Phone | 100 |
3 | 2001 | Finland | Phone | 10 |
1 | 2000 | India | Calculator | 75 |
2 | 2000 | India | Calculator | 75 |
3 | 2000 | India | Computer | 1200 |
1 | 2000 | USA | Calculator | 75 |
2 | 2000 | USA | Computer | 1500 |
3 | 2001 | USA | Calculator | 50 |
4 | 2001 | USA | Computer | 1500 |
its also possible to use
ROW_NUMBER()
WithORDER BY
to determine ranks
SELECT ROW_NUMBER() OVER(ORDER BY profit DESC) AS row_num1,
year, country, product, profit
FROM sales;
row_num1 | year | country | product | profit |
---|---|---|---|---|
1 | 2000 | Finland | Computer | 1500 |
2 | 2000 | USA | Computer | 1500 |
3 | 2001 | USA | Computer | 1500 |
4 | 2000 | India | Computer | 1200 |
5 | 2001 | USA | Computer | 1200 |
6 | 2001 | USA | TV | 150 |
7 | 2000 | Finland | Phone | 100 |
8 | 2001 | USA | TV | 100 |
9 | 2000 | India | Calculator | 75 |
10 | 2000 | India | Calculator | 75 |
in the above example , we see that we have ranked our data according to
profits
implying that each rowid also signifies the rank of profit
in such a case we use
RANK()
SELECT RANK() OVER(ORDER BY profit DESC) AS RANK,
year, country, product
profit ,FROM sales;
RANK | profit | year | country | product |
---|---|---|---|---|
1 | 1500 | 2000 | Finland | Computer |
1 | 1500 | 2000 | USA | Computer |
1 | 1500 | 2001 | USA | Computer |
4 | 1200 | 2000 | India | Computer |
4 | 1200 | 2001 | USA | Computer |
6 | 150 | 2001 | USA | TV |
7 | 100 | 2000 | Finland | Phone |
7 | 100 | 2001 | USA | TV |
9 | 75 | 2000 | India | Calculator |
9 | 75 | 2000 | India | Calculator |
DENSE_RANK()
we can use
DENSE_RANK()
if we need ranks to be ordered or require a further partition
SELECT DENSE_RANK() OVER(PARTITION BY country ORDER BY profit DESC) AS RANK,
year, country, product
profit ,FROM sales;
RANK | profit | year | country | product |
---|---|---|---|---|
1 | 1500 | 2000 | Finland | Computer |
2 | 100 | 2000 | Finland | Phone |
3 | 10 | 2001 | Finland | Phone |
1 | 1200 | 2000 | India | Computer |
2 | 75 | 2000 | India | Calculator |
2 | 75 | 2000 | India | Calculator |
1 | 1500 | 2000 | USA | Computer |
1 | 1500 | 2001 | USA | Computer |
2 | 1200 | 2001 | USA | Computer |
3 | 150 | 2001 | USA | TV |
now we have ranked our profits by country starting with the highest profit in each country
GROUP BY
aggregate
function but all rows stay in the outputusing the sales information table, these two queries perform aggregate operations that produce a single global sum for all rows taken as a group, and sums grouped per country:
SELECT SUM(profit) AS total_profit
FROM sales;
total_profit |
---|
7535 |
lets us see a
GROUP BY
in action
SELECT country, SUM(profit) AS country_profit
FROM sales
GROUP BY country
ORDER BY country;
country | country_profit |
---|---|
Finland | 1610 |
India | 1350 |
USA | 4575 |
On the other hand , window operations do not collapse groups of query rows to a single output row. Instead, they produce a result for each row. Like the preceding queries, the following query uses
SUM()
, but this time as a window function:
SELECT
year, country, product, profit,
SUM(profit) OVER() AS total_profit,
SUM(profit) OVER(PARTITION BY country) AS country_profit
FROM sales
ORDER BY country, year, product, profit;
year | country | product | profit | total_profit | country_profit |
---|---|---|---|---|---|
2000 | Finland | Computer | 1500 | 7535 | 1610 |
2000 | Finland | Phone | 100 | 7535 | 1610 |
2001 | Finland | Phone | 10 | 7535 | 1610 |
2000 | India | Calculator | 75 | 7535 | 1350 |
2000 | India | Calculator | 75 | 7535 | 1350 |
2000 | India | Computer | 1200 | 7535 | 1350 |
2000 | USA | Calculator | 75 | 7535 | 4575 |
2000 | USA | Computer | 1500 | 7535 | 4575 |
2001 | USA | Calculator | 50 | 7535 | 4575 |
2001 | USA | Computer | 1200 | 7535 | 4575 |
we have included
OVER
clause that specifies how to partition query rows into groups for processing by the window function:
Window functions are permitted only in the select list and
ORDER BY
clause. Query result rows are determined from theFROM
clause, afterWHERE, GROUP BY
, andHAVING
processing, and windowing execution occurs beforeORDER BY
,LIMIT
, andSELECT DISTINCT
.
WITH
clause. for instance the following
table can be created as a temporary tableFrames allow you to “peek” forwards or backward without first using the relative fetching functions,
LAG
andLEAD
, to fetch previous rows’ values into the current row.
ROWS BETWEEN [START] AND [FINISH]
n PRECEDING
: n
rows before current
rowCURRENT ROW
: the current rown FOLLOWING
: n
rows after the current
rowthe following query calculates the number of sales per year and per country(i.e grouped by year and country)
SELECT
year, country, COUNT(*) AS num_sales
FROM sales
GROUP BY year, country;
year | country | num_sales |
---|---|---|
2000 | Finland | 2 |
2000 | India | 3 |
2000 | USA | 2 |
2001 | Finland | 1 |
2001 | USA | 5 |
like soo…
WITH
and a FRAME
country_sales
WITH Country_sales AS (
SELECT
year, country, COUNT(*) AS num_sales
FROM sales
GROUP BY year, country)
SELECT
year, country, num_sales,
-- Calculate each country's 3s-ales moving total
SUM(num_sales) OVER
PARTITION BY country
(ORDER BY year ASC
ROWS BETWEEN
2 PRECEDING AND CURRENT ROW) AS sales_MA
FROM Country_sales
ORDER BY country ASC, year ASC;
year | country | num_sales | sales_MA |
---|---|---|---|
2000 | Finland | 2 | 2 |
2001 | Finland | 1 | 3 |
2000 | India | 3 | 3 |
2000 | USA | 2 | 2 |
2001 | USA | 5 | 7 |