data science

24 SQL functions every analytics engineer should know

If you work with data, SQL functions are often your key to unlocking the power of your data. SQL helps data doers in every industry uncover insights and become a data-driven business.

In this article, we'll cover 15 SQL functions every analytics engineer should know. Plus, we’ve added a few honorable mention functions to help take your data analysis to the next level. Whether you're a seasoned SQL pro or just starting out, these functions are essential tools that will help you extract the most value from your data. Let's dive in!

Part 1: Basic SQL functions

1. SELECT

The SELECT statement in SQL is used to retrieve data from one or more tables. Here's an example of how to use it:

Let's say you have a table called "customers" with columns "customer_id", "customer_name", and "customer_email". You want to retrieve all rows from this table. You can use the SELECT statement to achieve this:

SELECT * FROM customers;

In this example, the SELECT statement is used to select all columns from the "customers" table by using the asterisk (*) symbol. The FROM clause specifies the name of the table that the data will be retrieved from.

The result of this query will be a table with three columns: "customer_id", "customer_name", and "customer_email". Each row in the table represents a unique customer with their ID, name, and email.

You can also specify individual columns that you want to select by listing their names after the SELECT keyword, separated by commas. For example, to select only the "customer_name" and "customer_email" columns from the "customers" table, you can use the following query:

SELECT customer_name, customer_email 
FROM customers;

In this case, the result will be a table with two columns: "customer_name" and "customer_email". Each row in the table represents a unique customer with their name and email.

2. WHERE

The WHERE clause in SQL is used to filter records based on certain conditions. It allows you to specify a condition that must be met for a row to be included in the result set. Here's an example of how to use it:

Let's say you have a table called "employees" with columns "employee_id", "first_name", "last_name", "department_id", and "salary". You want to find all employees who work in the "Sales" department and earn more than $50,000 per year. You can use the WHERE clause to specify these conditions:

SELECT employee_id, first_name, last_name, salary FROM employees 
WHERE department_id = 3 AND salary > 50000;

In this example, the SELECT statement selects the "employee_id", "first_name", "last_name", and "salary" columns. The WHERE clause specifies two conditions using the AND operator: "department_id = 3" and "salary > 50000". The first condition filters the records to only include employees who work in the "Sales" department (assuming department_id 3 is Sales). The second condition filters the records to only include employees who earn more than $50,000 per year.

The result of this query will be a table with four columns: "employee_id", "first_name", "last_name", and "salary". Each row will represent an employee who meets the specified conditions.

3. ORDER BY

The ORDER BY clause in SQL is used to sort the result set of a SELECT statement by one or more columns in ascending or descending order. Here's an example of how to use it:

Let's say you have a table called "employees" with columns "employee_id", "first_name", "last_name", and "salary". You want to retrieve all rows from this table, sorted by the "salary" column in descending order. You can use the ORDER BY clause to achieve this:

SELECT * FROM employees
ORDER BY salary DESC;

In this example, the SELECT statement is used to select all columns from the "employees" table. The ORDER BY clause specifies the column to sort by ("salary") and the direction of the sort (DESC for descending order).

The result of this query will be a table with all the rows from the "employees" table, sorted in descending order by the "salary" column. This means that the employees with the highest salaries will be listed first, and the employees with the lowest salaries will be listed last.

You can also specify multiple columns to sort by, separated by commas. For example, to sort the "employees" table by both the "salary" column in descending order and the "last_name" column in ascending order, you can use the following query:

SELECT *
FROM employees
ORDER BY salary DESC, last_name ASC;

In this case, the result will be a table with all the rows from the "employees" table, sorted first by salary in descending order, and then by last name in ascending order.

4. LIKE

The LIKE operator in SQL is used to search for a specific pattern within a string. It is typically used in conjunction with the WHERE clause to filter records based on a specific pattern match. Here's an example of how to use it:

Let's say you have a table called "employees" with columns "employee_id", "first_name", and "last_name". You want to find all employees whose last name starts with the letter "S". You can use the LIKE operator in the WHERE clause to achieve this:

SELECT employee_id, first_name, last_name
FROM employees
WHERE last_name LIKE 'S%';

In this example, the SELECT statement selects the "employee_id", "first_name", and "last_name" columns. The WHERE clause uses the LIKE operator and the pattern 'S%' to filter the records. The percent sign (%) is used as a wildcard character to match any characters that come after the letter "S". This means that any last name that starts with the letter "S" will be included in the result set.

The result of this query will be a table with three columns: "employee_id", "first_name", and "last_name". Each row will represent an employee whose last name starts with the letter "S".

5. IN

The IN operator in SQL is used to filter data based on a set of specified values. Here's an example of how to use it:

Let's say you have a table called "customers" with columns "customer_id", "customer_name", and "customer_city". You want to retrieve all rows from this table where the customers are located in either New York, London, or Paris. You can use the IN operator to achieve this:

SELECT * FROM customers
WHERE customer_city IN ('New York', 'London', 'Paris');

In this example, the SELECT statement is used to select all columns from the "customers" table. The WHERE clause specifies the condition for filtering the data, which is that the "customer_city" column must be one of the values specified in the set of values in the parentheses following the IN operator.

The result of this query will be a table with all the rows from the "customers" table where the "customer_city" is either New York, London, or Paris.

You can also use subqueries with the IN operator to retrieve data from another table. For example, to retrieve all rows from a table called "orders" where the "customer_id" column is equal to any of the customer IDs in the "customers" table located in New York, London, or Paris, you can use the following query:

SELECT *
FROM orders
WHERE customer_id IN (
    SELECT customer_id
    FROM customers
    WHERE customer_city IN ('New York', 'London', 'Paris')
);

In this case, the subquery retrieves all the customer IDs for customers located in New York, London, or Paris, and the IN operator is used to filter the "orders" table by only selecting rows where the "customer_id" is equal to any of those IDs.

Part 2: Aggregate SQL functions

6. COUNT()

The COUNT function in SQL is used to count the number of rows that meet a specified condition. It can be used in combination with the SELECT statement and the WHERE clause to count the number of records that satisfy a particular condition. Here's an example of how to use it:

Let's say you have a table called "orders" with columns "order_id", "customer_id", and "order_date". You want to count the number of orders placed by a specific customer, let's say customer ID 100. You can use the COUNT function to achieve this:

SELECT COUNT(order_id) as order_count
FROM orders
WHERE customer_id = 100;

In this example, the SELECT statement uses the COUNT function to count the number of orders placed by customer ID 100. The "as" keyword is used to give a name to the calculated column, which is "order_count" in this case. The WHERE clause is used to specify the condition that the customer ID must be 100.

The result of this query will be a table with one column: "order_count". The value in this column will be the number of orders placed by customer ID 100.

7. SUM()

The SUM function in SQL is used to calculate the sum of values in a column. Here's an example of how to use it:

Let's say you have a table called "sales" with columns "sale_id", "product_name", "quantity", and "price". You want to calculate the total revenue generated by all sales. You can use the SUM function to achieve this:

SELECT SUM(quantity * price) AS total_revenue
FROM sales;

In this example, the SELECT statement is used to calculate the sum of the product of "quantity" and "price" columns for all rows in the "sales" table. The AS keyword is used to give the result a column alias "total_revenue" for better readability.

The result of this query will be a table with one row and one column, which is the total revenue generated by all sales.

You can also use the SUM function with the GROUP BY clause to calculate the sum of values for each group. For example, to calculate the total revenue for each product in the "sales" table, you can use the following query:

SELECT product_name, SUM(quantity * price) AS total_revenue
FROM sales
GROUP BY product_name;

In this case, the GROUP BY clause is used to group the rows by the "product_name" column, and the SUM function is used to calculate the total revenue for each group. The result of this query will be a table with one row for each product and its total revenue generated by all sales.

8. AVG()

The AVG function in SQL is used to calculate the average value of a set of values in a column. It can be used in combination with the SELECT statement and the WHERE clause to calculate the average of a particular set of values that satisfy a specific condition. Here's an example of how to use it:

Let's say you have a table called "sales" with columns "sale_id", "customer_id", "product_id", and "sale_amount". You want to calculate the average sale amount for a specific product, let's say product ID 100. You can use the AVG function to achieve this:

SELECT AVG(sale_amount) as average_sale_amount
FROM sales
WHERE product_id = 100;

In this example, the SELECT statement uses the AVG function to calculate the average sale amount for product ID 100. The "as" keyword is used to give a name to the calculated column, which is "average_sale_amount" in this case. The WHERE clause is used to specify the condition that the product ID must be 100.

The result of this query will be a table with one column: "average_sale_amount". The value in this column will be the average sale amount for product ID 100.

9. MIN()

The MIN function in SQL is used to retrieve the smallest value in a column. Here's an example of how to use it:

Let's say you have a table called "products" with columns "product_id", "product_name", and "product_price". You want to retrieve the product with the lowest price. You can use the MIN function to achieve this:

SELECT MIN(product_price) AS lowest_price
FROM products;

In this example, the SELECT statement is used to retrieve the smallest value in the "product_price" column for all rows in the "products" table. The AS keyword is used to give the result a column alias "lowest_price" for better readability.

The result of this query will be a table with one row and one column, which is the lowest price among all products.

You can also use the MIN function with the GROUP BY clause to retrieve the smallest value for each group. For example, to retrieve the lowest price for each product category in the "products" table, you can use the following query:

SELECT product_category, MIN(product_price) AS lowest_price
FROM products
GROUP BY product_category;

In this case, the GROUP BY clause is used to group the rows by the "product_category" column, and the MIN function is used to retrieve the smallest value in the "product_price" column for each group. The result of this query will be a table with one row for each product category and its lowest price.

10. MAX()

Similarly, the MAX function in SQL is used to find the maximum value in a column. It can be used in combination with the SELECT statement and the WHERE clause to find the maximum value that satisfies a particular condition. Here's an example of how to use it:

Let's say you have a table called "products" with columns "product_id", "product_name", and "product_price". You want to find the highest priced product in the table. You can use the MAX function to achieve this:

SELECT MAX(product_price) as max_price
FROM products;

In this example, the SELECT statement uses the MAX function to find the highest value in the "product_price" column. The "as" keyword is used to give a name to the calculated column, which is "max_price" in this case.

The result of this query will be a table with one column: "max_price". The value in this column will be the highest price in the "product_price" column of the "products" table.

Part 3: Advanced SQL functions

11. DISTINCT()

The DISTINCT keyword in SQL is used to remove duplicate values from the result set. Here's an example of how to use it:

Let's say you have a table called "customers" with columns "customer_id", "customer_name", and "customer_city". You want to retrieve the list of all unique cities where your customers are located. You can use the DISTINCT keyword to achieve this:

SELECT DISTINCT customer_city
FROM customers;

In this example, the SELECT statement is used to retrieve the "customer_city" column for all rows in the "customers" table. The DISTINCT keyword is used to remove duplicate cities from the result set.

The result of this query will be a table with one row for each unique city where your customers are located.

You can also use the DISTINCT keyword with multiple columns to remove duplicates based on multiple columns. For example, to retrieve the list of all unique customer names and cities in the "customers" table, you can use the following query:

SELECT DISTINCT customer_name, customer_city
FROM customers;

In this case, the DISTINCT keyword is used to remove duplicate rows based on both "customer_name" and "customer_city" columns. The result of this query will be a table with one row for each unique combination of customer name and city.

12. GROUP BY

Explanation

The GROUP BY clause in SQL is used to group rows based on one or more columns and aggregate functions can be used on each group. The aggregate function performs calculations on each group and returns a single result. Here's an example of how to use it:

Let's say you have a table called "orders" with columns "order_id", "customer_id", "order_date", and "order_total". You want to know the total amount spent by each customer. You can use the GROUP BY clause along with the SUM function to achieve this:

SELECT customer_id, SUM(order_total) as total_spent
FROM orders
GROUP BY customer_id;

In this example, the SELECT statement uses the SUM function to calculate the total amount spent by each customer. The "as" keyword is used to give a name to the calculated column, which is "total_spent" in this case. The GROUP BY clause is used to group the rows based on the "customer_id" column.

The result of this query will be a table with two columns: "customer_id" and "total_spent". Each row in the table represents a unique customer ID with the total amount spent by that customer.

13. HAVING

The HAVING clause in SQL is used in conjunction with the GROUP BY clause to filter the result set based on a condition applied to the aggregated values. Here's an example of how to use it:

Let's say you have a table called "orders" with columns "order_id", "customer_id", and "order_total". You want to retrieve the list of customers who have placed orders with a total value greater than $1000. You can use the HAVING clause to achieve this:

SELECT customer_id, SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING total_spent > 1000;

In this example, the SELECT statement is used to retrieve the "customer_id" column and the aggregated value of "order_total" for all rows in the "orders" table. The GROUP BY clause is used to group the rows by the "customer_id" column. The HAVING clause is used to filter the result set based on the condition that the aggregated value of "order_total" should be greater than $1000.

The result of this query will be a table with one row for each customer who has placed orders with a total value greater than $1000 and the total amount they have spent.

You can also use multiple conditions in the HAVING clause using logical operators like AND and OR. For example, to retrieve the list of customers who have placed orders with a total value greater than $1000 and who have placed at least 3 orders, you can use the following query:

SELECT customer_id, COUNT(*) AS total_orders, SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING total_spent > 1000 AND total_orders >= 3;

In this case, the HAVING clause is used to filter the result set based on two conditions: the aggregated value of "order_total" should be greater than $1000 and the count of orders for each customer should be greater than or equal to 3. The result of this query will be a table with one row for each customer who satisfies both conditions, along with their total spent and total orders.

14. JOIN

The JOIN operation in SQL is used to combine rows from two or more tables based on a related column between them. There are different types of JOINs available in SQL, such as INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN. Here's an example of how to use INNER JOIN:

Let's say you have two tables called "customers" and "orders". The "customers" table has columns "customer_id", "customer_name", and "customer_email", while the "orders" table has columns "order_id", "order_date", "customer_id", and "order_total". You want to join these two tables to get the name and email of customers who have placed orders. You can use INNER JOIN to achieve this:

SELECT customers.customer_name, customers.customer_email, orders.order_id, orders.order_date, orders.order_total
FROM customers
INNER JOIN orders
ON customers.customer_id = orders.customer_id;

In this example, the SELECT statement selects the columns "customer_name", "customer_email", "order_id", "order_date", and "order_total" from both the "customers" and "orders" tables. The INNER JOIN clause is used to combine rows from both tables based on the "customer_id" column, which is common to both tables.

The result of this query will be a table with five columns: "customer_name", "customer_email", "order_id", "order_date", and "order_total". Each row in the table represents a unique combination of customer details and order details where the customer has placed an order.

15. UNION

The UNION operator in SQL is used to combine the result sets of two or more SELECT statements into a single result set. Here's an example of how to use it:

Let's say you have two tables called "customers" and "employees", both with columns "id" and "name". You want to retrieve a list of all the unique names from both tables. You can use the UNION operator to achieve this:

SELECT name FROM customers
UNION
SELECT name FROM employees;

In this example, the SELECT statement is used twice, once for each table. The UNION operator is used to combine the result sets of the two SELECT statements into a single result set, removing any duplicates.

The result of this query will be a table with one row for each unique name in both the "customers" and "employees" tables.

Note that the number and data types of the columns in the SELECT statements must match for the UNION operator to work properly. If the columns in the SELECT statements do not match, you may need to use aliases to ensure that they do.

You can also use the UNION operator with more than two SELECT statements, by simply adding additional SELECT statements separated by the UNION keyword. For example:

SELECT name FROM customers
UNION
SELECT name FROM employees
UNION
SELECT name FROM vendors;

In this case, the result set will contain the unique names from all three tables, "customers", "employees", and "vendors".

Part 4: Honorable mentions

Even with an understanding of these core SQL functions, you may run into other scenarios where you need additional tools. While these may not be the functions you use every day, these honorable mention functions are handy to keep bookmarked for your time of need.

16. COALESCE

The COALESCE function in SQL is used to return the first non-null value in a list of values. Here's an example of how to use it:

Let's say you have a table called "employees" with columns "id", "name", "email", and "phone". You want to retrieve a list of employees along with their contact information, but if an employee does not have an email address, you want to use their phone number instead. You can use the COALESCE function to achieve this:

SELECT id, name, COALESCE(email, phone) AS contact
FROM employees;

In this example, the SELECT statement is used to retrieve the "id" and "name" columns from the "employees" table, along with a new column called "contact". The COALESCE function is used to return the first non-null value from the "email" and "phone" columns. If an employee has an email address, the value from the "email" column will be used; otherwise, the value from the "phone" column will be used.

The result of this query will be a table with one row for each employee, containing their id, name, and contact information.

You can also use more than two values with the COALESCE function. For example, if you have a table with columns "a", "b", and "c", you can use the following query to retrieve the first non-null value from these three columns:

SELECT COALESCE(a, b, c) AS result
FROM my_table;

In this case, the result column will contain the first non-null value from the "a", "b", and "c" columns for each row in the table. If all three columns are null, the result column will be null as well.

17. CONCAT

The CONCAT function in SQL is used to concatenate two or more strings into a single string. Here's an example of how to use it:

Let's say you have a table called "customers" with columns "first_name" and "last_name". You want to retrieve a list of customers with their full names concatenated together. You can use the CONCAT function to achieve this:

SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM customers;

In this example, the SELECT statement is used to retrieve the "first_name" and "last_name" columns from the "customers" table, concatenated together with a space character in between. The CONCAT function is used to concatenate the two columns into a single string.

The result of this query will be a table with one row for each customer, containing their full name as a single string in the "full_name" column.

You can also use the CONCAT function with more than two values. For example, if you have a table with columns "first_name", "middle_name", and "last_name", you can use the following query to retrieve the full name for each customer:

SELECT CONCAT(first_name, ' ', middle_name, ' ', last_name) AS full_name
FROM customers;

In this case, the result column will contain the full name for each customer, with spaces in between the first name, middle name (if present), and last name. If a customer does not have a middle name, the CONCAT function will still work, but the result will not contain a middle name.

18. DATEPART

The DATEPART function in SQL is used to extract a specific part of a date or time value, such as the year, month, or day. Here's an example of how to use it:

Let's say you have a table called "orders" with columns "order_date" and "total_amount". You want to retrieve a list of orders and group them by year and month, so you can see the total amount of orders for each month over time. You can use the DATEPART function to achieve this:

SELECT DATEPART(year, order_date) AS order_year,
       DATEPART(month, order_date) AS order_month,
       SUM(total_amount) AS total_sales
FROM orders
GROUP BY DATEPART(year, order_date), DATEPART(month, order_date);

In this example, the SELECT statement is used to retrieve the year and month parts of the "order_date" column, along with the total amount of each order from the "total_amount" column. The DATEPART function is used to extract the year and month parts of the date value. The GROUP BY clause is used to group the orders by year and month.

The result of this query will be a table with one row for each combination of year and month, containing the total amount of sales for each month over time.

You can also use the DATEPART function with other date or time parts, such as "day", "hour", "minute", "second", "millisecond", "microsecond", "nanosecond", and others, depending on the precision of your date or time values. For example, if you have a column with datetime values, you can use the following query to retrieve the hour part of each value:

SELECT DATEPART(hour, datetime_column) AS hour_part
FROM my_table;

In this case, the result column will contain the hour part of each datetime value in the table.

19. DATEADD

The DATEADD function in SQL is used to add a specified number of intervals (such as days, months, or years) to a date or time value. Here's an example of how to use it:

Let's say you have a table called "orders" with columns "order_date" and "ship_date". You want to retrieve a list of orders where the ship date is one week after the order date. You can use the DATEADD function to achieve this:

SELECT *
FROM orders
WHERE ship_date = DATEADD(week, 1, order_date);

In this example, the WHERE clause is used to filter the orders based on the ship date being one week after the order date. The DATEADD function is used to add one week to the "order_date" value, and then compare it to the "ship_date" value.

The result of this query will be a table with one row for each order where the ship date is one week after the order date.

You can also use the DATEADD function to subtract intervals from a date or time value, or to add or subtract intervals of other types (such as hours, minutes, or seconds). For example, if you have a column with datetime values, you can use the following query to add one hour to each value:

SELECT DATEADD(hour, 1, datetime_column) AS new_datetime
FROM my_table;

In this case, the result column will contain a new datetime value for each row, with one hour added to the original value.

20. DATEDIFF

The DATEDIFF function in SQL is used to calculate the difference between two date or time values, in terms of a specified interval (such as days, months, or years). Here's an example of how to use it:

Let's say you have a table called "orders" with columns "order_date" and "ship_date". You want to retrieve a list of orders where the ship date is at least 7 days after the order date. You can use the DATEDIFF function to achieve this:

SELECT *
FROM orders
WHERE DATEDIFF(day, order_date, ship_date) >= 7;

In this example, the WHERE clause is used to filter the orders based on the ship date being at least 7 days after the order date. The DATEDIFF function is used to calculate the difference between the "order_date" value and the "ship_date" value, in terms of days. The result is then compared to the value 7, to ensure that the difference is at least 7 days.

The result of this query will be a table with one row for each order where the ship date is at least 7 days after the order date.

You can also use the DATEDIFF function to calculate the difference between two date or time values in terms of other intervals, such as "month", "year", "hour", "minute", or "second". For example, if you have a column with datetime values, you can use the following query to calculate the difference in hours between two values:

SELECT DATEDIFF(hour, datetime_column1, datetime_column2) AS hour_diff
FROM my_table;

In this case, the result column will contain the difference in hours between the two datetime values in each row.

21. CAST

The CAST function in SQL is used to convert a value of one data type to another data type. Here's an example of how to use it:

Let's say you have a column called "price" in a table called "products". The "price" column is stored as a varchar, but you want to retrieve the data as a numeric data type. You can use the CAST function to achieve this:

SELECT CAST(price AS decimal(10,2)) AS numeric_price
FROM products;

In this example, the CAST function is used to convert the "price" column to a decimal data type with 10 total digits and 2 decimal places. The result of the CAST function is then assigned to a new column called "numeric_price".

The result of this query will be a table with one row for each product, and a new column called "numeric_price" containing the price value as a decimal data type.

You can also use the CAST function to convert values between other data types, such as varchar to date, date to datetime, or int to float. Here's an example of how to use the CAST function to convert a date value to a datetime value:

SELECT CAST(date_column AS datetime) AS datetime_column
FROM my_table;

In this case, the result column will contain the date value from the "date_column" column, converted to a datetime data type.

22. PARTITION BY

In SQL, the PARTITION BY clause is used to divide the result set of a query into partitions based on one or more columns. This is useful when you want to perform calculations on subsets of a larger result set.

Here's an example of how to use the PARTITION BY clause in SQL:

Let's say you have a table called "sales" with columns "region", "year", and "sales_amount". You want to calculate the percentage of sales amount for each region in each year. You can use the PARTITION BY clause to achieve this:

SELECT region, year, sales_amount, SUM(sales_amount) OVER (PARTITION BY region, year) / SUM(sales_amount) OVER (PARTITION BY year) AS sales_percent
FROM sales;

In this example, the PARTITION BY clause is used to divide the result set into partitions based on the "region" and "year" columns. The SUM function is then used with the OVER clause to calculate the total sales amount for each partition.

The result of this query will be a table with one row for each combination of region and year, and a new column called "sales_percent" containing the percentage of sales amount for each region in each year.

The PARTITION BY clause can be used with a wide range of SQL functions, including SUM, AVG, MAX, MIN, and RANK, to name a few. It is a powerful tool for performing complex calculations on subsets of a larger result set.

23. RANK

The RANK function in SQL is used to assign a rank to each row within a result set based on the value of one or more columns. The rank indicates the relative position of each row within the result set, based on the ordering specified in the query.

Here's an example of how to use the RANK function in SQL:

Let's say you have a table called "sales" with columns "region", "year", and "sales_amount". You want to assign a rank to each region based on their sales amount in a given year. You can use the RANK function to achieve this:

SELECT region, year, sales_amount, RANK() OVER (PARTITION BY year ORDER BY sales_amount DESC) AS sales_rank
FROM sales;

In this example, the RANK function is used to assign a rank to each row within the result set, based on the sales amount in descending order, for each year. The PARTITION BY clause is used to partition the result set into groups based on the "year" column. The ORDER BY clause is used to order the rows within each partition by the "sales_amount" column in descending order.

The result of this query will be a table with one row for each region and year combination, and a new column called "sales_rank" containing the rank of each region based on their sales amount in that year. The rank values will start from 1 for the highest sales amount and increase by 1 for each subsequent row.

You can also use the RANK function to assign a rank based on multiple columns. For example, if you want to assign a rank to each region based on their sales amount in a given year, and then break ties based on their sales amount in the previous year, you can use the following query:

SELECT region, year, sales_amount, RANK() OVER (PARTITION BY year ORDER BY sales_amount DESC, LAG(sales_amount) OVER (PARTITION BY region ORDER BY year) DESC) AS sales_rank
FROM sales;

In this case, the LAG function is used to retrieve the sales amount for the previous year for each region. The result set is partitioned by year and ordered by sales amount in descending order, with ties being broken by the sales amount in the previous year.

24. LEAD/LAG

The LEAD and LAG functions in SQL are used to retrieve the value of a column from a row that is a specified number of rows ahead or behind the current row within a result set.

The LAG function retrieves the value of a column from the previous row, while the LEAD function retrieves the value of a column from the next row.

Here's an example of how to use the LAG function in SQL:

Let's say you have a table called "sales" with columns "region", "year", and "sales_amount". You want to calculate the percentage change in sales amount from the previous year for each region. You can use the LAG function to achieve this:

SELECT region, year, sales_amount, (sales_amount - LAG(sales_amount) OVER (PARTITION BY region ORDER BY year)) / LAG(sales_amount) OVER (PARTITION BY region ORDER BY year) AS sales_percent_change
FROM sales;

In this example, the LAG function is used to retrieve the sales amount for the previous year for each region. The PARTITION BY clause is used to partition the result set into groups based on the "region" column. The ORDER BY clause is used to order the rows within each partition by the "year" column.

The result of this query will be a table with one row for each region and year combination, and a new column called "sales_percent_change" containing the percentage change in sales amount from the previous year for each region.

You can use the LEAD function in a similar way to retrieve the value of a column from the next row. The syntax is the same, except you replace "LAG" with "LEAD" in the function call.

The LEAD and LAG functions are useful for calculating running totals, moving averages, and other types of time-based calculations in SQL.

Looking for ways to write less SQL?

Regardless of if you’re a data analyst, data engineer, or an analytics engineer, knowing the fundamentals of SQL is key to your success. But writing and optimizing complex SQL queries for every one-off data request can be a major drag on your bandwidth. Understanding SQL query optimization best practices can significantly reduce this burden.

That was the case for Wellthy’s Head of Data Science and Analytics, Kelly Burdine. Their legacy BI platform required Python and SQL input for every data request—leaving no opportunities to proactively implement larger data initiatives.

Quote from kelly burdine, head of data science and analytics at Wellthy

But within six months of starting ThoughtSpot, Kelly’s team was able to 2x their velocity and output—generating +350% ROI solely based on their analysts ability to meet high-growth demand. What could you do with your time if you didn’t spend a majority of your work day writing complex SQL queries? 

Curious to find out? Start a risk-free, 14-day free trial of ThoughtSpot today.