How the Snowflake Qualify Clause Changed My Life

I know what you’re thinking: either this is a click-bait title or Alisa doesn’t have a lot going on in her life. Well, two things can be true at once. This post is about the relatively new Snowflake function Qualify. It is basically a WHERE clause for window functions. I love it.

Below is my life before and after Qualify came into it. You will notice there were dark times before, but now, life is better.

Deduplicating a table before Qualify

In order to deduplicate cats_table based on the most recent adoption for each cat_id, I used to have to do something like this:

CREATE OR REPLACE TABLE cat_adopted_home AS
    SELECT * FROM (
        SELECT 
            *,
            RANK() OVER (PARTITION BY cat_id ORDER BY cat_adopted_date DESC) AS adoption_rank
        FROM cats_table
) WHERE adoption_rank = 1;

Deduplicating a table after Qualify

CREATE OR REPLACE TABLE cat_adopted_home AS
    SELECT * FROM cats_table
    QUALIFY RANK() OVER (PARTITION BY cat_id ORDER BY cat_adopted_date DESC) = 1

Note we are using one less query, but more importantly we do not actually have to materialize the adoption_rank column. This means that our table cat_adopted_home has the same fields as cats_table. This is important in case we want to do something like:

CREATE OR REPLACE TABLE old_adoption_homes AS
    SELECT * FROM cats_table
    MINUS 
    SELECT * FROM cat_adopted_home;

Before we would have had to either list the columns (a good practice, unless you really want all the columns and don’t want to edit 5 queries when you add a field to a table) or manually drop the column like:

ALTER TABLE cat_adopted_home DROP COLUMN adoption_rank;

Both methods are more code to maintain and more code that can go wrong. It’s cleaner to use Qualify.

But is it more performant?

While I could not find any Snowflake documentation on this, likely there is little to no performance improvement from using Qualify. This is really just a syntax change. In the backend, a Qualify is evaluated after a WINDOW function, which means the entire table with the WINDOW function has to be created before the Qualify is performed. So basically, it’s the same thing as we were doing before, but with fewer letters.

Finding duplicates before Qualify

Before Qualify came into my life, I had to do this a lot:

SELECT 
    cat_id,
    COUNT(*) AS cat_count 
FROM cat_table
GROUP BY cat_id
HAVING cat_count>1;

This isn’t so bad except the only data you can get is the primary key (cat_id); nothing other than the PK and the aggregate can be in the SELECT clause (fun fact: the aggregate is optional).

In order to get all the cat data to figure out what fields are causing the dupes, we have to do a temp table, subquery or CTE:

WITH cat_dupes AS 
    (SELECT 
        cat_id FROM cat_table
    GROUP BY cat_id
    HAVING COUNT(*) >1)
SELECT * FROM cat_table
WHERE cat_id IN (SELECT cat_id FROM cat_dupes);

Finding duplicate after Qualify

Now that life is better, we can do:

SELECT * FROM cat_table
QUALIFY COUNT(*) OVER (PARTITION BY cat_id) > 1;

Now we get all duplicates and all their information in one query! In two lines! Think of the applications!

Well that’s it for me. Back to my boring life.