Select Statements in Snowflake are Having a brat Summer

Technically, these features were released before this summer, but I am trying hard to stay relevant here.

What cool new things can you do in a Select statement?

The select statement has been the same since the dawn of SQL. I don’t know if that’s true, but we can agree it’s basic and simple and probably hasn’t changed much. Until now! Snowflake has introduced some cool new features that you can use in your analysis, ETL or dinner discussions.

EXCLUDE

Syntax

SELECT * EXCLUDE cat_id FROM cat_table

SELECT * EXCLUDE cat_id, last_modified_dt FROM cat_table

Good for

Seeing all columns in a table in order except a few

You may use this because you are trying to do a set statement, such as minus or union all, but one or two fields aren’t important to your findings (I’m looking right at you, last_modified_dt), or you may want to INSERT or INSERT OVERWRITE into another table that has all the same columns except a few

REPLACE

Syntax

SELECT * REPLACE (cat_cuteness * 100 || '%' AS cat_cuteness) FROM cat_table

Good for

Selecting all columns from a table in order, but slightly altering the contents of one column in place

You may use this if you are trying to do a set operation (minus, union all, etc) on two sources and one varies slightly or if you want to do an INSERT or INSERT OVERWRITE into another table but slightly change the contents. In the example above, I am changing cat_cuteness to a percent. If there are 100 columns and cat_cuteness is ordinal 50, that select statement will return cat_cuteness as percent in the 50th ordinal position.

RENAME

Syntax

SELECT * RENAME breed AS cat_breed FROM cat_table

Good for

Renaming a column name to match or not match another table. For instance, if you want all column names to match before you do a merge you may make a temp table that renames any mismatched names so you don’t have merge statements like:

MERGE INTO target USING source
ON target.cat_id = source.cat_id
WHEN MATCHED THEN UPDATE SET
target.cat_breed = source.breed

More likely though, you may want to intentionally change the name of the column so when you do a join, your two tables will have distinct column names. If breed exists in both tables, this will give you an error:

SELECT breed
FROM cat_table
LEFT JOIN animal_table
ON cat_table.cat_id  = animal_table.animal_id

While I do think you should alias in this case, if you want both fields, you will need to alias and rename:

SELECT 
    cat_table.breed AS cat_breed,
    animal_table.breed AS animal_breed,
    COALESCE(cat_breed, animal_breed) AS breed
FROM cat_table
LEFT JOIN animal_table
ON cat_table.cat_id = animal_table.animal_id

This can be wordy if there is more than one field like this, so you may do:

WITH cat AS 
(SELECT * RENAME breed AS cat_breed FROM cat_table),
animal AS 
(SELECT * RENAME breed AS animal_breed FROM animal_table)

SELECT 
    cat_breed, 
    animal_breed, 
    COALESCE(cat_breed, animal_breed) AS breed
FROM cat
LEFT JOIN animal
ON cat.cat_id = animal.animal_id

ILIKE

Syntax

SELECT * ILIKE 'cat%' FROM animal_table

Good for

When you want to select only certain columns from a table. I probably wouldn’t use this for ETL, but if you are doing analysis on a wide table, it may be help you see only what it is relevant very quickly.

Warning

You can combine these keywords into one select statement, but as we can see, this can get hard to read. In the case of ETL, you may end up writing out the columns rather than trying to save some characters.

SELECT * EXCLUDE cat_id REPLACE (cat_cuteness * 100 || '%' AS cat_cuteness) FROM cat_table