Select Statements in Snowflake are Having a brat Summer
Technically, these features were released before this summer, but I am trying hard to stay relevant here.
What cool new things can you do in a Select statement?
The select
statement has been the same since the dawn of SQL. I don’t know if that’s true, but we can agree it’s basic and simple and probably hasn’t changed much. Until now!
Snowflake has introduced some cool new features that you can use in your analysis, ETL or dinner discussions.
EXCLUDE
Syntax
SELECT * EXCLUDE cat_id FROM cat_table
SELECT * EXCLUDE cat_id, last_modified_dt FROM cat_table
Good for
Seeing all columns in a table in order except a few
You may use this because you are trying to do a set
statement, such as minus
or union all
, but one or two fields aren’t important to your findings (I’m looking right at you, last_modified_dt
), or you may want to INSERT
or INSERT OVERWRITE
into another table that has all the same columns except a few
REPLACE
Syntax
SELECT * REPLACE (cat_cuteness * 100 || '%' AS cat_cuteness) FROM cat_table
Good for
Selecting all columns from a table in order, but slightly altering the contents of one column in place
You may use this if you are trying to do a set
operation (minus
, union all
, etc) on two sources and one varies slightly or if you want to do an INSERT
or INSERT OVERWRITE
into another table but slightly change the contents. In the example above, I am changing cat_cuteness to a percent. If there are 100 columns and cat_cuteness is ordinal 50, that select
statement will return cat_cuteness as percent in the 50th ordinal position.
RENAME
Syntax
SELECT * RENAME breed AS cat_breed FROM cat_table
Good for
Renaming a column name to match or not match another table. For instance, if you want all column names to match before you do a merge
you may make a temp table that renames any mismatched names so you don’t have merge statements like:
MERGE INTO target USING source
ON target.cat_id = source.cat_id
WHEN MATCHED THEN UPDATE SET
target.cat_breed = source.breed
More likely though, you may want to intentionally change the name of the column so when you do a join
, your two tables will have distinct column names. If breed
exists in both tables, this will give you an error:
SELECT breed
FROM cat_table
LEFT JOIN animal_table
ON cat_table.cat_id = animal_table.animal_id
While I do think you should alias in this case, if you want both fields, you will need to alias and rename:
SELECT
cat_table.breed AS cat_breed,
animal_table.breed AS animal_breed,
COALESCE(cat_breed, animal_breed) AS breed
FROM cat_table
LEFT JOIN animal_table
ON cat_table.cat_id = animal_table.animal_id
This can be wordy if there is more than one field like this, so you may do:
WITH cat AS
(SELECT * RENAME breed AS cat_breed FROM cat_table),
animal AS
(SELECT * RENAME breed AS animal_breed FROM animal_table)
SELECT
cat_breed,
animal_breed,
COALESCE(cat_breed, animal_breed) AS breed
FROM cat
LEFT JOIN animal
ON cat.cat_id = animal.animal_id
ILIKE
Syntax
SELECT * ILIKE 'cat%' FROM animal_table
Good for
When you want to select only certain columns from a table. I probably wouldn’t use this for ETL, but if you are doing analysis on a wide table, it may be help you see only what it is relevant very quickly.
Warning
You can combine these keywords into one select statement, but as we can see, this can get hard to read. In the case of ETL, you may end up writing out the columns rather than trying to save some characters.
SELECT * EXCLUDE cat_id REPLACE (cat_cuteness * 100 || '%' AS cat_cuteness) FROM cat_table