Views in Snowflake
Someone asked this question on Reddit
trying to build a data mart basically and most of the times it will be file source that will be transformed through ETL and eventually loaded into snowflake. So after that plan will be to build some views on top of these transformed tables which will be later used for data science purposes so wanted to know how to first approach building tables on snowflake (transient, permanent) i heard from some people suggesting to use transient tables in development environment So was getting confused what should be used when and how and when things like time travel should be enabled Along with that if there are best practices for roles accesses, data security, designing of tables and views will be great
I previously posted about tables here. As for views, I don’t really have strong opinions on them. I have so many strong opinions on other things, like cats, that there’s no room left for views. This post will again be derivative of Snowflake documentation, with a bit of advice.
There are four types of views according to Snowflake’s documentation:
- Views
- Materialized views
- Secure views
- Recursive views
Note that I will not be covering recursive views because, to be honest, I’ve never used them so I can’t add anymore than the Snowflake documentation includes. They can always be replaced with recursive CTEs, and the complication is in the recursion itself, not the creation of the view. So we will focus on:
Views, regular old views
What they are
I always tell people to think about views as just saved queries. Let’s say you wrote a query to find the number of cats per day:
SELECT
cat_name,
SUM(cats)
FROM cat_info
GROUP BY cat_name
If you need to rerun that query often, you may save it in a Snowflake tab or a file and call it cat_count. All you are saving is the query that gets the data, not the data. Wouldn’t it be easy if you could name that save query and invoke it with one line? That is effectively a view. You would make that view like this:
CREATE OR REPLACE VIEW cat_view AS
SELECT
cat_name,
SUM(cats)
FROM cat_info
GROUP BY cat_name
And you would call it like this:
Select * from cat_view
Pros
- Remember this is just a saved query so the benefit is just convenience; instead of finding your old query and running it, you can now get the same results easier.
- The data is always up to date. Calling cat_view always refers back to cat_info.
- For a data engineer, the benefit is there is very little code to maintain. Although all tables and views that are available in the production database should be stored in version control and therefore part of the ETL, storing the definition of a view is simpler than storing DDL and a merge.
Cons
- Since cat_view is not truly stored, it can take some time to return. This is especially true if your view definition has joins or references other views. Selecting from the view will only be as fast as running the underlying query, and that can be slow. If multiple people are querying it throughout the day, it may cost a lot of compute for something that, if materialized, would only be paid for once.
- For a data engineer, the other con is that a view requires more maintenance than intuition would suggest. Even if a field was changed in cat_info that was not cat_name or cats, the view would have to be manually dropped and recreated or else it would give an
invalid
error. As a result, I suggest keeping your view creation statements close to their underlying table(s), so that way you’ll remember to drop and recreate them if you add or change a field. Alternatively, you could have all your views CREATE OR REPLACE overnight, but make sure your permissions account for this!
Materialized views
What they are
Materialized views are…well, materialized. They are views that, instead of updating (running) whenever they are selected from, they update whenever there is an DML action to the underlying table. So if cat_info
is updated with a MERGE or UPDATE at 10 am, cat_view
will then update and materialize. When cat_view
is queried throughout the day, it will produce the same materialized view.
Pros
- Unlike views, they only update when the underlying table updates, so there’s no waste in compute if it’s queried frequently during the day
- Similar to views, you do not have to write full DDL and merge statements to update them so it’s less code to maintain
Cons
- Similar to views, any change to the underlying table will require a recreation of the view definition.
- You can’t have a join (or, at the time of this writing, an analytical function) in the definition of a materialized view. This greatly reduces their usefulness
Secure views
What they are
Secure views can be either regular or materialized; the difference is that they protect the underlying data and the view definition from end users. If the latter isn’t important to you, then don’t use them.
To be clear, users in Snowflake do not need access to cat_info
in order to query cat_view
, whether you use a secure view or not. The difference is subtler than that; the Snowflake website gives a particularly terrifying example of a user forcing an error to see if there is any data in the underlying table:
SELECT *
FROM widgets_view
WHERE 1/iff(color = 'Purple', 0, 1) = 1;
According to their documentation, his query can be used to determine if there are any purple widgets even if widgets_view is filtered to red widgets only.
Additionally, a secure view means that the end user can’t use GET_DDL
or SHOW VIEWS
to find out the view definition.
Pros
- The reason to use these is simply because of the data they hide; both in the widget example and in the view definition, you can see how secure views can help you restrict access to certain end users. If you need this restriction, this view is a great option.
Cons
- If you do not need to leverage the security of secure views, then don’t use them because they can be slower and take away users’s ability to see their definition. There is no benefit to them if not security.