Author: Nikhil Pai
Introduction
This blog talks about how we can connect DBT with a popular cloud data warehousing platform like Snowflake. It gives you an overview of how you can quickly get started with using DBT and Snowflake and why DBT is a good option when it comes to Snowflake.
We will be looking at how to set up the DBT cloud IDE to connect to Snowflake.
What is DBT?
DBT or Data Build Tool is an ETL tool that allows for a more efficient way of transforming data in your warehouses. In this case, it leverages the computing power of Snowflake to perform transformations on your data.
Now DBT doesn’t extract or load data but transforms data that has been already loaded into your warehouse. This brings up a new term called ‘Analytics Engineers’ to help transform data in their warehouses by simply writing select statements.
DBT offers two ways of leveraging its tool:
- The ‘DBT cloud platform’ which is an IDE hosted on the web and
- The ‘DBT CLI’ is an open-source tool that can be downloaded onto your personal workspace.
Where does DBT fit in the modern BI stack?
DBT quite nicely fits in the stack along with some products like Snowflake, Redshift, Stitch, Fivetran, and more.
Why DBT?
Using DBT, you will be focusing on only the core business logic of writing select queries or models to transform the data according to your needs. You won’t be required to write code to create tables, views or set up a chain of order to execute your models. DBT handles creating objects in the warehouse for you.
DBT has quite a few useful features which makes it that much more powerful
- Code compiler: DBT leverages jinja, which is a lightweight templating language that allows using control structures like the ‘if’ and ‘for’ statements in your SQL queries.
- Documentation: DBT’s feature for documentation allows you to write, source-control, or version control and sharing of this documentation for the models.
- Test: It allows you to test your SQL code which can be quite difficult to test due to the frequent change in underlying data. But with DBT, you can easily test whether a specified column has Non-null values or unique values and more.
- Package manager: DBT comes with a package manager, which helps users to publish both public and private repositories of their DBT codes which in turn can be leveraged by other users.
DBT comes along with many other such powerful features which allow analysts to easily work on just the core business logic of their projects.
Prerequisites before you get started with DBT
- A snowflake Instance that is up and running(any edition).
- A GitHub account for version control.
Steps on how to connect DBT and Snowflake
Step 1: Create an account on the DBT cloud by filling the fields with the necessary information and verifying your Email. You will then land upon a welcome page. Click on continue.
Step 2: Click on Snowflake. Fill up the fields with the Snowflake account details and the development credentials, i.e., your Snowflake login credentials.
Note: If the URL for your Snowflake account is xyz12345.snowflakecomputing.com, then you should enter xyz12345.If the URL for your Snowflake account is something like xyz12345.east-us-2.azure.snowflakecomputing.com, then you should enter xyz12345.east-us-2.azure.
Once done, click on test, and once the test passes, you can move to the next page by clicking continue.
Step 3: Connect to your GitHub repository. Create a new repository on your GitHub account using the same email id used for creating your DBT cloud account. Click on Github on the setup repository page.
On the integrations, page click on the link to your GitHub account.
Click on only the select repositories and select the repository newly created for DBT and click on install. After installation, you will be redirected to the Integrations page. Navigate to the hamburger menu on the top left corner of your DBT page and click on the home page to return to set up the rest of your account.
Click on continue.
Click on GitHub again, and the link to your repository will be highlighted below. Click on that repository, and once successfully connected, click on continue.
Step 4: Click on start developing to jump into your IDE.
The initial page shows an error. Click on initialize your project to connect to your GitHub repository on your IDE.
Once done, your IDE is ready to be run with some sample codes based on your requirements for transformations.
Summary
DBT overall provides you with a powerful transformation tool that goes well with Snowflake. It gives you the ability to version control your code and data and provides Analysts the core features required to run their BI tools on top of the transformed data.
It is an easy-to-use and transforming tool among many ETL tools out there.