dbt has become the de facto standard for the SQL transformation layer — and for good reason. But "using dbt" and "using dbt well" are two different things. Projects that start clean tend to develop entropy as teams grow. This guide documents the organizational patterns that scale.
Folder Structure: The Foundation
dbt's recommended folder structure maps to the layers of your data architecture:
models/
├── staging/ # 1:1 with source tables, light casting only
│ ├── salesforce/
│ │ ├── _salesforce__sources.yml
│ │ ├── _salesforce__models.yml
│ │ ├── stg_salesforce__accounts.sql
│ │ └── stg_salesforce__opportunities.sql
│ └── stripe/
│ ├── _stripe__sources.yml
│ └── stg_stripe__charges.sql
├── intermediate/ # Business logic, not exposed to BI
│ └── int_orders__joined.sql
├── marts/ # Business domain groupings
│ ├── finance/
│ │ ├── _finance__models.yml
│ │ ├── fct_orders.sql
│ │ └── dim_customers.sql
│ └── marketing/
│ ├── fct_campaigns.sql
│ └── dim_channels.sql
└── utilities/ # Shared macros, date spines, etc.
stg_ for staging, int_ for intermediate, fct_ for fact tables, dim_ for dimensions. Include the source system in staging names: stg_salesforce__accounts (double underscore separates source from entity).
Staging Models: The Right Level of Abstraction
Staging models should do exactly one thing: faithfully represent the source table with minimal, non-destructive transformations. That means:
-- stg_salesforce__opportunities.sql
-- GOOD: light transforms only
WITH source AS (
SELECT * FROM {{ source('salesforce', 'opportunity') }}
),
renamed AS (
SELECT
id AS opportunity_id,
account_id,
name AS opportunity_name,
amount AS opportunity_amount_usd,
stagename AS stage_name,
closedate AS close_date,
isclosed AS is_closed,
iswon AS is_won,
createddate AS created_at,
lastmodifieddate AS updated_at
FROM source
)
SELECT * FROM renamed
Testing Strategy
dbt's built-in tests cover a lot of ground. A good baseline:
# _finance__models.yml
version: 2
models:
- name: fct_orders
description: "One row per order. Grain: order_id."
columns:
- name: order_id
description: "Primary key from the orders source system."
tests:
- unique
- not_null
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_status
tests:
- accepted_values:
values: ['pending', 'processing', 'shipped', 'delivered', 'cancelled']
- name: total_amount_usd
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
Custom Tests for Business Logic
-- tests/assert_orders_have_valid_totals.sql
-- This test should return zero rows to pass
SELECT order_id
FROM {{ ref('fct_orders') }}
WHERE total_amount_usd < 0
OR (order_status = 'delivered' AND total_amount_usd = 0)
Materializations: Choosing Wisely
The default materialization choice has huge performance and cost implications:
- Staging models →
view(always fresh, no storage cost) - Intermediate models →
ephemeral(inlined as CTEs, no table created) orview - Fact tables →
incremental(append only new/changed rows) - Dimension tables →
table(full refresh, usually fast enough) - Large historical tables →
incrementalwithunique_keyfor upsert behavior
-- fct_orders.sql (incremental example)
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
cluster_by=['order_date', 'customer_id']
) }}
SELECT
o.order_id,
o.customer_id,
o.order_date,
o.total_amount_usd,
c.region,
CURRENT_TIMESTAMP() AS dbt_updated_at
FROM {{ ref('stg_orders__orders') }} o
JOIN {{ ref('dim_customers') }} c USING (customer_id)
{% if is_incremental() %}
WHERE o.updated_at > (SELECT MAX(dbt_updated_at) FROM {{ this }})
{% endif %}
Documentation That Stays Current
The hardest part of dbt documentation is keeping it accurate. Two practices that help:
1. Use doc blocks for shared definitions. Define key business terms once in docs/ and reference them across models:
-- docs/definitions.md
{% docs order_status %}
The current fulfillment status of the order. Values: pending (payment received, not yet
picked), processing (in warehouse), shipped (with carrier), delivered (confirmed receipt),
cancelled (order voided).
{% enddocs %}
2. Run dbt docs generate in CI. Make documentation generation part of your deployment pipeline so the catalog is always in sync with your models. Publish the generated site to an internal tool like Netlify or an S3 static site.
Packages Worth Adding
dbt-utils— surrogate keys, date spines, generic testsdbt-expectations— Great Expectations-style tests in YAMLdbt-audit-helper— compare model outputs between branchesdbt-meta-testing— enforce that all models have tests and descriptions
The "Definition of Done" for a dbt Model
A model isn't done until it has: a grain statement in its description, a primary key with unique + not_null tests, at least one relationship test linking it to upstream models, accepted_values tests on all enum/status columns, and a business owner documented in its .yml meta section. Encode this as a dbt-meta-testing check in CI and it enforces itself.