dbt Best Practices: Structuring Your Transformation Layer

dbt has become the de facto standard for the SQL transformation layer — and for good reason. But "using dbt" and "using dbt well" are two different things. Projects that start clean tend to develop entropy as teams grow. This guide documents the organizational patterns that scale.

Folder Structure: The Foundation

dbt's recommended folder structure maps to the layers of your data architecture:

models/
├── staging/          # 1:1 with source tables, light casting only
│   ├── salesforce/
│   │   ├── _salesforce__sources.yml
│   │   ├── _salesforce__models.yml
│   │   ├── stg_salesforce__accounts.sql
│   │   └── stg_salesforce__opportunities.sql
│   └── stripe/
│       ├── _stripe__sources.yml
│       └── stg_stripe__charges.sql
├── intermediate/     # Business logic, not exposed to BI
│   └── int_orders__joined.sql
├── marts/            # Business domain groupings
│   ├── finance/
│   │   ├── _finance__models.yml
│   │   ├── fct_orders.sql
│   │   └── dim_customers.sql
│   └── marketing/
│       ├── fct_campaigns.sql
│       └── dim_channels.sql
└── utilities/        # Shared macros, date spines, etc.

💡

Naming Convention Prefix model names with their type: stg_ for staging, int_ for intermediate, fct_ for fact tables, dim_ for dimensions. Include the source system in staging names: stg_salesforce__accounts (double underscore separates source from entity).

Staging Models: The Right Level of Abstraction

Staging models should do exactly one thing: faithfully represent the source table with minimal, non-destructive transformations. That means:

-- stg_salesforce__opportunities.sql
-- GOOD: light transforms only
WITH source AS (
  SELECT * FROM {{ source('salesforce', 'opportunity') }}
),

renamed AS (
  SELECT
    id                          AS opportunity_id,
    account_id,
    name                        AS opportunity_name,
    amount                      AS opportunity_amount_usd,
    stagename                   AS stage_name,
    closedate                   AS close_date,
    isclosed                    AS is_closed,
    iswon                       AS is_won,
    createddate                 AS created_at,
    lastmodifieddate            AS updated_at
  FROM source
)

SELECT * FROM renamed

Testing Strategy

dbt's built-in tests cover a lot of ground. A good baseline:

# _finance__models.yml
version: 2

models:
  - name: fct_orders
    description: "One row per order. Grain: order_id."
    columns:
      - name: order_id
        description: "Primary key from the orders source system."
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id
      - name: order_status
        tests:
          - accepted_values:
              values: ['pending', 'processing', 'shipped', 'delivered', 'cancelled']
      - name: total_amount_usd
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"

Custom Tests for Business Logic

-- tests/assert_orders_have_valid_totals.sql
-- This test should return zero rows to pass
SELECT order_id
FROM {{ ref('fct_orders') }}
WHERE total_amount_usd < 0
   OR (order_status = 'delivered' AND total_amount_usd = 0)

Materializations: Choosing Wisely

The default materialization choice has huge performance and cost implications:

Staging models → view (always fresh, no storage cost)
Intermediate models → ephemeral (inlined as CTEs, no table created) or view
Fact tables → incremental (append only new/changed rows)
Dimension tables → table (full refresh, usually fast enough)
Large historical tables → incremental with unique_key for upsert behavior

-- fct_orders.sql (incremental example)
{{ config(
  materialized='incremental',
  unique_key='order_id',
  incremental_strategy='merge',
  cluster_by=['order_date', 'customer_id']
) }}

SELECT
  o.order_id,
  o.customer_id,
  o.order_date,
  o.total_amount_usd,
  c.region,
  CURRENT_TIMESTAMP() AS dbt_updated_at
FROM {{ ref('stg_orders__orders') }} o
JOIN {{ ref('dim_customers') }} c USING (customer_id)

{% if is_incremental() %}
  WHERE o.updated_at > (SELECT MAX(dbt_updated_at) FROM {{ this }})
{% endif %}

Documentation That Stays Current

The hardest part of dbt documentation is keeping it accurate. Two practices that help:

1. Use doc blocks for shared definitions. Define key business terms once in docs/ and reference them across models:

-- docs/definitions.md
{% docs order_status %}
The current fulfillment status of the order. Values: pending (payment received, not yet 
picked), processing (in warehouse), shipped (with carrier), delivered (confirmed receipt), 
cancelled (order voided).
{% enddocs %}

2. Run dbt docs generate in CI. Make documentation generation part of your deployment pipeline so the catalog is always in sync with your models. Publish the generated site to an internal tool like Netlify or an S3 static site.

Packages Worth Adding

dbt-utils — surrogate keys, date spines, generic tests
dbt-expectations — Great Expectations-style tests in YAML
dbt-audit-helper — compare model outputs between branches
dbt-meta-testing — enforce that all models have tests and descriptions

The "Definition of Done" for a dbt Model

A model isn't done until it has: a grain statement in its description, a primary key with unique + not_null tests, at least one relationship test linking it to upstream models, accepted_values tests on all enum/status columns, and a business owner documented in its .yml meta section. Encode this as a dbt-meta-testing check in CI and it enforces itself.

dbt Best Practices: Structuring Your Transformation Layer

Folder Structure: The Foundation

Staging Models: The Right Level of Abstraction

Testing Strategy

Custom Tests for Business Logic

Materializations: Choosing Wisely

Documentation That Stays Current

Packages Worth Adding

The "Definition of Done" for a dbt Model

Related Articles

The Medallion Architecture Explained

SQL Window Functions Playbook

What is Analytics Engineering?