The word "governance" makes data engineers reach for the exit. It conjures images of approval committees, endless policy documents, and bureaucratic processes that slow everything down. But ungoverned data has its own costs: nobody knows who to trust, critical decisions get made on wrong numbers, and compliance audits turn into all-hands emergencies.

The solution isn't less governance or more governance. It's right-sized governance — just enough structure to maintain trust, applied where it matters most, automated wherever possible.

The Minimal Viable Governance Stack

If you're starting from zero, focus on four things: data catalog, data ownership, access control, and change management. Everything else is optional until you're larger.

1. Data Catalog: Make Data Discoverable

Before you can govern data, people need to know it exists. A data catalog is a searchable inventory of your data assets — tables, columns, lineage, owners, and descriptions. Without this, governance is impossible because people don't know what they're supposed to be governing.

Start simple. Even a well-maintained README and dbt docs site is better than nothing. Purpose-built catalogs (DataHub, Atlan, Alation, Collibra) add lineage tracking, business glossaries, and usage analytics — valuable when you're at scale.

2. Data Ownership: Assign Accountability

Every dataset needs a named owner — a specific person or team who is accountable for its quality, freshness, and documentation. Without explicit ownership, governance questions go unanswered and quality declines by default.

# Ownership manifest — track this in version control
datasets:
  - id: orders_daily
    owner: data-platform-team
    steward: jane.smith@company.com
    classification: business_critical
    consumers: [finance, analytics, ml-platform]
    sla:
      freshness_hours: 2
      availability_pct: 99.9

  - id: user_events_raw
    owner: product-analytics-team
    steward: bob.jones@company.com
    classification: pii_sensitive
    pii_fields: [user_id, email, ip_address]
    retention_days: 365
    access_policy: restricted

3. Access Control: Protect What Needs Protection

Not all data needs the same access controls. Classify your datasets into tiers and apply access policies accordingly. A practical three-tier model:

Implement access control through your data platform's native RBAC, not manually. Every data access decision that requires a Jira ticket and a 3-day wait creates shadow IT — people find workarounds rather than going through proper channels.

4. Change Management: Prevent Breaking Changes

The most common governance failure isn't malicious — it's a well-intentioned engineer renaming a column without realizing that 12 downstream dashboards depend on it. Implement lightweight change management for high-impact datasets.

# Schema change detection in CI/CD (GitHub Actions)
name: Schema Change Check
on: [pull_request]

jobs:
  schema-check:
    runs-on: ubuntu-latest
    steps:
      - name: Detect breaking schema changes
        run: |
          # Compare schema against registered data contracts
          python scripts/check_schema_changes.py \
            --contract-dir contracts/ \
            --changed-models "$(git diff --name-only origin/main)"

      - name: Block if breaking change without deprecation notice
        run: |
          if breaking_change_detected && ! deprecation_notice_filed; then
            echo "Breaking schema change detected."
            echo "File a deprecation notice and notify consumers."
            exit 1
          fi

The Business Glossary: Solving the "One Number" Problem

Every organization has a version of this problem: Finance says revenue was $12.4M last quarter. Sales says it was $13.1M. Both are technically correct — they're just measuring different things with the same word.

A business glossary defines canonical terms with agreed-upon business logic. Not every term, just the ones that matter most — the ones that appear in executive presentations, that multiple teams measure independently, or that have caused confusion in the past.

# Business glossary entry example
term: Monthly Recurring Revenue (MRR)
alias: [mrr, monthly_revenue, subscription_revenue]
definition: >
  The total normalized monthly subscription revenue from active paying
  customers. Includes only recurring subscription components.

  EXCLUDES: one-time setup fees, professional services, usage overages,
  trials (even paid trials in first 30 days), and churned customers.

  CALCULATION: SUM(subscription_amount / billing_period_months)
  for all active subscriptions as of the last day of the month.

canonical_table: marts.finance.fct_mrr_monthly
canonical_column: mrr_amount_usd
owner: finance-analytics
approved_by: [CFO, Head of Finance, Head of Data]
last_reviewed: 2025-03-01

Governance That Works in Practice

The governance programs that succeed share a few characteristics. They are automated first — manual processes rot. Schema validation in CI/CD beats a "please check with the data team" Slack reminder. They are enforced at the platform level, not by asking people nicely. And they are proportional to risk — a table used by one analyst for exploratory work needs different governance than the revenue table that feeds the board deck.

🚀 30-Day Governance Kickstart

Week 1: Inventory your top 20 datasets and assign owners. Week 2: Classify each as Open/Restricted/Confidential and set access policies. Week 3: Document the top 10 most-contested business terms in a glossary. Week 4: Add schema change detection to CI/CD for your 5 most critical datasets. That's a governance program you can actually maintain.

Governance is not a project you complete. It's a practice you maintain. Start small, make it easy to comply, and expand as your organization grows. The goal isn't perfect governance — it's enough governance that people trust the data and can find what they need.