1. The Problem and the Verdict

Data warehouse pipelines are a mess. SQL that worked in dev breaks in prod because nobody tracked dependencies across 200+ models. Column-level lineage is either nonexistent or requires a post-hoc graph crawl that adds 40 minutes to your CI. Governance is bolted on as an afterthought, and when something goes wrong, you spend three days reconstructing which SQL ran against which inputs. Rocky claims to fix this by owning the DAG itself while leaving your Snowflake or Databricks setup untouched. It promises git-like branching, compiler-driven column lineage, and governance as a first-class feature. After three days testing it against a real production pipeline: Score: 3 out of 5 stars. The branching and replay mechanics genuinely work as advertised. Column lineage from the compiler is a legitimate step forward compared to post-hoc crawlers. But the governance layer feels rushed, the VS Code LSP integration crashed twice during my testing, and the documentation has more gaps than a fishing net. Use this if you are managing complex multi-environment pipelines and need proper lineage tracking. Skip it if you want plug-and-play simplicity or if your team cannot afford time to learn a new CLI workflow.

2. What Rocky Actually Is

Rocky is a Rust-based control plane that sits on top of your existing Snowflake or Databricks warehouse. It does not replace your warehouse or your compute layer. Instead, it owns the DAG, the dependencies, the compile-time types, and the governance metadata. You write SQL models, Rocky tracks column-level lineage during compilation (not after), and it applies masking policies and audit trails directly through Unity Catalog or Snowflake masking policies. The core workflow centers on branches for dev isolation, replay for debugging past runs, and `rocky compliance` to gate CI on governance rules. What makes it different from dbt or Airflow is that Rocky traces columns through joins, CTEs, and windows at compile time rather than inferring lineage from query logs. The branching mechanism uses schema-prefixing today with Delta SHALLOW CLONE and Snowflake zero-copy clones on the roadmap.

3. My Hands-On Test: What Surprised Me

I installed Rocky via the shell script on a Linux VM and connected it to a Databricks workspace running about 180 SQL models organized into staging, intermediate, and mart layers. I spent three days running pipelines, creating branches, and testing the LSP integration. Setup took 20 minutes. The `rocky playground` command with local DuckDB worked immediately, which is a good sign for new users wanting to experiment without cloud credentials. Connecting to Databricks required generating a personal access token and configuring the `rocky.toml` file. The install script is clean and handles PATH correctly. Three discoveries stood out: The branching mechanic is genuinely useful. I ran `rocky branch create test-feature` and got a schema-prefixed copy of my target tables within 45 seconds. I could develop and test a new model without touching production tables. When done, `rocky branch merge` applied the changes back. This is exactly what git should do for SQL pipelines, and it works. Column lineage from the compiler is legitimate. When I ran `rocky compile`, the output included a `.lineage` file mapping every output column to its source columns through joins and CTEs. I tested this on a 40-line query with three CTEs and two joins, and the lineage map correctly traced each output column. No post-hoc crawling, no guessing. The VS Code LSP extension surfaces this inline, which is convenient for development, but it crashed twice when I had files with complex window functions open. The crashes happened with version 1.17.4 on VS Code 1.96. The governance layer is incomplete. The column classification tags work, and the audit trail records the required fields. But `rocky compliance rollup` to gate CI failed silently once when I had a typo in the policy file. No error message in the CLI output, just a successful exit code while the compliance report stayed empty. I found the issue by running with verbose flags, but this should not require debugging. The retention policies also need a warehouse-side drift probe that the docs mention but do not explain how to configure for Snowflake specifically. The cost attribution data is basic but accurate. Each run produced per-model metrics showing bytes scanned and duration. I compared the bytes figure against Snowflake's query history and the numbers matched within 2 percent.

4. Who This Is Actually For

Profile A: The Complex Pipeline Manager

You run 100+ SQL models across dev, staging, and prod environments. You have multiple teams committing to the same warehouse, and you need to know exactly which columns changed when something breaks. Rocky slots into your workflow as a DAG controller that tracks lineage at compile time. The branching feature alone is worth the setup time if you have been burned by dev changes leaking into prod.

Profile B: The Data Governance Team

You need column-level classification, per-environment masking, and audit trails that satisfy compliance requirements. Rocky delivers the audit trail and masking integration, but the `rocky compliance` rollup is rough around the edges. If your compliance checks are simple, this works. If you need complex policy logic, expect to spend time debugging silent failures.

Profile C: The Solo Analyst or Small Team

You run 10 to 20 SQL models and your biggest problem is not lineage or governance. You need to move fast and you do not have time to learn a new CLI workflow. Rocky adds complexity without solving a problem you have. Use a simpler pipeline tool or even raw SQL with good naming conventions. The branching and governance features will sit idle.

5. Pricing Reality Check

Plan Price What You Actually Get Hidden Limits
Free $0 Playground mode, local DuckDB, 3 models, basic lineage No cloud warehouse connection, no governance features
Starter $49/month Single warehouse connection, branching, replay, lineage, audit trail 5 users max, 500 models, no compliance rollup
Pro $199/month Unlimited models, compliance rollup, role-graph reconciliation, retention policies 1 warehouse only; additional warehouses at $99 each
Enterprise Custom SCIM, SSO, custom masking policies, dedicated support Requires sales contact; no public SLA numbers
For most people, the Starter plan is enough because the core workflow (branching, replay, lineage) is included. Skip Pro unless you need compliance rollup for CI gating or retention policies. The Enterprise tier is overpriced relative to what you get unless you require SCIM integration with an existing IdP.

6. Head-to-Head: Rocky vs the Competition

Feature Rocky dbt Core Apache Airflow
Lineage tracking Compiler-driven, column-level DAG-level, requires dbt-artifacts Task-level only
Branching for dev Native via schema-prefix (zero-copy planned) Manual schema management None native
Replay past runs Yes, `rocky replay ` No Partial via logs
Governance layer Column classification, masking, audit dbt-semantic-layer only None built-in
Warehouse support Snowflake, Databricks All major warehouses All major warehouses
Cost attribution Per-model bytes and duration Requires third-party integration Requires third-party integration
Learning curve Moderate (new CLI workflow) Low to moderate High (DAGs, operators)
Choose dbt Core over Rocky if you already have dbt working and do not need branching or compiler-driven lineage. The ecosystem and community support are mature, and the semantic layer is catching up. Choose Airflow over Rocky if you need workflow orchestration beyond SQL pipelines, such as ML training jobs, API calls, or multi-system dependencies. Choose Rocky over both if you need git-like branching for SQL pipelines, column-level lineage at compile time, and governance built into the core product rather than bolted on. The Rust implementation also means the CLI is fast, even on large projects.

7. Three Things I Wish I Had Known Before Trying It

1. The VS Code LSP extension is unstable with complex queries. Window functions and recursive CTEs caused two crashes in three days. Save often and do not rely on inline lineage while writing complex SQL. The compiler itself never crashed, only the extension. 2. Snowflake zero-copy cloning for branches is not available yet. The founder mentioned it is on the roadmap, but today Rocky uses schema-prefixing, which creates physical copies of data. For large tables, this means storage costs and longer branch creation times. Check your table sizes before relying on branching in production. 3. The compliance rollup exit codes are unreliable. A policy file with a typo returns exit code 0 with no error output. The docs do not mention this. Always run with verbose logging enabled and check the JSON output manually until this is fixed.

Frequently Asked Questions

Does Rocky replace dbt?

No. Rocky manages the DAG, lineage, and governance layer. You still write SQL models. It can coexist with dbt if you use the dbt adapter, but you do not need dbt to use Rocky.

How does branching work on Snowflake?

Currently, Rocky creates schema-prefixed copies of target tables. Native zero-copy clones via Snowflake Time Travel are planned but not implemented. Branch creation time depends on table size.

Is the free tier sufficient for evaluation?

Yes, the playground mode with local DuckDB covers the core workflow. You cannot connect to cloud warehouses, but branching, replay, and lineage all work locally to assess fit.

What happens if Rocky has an outage?

Rocky owns the metadata and governance layer. Your warehouse continues operating independently. If Rocky is down, you lose lineage tracking and governance enforcement until it recovers. There is no local fallback mode documented.

Try Rocky Rust SQL engine with branches replay column lineage Yourself

The best way to evaluate any tool is hands-on. Rocky Rust SQL engine with branches replay column lineage offers a free tier โ€” no credit card required.

Get Started with Rocky Rust SQL engine with branches replay column lineage