Starlake + DuckLake: start from your laptop, scale to the cloud without changing your stack.
The Post-Modern Data Stack is about removing friction. Declarative YAML from Starlake unifies ingestion, transformation, and orchestration. DuckLake gives you an open, SQL-backed lake format with ACID guarantees on Parquet. Together they keep your pipelines portable, composable, and cloud ready.
TPC-H rows processed in under 1 second on open Parquet
Transactional integrity, snapshots, and schema evolution
SQL + Parquet. No hidden runtimes or vendor lock-in.
Declarative Pipelines
Starlake YAML + SQL
- • Ingestion, validation, transformation in one source
- • Automated DAGs for Airflow, Dagster & more
- • Git-style branching for every dataset
Open Lake Format
DuckLake SQL Catalog
- • Parquet storage with ACID transactions & snapshots
- • Catalog in DuckDB, PostgreSQL, MySQL or SQLite
- • Time-travel, schema evolution, multi-writer safety
Why Move Beyond the Modern Data Stack
The Post-Modern Data Stack removes friction instead of adding tools
Starlake + DuckLake deliver declarative automation and an open, transactional lake format so you can build once, run anywhere, and scale without lock-in.
Fewer tools, more momentum
The Modern Data Stack added cloud agility but also fragmentation and brittle glue code. Starlake collapses ingestion, transformation, validation, and orchestration into declarative YAML so teams focus on shipping value, not stitching platforms.
Open lake format with warehouse guarantees
DuckLake coordinates Parquet tables through a real SQL catalog—DuckDB for local dev, PostgreSQL or MySQL in production. ACID transactions, schema evolution, and time travel keep data trustworthy without sacrificing openness.
From laptop to cloud without rewrites
Develop locally against DuckDB, then point the same configuration at cloud object storage. Starlake’s YAML definitions and DuckLake’s metadata stay identical, letting you scale without migrations or duplicated logic.
Start Small, Scale Big
One configuration unlocks local development and cloud deployment
Keep identical semantics from your laptop to production. DuckLake’s SQL catalog and Starlake’s declarative configs make environments a switch, not a rewrite.
Develop with DuckDB catalogs on day zero
Bootstrap a Starlake project, point at a DuckLake catalog that lives inside DuckDB, and iterate fast without provisioning infra. Every dataset keeps ACID guarantees, snapshots, and schema evolution even in local mode.
Set ACTIVE_CONNECTION=ducklake_local and iterate with the exact SQL and YAML you will ship.
connectionRef: "{{ACTIVE_CONNECTION}}"
connections:
ducklake_local:
type: jdbc
options:
url: "jdbc:duckdb:"
driver: "org.duckdb.DuckDBDriver"
preActions: >
INSTALL ducklake;
LOAD ducklake;
ATTACH IF NOT EXISTS 'ducklake:/local/path/metadata.ducklake' AS my_ducklake
(DATA_PATH '/local/path/');
USE my_ducklake;Promote to PostgreSQL or MySQL without rewrites
Flip your ACTIVE_CONNECTION and DuckLake attaches to a shared SQL catalog with object storage paths. Starlake keeps orchestrations, data contracts, and quality policies unchanged.
Set ACTIVE_CONNECTION=ducklake_cloud and reuse the same models against shared catalogs and object storage.
connections:
ducklake_cloud:
type: jdbc
options:
url: "jdbc:postgresql://your_postgres_host/ducklake_catalog"
driver: "org.postgresql.Driver"
preActions: >
INSTALL POSTGRES;
INSTALL ducklake;
LOAD POSTGRES;
LOAD ducklake;
CREATE OR REPLACE SECRET (
type gcs,
key_id '{{DUCKLAKE_HMAC_ACCESS_KEY_ID}}',
secret '{{DUCKLAKE_HMAC_SECRET_ACCESS_KEY}}'
SCOPE 'gs://ducklake_bucket/data_files/');
ATTACH IF NOT EXISTS 'ducklake:postgres:
dbname=ducklake_catalog
host=your_postgres_host
port=5432
user=dbuser
password={{DUCKLAKE_PASSWORD}}' AS my_ducklake
(DATA_PATH 'gs://ducklake_bucket/data_files/');Performance Without Trade-offs
Operate at warehouse-grade speed while staying fully open
DuckLake proves that openness and performance are not mutually exclusive. With Starlake orchestrating transformations, you get governed pipelines and instant query turnaround in one streamlined platform.
TPC-H SF100 at sub-second latency
DuckLake inherits DuckDB’s vectorized execution, delivering sub-second response times on benchmark-grade datasets while operating fully on open Parquet.
Benchmark-backed speed without warehouse lock-in.
Transactional rigor across every write
ACID semantics, snapshot isolation, and schema evolution mean your lake behaves like a warehouse—even with multiple users writing concurrently.
Time-travel debug sessions back to any version safely.
Composable with any engine or BI layer
Use DuckLake tables across engines: DuckDB, Trino, Spark, Snowflake external tables, or direct BI connections. Metadata is stored in plain SQL, not proprietary blobs.
Freedom to choose the right compute for every workload.
Why Starlake + DuckLake Works
Every pillar of the Post-Modern Data Stack, solved as one system
Pair declarative automation with an open, transactional lake format and deliver analytics that scale with confidence. Here is how the combination maps to the needs of modern data teams.
DuckLake vs. Cloud Data Warehouses
Choose openness when you can, managed services when you must
DuckLake pairs with Starlake to minimize friction and costs, while cloud data warehouses still excel when you need fully managed elasticity. Many teams adopt both—open lakehouse foundations with managed services where it makes sense.
Why teams choose DuckLake
- Lower, predictable costs. Store data as Parquet on affordable object storage and run analytical compute only when you need it—no per-query or per-user surprises.
- Full data control and privacy. Keep data within your own perimeter, apply custom security controls, and satisfy regulatory requirements without handing telemetry to a vendor.
- Optimized performance. Leverage DuckDB’s vectorized execution directly on columnar files to get warehouse-class latency without shipping data to a proprietary engine.
- Open and transparent. DuckLake is open source and SQL-native, so you can audit, extend, and integrate without proprietary lock-in.
- Vibrant community and ecosystem. An active OSS community evolves DuckLake rapidly and keeps interoperability with the tools you already use.
When cloud warehouses add value
- Fully managed operations. Vendors handle infrastructure, maintenance, and automatic upgrades, reducing operational overhead.
- Elastic scalability. Instantly dial compute and storage up or down to match workload spikes with usage-based pricing.
- Rich cloud integrations. Out-of-the-box connectors for analytics, ML, ingestion, and visualization streamline adoption.
- Built-in reliability. High availability, backups, and global accessibility are included, supporting distributed teams.
- Enterprise security & compliance. Providers offer fine-grained governance and compliance certifications maintained on your behalf.
Launch your first DuckLake-backed pipeline in under an hour.
Schedule a strategy session with our team to see how Starlake + DuckLake modernize your data platform without a re-platforming marathon. We'll walk through your stack, map the migration path, and spin up a proof of value anchored in your requirements.