Post-Modern Data StackStart small. Scale big.

Starlake + DuckLake: start from your laptop, scale to the cloud without changing your stack.

The Post-Modern Data Stack is about removing friction. Declarative YAML from Starlake unifies ingestion, transformation, and orchestration. DuckLake gives you an open, SQL-backed lake format with ACID guarantees on Parquet. Together they keep your pipelines portable, composable, and cloud ready.

600M+

TPC-H rows processed in under 1 second on open Parquet

ACID

Transactional integrity, snapshots, and schema evolution

100%

SQL + Parquet. No hidden runtimes or vendor lock-in.

Declarative Pipelines

Starlake YAML + SQL

  • • Ingestion, validation, transformation in one source
  • • Automated DAGs for Airflow, Dagster & more
  • • Git-style branching for every dataset

Open Lake Format

DuckLake SQL Catalog

  • • Parquet storage with ACID transactions & snapshots
  • • Catalog in DuckDB, PostgreSQL, MySQL or SQLite
  • • Time-travel, schema evolution, multi-writer safety

Why Move Beyond the Modern Data Stack

The Post-Modern Data Stack removes friction instead of adding tools

Starlake + DuckLake deliver declarative automation and an open, transactional lake format so you can build once, run anywhere, and scale without lock-in.

Replace brittle pipelines with a single declarative layer.

Fewer tools, more momentum

The Modern Data Stack added cloud agility but also fragmentation and brittle glue code. Starlake collapses ingestion, transformation, validation, and orchestration into declarative YAML so teams focus on shipping value, not stitching platforms.

Run on SQL + Parquet. Keep your freedom to choose engines.

Open lake format with warehouse guarantees

DuckLake coordinates Parquet tables through a real SQL catalog—DuckDB for local dev, PostgreSQL or MySQL in production. ACID transactions, schema evolution, and time travel keep data trustworthy without sacrificing openness.

Flip ACTIVE_CONNECTION, keep the same pipelines.

From laptop to cloud without rewrites

Develop locally against DuckDB, then point the same configuration at cloud object storage. Starlake’s YAML definitions and DuckLake’s metadata stay identical, letting you scale without migrations or duplicated logic.

Start Small, Scale Big

One configuration unlocks local development and cloud deployment

Keep identical semantics from your laptop to production. DuckLake’s SQL catalog and Starlake’s declarative configs make environments a switch, not a rewrite.

Local Momentum

Develop with DuckDB catalogs on day zero

Bootstrap a Starlake project, point at a DuckLake catalog that lives inside DuckDB, and iterate fast without provisioning infra. Every dataset keeps ACID guarantees, snapshots, and schema evolution even in local mode.

Set ACTIVE_CONNECTION=ducklake_local and iterate with the exact SQL and YAML you will ship.

connectionRef: "{{ACTIVE_CONNECTION}}"
connections:
  ducklake_local:
    type: jdbc
    options:
      url: "jdbc:duckdb:"
      driver: "org.duckdb.DuckDBDriver"
      preActions: >
        INSTALL ducklake;
        LOAD ducklake;
        ATTACH IF NOT EXISTS 'ducklake:/local/path/metadata.ducklake' AS my_ducklake
          (DATA_PATH '/local/path/');
        USE my_ducklake;
Cloud Scale

Promote to PostgreSQL or MySQL without rewrites

Flip your ACTIVE_CONNECTION and DuckLake attaches to a shared SQL catalog with object storage paths. Starlake keeps orchestrations, data contracts, and quality policies unchanged.

Set ACTIVE_CONNECTION=ducklake_cloud and reuse the same models against shared catalogs and object storage.

connections:
  ducklake_cloud:
    type: jdbc
    options:
      url: "jdbc:postgresql://your_postgres_host/ducklake_catalog"
      driver: "org.postgresql.Driver"
      preActions: >
        INSTALL POSTGRES;
        INSTALL ducklake;
        LOAD POSTGRES;
        LOAD ducklake;
        CREATE OR REPLACE SECRET (
          type gcs,
          key_id '{{DUCKLAKE_HMAC_ACCESS_KEY_ID}}',
          secret '{{DUCKLAKE_HMAC_SECRET_ACCESS_KEY}}'
          SCOPE 'gs://ducklake_bucket/data_files/');
        ATTACH IF NOT EXISTS 'ducklake:postgres:
          dbname=ducklake_catalog
          host=your_postgres_host
          port=5432
          user=dbuser
          password={{DUCKLAKE_PASSWORD}}' AS my_ducklake
          (DATA_PATH 'gs://ducklake_bucket/data_files/');

Performance Without Trade-offs

Operate at warehouse-grade speed while staying fully open

DuckLake proves that openness and performance are not mutually exclusive. With Starlake orchestrating transformations, you get governed pipelines and instant query turnaround in one streamlined platform.

TPC-H SF100 at sub-second latency

DuckLake inherits DuckDB’s vectorized execution, delivering sub-second response times on benchmark-grade datasets while operating fully on open Parquet.

Benchmark-backed speed without warehouse lock-in.

Transactional rigor across every write

ACID semantics, snapshot isolation, and schema evolution mean your lake behaves like a warehouse—even with multiple users writing concurrently.

Time-travel debug sessions back to any version safely.

Composable with any engine or BI layer

Use DuckLake tables across engines: DuckDB, Trino, Spark, Snowflake external tables, or direct BI connections. Metadata is stored in plain SQL, not proprietary blobs.

Freedom to choose the right compute for every workload.

Why Starlake + DuckLake Works

Every pillar of the Post-Modern Data Stack, solved as one system

Pair declarative automation with an open, transactional lake format and deliver analytics that scale with confidence. Here is how the combination maps to the needs of modern data teams.

Need
Starlake Delivers
DuckLake Enables
Quality-first ingestion
Build data contracts, schema checks, and freshness SLAs straight into YAML ingestion specs.
Catalog metadata, snapshots, and audit trails guarantee trust across every write.
SQL-only, portable transformations
Keep business logic expressed as first-class SQL that Starlake orchestrates automatically.
Operate on Parquet via ANSI SQL so every engine and BI layer stays fully interoperable.
Local dev, global deployment
Prototype against DuckDB on your laptop and promote the exact same DAGs to production.
Swap catalogs between DuckDB, PostgreSQL, MySQL, or SQLite with zero refactors.
Git-style data branching
Spin up dataset branches for experiments, approvals, or hotfixes with a single command.
Time-travel and snapshot isolation keep every branch consistent and rewindable.
Orchestration-agnostic pipelines
Generate Airflow, Dagster from lineage-aware SQL.
Unified metadata powers dependency resolution and reproducible scheduling anywhere.
Semantic model freedom
Publish metrics layers for the BI tool of your choice without reauthoring transforms.
Open tables keep semantic layers portable, audit-friendly, and vendor neutral.

DuckLake vs. Cloud Data Warehouses

Choose openness when you can, managed services when you must

DuckLake pairs with Starlake to minimize friction and costs, while cloud data warehouses still excel when you need fully managed elasticity. Many teams adopt both—open lakehouse foundations with managed services where it makes sense.

Why teams choose DuckLake

  • Lower, predictable costs. Store data as Parquet on affordable object storage and run analytical compute only when you need it—no per-query or per-user surprises.
  • Full data control and privacy. Keep data within your own perimeter, apply custom security controls, and satisfy regulatory requirements without handing telemetry to a vendor.
  • Optimized performance. Leverage DuckDB’s vectorized execution directly on columnar files to get warehouse-class latency without shipping data to a proprietary engine.
  • Open and transparent. DuckLake is open source and SQL-native, so you can audit, extend, and integrate without proprietary lock-in.
  • Vibrant community and ecosystem. An active OSS community evolves DuckLake rapidly and keeps interoperability with the tools you already use.

When cloud warehouses add value

  • Fully managed operations. Vendors handle infrastructure, maintenance, and automatic upgrades, reducing operational overhead.
  • Elastic scalability. Instantly dial compute and storage up or down to match workload spikes with usage-based pricing.
  • Rich cloud integrations. Out-of-the-box connectors for analytics, ML, ingestion, and visualization streamline adoption.
  • Built-in reliability. High availability, backups, and global accessibility are included, supporting distributed teams.
  • Enterprise security & compliance. Providers offer fine-grained governance and compliance certifications maintained on your behalf.
Ready to build the Post-Modern Data Stack?

Launch your first DuckLake-backed pipeline in under an hour.

Schedule a strategy session with our team to see how Starlake + DuckLake modernize your data platform without a re-platforming marathon. We'll walk through your stack, map the migration path, and spin up a proof of value anchored in your requirements.