From brittle pipelines to an AI Ready Lakehouse.
We help enterprise teams modernize ingestion and transformations with declarative YAML + pure SQL, optimize cloud native lakehouse performance, integrate real-time AI inference, and ship governance and FinOps as code.
Pipeline Modernization
YAML + SQL, not spaghetti code
- • Universal JDBC/ODBC connectivity
- • Automated schema evolution policies
- • Validation, privacy & lineage built in
AI Ready Lakehouse
Real time features & inference paths
- • Merge on read upserts for low latency updates
- • Feature definitions reused for training & serving
- • Cost governance via spot/serverless automation
Starlake integrates effortlessly for maximum flexibility.
Who we are
One team. Three responsibilities.
We build
The declarative data stack
We design and maintain Starlake, the YAML + SQL framework that unifies ingestion, transformation, and orchestration across BigQuery, Snowflake, Redshift, Databricks, and DuckDB.
We steward
The Pragmatic Open Source Data Stack
An independent team committed to keeping Starlake Apache-licensed, contributor-friendly, and roadmap-driven, alongside DuckDB, DuckLake, and Quack On Demand.
We support
Teams running it in production
Direct access to the engineers who build the framework. Architecture reviews, custom implementations, and long-term collaboration for organizations on the hook for outcomes.
Service tiers
Named engagement formats, sized to where you are.
From a focused advisory sprint to a long-term embedded team, every tier is staffed by senior engineers who work on the open-source projects you depend on.
Architecture & Advisory
Reviews, audits, blueprints
A senior architect spends one to four weeks with your team. You walk away with a written assessment, a target topology, and a phased migration plan you can hand to engineering.
Deliverables
- Current-state audit and risk register
- Reference architecture for ingestion, lakehouse, and AI paths
- Migration plan with milestones and decision points
- Cost and governance baselines
Platform Implementation
Build the lakehouse end-to-end
We deliver a production-grade platform: declarative pipelines, lakehouse topology, governance and FinOps, hardened CI/CD. Your team learns the patterns as we ship.
Deliverables
- Production ingestion + transformation pipelines (YAML + SQL)
- Lakehouse layout with partitioning and upserts tuned
- Lineage, privacy controls, and audit trails as code
- CI/CD gates, runbooks, and observability
Embedded Engineering
Long-term collaboration with your team
Senior Starlake engineers join your team on a multi-quarter basis. We pair on hard problems, build features into the open-source core when it makes sense, and accelerate your roadmap.
Deliverables
- Dedicated engineers integrated into your workflow
- Custom feature development aligned with the OSS roadmap
- Performance and cost optimization on a recurring cadence
- Knowledge transfer baked into every cycle
Enterprise Support
SLA-backed support for production Starlake
Priority access to the engineers who maintain Starlake, DuckLake, and Quack On Demand. We help you run them reliably, diagnose incidents fast, and stay aligned with upstream.
Deliverables
- Defined response and resolution SLAs
- Direct Slack / shared channel with the core team
- Quarterly upgrade and roadmap reviews
- Hotfix paths for production-critical bugs
Not sure which tier fits?
Tell us about your stack and constraints. We'll propose the shape of an engagement that makes sense.
Competency Framework
Four pillars that cover the modern data stack end-to-end
Your services engagement is structured around the critical failure points in today’s data platforms so modernization is measurable, repeatable, and scalable.
Declarative Data Engineering
Modernize ELT with Configuration as code
Replace brittle scripts and opaque ETL tools with YAML driven ingestion, quality checks, and SQL first transformations that are portable across clouds.
- Zero/low code ingestion patterns
- Automated schema evolution policies
- SQL transpilation (write once, run anywhere)
- Auto lineage + dependency graphing
Cloud-Native Lakehouse
Performance and elasticity at petabyte scale
Design cloud-native lakehouse topologies that separate compute and storage, solve small-files and upsert bottlenecks, and optimize IO for object stores.
- Compute/storage separation with Kubernetes
- Hybrid partitioning to prevent skew
- Merge-on-read upserts for near real time
- Metadata scaling strategies
AI & ML Infrastructure
Close the loop between data and inference
Unify feature engineering across training and serving to eliminate skew, support large sparse feature spaces, and enable real time inference paths.
- Unified feature definitions via SQL
- Real-time inference integration paths
- Support for sparse / large feature spaces
- Online learning ready architectures
FinOps & Governance
Cost, security, and compliance by design
Ship cloud cost controls, lineage, and privacy controls as code so production platforms remain auditable, secure, and economically sustainable.
- Policy-as-code for privacy and controls
- Column-level encryption and masking
- Auditability and automated lineage
- Spot ready + serverless cost strategies
Services
A complete catalog of capabilities from pipelines to AI to governance
Choose a targeted engagement or combine modules into an end-to-end modernization program.
What you get
Production grade architecture, implementation, and enablement delivered with the same declarative principles we advocate.
- Architecture blueprints and reference implementations
- Production deployment and hardening
- Team enablement (YAML + SQL + CI workflows)
- Performance and cost optimization playbooks
Engagement Model
A predictable path from assessment to production and continuous optimization
We structure work in phases so you can deliver value early, reduce risk, and scale confidently.
Assess & Align
Understand the current platform, constraints, and target outcomes. Define success metrics and a migration plan.
Design
Create reference architectures for ingestion, lakehouse layout, AI integration, and governance/FinOps controls.
Implement
Deliver production grade pipelines and platform components with security, testing, and operational runbooks.
Enable
Train teams on declarative engineering (YAML + SQL), local dev with DuckDB, and CI/CD workflows.
Optimize
Tune performance and cost continuously: lakehouse compaction/partitioning, spot/serverless strategy, and governance automation.
Technology Coverage
Broad interoperability, delivered with a cloud agnostic approach
We design for portability: logic lives in YAML + SQL, deployments adapt to your cloud and runtime choices.
Cloud Platforms
- AWS
- Google Cloud Platform (GCP)
- Microsoft Azure
Warehouses & Lakehouses
- BigQuery
- Snowflake
- Redshift
- Azure Synapse
- Databricks
- DuckDB / DuckLake
Orchestration & CI/CD
- Airflow
- Dagster
- Prefect
- GitHub Actions
- GitLab CI
Local Dev & Testing
- DuckDB
- Docker
- CI runners with ephemeral test datasets
Ready to modernize?
Let’s map your path to an AI Ready Lakehouse.
Share your current stack and constraints our team will propose an actionable blueprint and a phased delivery plan.
Typical outcomes
Concrete improvements you can operationalize within weeks, not quarters.
- Standardized pipelines that are self-documenting and audit-ready
- Faster delivery via declarative configuration changes
- Lower cloud spend with spot/serverless-ready architecture patterns
- Governance and privacy controls embedded in the data lifecycle
Start with a strategy call
We’ll review your pipelines, lakehouse layout, and AI/governance requirements and identify the fastest path to production value.
Prefer async? Email us at [email protected].