Load | Declarative Data Pipelines

📄️ Tutorial

Load and validate, in one shot or incrementally, JSON, XML and CSV files into your datawarehouse using different write strategies.

Datawarehouses are organized around schemas where tables are grouped.

You'll use Load instead of Autoload when:

File load configuration

Loading a JSON file is similar to loading a CSV file except that JSON attributes may have nested attributes.

Loading XML files is similar to loading JSON files where attributes may have nested or repeated attributes.

To load fixed width files, you need to know the width of each column.

Standard strategies

When loading a file to a database table you can specify how to data is written to the table.

Some datawarehouses are designed to store data in a way that makes it easy to perform clustering and partitioning.

When loading files into your datawarehouse, starlake uses Spark for advanced data validation and

starlake allows you to validate the types of the data you are loading.

Add new attributes

Table Level Security

Expectations allow to test if the resulting table contains the expected data.

During ingestion, Starlake may produce metrics for any attribute in the dataset. Currently, only top level attributes are supported.