📄️ Tutorial
Load and validate, in one shot or incrementally, JSON, XML and CSV files into your datawarehouse using different write strategies.
📄️ Autoload
Datawarehouses are organized around schemas where tables are grouped.
📄️ Load
You'll use Load instead of Autoload when:
📄️ Load DSV files
File load configuration
📄️ Load JSON Files
Loading a JSON file is similar to loading a CSV file except that JSON attributes may have nested attributes.
📄️ Load XML files
Loading XML files is similar to loading JSON files where attributes may have nested or repeated attributes.
📄️ Load fixed width files
To load fixed width files, you need to know the width of each column.
📄️ Load strategies
Standard strategies
📄️ Write strategies
When loading a file to a database table you can specify how to data is written to the table.
📄️ Clustering and Partitioning
Some datawarehouses are designed to store data in a way that makes it easy to perform clustering and partitioning.
📄️ Native load
When loading files into your datawarehouse, starlake uses Spark for advanced data validation and
📄️ Type validation
starlake allows you to validate the types of the data you are loading.
📄️ Transform on load
Add new attributes
📄️ Access control
Table Level Security
📄️ Expectations
Expectations allow to test if the resulting table contains the expected data.
📄️ Orchestration
📄️ About Metrics
During ingestion, Starlake may produce metrics for any attribute in the dataset. Currently, only top level attributes are supported.