Skip to main content

Load strategies

Standard strategies

The starlake load will look for each domain and table, the files that match the pattern specified in the table.pattern attribute of the metadata/load/<domain>/<table>.sl.yml file in the directory specified in the load.metadata.directory attribute of the same file or, if not specified, from the <domain>/_config.sl.yml file.

starlake comes with two load strategies:

Load StrategyDescription
ai.starlake.job.load.IngestionTimeStrategyLoad the files in a chronological order based on the file last modification time. This is the default.
ai.starlake.job.load.IngestionNameStrategyLoad the files in a lexicographical order based on the file name.

To use a load strategy, you need to specify the loadStrategyClass attribute in the metadata/application.sl.yml file.


metadata/application.sl.yml: to switch from a time based load to a name based load
application:
...
loadStrategyClass: ai.starlake.job.load.IngestionNameStrategy
...

Custom Strategies

You can define your own load strategy by implementing the ai.starlake.job.load.LoadStrategy interface.


src/main/scala/my/own//CustomLoadStrategy.scala
object CustomLoadStrategy extends LoadStrategy with StrictLogging {

def list(
storageHandler: StorageHandler,
path: Path,
extension: String = "",
since: LocalDateTime = LocalDateTime.MIN,
recursive: Boolean
): List[FileInfo] = ???
}

metadata/application.sl.yml: to use a custom load strategy

application:
...
loadStrategyClass: ai.starlake.job.load.MyLoadStrategy
...