Skip to main content

Parallel extraction

To make extraction faster, we can parallelize the extraction process. This is done by setting the numPartitions attribute.

metadata/extract/my_extract_config.sl.yml
extract:
connectionRef: "duckdb" # The database connection to use
jdbcSchemas:
- schema: "starbake"
tables:
- name: "order" # table names or "*" to extract all tables
fullExport: true # (optional) set to false to use incremental extraction
partitionColumn: "order_id" # (optional) column to use for partitioning
numPartitions: 4 # Level of parallelism (defaults to 1 aka no parallelism)
...

Note that when using parallel extraction,

  • the partitionColumn attribute must be set to the column to use for partitioning and
  • the numPartitions attribute must be set to the number of partitions to use.

If incremental extraction is used (fullExport set to false), the partitionColumn used for parallel extraction is also the column used for incremental extraction:

metadata/extract/my_extract_config.sl.yml
extract:
connectionRef: "duckdb" # The database connection to use
jdbcSchemas:
- schema: "starbake"
tables:
- name: "order" # table names or "*" to extract all tables
fullExport: false # (optional) set to false to use incremental extraction
partitionColumn: "order_id" # (optional) column to use for partitioning
numPartitions: 4 # Level of parallelism (defaults to 1 aka no parallelism)
...