Parallel extraction
To make extraction faster, we can parallelize the extraction process.
This is done by setting the numPartitions
attribute.
metadata/extract/my_extract_config.sl.yml
extract:
connectionRef: "duckdb" # The database connection to use
jdbcSchemas:
- schema: "starbake"
tables:
- name: "order" # table names or "*" to extract all tables
fullExport: true # (optional) set to false to use incremental extraction
partitionColumn: "order_id" # (optional) column to use for partitioning
numPartitions: 4 # Level of parallelism (defaults to 1 aka no parallelism)
...
Note that when using parallel extraction,
- the
partitionColumn
attribute must be set to the column to use for partitioning and - the
numPartitions
attribute must be set to the number of partitions to use.
If incremental extraction is used (fullExport set to false), the partitionColumn used for parallel extraction is also the column used for incremental extraction:
metadata/extract/my_extract_config.sl.yml
extract:
connectionRef: "duckdb" # The database connection to use
jdbcSchemas:
- schema: "starbake"
tables:
- name: "order" # table names or "*" to extract all tables
fullExport: false # (optional) set to false to use incremental extraction
partitionColumn: "order_id" # (optional) column to use for partitioning
numPartitions: 4 # Level of parallelism (defaults to 1 aka no parallelism)
...