Parallel extraction

info

Parallel extraction is available only on databases that support concurrent read-write connections.

To make extraction faster, we can parallelize the extraction process. This is done by setting the numPartitions attribute.

metadata/extract/my_extract_config.sl.yml
version: 1
extract:
  connectionRef: "duckdb" # The database connection to use
  jdbcSchemas:
    - schema: "starbake"
      tables:
        - name: "order"               # table names or  "*" to extract all tables
          fullExport: true            # (optional) set to false to use incremental extraction
          partitionColumn: "order_id" # (optional) column to use for partitioning
          numPartitions: 4            # Level of parallelism (defaults to 1 aka no parallelism)
  ...

Note that when using parallel extraction,

the partitionColumn attribute must be set to the column to use for partitioning and
the numPartitions attribute must be set to the number of partitions to use.

If incremental extraction is used (fullExport set to false), the partitionColumn used for parallel extraction is also the column used for incremental extraction:

metadata/extract/my_extract_config.sl.yml
version: 1
extract:
  connectionRef: "duckdb" # The database connection to use
  jdbcSchemas:
    - schema: "starbake"
      tables:
        - name: "order"               # table names or  "*" to extract all tables
          fullExport: false            # (optional) set to false to use incremental extraction
          partitionColumn: "order_id" # (optional) column to use for partitioning
          numPartitions: 4            # Level of parallelism (defaults to 1 aka no parallelism)
  ...