Skip to main content

Test Load Tasks

Test your load task locally before deploying to production.

Load tasks are the first step in the data pipeline. They are responsible for loading data from the source to the destination. In this tutorial, we will test the load tasks.

Load tests are located in the metadata/tests/load directory.

Each test is a directory located in the domain/table subdirectory and contains the following files:

  • a CSV or JSONL file that will contain the initial data that will be loaded into the table before the unit test is run. This file should be named after the domain and table names. For example, the file for the starbake.product table should be named starbake.product.json or starbake.product.csv.
  • one or multiple data files whose names match the file pattern expected by the loader for this table. This is the data that will be loaded into the table using the starlake load task.
  • a _expected.csv file that contains the expected data in the table after the load task is run.

After loading using the starlake load task against the schema defined in the metadata/load directory, Starlake will compare the schema of the table with the schema of the expected data file and will raise an error if they do not match.

The test will pass if the load task succeeds and the data in the table matches the data in the _expected.csv file.

The reports of the test test-name are stored in the test-reports/load/test-name directory and contain the following files:

  • test-reports/load/test-name/testname.db: the actual database after the load task is run. This database contains the following tables:
    • sl_expected: The expected data.
    • starbake.product: The actual data.
    • sl_expectations: The results of the execution of any the expectation related to this table.
    • audit.audit: The audit log of the load task.
    • audit.rejected: The rejected data.
  • test-reports/load/test-name/not_expected.csv: the unexpected data in the actual table after the load task is run.
  • test-reports/load/test-name/missing.csv: the missing data in the actual table after the load task is run.

The test reports are generated using the starlake test task.

To run the load tests without running transform tests, use the following command:

starlake test --load

To run a specific test use the --name flag:

starlake test --name starbake.product.test-name

Running tests will also generate a complete website report in the test-reports directory.

Example: Load summary report

Example: Transform Test report

Example: Load & Transform report