Test Load Tasks
Test your load task locally before deploying to production.
Load tasks are the first step in the data pipeline. They are responsible for loading data from the source to the destination. In this tutorial, we will test the load tasks.
Load tests are located in the metadata/tests/load
directory.
Each test is a directory located in the domain/table
subdirectory and contains the following files:
- a CSV or JSONL file that will contain the initial data that will be loaded into the table before the unit test is run.
This file should be named after the domain and table names. For example, the file for the
starbake.product
table should be namedstarbake.product.json
orstarbake.product.csv
. - one or multiple data files whose names match the file pattern expected by the loader for this table. This is the data that will be loaded into the table using the
starlake load
task. - a
_expected.csv
file that contains the expected data in the table after the load task is run.
After loading using the starlake load
task against the schema defined in the metadata/load
directory,
Starlake will compare the schema of the table with the schema of the expected data file and will raise an error if they do not match.
The test will pass if the load task succeeds and the data in the table matches the data in the _expected.csv
file.
The reports of the test test-name
are stored in the test-reports/load/test-name
directory and contain the following files:
test-reports/load/test-name/testname.db
: the actual database after the load task is run. This database contains the following tables:- sl_expected: The expected data.
starbake.product
: The actual data.- sl_expectations: The results of the execution of any the expectation related to this table.
- audit.audit: The audit log of the load task.
- audit.rejected: The rejected data.
test-reports/load/test-name/not_expected.csv
: the unexpected data in the actual table after the load task is run.test-reports/load/test-name/missing.csv
: the missing data in the actual table after the load task is run.
The test reports are generated using the starlake test
task.
To run the load tests without running transform tests, use the following command:
starlake test --load
To run a specific test use the --name
flag:
starlake test --name starbake.product.test-name
Running tests will also generate a complete website report in the test-reports
directory.