Test Transform Tasks
Test your transform locally before deploying to production.
Transform tasks are the second step in the data pipeline. They are responsible for transforming the data in the database. In this tutorial, we will test the transform tasks.
Transform tests are located in the metadata/tests/transform
directory.
Each test is a directory located in the domain/table
subdirectory and contains the following files:
- multiple CSV or JSONL files that will contain the initial data that will be loaded into the tables before the unit test is run.
These files should be named after the domain and table names. For example, the file for the
starbake.product
table should be namedstarbake.product.json
orstarbake.product.csv
. - a
_expected.csv
or_expected.sql
file that contains the expected data in the table after the transform task is run.
Before running the transform, starlake will transpile your SQL statements to the Duckdb dialect before running it.
After running the transpiled SQL staement using the starlake transform
task against local duckdb database populated by starlake using the data files present in the test directory,
Starlake will compare the schema of the table with the schema of the expected data file and will raise an error if they do not match.
The test will pass if the transform task succeeds and the data in the table matches the data in the _expected.csv
file.
The reports of the test test-name
are stored in the test-reports/transform/test-name
directory and contain the following files:
test-reports/transform/test-name/testname.db
: the actual database after the load task is run. This database contains the following tables:- sl_expected: The expected data.
starbake.product
: The actual data.- sl_expectations: The results of the execution of any the expectation related to this table.
- audit.audit: The audit log of the transform task.
test-reports/transform/test-name/not_expected.csv
: the unexpected data in the actual table after the transform task is run.test-reports/transform/test-name/missing.csv
: the missing data in the actual table after the transform task is run.
The test reports are generated using the starlake test
task.
To run the transform tests without running load tests, use the following command:
starlake test --transform
To run a specific test use the --name
flag:
starlake test --name starbake.product.test-name
Running tests will also generate a complete website report in the test-reports
directory.