Skip to main content

Expectations

Expectations allow to test if the resulting table contains the expected data.

They are defined using the expectations attribute as follows:


table:
...
attributes:
- name: id
type: integer
...
expectations:
- "is_col_value_not_unique('id') => result(0) == 1"

The expectations attribute is a list of strings, each string representing an expectation. The expectation is defined as a string with the following format:

<query_name>(<param>*) => <condition>

<query_name> is the name of the expectation to be tested. The available expectations are defined in the expectations directory. An query is a SELECT statement run against the target table and the result is used to test the expectation.

<param> is the parameter to be passed to the expectation. It is a value that can be read as the expected type by the expectation. Parameters are separated by commas.

<condition> is the condition to be tested. It is a string that can be parsed to a boolean by the expectation and that will be used to test the expectation:

The following variables are available in the condition:

VariableTypeDescription
countLongThe number of rows in the result
resultSeq[Any]The first row in the result as a collection of values (one per column)
resultsSeq[Seq[Any]]The result as a collection of rows, each row as a collection of values (one per column)

Writing expectation queries

Expectation queries are mainly jinja macro templates that are used to generate SQL queries.

{% macro is_col_value_not_unique(col, table='SL_THIS') %}
SELECT max(cnt)
FROM (SELECT {{ col }}, count(*) as cnt FROM {{ table }}
GROUP BY {{ col }}
HAVING cnt > 1)
{% endmacro %}

{% macro is_row_count_to_be_between(min_value, max_value, table_name = 'SL_THIS') -%}
select
case
when count(*) between {{min_value}} and {{max_value}} then 1
else
0
end
from {{table_name}}
{%- endmacro %}

{% macro col_value_count_greater_than(col, table_name='SL_THIS') %}
SELECT {{ col }}, count(*) FROM {{ table_name }}
GROUP BY {{ col }}
HAVING count(*) > {{ count }}
{% endmacro %}


{% macro count_by_value(col, value, table='SL_THIS') %}
SELECT count(*)
FROM {{ table }}
WHERE {{ col }} LIKE '{{ value }}'
{% endmacro %}


{% macro column_occurs(col, times, table='SL_THIS') %}
SELECT max(cnt)
FROM (
SELECT {{ col }}, count(*) as cnt FROM {{ table }}
GROUP BY {{ col }}
HAVING cnt == {{ times }}
)
{% endmacro %}

Put the queries in the expectations directory and use the .j2 extension. You may organize your queries in subdirectories.