Skip to main content

Transform on load

Add new attributes

You may add extra attributes during load. This is useful to add extra columns that are not present in the source file such as the current timestamp or the file name

You may use any Spark SQL function to define the value of the new column and this function may reference any column of the same record in the file being loaded including the ignored ones.

The new column will be added to the table with the name you provide in the name property and the value will be the result of the function you provide in the script property.

In addition to the Spark SQL functions you can reference any of the properties available in the _metadata hidden column.

Available metadata properties are :

PropertyTypeDescription
file_namestringthe name of the file being loaded
file_pathstringthe path of the file being loaded
file_sizelongthe size of the file being loaded
file_modification_timetimestampthe last modified date of the file being loaded

Below is an example of how to add a new column file_name to the table that will contain the name of the file being loaded.


metadata/load/<domain>/<table>.sl.yml
table:
pattern: "order_line.*.csv"
...
attributes:
...
- name: "total_price"
script: "quantity * sale_price"
- name: "quantity_in_hexadecimal"
script: "hex(quantity)"
- name: "file_name"
script: "_metadata.file_name"

Ignore attributes

You may ignore columns during load. This is useful to avoid loading columns that are not needed in the target table or have sensitive data (GDPR).

To ignore a column, set the ignore attribute to true in the column definition.

tip

You may ignore a column you use inside the script attribute of another column.

metadata/load/<domain>/<table>.sl.yml
table:
pattern: "order_line.*.csv"
...
attributes:
...
- name: "product_id"
type: "int"
ignore: true

Foreign key attributes

You may define a foreign key on a column during load using the foreignKey property on an attribute. This will be set as a foreign key in the target table. Some databases may not support foreign keys, in this case, the foreign key will be ignored but will still appear in the autogenerated entity-relational diagram.


metadata/load/<domain>/<table>.sl.yml
table:
pattern: "order_line.*.csv"
...
attributes:
...
- name: "product_id"
type: "int"
foreignKey: "product.id"


The foreignKey property is defined using one of the following syntax:

SyntaxDescription
foreign_key: "<table>"This will create a foreign key on the primary key of the specified table in the same domain.
foreign_key: "<domain>.<table>"This will create a foreign key on the column to primary key of the specified table of the specified domain.
foreign_key: "<domain>.<table>.<column>"This will create a foreign key on the column to the specified column in the specified table of the specified domain.
foreign_key: "<table>.<column>"This will create a foreign key on the column to the specified column in the specified table in the same domain