Transform on load
Add new attributes
You may add extra attributes during load. This is useful to add extra columns that are not present in the source file such as the current timestamp or the file name
You may use any Spark SQL function to define the value of the new column and this function may reference any column of the same record in the file being loaded including the ignored ones.
The new column will be added to the table with the name you provide in the name
property and the value will be the result of the function you provide in the script
property.
In addition to the Spark SQL functions you can reference any of the properties available in the _metadata
hidden column.
Available metadata properties are :
Property | Type | Description |
---|---|---|
file_name | string | the name of the file being loaded |
file_path | string | the path of the file being loaded |
file_size | long | the size of the file being loaded |
file_modification_time | timestamp | the last modified date of the file being loaded |
Below is an example of how to add a new column file_name
to the table that will contain the name of the file being loaded.
table:
pattern: "order_line.*.csv"
...
attributes:
...
- name: "total_price"
script: "quantity * sale_price"
- name: "quantity_in_hexadecimal"
script: "hex(quantity)"
- name: "file_name"
script: "_metadata.file_name"
Ignore attributes
You may ignore columns during load. This is useful to avoid loading columns that are not needed in the target table or have sensitive data (GDPR).
To ignore a column, set the ignore
attribute to true
in the column definition.
You may ignore a column you use inside the script
attribute of another column.
table:
pattern: "order_line.*.csv"
...
attributes:
...
- name: "product_id"
type: "int"
ignore: true
Foreign key attributes
You may define a foreign key on a column during load using the foreignKey
property on an attribute. This will be set as a foreign key in the target table.
Some databases may not support foreign keys, in this case, the foreign key will be ignored but will still appear in the autogenerated entity-relational diagram.
table:
pattern: "order_line.*.csv"
...
attributes:
...
- name: "product_id"
type: "int"
foreignKey: "product.id"
The foreignKey
property is defined using one of the following syntax:
Syntax | Description |
---|---|
foreign_key: "<table>" | This will create a foreign key on the primary key of the specified table in the same domain. |
foreign_key: "<domain>.<table>" | This will create a foreign key on the column to primary key of the specified table of the specified domain. |
foreign_key: "<domain>.<table>.<column>" | This will create a foreign key on the column to the specified column in the specified table of the specified domain. |
foreign_key: "<table>.<column>" | This will create a foreign key on the column to the specified column in the specified table in the same domain |