Export

You may export your data to files or to another database.

Export to files

You may export to CSV / JSON / Parquet / Avro or whatever file formt supported by Apache Spark. You can export the data to a CSV file using by specifying the csv format in the sink attribute of your transform.

metadata/transform/<domain>/<transform>.sl.yml
task:
  sink:
    format: csv
    extension: csv

The file will be saved in the datasets folder (relative to the root path) of the project under the <domain> folder named after the <transform> name and the extension set in the yaml file.

You may also request the file to be saved in an absolute path by specifying the file path (relative to the root path) in the path attribute of the sink.

metadata/transform/<domain>/<transform>.sl.yml
task:
  sink:
    format: csv
    path: mnt/data/output.csv

On a cloud storage, the bucket name will be prepended to the path. The file will be saved in the specified path below the root path.

The root path is specified in the root key in the metadata.application.sl.yml file.

metadata/application.sl.yml
...
root: gs://my-bucket/folder1/folder2
...

Export to another database

You may also export the data to a database. You can specify the database connection name in the sink.connectionRef attribute of your transform.

metadata/transform/<domain>/<transform>.sl.yml

task:
  ...
  sink:
    connectionRef: my_database
    ...

Export to files​

Export to another database​

Export to files

Export to another database