Export
You may export your data to files or to another database.
Export to files
You may export to CSV / JSON / Parquet / Avro or whatever file formt supported by Apache Spark. You can export the data to a CSV file using by specifying the csv format in the sink attribute of your transform.
task:
sink:
format: csv
extension: csv
The file will be saved in the datasets
folder (relative to the root path) of the project under the <domain>
folder named after the <transform>
name and the extension
set in the yaml file.
You may also request the file to be saved in an absolute path by specifying the file path (relative to the root path) in the path
attribute of the sink.
task:
sink:
format: csv
path: mnt/data/output.csv
On a cloud storage, the bucket name will be prepended to the path. The file will be saved in the specified path below the root path.
The root path is specified in the root
key in the metadata.application.sl.yml
file.
...
root: gs://my-bucket/folder1/folder2
...
Export to another database
You may also export the data to a database. You can specify the database connection name in the sink.connectionRef
attribute of your transform.
task:
...
sink:
connectionRef: my_database
...