Azure Synapse Spark Pools

Running Locally

Starlake need to access ADFS. You need to provide the credentials in one of the three ways below:

Through a core-site.xml file present in the classpath (you'll probably use this method when running the ingestion process from your laptop):

 <?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>
     <property>
         <name>fs.azure.account.key.ebizcomet.dfs.core.windows.net</name>
         <value>*******==</value>
     </property>
     <property>
         <name>fs.default.name</name>
         <value>abfs://cometfs@ebizcomet.dfs.core.windows.net/</value>
     </property>
 </configuration>

Running on Azure

At cluster creation as specified here <https://docs.microsoft.com/fr-fr/azure/databricks/data/data-sources/azure/azure-datalake-gen2#rdd-api>_. (you'll probably use this method on a production cluster)
Through a specific application.conf file in the starlake-assembly.jar classpath. You must add the spark.hadoop. prefix to the corresponding Hadoop configuration keys to propagate them to the Hadoop configurations that are used used in the Starlake Spark Job.

Azure Synapse Spark Pools

Running Locally​

Running on Azure​

Running Locally

Running on Azure