How Interchangeable Are Integration pipelines Between Azure Data factory and Azure Synapse Analytics?

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.

As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Question: How Interchangeable Are Integration Pipelines Between Azure Data Factory and Azure Synapse Analytics?
Answer: Very interchangeable!

Or, to ask the question another way:

Question: Can we use the same integration components in Azure Data Factory and Azure Synapse Analytics at the same time?
Answer: Yes!

The only caveat to both these questions is that in the source control configuration for each resource you set the ‘root folder’ to the same location. In my case this was just the root of the repository itself because I created the test case from scratch. Link below if you want to view the contents.

https://github.com/mrpaulandrew/AzureIntegrationPipelines

Not convinced? Watch this…

With the above in mind, for me, as an architect, things now get very interesting when designing a data platform solution.

Delta Tables interchangeable as an open source standard when working with Apache Spark as the compute.
Data Lake storage interchangeable and accessible by lots of different resources by the very nature of the underlying distributed file system.
Orchestration components interchangeable between integration resources when accessing the same Git repository and using the same pipeline artifacts.

Therefore, in a given data platform architecture (before Synapse arrived) where there were a common set of core components, listed below. Now there isn’t any reason why (in most cases) we can’t switch things over to Azure Synapse Analytics, if we wanted to.

Pre-Synapse Core Resources

Data Lake
Databricks
Data Factory

Post-Synapse Core Resources

Data Lake
Synapse – Spark Pools
Synapse – Integration Pipelines

The other great thing, as data engineers we wouldn’t need to do much work for these resources in our solution to become almost plug and play. We could even run solutions in parallel with some creative code branching!

Now, trolls, I fully appreciate my initial test in the video was very very simple, mainly due to a lack of time. So, I will continue this work and test all the integration components including debugging in both resources at the same time to see if we uncover any side effects to this repo sharing. So, stay tuned.

For now I wanted to plant the seed of architecture interchangeability so you could consider trying out the same and maybe unlock Synapse in a future data platform solution because it’s fairly easy to do so, I think you’ll agree 🙂

Many thanks for reading.

One thought on “How Interchangeable Are Integration pipelines Between Azure Data factory and Azure Synapse Analytics?”

Leave a comment Cancel reply

About Me

mrpaulandrew

Paul (AKA @mrpaulandrew) is the Founder & CTO of Cloud Formations, a specialist data consultancy based in the UK. With nearly 20 years’ experience designing and delivering Microsoft data architectures, Paul leads a passionate team of engineers, supporting businesses small and large with scalable cloud platforms. Business value delivered through data insights. Over the years, Paul has covered the breadth and depth of design patterns and industry leading concepts, including Lambda, Kappa, Delta Lake, Data Mesh and Data Fabric. Paul is also a Microsoft Data Platform MVP, director for the Data Relay community conference, East Midlands user group leader, book author and mentor. In addition to the day job(s), Paul is a father of three, husband, foodie, runner, blood donor, geek, Lego, and Star Wars fan! Lastly, Paul confesses to enjoying a Ramstein playlist when given half a chance to do some coding for a customer project.