How Interchangeable Are Delta Tables Between Azure Databricks and Azure Synapse Analytics?

Databricks vs Synapse Analytics

As an architect I often get challenged by customers on different approach’s to a data transformation solutions, mainly because they are concerned about locking themselves into a particular technology, resource or vendor. One example of this is using a Delta Lake to deliver an Azure based warehousing/analytics platform.

Given this context, in this blog post I want to explore the Delta technology and how to overcome said concern.

As a side note; the critic in me might blame our experiences of using Microsoft’s Data Lake Analytics offering for this new found anxiety regarding resource lock-in. But let’s not go there!

A Quick History Lesson

Databricks Delta was born and announced at the Spark +AI Summit back in November 2017 by Databricks. Reminder of the announcement here. For data engineers this was certainly an exciting capability to have available in our toolkit for data transformation workloads. But, the above challenge about becoming locked into using Databricks quickly surfaced as a concern. Sure, Databricks is available as a service on most cloud platforms, not just Azure, but still, its understandable that in 2017 we might not want to have Delta tables for everything because the technology appeared as a propriety Databricks capability.

Moving on. This concern was initially addressed when Databricks further announced the release of Delta Lake v0.3.0 in August 2019. Reminder of the announcement here. This meant we now had an open source version of Delta that can be used with any Spark implementation, assuming you install it. Of course, this isn’t as good as the Databricks premium implementation over Spark, but its still a great technology to have available on any Spark cluster.

Then, the final hurdle in addressing this concern came in Azure when Microsoft announced Synapse Analytics in November 2019 and specifically the Apache Spark Compute pools available as part of the resource. These managed compute clusters come pre-installed with the Delta Lake libraries and allow easy interaction with Delta tables via the Synapse Notebooks and workspace.

History lesson over…

Now the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based Delta Lakehouse.

Question: How Interchangeable Are Delta Tables Between Azure Databricks and Azure Synapse Analytics?

Answer: Very interchangeable! 🙂

Or, to ask the question another way…

Question: Can we use (read/write) Delta tables created in Azure Databricks with Azure Synapse Analytics – Spark Compute Pools and vice versa?

Answer: Yes!

Let’s go a little deeper and explore some more explicit technical table scenario’s.

Testing

To prove this understanding and explore a few real world situations that might influence architecture decisions; a simple CSV file was read into a data frame using Python and written as a Delta table. This same Python code was copied and used in both Databricks and Synapse Notebook development environments.

Full credit to William Wright for testing these scenario’s. Also, look out for Will’s own blog that he’ll be starting very soon! 😉

Scenario	Result	Comments
Write DELTA in Databricks, read in Synapse
Write partitioned DELTA in Databricks, read in Synapse
Write DELTA in Synapse, read in Databricks
Write partitioned DELTA in Synapse, read in Databricks
In Synapse, merge into DELTA created by Databricks then read in Synapse		Using Python syntax, SQL MERGE not yet supported.
In Databricks, merge into DELTA created by Synapse then read in Databricks
Optimize in Synapse		Supported in the intellisense, not when executed.
Vacuum in Synapse
Read data from both Databricks and Synapse at the same time
Write data from both Databricks and Synapse at the same time (Different Data)		Databricks completed. Synapse failed.Reason error details: https://docs.delta.io/0.4.0/delta-concurrency.html
Write data from both Databricks and Synapse at the same time (Same Data)		Both completed without error. However, executing both at exactly the same time wasn’t possible manually.

Test Environment

A few details about the versions of the services used:

Synapse

Apache Spark Version 2.4
Python Version 3.6.1
Delta Lake Version 0.6.1

Databricks

Runtime Version 7.4

Conclusion

In my opinion; Delta is and should be the standard now used for delivering data warehousing/analytics capabilities within an Azure Data Platform solution. The technology is, for the most part, open source and allows the decoupling of storage from a choice of compute.

I hope this blog was helpful.

Many thanks for reading.

10 thoughts on “How Interchangeable Are Delta Tables Between Azure Databricks and Azure Synapse Analytics?”

Hey Paul, great article!!

I’ve had many clients asking to have a delta lake built with synapse spark pools, but with the ability to read the tables from the on-demand sql pool. I’ve tested and tested but it seems that the sql part of synapse is only able to read parquet at the moment, and it is not easy to feed an analysis services model from spark. How do you address that issue?

LikeLike

mrpaulandrew says:

January 26, 2021 at 6:30 pm

Hey, yes correct. Delta tables aren’t yet supported via SQL On Demand. Not sure what Microsoft have planned here. I don’t have a good work around to this situation yet. Thanks

LikeLike

Reply

Pingback: Query options in Azure Synapse Analytics | James Serra's Blog

Pingback: Delta Table Compatibility between Azure Databricks and Azure Synapse Analytics – Curated SQL

Is there a separate post which explains how to explore delta tables created in databricks using Synapse?
also, interested in seeing testing methods or steps used by William Wright.
are they documented anywhere?

LikeLike

Pingback: How Interchangeable Are Integration pipelines Between Azure Data factory and Azure Synapse Analytics? – Welcome to the Community Blog of Paul Andrew

Pingback: How Interchangeable Are Integration pipelines Between Azure Data factory and Azure Synapse Analytics? - Tech Daily Chronicle

Hello, this was a great informative post. Also, what was the speed difference for similar size cluster?

LikeLike

In Databricks you have to run only once command for compaction and vacuum and rest is assured. how I can do the same in Synapse Spark Pool???

LikeLike

Pingback: Azure Synapse and Delta Lake | James Serra's Blog

Leave a comment Cancel reply

About Me

mrpaulandrew

Paul (AKA @mrpaulandrew) is the Founder & CTO of Cloud Formations, a specialist data consultancy based in the UK. With nearly 20 years’ experience designing and delivering Microsoft data architectures, Paul leads a passionate team of engineers, supporting businesses small and large with scalable cloud platforms. Business value delivered through data insights. Over the years, Paul has covered the breadth and depth of design patterns and industry leading concepts, including Lambda, Kappa, Delta Lake, Data Mesh and Data Fabric. Paul is also a Microsoft Data Platform MVP, director for the Data Relay community conference, East Midlands user group leader, book author and mentor. In addition to the day job(s), Paul is a father of three, husband, foodie, runner, blood donor, geek, Lego, and Star Wars fan! Lastly, Paul confesses to enjoying a Ramstein playlist when given half a chance to do some coding for a customer project.

Martin Zurita (@Martinzurita) says:

January 21, 2021 at 8:18 pm

Hey Paul, great article!!

I’ve had many clients asking to have a delta lake built with synapse spark pools, but with the ability to read the tables from the on-demand sql pool. I’ve tested and tested but it seems that the sql part of synapse is only able to read parquet at the moment, and it is not easy to feed an analysis services model from spark. How do you address that issue?

LikeLike

1. mrpaulandrew says:
  
  January 26, 2021 at 6:30 pm
  
  Hey, yes correct. Delta tables aren’t yet supported via SQL On Demand. Not sure what Microsoft have planned here. I don’t have a good work around to this situation yet. Thanks
  
  LikeLike
  
Pingback: Query options in Azure Synapse Analytics | James Serra's Blog
Pingback: Delta Table Compatibility between Azure Databricks and Azure Synapse Analytics – Curated SQL
Kandarp Jani says:

April 27, 2021 at 7:48 pm

Is there a separate post which explains how to explore delta tables created in databricks using Synapse?
also, interested in seeing testing methods or steps used by William Wright.
are they documented anywhere?

LikeLike

Pingback: How Interchangeable Are Integration pipelines Between Azure Data factory and Azure Synapse Analytics? – Welcome to the Community Blog of Paul Andrew
Pingback: How Interchangeable Are Integration pipelines Between Azure Data factory and Azure Synapse Analytics? - Tech Daily Chronicle
Dung Tran says:

February 4, 2022 at 10:12 pm

Hello, this was a great informative post. Also, what was the speed difference for similar size cluster?

LikeLike

J says:

April 14, 2022 at 5:41 am

In Databricks you have to run only once command for compaction and vacuum and rest is assured. how I can do the same in Synapse Spark Pool???

LikeLike

Pingback: Azure Synapse and Delta Lake | James Serra's Blog