Data Factory Activity Concurrency Limits – What Happens Next?

As a follow up to my blog about Data Factory resource limitations here. I decided to dig deeper and expose some of these limitations with a view to understanding what happens to your pipeline when/if you hit them.

In this post I’m focusing on the Activity Concurrency limits, as a reminder:

Resource	Default limit	Maximum limit
Concurrent pipeline runs per data factory that’s shared among all pipelines in the factory	10,000	Contact support.
Concurrent External activity runs per subscription per Azure Integration Runtime region External activities are managed on integration runtime but execute on linked services, including Databricks, stored procedure, HDInsights, Web, and others.	3000	Contact support.
Concurrent Pipeline activity runs per subscription per Azure Integration Runtime region Pipeline activities execute on integration runtime, including Lookup, GetMetadata, and Delete.	1000	Contact support.

Firstly, understanding how these limits apply to your Data Factory pipelines takes a little bit of thinking about considering you need to understand the difference between an internal and external activity. Then you need to think about this with the caveats of being per subscription and importantly per Azure Integration Runtime region.

Assuming you know that, and you’ve hit these limits! 🙂

… As the blog title suggestions, what happens next?

In short, nothing.

Pipelines fail.
Activities fail.
- They don’t queue.
- They don’t even start.
- They don’t even try.
- They don’t offer any error messages. See 2min video:

They just fail.

So, if you hit the activity concurrency limits, you’ll have a pile of Data Factory pipelines with fail status values all over the place with no obvious reason what has happened.

You might think, why don’t you have a failure path to another…. erm…. ???? What, another activity? When you’ve hit the limit? 🙂

What about using another Data Factory to bootstrap and monitor the first one to catch failures?… It’s not going to help; the limit is per subscription. Maybe if you have the bootstrap factory in an isolated region. But monitoring everything, every pipeline and every activity for a failure that you can’t easily identify isn’t realistic.

My current advise is:

Carefully understand these limits.
Design for using multiple Azure regions if possible.
Refactor your internal activities, especially things like setting variables. Do you really need them?
Ask Microsoft to increase your limit.

Obviously a fairly specific topic, but hopefully a helpful one to understand.

Many thanks for reading.

About Me

mrpaulandrew

Paul (AKA @mrpaulandrew) is the Founder & CTO of Cloud Formations, a specialist data consultancy based in the UK. With nearly 20 years’ experience designing and delivering Microsoft data architectures, Paul leads a passionate team of engineers, supporting businesses small and large with scalable cloud platforms. Business value delivered through data insights. Over the years, Paul has covered the breadth and depth of design patterns and industry leading concepts, including Lambda, Kappa, Delta Lake, Data Mesh and Data Fabric. Paul is also a Microsoft Data Platform MVP, director for the Data Relay community conference, East Midlands user group leader, book author and mentor. In addition to the day job(s), Paul is a father of three, husband, foodie, runner, blood donor, geek, Lego, and Star Wars fan! Lastly, Paul confesses to enjoying a Ramstein playlist when given half a chance to do some coding for a customer project.