Azure Data Factory Resource Limitations

Hello friends, I’m creating this post hopefully to raise awareness for my followers of the service limitations for Azure Data Factory. Like most resources in the Microsoft Cloud Platform at various levels (Resource/Resource Group/Subscription/Tenant) there are limitations, these are enforced by Microsoft and most of the time we don’t hit them, especially when developing. That said, all to often I see these limitations bring down production processes because people aren’t aware of them, or aren’t calculating execution concurrency correctly. Sorry if that sounds fairly dramatic, but this is born out of my own frustrations.

As far as I can tell Microsoft do an excellent job at managing data centre capacity so I completely understand the reason for having limitations on resources in place. There is no such thing as a limitless cloud platform.

Note; in a lot of cases (as you’ll see in the below table for Data Factory) the MAX limitations are only soft restrictions that can easily be lifted via a support ticket. Please check before raising alerts and project risks.

Data Factory Limitations

I copied this table exactly as it appears for Data Factory on 22nd Jan 2019. References at the bottom.

Resource Default limit Maximum limit
Data factories in an Azure subscription 800 (updated) 800 (updated)
Total number of entities, such as pipelines, data sets, triggers, linked services, and integration runtimes, within a data factory 5,000 Contact support.
Total CPU cores for Azure-SSIS Integration Runtimes under one subscription 256 Contact support.
Concurrent pipeline runs per data factory that’s shared among all pipelines in the factory 10,000 Contact support.
Concurrent External activity runs per subscription per Azure Integration Runtime region
External activities are managed on integration runtime but execute on linked services, including Databricks, stored procedure, HDInsights, Web, and others.
3000 Contact support.
Concurrent Pipeline activity runs per subscription per Azure Integration Runtime region
Pipeline activities execute on integration runtime, including Lookup, GetMetadata, and Delete.
1000 Contact support.
Concurrent authoring operations per subscription per Azure Integration Runtime region
Including test connection, browse folder list and table list, preview data.
200 Contact support.
Concurrent Data Integration Units1 consumption per subscription per Azure Integration Runtime region Region group 12: 6000
Region group 22: 3000
Region group 32: 1500
Contact support.
Maximum activities per pipeline, which includes inner activities for containers 40 40
Maximum number of linked integration runtimes that can be created against a single self-hosted integration runtime 100 Contact support.
Maximum parameters per pipeline 50 50
ForEach items 100,000 100,000
ForEach parallelism 20 50
Maximum queued runs per pipeline 100 100
Characters per expression 8,192 8,192
Minimum tumbling window trigger interval 15 min 15 min
Maximum timeout for pipeline activity runs 7 days 7 days
Bytes per object for pipeline objects3 200 KB 200 KB
Bytes per object for dataset and linked service objects3 100 KB 2,000 KB
Data Integration Units1 per copy activity run 256 Contact support.
Write API calls 1,200/h

This limit is imposed by Azure Resource Manager, not Azure Data Factory.

Contact support.
Read API calls 12,500/h

This limit is imposed by Azure Resource Manager, not Azure Data Factory.

Contact support.
Monitoring queries per minute 1,000 Contact support.
Entity CRUD operations per minute 50 Contact support.
Maximum time of data flow debug session 8 hrs 8 hrs
Concurrent number of data flows per factory 50 Contact support.
Concurrent number of data flow debug sessions per user per factory 3 3
Data Flow Azure IR TTL limit 4 hrs Contact support.

You can find this table in the following Microsoft docs page. The page is huge and includes all Azure services, which is why I think people never manage to find it.

https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits

Also, the source for the page I believe is the following GitHub link.

https://github.com/MicrosoftDocs/azure-docs/blob/master/includes/azure-data-factory-limits.md

My blog is static so please refer to these links for the latest numbers.

Finally, it is not a competition to see who can hit all of these restrictions! Honest! 😉

Many thanks for reading.

16 thoughts on “Azure Data Factory Resource Limitations

  1. Thank you so much Paul for knowing these limitations of ADF.

    Can you please share some thoughts on how to improve the performance of ADF.

    Thanks hoping to see this blogs soon.

    Like

  2. Hi Paul, what are the limitations that you encounter “normally”? The list itself is interesting, but the real-life experience is the more interesting.

    Like

  3. Great Article. Good to know these limitations in ADF. Agree with Johannes Vink Question. It is really good to know the practical limitations which we encounter during our developement in ADF.

    Like

  4. Hi Paul ,
    Thanks for Excellent analysis on Azure data factory.
    I have send a request on linkedin . I have question , How do you see ADF (orchestration tool) from traditional ETL tool perspective (like Informatica, DataStage , ODI) , Is it right to compare any legacy ETL tool with Orechestration tool .
    Regards,
    Mangesh

    Like

  5. The maximum 40 activities per pipeline to say the least is outrageous, what do we do if we need to have one trigger to execute 100 pipelines for Dimensions for instance

    Like

    1. Yes, I agree. There are other patterns you can consider like using a ForEach activity with nested calls to child pipelines.

      Like

  6. That’s an interesting way of doing it. While using ForEach is practical, it might not scale well. You should also be careful with nested calls to child pipelines as there is a high likelihood that you’ll encounter delays. This is because each pipeline has to wait for the processing to finish before it is the next one to be processed. You can reduce the delay by having each pipeline write to a service bus, then have another pipeline read from the service bus. It will have to wait for its turn, but it’ll have shorter delays.

    Like

  7. Concurrent number of data flow debug sessions per user per factory:3…. Does anyone has faced with this limitation? How do you handle this limit when you had more than 3 ADF developers debugging at the same time?

    Like

  8. Is the ADF capable of handling 200GB single csv file? I have assumed the ADF tool is for integration with external azure services and build ETL pipeline. But I am not sure the capacity to handle the data. I did not clear your below statement.
    Bytes per object for pipeline objects3 200 KB 200 KB
    Bytes per object for dataset and linked service objects3 100 KB 2,000 KB

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.