Hello friends, I’m creating this post hopefully to raise awareness for my followers of the service limitations for Azure Data Factory. Like most resources in the Microsoft Cloud Platform at various levels (Resource/Resource Group/Subscription/Tenant) there are limitations, these are enforced by Microsoft and most of the time we don’t hit them, especially when developing. That said, all to often I see these limitations bring down production processes because people aren’t aware of them, or aren’t calculating execution concurrency correctly. Sorry if that sounds fairly dramatic, but this is born out of my own frustrations.
As far as I can tell Microsoft do an excellent job at managing data centre capacity so I completely understand the reason for having limitations on resources in place. There is no such thing as a limitless cloud platform.
Note; in a lot of cases (as you’ll see in the below table for Data Factory) the MAX limitations are only soft restrictions that can easily be lifted via a support ticket. Please check before raising alerts and project risks.
Data Factory Limitations
I copied this table exactly as it appears for Data Factory on 22nd Jan 2019. References at the bottom.
Resource | Default limit | Maximum limit |
---|---|---|
Data factories in an Azure subscription | 800 (updated) | 800 (updated) |
Total number of entities, such as pipelines, data sets, triggers, linked services, and integration runtimes, within a data factory | 5,000 | Contact support. |
Total CPU cores for Azure-SSIS Integration Runtimes under one subscription | 256 | Contact support. |
Concurrent pipeline runs per data factory that’s shared among all pipelines in the factory | 10,000 | Contact support. |
Concurrent External activity runs per subscription per Azure Integration Runtime region External activities are managed on integration runtime but execute on linked services, including Databricks, stored procedure, HDInsights, Web, and others. |
3000 | Contact support. |
Concurrent Pipeline activity runs per subscription per Azure Integration Runtime region Pipeline activities execute on integration runtime, including Lookup, GetMetadata, and Delete. |
1000 | Contact support. |
Concurrent authoring operations per subscription per Azure Integration Runtime region Including test connection, browse folder list and table list, preview data. |
200 | Contact support. |
Concurrent Data Integration Units1 consumption per subscription per Azure Integration Runtime region | Region group 12: 6000 Region group 22: 3000 Region group 32: 1500 |
Contact support. |
Maximum activities per pipeline, which includes inner activities for containers | 40 | 40 |
Maximum number of linked integration runtimes that can be created against a single self-hosted integration runtime | 100 | Contact support. |
Maximum parameters per pipeline | 50 | 50 |
ForEach items | 100,000 | 100,000 |
ForEach parallelism | 20 | 50 |
Maximum queued runs per pipeline | 100 | 100 |
Characters per expression | 8,192 | 8,192 |
Minimum tumbling window trigger interval | 15 min | 15 min |
Maximum timeout for pipeline activity runs | 7 days | 7 days |
Bytes per object for pipeline objects3 | 200 KB | 200 KB |
Bytes per object for dataset and linked service objects3 | 100 KB | 2,000 KB |
Data Integration Units1 per copy activity run | 256 | Contact support. |
Write API calls | 1,200/h
This limit is imposed by Azure Resource Manager, not Azure Data Factory. |
Contact support. |
Read API calls | 12,500/h
This limit is imposed by Azure Resource Manager, not Azure Data Factory. |
Contact support. |
Monitoring queries per minute | 1,000 | Contact support. |
Entity CRUD operations per minute | 50 | Contact support. |
Maximum time of data flow debug session | 8 hrs | 8 hrs |
Concurrent number of data flows per factory | 50 | Contact support. |
Concurrent number of data flow debug sessions per user per factory | 3 | 3 |
Data Flow Azure IR TTL limit | 4 hrs | Contact support. |
You can find this table in the following Microsoft docs page. The page is huge and includes all Azure services, which is why I think people never manage to find it.
Also, the source for the page I believe is the following GitHub link.
https://github.com/MicrosoftDocs/azure-docs/blob/master/includes/azure-data-factory-limits.md
My blog is static so please refer to these links for the latest numbers.
Finally, it is not a competition to see who can hit all of these restrictions! Honest! 😉
Many thanks for reading.
Thank you so much Paul for knowing these limitations of ADF.
Can you please share some thoughts on how to improve the performance of ADF.
Thanks hoping to see this blogs soon.
LikeLike
Hi Paul, what are the limitations that you encounter “normally”? The list itself is interesting, but the real-life experience is the more interesting.
LikeLike
Great Article. Good to know these limitations in ADF. Agree with Johannes Vink Question. It is really good to know the practical limitations which we encounter during our developement in ADF.
LikeLike
Hi Paul ,
Thanks for Excellent analysis on Azure data factory.
I have send a request on linkedin . I have question , How do you see ADF (orchestration tool) from traditional ETL tool perspective (like Informatica, DataStage , ODI) , Is it right to compare any legacy ETL tool with Orechestration tool .
Regards,
Mangesh
LikeLike
The maximum 40 activities per pipeline to say the least is outrageous, what do we do if we need to have one trigger to execute 100 pipelines for Dimensions for instance
LikeLike
Yes, I agree. There are other patterns you can consider like using a ForEach activity with nested calls to child pipelines.
LikeLike
The tumbling window trigger now is 5 minutes
LikeLike
That’s an interesting way of doing it. While using ForEach is practical, it might not scale well. You should also be careful with nested calls to child pipelines as there is a high likelihood that you’ll encounter delays. This is because each pipeline has to wait for the processing to finish before it is the next one to be processed. You can reduce the delay by having each pipeline write to a service bus, then have another pipeline read from the service bus. It will have to wait for its turn, but it’ll have shorter delays.
LikeLike
Concurrent number of data flow debug sessions per user per factory:3…. Does anyone has faced with this limitation? How do you handle this limit when you had more than 3 ADF developers debugging at the same time?
LikeLike
Is the ADF capable of handling 200GB single csv file? I have assumed the ADF tool is for integration with external azure services and build ETL pipeline. But I am not sure the capacity to handle the data. I did not clear your below statement.
Bytes per object for pipeline objects3 200 KB 200 KB
Bytes per object for dataset and linked service objects3 100 KB 2,000 KB
LikeLike