Execute All SSIS Packages in Parallel with ADFv2 Lookup and ForEach Activities

Just a quick post friends following a talk I did on Azure Data Factory (ADF) v2 at SQL Saturday Manchester. In the talk I demonstrated using ADF more dynamically now version 2 supports expressions and parameters… Amongst other cool stuff I specifically showed how you could run your SSIS packages, scaled out across all the compute nodes available in the SSIS Integration Runtime (IR). A member of the audience asked if I’d blogged about this scenario…. well I have now 🙂

For those that don’t know, the SSIS IR currently supports up to 10 compute nodes in the cluster and allows a single node to be scaled up significantly with the ability to execute multiple packages per node (SSIS package MAXDOP). This is great news if you have hundreds of SSIS packages that we need to run. However, we need to wrap the calling a package with an ADF activity, 1 package per activity. This potentially means adding as many activities to an ADF pipeline as we have packages if you truly want them all to execute in parallel. There is a better way!




The ADF ForEach activity, as I’m sure I’ve mentioned before, has the ability to execute 20 parallel threads at once. This is it’s default behaviour. You can reduce the batch size and even enforce sequential executions if required, but out of the box its a great way to scale out inner activity executions without doing any addition plumbing.

Ok, enough theory, let’s build it.

Step 1

We need to provide the ForEach with an array of things to iterate over. Nothing new here for a recursive operation. In this scenario it’s our SSIS packages that we want to know so let’s hit the SSISDB and get a list of all packages. No problem with a bit of T-SQL.

You may also want the query to order your packages given there previous execution times (using the [internal].[execution_info] table). Meaning the longest running packages are called first. Or, add a where clause for a particular folder in the SSIS catalogue, which could be driven by an ADF pipeline parameter. Just idea’s to extend this basic pattern.

Next, the query can go into an ADF Lookup activity returning all packages we currently have deployed, or whatever. Make sure you unselect the ‘FirstRow’ property otherwise you will only get 1 value returned. Sorry if that’s obvious.

 

Step 2

The Lookup feeds into our ForEach activity items as an array of values. The expression builder can help you with this, but will need ‘.value’ adding. Value is the level in the JSON output containing the array.

Step 3

Finally, within our ForEach activity we can have the SSIS activity. This then needs the package path to be dynamic, coming from our array of things.

That’s it. Done! SSIS package execution done in parallel with a MAXDOP of 20. Assuming you have 10 nodes in your SSIS IR each allowing 2 packages to run per node.

Please Sir I Want Some More

MAXDOP 20 not enough for you?

Ok!

Well, if you want to achieve more than 20 parallel threads in ADF you need to consider nesting pipelines. I’ll just bullet point this, rather than going through it, as its a little more advanced. Refactor your above control flow as follows:

  • Create a new ADF pipeline with  just an SSIS activity.
  • Add a pipeline parameter call package path, or something meaningful.
  • The package path in the activity needs to be driven from a pipeline parameter.
  • Save it.
  • Back in the ForEach pipeline we created above, replace the inner SSIS activity with an Execute pipeline activity.
  • In the execute pipeline activity choose your new child pipeline created in bullet point 1.
  • Import the pipeline parameters and this time give the it the @{item().PackagePath} iterator value.

Why is this different? Well, the Execute pipeline activity can run asynchronously if you unselect the ‘Wait on completion’ property.

This means all your SSIS packages will be triggered at the same time, well as quickly as ADF can call them, without any MAXDOP. There is an ADF service concurrency limitation on activity executions but hopefully this is going to be lifted soon.

Please be aware that this approach is really for special cases where you know how many child threads are going to be spawned. You lose a lot of control when you enter the world of async calls. But, the point is, it’s possible.

Hope this helped.

Many thanks for reading.

 

 

 

 

 

 

 

 

2 thoughts on “Execute All SSIS Packages in Parallel with ADFv2 Lookup and ForEach Activities

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.