Controlling U-SQL Job AU’s with Azure Data Factory v2 Pipeline Parameters

This post came about really from me playing around with Azure Data Factory Version 2 (ADFv2) and trying to gain a better appreciation of why and how you can now use parameters and expressions in ADFv2 to control your pipelines. Below is the first thing I tested out and hopefully gives you a taster of what’s now possible with the new service.

Scenario

For certain pipelines containing many (20+) U-SQL activities I’ve often wanted to either throw more compute at the jobs, or even dial down the AU’s for weekend processing, because data delivery times are less of a factor so lets save some money. In ADFv1 this is possible but involves changing the “degreeOfParallelism” value on every activity. Now in ADFv2 we can do this centrally at a pipeline level.

Now, before we go on, I accept this is a fairly simple and specific set of scenarios and we don’t gain much by doing it, but the point really is understanding the use of parameters and expressions in ADFv2.

The Pipeline

For this post I have a very simple pipeline with two U-SQL activities (which I can screen shot because I have private preview access to the ADFv2 UI 🙂 ) with the second dependant on the first.

Each activity contains the expected “degreeOfParallelism” attribute. But instead of setting this value to a static INT, I’ve used an expression.

This expression added to each activity as you can tell calls a pipeline level defined parameter to get the value.

The JSON

I’m guessing at this point because of the lack of ADFv2 templates and dev tools currently out there you’d like to see the JSON for the above. Well my friends here it is to download, take a look. Basically for anyone familiar with ADFv1 JSON the important bits are, in the activity:

"typeProperties": {
"scriptPath": "u-sql1/Test",
"degreeOfParallelism": {
"value": "@pipeline().parameters.USQLJobAUs",
"type": "Expression"
}

And in the pipeline:

"parameters": {
"USQLJobAUs": {
"type": "Int",
"defaultValue": 5,
}

More Thoughts

As a more realistic use of this, I’m planning to create an expression for several downstream U-SQL activities that get the current time. Then, if the first U-SQL job has taken longer to run that expected lets dynamically increase the AU’s on the subsequent jobs to compensate.

Eg. If ‘Job1’ didn’t complete until after 8am increase the AU’s on ‘Job2’ to 50. Otherwise use 25 AU’s.

Lastly what’s handy about doing this from a development perspective is when you trigger the pipeline in the new UI you can change the value passed.

I hope you found this useful.

Many thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s