As Azure Data Factory continues to evolve as a powerful cloud orchestration service we need to update our knowledge and understanding of everything the service has to offer. Mainly, so we can make the right design decisions when developing complex, dynamic solution pipelines. In this post I want to explore and share the reasons for choosing the new Web Hook Activity compared to the older, standard Web Activity.
In short, the question you need to ask when developing a pipeline is; do I want my web process call(s) to be synchronous or asynchronous?
Answers to that question:
- If asynchronous, or if I doesn’t matter, use a Web Activity. Fire and forget.
- If synchronous, use a Web Hook Activity. Fire and wait for completion.
Beyond that question lets go a little deeper and look at what’s involved in implementing each activity to achieve this blocking/non-blocking behaviour.
Just before we dive in, I would like to caveat this technical understanding with a previous blog where I used a Web Activity to stop/start the SSIS IR and made the operation synchronous by adding an Until Activity that checked and waited for the Web Activity condition to complete. The post referenced here. The point being that we can enforce synchronous behaviour from any activity if we want. The new Web Hook activity now just gives us a convenient way to do it without much extra effort and additional operational calls.
Firstly, let’s quickly cover off the Web Activity. As most will know this has been available in Data Factory since the release of version 2. It’s a great way to easily hit any API using PUT, POST, GET and DELETE methods.
Now the activity also supports Managed Service Identity (MSI) authentication which further undermines my above mentioned blog post, because we can get the bearer token from the Azure Management API on the fly without needing to make an extra call first.
Just to reiterate, this activity will make an asynchronous call to a given API and return a success or failure if no response is received within 1 minute. This timeout isn’t configurable.
Beyond the process flow argument the other consideration you might have when choosing the web activity relates to your other Data Factory components. Unlike the web hook activity, the web activity offers the ability to pass in information for your Data Factory Linked Services and Datasets. This can be useful, for example, when uploading information to an endpoint from other parts of your pipeline. Datasets can be passed into the call as an array for the receiving service. Setting screen shot below.
That covers off the main points for the Web activity. Which I’m assuming people are probably familiar with.
Link to the Microsoft docs if you want to read more: https://docs.microsoft.com/en-us/azure/data-factory/control-flow-web-activity
Web Hook Activity
- Other Data Factory Dataset and Linked Service resources can’t currently be passed in at runtime. However, I expect this will change in the near future.
- The activity offers the feature of a call back URL. Meaning it will request and wait for a response from the API hit before returning.
To show this important call back feature in action I have a very simple and hopefully common example to share.
Scenario: we have a pipeline doing some data transformation work, or whatever. The output dataset is going to be loaded into an Azure SQLDB table. However, before this happens, for Azure consumption cost efficiencies and loading times, we want to scale up the database tier at runtime. Then once data has been loaded we want to scale down the service. Maybe something like the below pipeline.
To adjust the service tier of the SQLDB we can use a PowerShell cmdlet, shown below. Which we can wrap up in an Azure Automation Runbook. The Runbook can then have a Webhook added allowing us to hit the PowerShell scripts from a URL. This will be the API will call with our Web Hook Activity.
I won’t go into the details on how to create the Runbook itself within Azure Automation and will assume most people are familiar with doing this. There are also plenty of tutorials out there. For this post the important thing to understand is that when the Data Factory pipeline runs the Web Hook activity (calling the Automation Webhook) it passes a supplementary set of values in the Body of the request. These values get appended onto any Body information you add via the activity settings, also, helpfully you can’t see this extra information if you Debug the pipeline checking the activity input and outputs!
Example JSON of the full Body request as received via the Automation service:
The additional Body information, as you can see, includes the call back URI created by Data Factory during execution along with a bearer token to authenticate against the Data Factory API.
All that is required within your PowerShell Runbook is to capture this URI from the passed in Body and then Invoke a Web Request POST against it once all your other cmdlets have completed.
Depending on what other parameters you want to pass in and what other exception handling you want to put into the PowerShell the entire Runbook could be as simple as the below script:
Ultimately this behaviour means Data Factory will wait for the activity to complete until it receives the POST request to the call back URI. Making the pipeline activity synchronous.
As long as the API you hit can handle this behaviour and call back to Data Factory once complete the Web Hook activity does the ‘rest’ for you, pun intended 🙂
Link to the Microsoft docs if you want to read more: https://docs.microsoft.com/en-us/azure/data-factory/control-flow-webhook-activity
I hope you found the above guide helpful for working with the Web Hook Activity.
Many thanks for reading.