How To Control the SSIS IR From Your ADFv2 Pipelines – Start/Stop

Hello friends, in this post I want to share with you a couple of methods I’ve developed for controlling the start up and stop of the Azure Data Factory (ADF) v2 SSIS Integration Runtime (IR) from your pipelines. This requirement came about mainly because of cost… The SSIS IR underneath is simply a cluster of virtual machines, which if left running with 10 nodes 24/7 will result in a large Azure bill. Also, unlike the Azure Batch Service compute pool its not currently possible to auto scale the nodes.

Below are two pipeline patterns and methods to handle this.

Basic Pattern

Below I have;

  • An ADFv2 pipeline running some SSIS workload using the SSIS activity. One package or many, this doesn’t matter.
  • The pipeline includes Web Activities to Start and Stop the SSIS IR at either end.
  • The Web Activities hit Web Hooks for a couple of Azure Automation Runbooks.
  • The Runbooks include PowerShell cmdlets to control the ADFv2 SSIS IR.
  • After the start call we have a static Wait activity fixed to 30 minutes… Which is roughly how long the SSIS IR takes to start up.

Does this approach work? Yes, but this isn’t very dynamic and relies on a time bound assumption.

We can do better! 🙂

Azure Automation

Before we look at the second pattern, let’s just think about the Azure Automation part and ADF Web Activity call to the Web Hook. The PowerShell for handling the SSIS IR, after authentication, is only a single cmdlet for starting and stopping the cluster. Lovely!

Start-AzureRmDataFactoryV2IntegrationRuntime -DataFactoryName $DataFactoryName -Name $SSISIR -ResourceGroupName $ResourceGroupName

Stop-AzureRmDataFactoryV2IntegrationRuntime -DataFactoryName $DataFactoryName -Name $SSISIR -ResourceGroupName $ResourceGroupName

However, the problem and the reason for the Wait activity in the basic approach is because the ADF Web activity calls are asynchronous. The Azure Automation runbooks are then called, compute allocated (queued) and then executed. We therefore have a disconnection between hitting the Web Hook and the PowerShell, this means the ADF pipeline could continue and try to run the package(s) before the SSIR IR cluster has actually come up. If that happens the packages will fail.

Thankfully the PowerShell cmdlet ‘Start-AzureRmDataFactoryV2IntegrationRuntime’ does actually wait until the cluster has been fully started before returning. So we just need to handle the Runbook job status.

Side note; we can’t use an Azure Function for this because of the 5 minute limitation on executions. Also, as a type, PowerShell for Azure Functions is still only in preview.

Back to our ADF pipeline. After the first Web activity POST request to the Azure Automation Runbook the job execution status isn’t going to be returned. It will only give us the Runbook Job ID, and as an array, annoyingly. Example below of the body returned via Postman (great tool in this new world of API everything).

Hopefully that’s enough context for how we can do better at handling this.

Advanced Pattern

What we need to manage this properly, building on the first pattern, is:

  • An additional downstream Web activity that hits the Azure Management API to check on the status of the Runbook job.
  • An ADFv2 Until activity that waits for the Runbook job is complete, checking its status with each iteration. Affectively polling for when the SSIS IR has started.
  • A third Web activity to allow our Azure Management API call to authenticate with a valid token.

Within the Until activity:

In this approach we are still waiting, but in a controlled way without any assumptions.

Firstly, lets think about the Management API token (warning: from a business intelligence persons perspective). For Azure Management REST calls an access token with a short expiry date is assigned to a service principal. Typically these tokens will expire after 24 hours so we need to get a new token at runtime if this pipeline is triggered daily, for example. For details on creating an access token in the MS docs here. Next, we can perform a POST request as below from ADF to the Microsoft OAuth2 service, but the body of the request needs to be application/x-www-form-urlencoded, a JSON body won’t work.

https://login.microsoftonline.com/YOUR_TENANT_ID/oauth2/token

Body content type defined in the header:

In the body itself we need to pass to following.

Note; for ADF a body that is URL Encoded may need values escaping. For example, from our secret if it ends with an equals sign, escape to %3D or if it contains a plus sign, escape to %2B .

The POST body will then appear like something like this:

grant_type=client_credentials&client_id=YOUR_SERVICE_PRINCIPAL_ID&client_secret=YOUR_SECRET&resource=https://management.azure.com

Sorry if that’s really dry to walk through!… For a BI person that just wants to control when SSIS packages are ran, without costing the earth, this was a little outside of my normal comfort zone. If only there was some sort of role out there where by administrators of databases and data services could help with this! 🙂

Moving on…

The Until inner Web activity then does a GET to the management API with the job ID output from the up stream ‘Start SSIS IR’ activity appended to the URL dynamically.

Carriage returns for ease of reading only:

https://management.azure.com/
subscriptions/YOUR_SUBSCRIPTION_ID/
resourceGroups/YOUR_RESOURCE_GROUP/
providers/Microsoft.Automation/
automationAccounts/YOUR_AUTOMATION_ACCOUNT/
jobs/@{activity('Start SSIS IR').output.JobIds[0]}?
api-version=2015-10-31

The header in the management call will include the authorisation token handle above as a bearer token, like this:

Bearer @{activity('Get Mgr API Token').output.access_token}

Header key, authorisation, spelt wrong 🙂

Result

When the ADF pipeline runs we now get the required manageable behaviour.

  • Call the runbook PowerShell to start the SSIS IR cluster.
  • Check/poll and wait for the PowerShell (runbook job) to complete.
  • Run the SSIS package(s).
  • Call the runbook PowerShell to stop the SSIS IR cluster.

 

I hope this approach helped and saved you some Azure pennies.

Comments on the management call handling welcome.

Many thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.