Execute Any Azure Data Factory Pipeline with an Azure Function

Following on from a previous blog post that I wrote a few months ago where I got an Azure Data Factory Pipeline run status with an Azure Function (link below). I recently found the need to create something very similar to execute any pipeline from an Azure Function.

https://mrpaulandrew.com/2019/11/21/get-any-azure-data-factory-pipeline-run-status-with-azure-functions/

Happily, this pipeline execution is basically the example provided by Microsoft in the documentation for the Data Factory .NET SDK (link also below). Given this I’m not taking any credit for the bulk of the function code. However, I did need to extend the body of the request for the function to accept any amount of pipeline parameters.

https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-dot-net

The reason for needing such an Azure Function is because currently the Data Factory activity to execute another pipeline is not dynamic. The name of the downstream pipeline called can not be driven by metadata which upsets me greatly, everything should be dynamic 🙂

Replacing this activity with an Azure Function activity is less than ideal as this then presents the following challenges:

  • Making the Azure Function block and wait until the pipeline returns means potentially a long running durable function is required.
  • Calling an Azure Functions mean paying for the additional compute to a achieve the same behaviour which we are already paying for in Data Factory is used directly.
  • Authentication needs to be handled from Data Factory to the Azure Function App and then from the Azure Function back to the same Data Factory. This should be done via our application settings and handled in our release pipelines, rather than passed in the function body. No trolls please, I know.

With an understanding of these important caveats here’s an overview of the solution.

 

Note; I used .NET Core 3.0 for the below function.


Execute Pipeline

For the function itself, hopefully this is fairly intuitive once you’ve created your DataFactoryManagementClient and authenticated.

The only thing to be careful of is not using the CreateOrUpdateWithHttpMessagesAsync method by mistake. Make sure its Create Run. Sounds really obvious, but when you get code drunk names blur together and the very different method overloads will have you confused for hours!…. According to a friend 🙂

Body JSON without Pipeline Parameters


{
"tenantId": "1234-1234-1234-1234-1234",
"applicationId": "1234-1234-1234-1234-1234",
"authenticationKey": "Passw0rd123!",
"subscriptionId": "1234-1234-1234-1234-1234",
"resourceGroup": "CommunityDemos",
"factoryName": "PaulsFunFactoryV2",
"pipelineName": "WaitingPipeline"
}

Body JSON with Pipeline Parameters

The pipeline parameters attributes can contain as many parameters as you want and basically just ingests them into the overloaded method; CreateRunWithHttpMessagesAsync as a Dictionary of string and object.

Data Factory doesn’t validate the parameter names so you can send anything. I just assumes the names passed are identical to the names of the actual pipeline parameters. If so, the values are simply mapped across.

{
"tenantId": "1234-1234-1234-1234-1234",
"applicationId": "1234-1234-1234-1234-1234",
"authenticationKey": "Passw0rd123!",
"subscriptionId": "1234-1234-1234-1234-1234",
"resourceGroup": "CommunityDemos",
"factoryName": "PaulsFunFactoryV2",
"pipelineName": "WaitingPipeline",
"pipelineParameters":
{
"TestParam1": "Frank",
"TestParam2": "Harry"
}
}

Output


{
"PipelineName": "WaitingPipeline",
"RunIdUsed": "0d069026-bcbc-4356-8fe8-316ce5e07134",
"Status": "Succeeded"
}


Just Give Me The Code!

Ok! Here you go… ExecutePipeline.cs

The full solution is in the same Blob Support Content GitHub repository if you’d like to use the Visual Studio Solution.

I hope you found this post helpful.

Many thanks for reading.

 

5 thoughts on “Execute Any Azure Data Factory Pipeline with an Azure Function

  1. Hi Paul.. We have a requirement of parallel execution of ADF pipelines . Our Azure functions run in parallel independently (multiple instances) . Hence we are thinking of using the function instances to trigger the pipeline instance in parallel. The other option is to use the tumbling window trigger . Which option do you suggest is more cost efficient.

    Like

    1. Hi Manish, I think the best answer I can offer is for you to check out my open source code project (Creating a Simple Metadata Driven Framework for Executing Azure Data Factory Pipelines). GitHub link: https://github.com/mrpaulandrew/ADF.procfwk

      This post about Functions executing pipelines was a prerequisite to me creating the framework wrapper. Within the framework pipelines executed within a processing stage will always run in parallel. Check it out and let me know if this solves your problem.
      Cheers
      Paul

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.