Execute Any Azure Data Factory Pipeline with an Azure Function

Following on from a previous blog post that I wrote a few months ago where I got an Azure Data Factory Pipeline run status with an Azure Function (link below). I recently found the need to create something very similar to execute any pipeline from an Azure Function.

https://mrpaulandrew.com/2019/11/21/get-any-azure-data-factory-pipeline-run-status-with-azure-functions/

Happily, this pipeline execution is basically the example provided by Microsoft in the documentation for the Data Factory .NET SDK (link also below). Given this I’m not taking any credit for the bulk of the function code. However, I did need to extend the body of the request for the function to accept any amount of pipeline parameters.

https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-dot-net

The reason for needing such an Azure Function is because currently the Data Factory activity to execute another pipeline is not dynamic. The name of the downstream pipeline called can not be driven by metadata which upsets me greatly, everything should be dynamic 🙂

Replacing this activity with an Azure Function activity is less than ideal as this then presents the following challenges:

  • Making the Azure Function block and wait until the pipeline returns means potentially a long running durable function is required.
  • Calling an Azure Functions mean paying for the additional compute to a achieve the same behaviour which we are already paying for in Data Factory is used directly.
  • Authentication needs to be handled from Data Factory to the Azure Function App and then from the Azure Function back to the same Data Factory. This should be done via our application settings and handled in our release pipelines, rather than passed in the function body. No trolls please, I know.

With an understanding of these important caveats here’s an overview of the solution.

 

Note; I used .NET Core 3.0 for the below function.


Execute Pipeline

For the function itself, hopefully this is fairly intuitive once you’ve created your DataFactoryManagementClient and authenticated.

The only thing to be careful of is not using the CreateOrUpdateWithHttpMessagesAsync method by mistake. Make sure its Create Run. Sounds really obvious, but when you get code drunk names blur together and the very different method overloads will have you confused for hours!…. According to a friend 🙂

Body JSON without Pipeline Parameters


{
"tenantId": "1234-1234-1234-1234-1234",
"applicationId": "1234-1234-1234-1234-1234",
"authenticationKey": "Passw0rd123!",
"subscriptionId": "1234-1234-1234-1234-1234",
"resourceGroup": "CommunityDemos",
"factoryName": "PaulsFunFactoryV2",
"pipelineName": "WaitingPipeline"
}

Body JSON with Pipeline Parameters

The pipeline parameters attributes can contain as many parameters as you want and basically just ingests them into the overloaded method; CreateRunWithHttpMessagesAsync as a Dictionary of string and object.

Data Factory doesn’t validate the parameter names so you can send anything. I just assumes the names passed are identical to the names of the actual pipeline parameters. If so, the values are simply mapped across.

{
"tenantId": "1234-1234-1234-1234-1234",
"applicationId": "1234-1234-1234-1234-1234",
"authenticationKey": "Passw0rd123!",
"subscriptionId": "1234-1234-1234-1234-1234",
"resourceGroup": "CommunityDemos",
"factoryName": "PaulsFunFactoryV2",
"pipelineName": "WaitingPipeline",
"pipelineParameters":
{
"TestParam1": "Frank",
"TestParam2": "Harry"
}
}

Output


{
"PipelineName": "WaitingPipeline",
"RunIdUsed": "0d069026-bcbc-4356-8fe8-316ce5e07134",
"Status": "Succeeded"
}


Just Give Me The Code!

Ok! Here you go… ExecutePipeline.cs

The full solution is in the same Blob Support Content GitHub repository if you’d like to use the Visual Studio Solution.

I hope you found this post helpful.

Many thanks for reading.

 

26 thoughts on “Execute Any Azure Data Factory Pipeline with an Azure Function

  1. Hi Paul.. We have a requirement of parallel execution of ADF pipelines . Our Azure functions run in parallel independently (multiple instances) . Hence we are thinking of using the function instances to trigger the pipeline instance in parallel. The other option is to use the tumbling window trigger . Which option do you suggest is more cost efficient.

    Like

    1. Hi Manish, I think the best answer I can offer is for you to check out my open source code project (Creating a Simple Metadata Driven Framework for Executing Azure Data Factory Pipelines). GitHub link: https://github.com/mrpaulandrew/ADF.procfwk

      This post about Functions executing pipelines was a prerequisite to me creating the framework wrapper. Within the framework pipelines executed within a processing stage will always run in parallel. Check it out and let me know if this solves your problem.
      Cheers
      Paul

      Like

  2. Hi Paul, would you know if it’s possible to execute a pipeline run in Azure Functions using managed identity? Instead of Service Principal? Many thanks.

    Like

      1. Thank you Paul! I wonder if using Azure Functions MI means:
        – We simply assign the MI “data factory contributor” role in ADF
        – No longer need to use SPN and keep SPN details in SQL DB
        – Changing the way ADF pipeline (child) works? no longer needed I suppose?
        – How do we create adf client and execute the pipeline run in Functions? i.e. authentication key?
        Much appreciated for the help!!!

        Like

      2. It’ll probably need the owner role.
        I don’t do this in my processing framework as this limits you to using a single Data Factory for Worker pipelines and it becomes an extra step to handle at deployment time.
        I haven’t looked into how. Let me know

        Liked by 1 person

  3. Hi Paul,
    I am getting the following error when trying to invoke a ADF pipeline from an Azure powershell function: ERROR: Invoke-AzDataFactoryV2Pipeline : Object reference not set to an instance of an object. It looks like the library is missing Az.DataFactory. Any ideas on who to resolve the issue?
    Best Regards, Andrew

    Like

  4. Hi,
    im using the namespace Microsoft.Azure.Management.DataFactory in Azure Function. Function is failing here. How do i add reference to this dll in Azure function. Im using portal for creating the function.

    Like

    1. I recommend using Visual Studio or VSCode to develop the function. It makes the adding of NuGet libraries much easier.

      Like

      1. Hi Paul,
        Im getting error An unhandled exception of type ‘System.AggregateException’ occurred in mscorlib.dll
        at AcquireTokenAsync function call. Authentication Id and Key are correct.
        Please suggest .

        Like

      2. Mmmm, not sure, don’t think I’ve had that one before. Is you function app configured for .Net core rather than .Net framework

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.