Execute Any Azure Data Factory Pipeline with an Azure Function

Following on from a previous blog post that I wrote a few months ago where I got an Azure Data Factory Pipeline run status with an Azure Function (link below). I recently found the need to create something very similar to execute any pipeline from an Azure Function.

https://mrpaulandrew.com/2019/11/21/get-any-azure-data-factory-pipeline-run-status-with-azure-functions/

Happily, this pipeline execution is basically the example provided by Microsoft in the documentation for the Data Factory .NET SDK (link also below). Given this I’m not taking any credit for the bulk of the function code. However, I did need to extend the body of the request for the function to accept any amount of pipeline parameters.

https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-dot-net

The reason for needing such an Azure Function is because currently the Data Factory activity to execute another pipeline is not dynamic. The name of the downstream pipeline called can not be driven by metadata which upsets me greatly, everything should be dynamic 🙂

Replacing this activity with an Azure Function activity is less than ideal as this then presents the following challenges:

  • Making the Azure Function block and wait until the pipeline returns means potentially a long running durable function is required.
  • Calling an Azure Functions mean paying for the additional compute to a achieve the same behaviour which we are already paying for in Data Factory is used directly.
  • Authentication needs to be handled from Data Factory to the Azure Function App and then from the Azure Function back to the same Data Factory. This should be done via our application settings and handled in our release pipelines, rather than passed in the function body. No trolls please, I know.

With an understanding of these important caveats here’s an overview of the solution.

 

Note; I used .NET Core 3.0 for the below function.


Execute Pipeline

For the function itself, hopefully this is fairly intuitive once you’ve created your DataFactoryManagementClient and authenticated.

The only thing to be careful of is not using the CreateOrUpdateWithHttpMessagesAsync method by mistake. Make sure its Create Run. Sounds really obvious, but when you get code drunk names blur together and the very different method overloads will have you confused for hours!…. According to a friend 🙂

Body JSON without Pipeline Parameters


{
"tenantId": "1234-1234-1234-1234-1234",
"applicationId": "1234-1234-1234-1234-1234",
"authenticationKey": "Passw0rd123!",
"subscriptionId": "1234-1234-1234-1234-1234",
"resourceGroup": "CommunityDemos",
"factoryName": "PaulsFunFactoryV2",
"pipelineName": "WaitingPipeline"
}

Body JSON with Pipeline Parameters

The pipeline parameters attributes can contain as many parameters as you want and basically just ingests them into the overloaded method; CreateRunWithHttpMessagesAsync as a Dictionary of string and object.

Data Factory doesn’t validate the parameter names so you can send anything. I just assumes the names passed are identical to the names of the actual pipeline parameters. If so, the values are simply mapped across.

{
"tenantId": "1234-1234-1234-1234-1234",
"applicationId": "1234-1234-1234-1234-1234",
"authenticationKey": "Passw0rd123!",
"subscriptionId": "1234-1234-1234-1234-1234",
"resourceGroup": "CommunityDemos",
"factoryName": "PaulsFunFactoryV2",
"pipelineName": "WaitingPipeline",
"pipelineParameters":
{
"TestParam1": "Frank",
"TestParam2": "Harry"
}
}

Output


{
"PipelineName": "WaitingPipeline",
"RunIdUsed": "0d069026-bcbc-4356-8fe8-316ce5e07134",
"Status": "Succeeded"
}


Just Give Me The Code!

Ok! Here you go… ExecutePipeline.cs

The full solution is in the same Blob Support Content GitHub repository if you’d like to use the Visual Studio Solution.

I hope you found this post helpful.

Many thanks for reading.

 

55 thoughts on “Execute Any Azure Data Factory Pipeline with an Azure Function

  1. Hi Paul.. We have a requirement of parallel execution of ADF pipelines . Our Azure functions run in parallel independently (multiple instances) . Hence we are thinking of using the function instances to trigger the pipeline instance in parallel. The other option is to use the tumbling window trigger . Which option do you suggest is more cost efficient.

    Like

    1. Hi Manish, I think the best answer I can offer is for you to check out my open source code project (Creating a Simple Metadata Driven Framework for Executing Azure Data Factory Pipelines). GitHub link: https://github.com/mrpaulandrew/ADF.procfwk

      This post about Functions executing pipelines was a prerequisite to me creating the framework wrapper. Within the framework pipelines executed within a processing stage will always run in parallel. Check it out and let me know if this solves your problem.
      Cheers
      Paul

      Like

  2. Hi Paul, would you know if it’s possible to execute a pipeline run in Azure Functions using managed identity? Instead of Service Principal? Many thanks.

    Like

      1. Thank you Paul! I wonder if using Azure Functions MI means:
        – We simply assign the MI “data factory contributor” role in ADF
        – No longer need to use SPN and keep SPN details in SQL DB
        – Changing the way ADF pipeline (child) works? no longer needed I suppose?
        – How do we create adf client and execute the pipeline run in Functions? i.e. authentication key?
        Much appreciated for the help!!!

        Like

      2. It’ll probably need the owner role.
        I don’t do this in my processing framework as this limits you to using a single Data Factory for Worker pipelines and it becomes an extra step to handle at deployment time.
        I haven’t looked into how. Let me know

        Liked by 1 person

      3. Hi Paul, I know it’s been a while since you wrote this article but I stumbled on it while looking for how to trigger an ADF using a managed identity. Your code sample uses the id/secret of an SPN. How do I need to change the code to use the managed identity of the running function? That way I don’t have to store/transmit any secrets.

        Like

  3. Hi Paul,
    I am getting the following error when trying to invoke a ADF pipeline from an Azure powershell function: ERROR: Invoke-AzDataFactoryV2Pipeline : Object reference not set to an instance of an object. It looks like the library is missing Az.DataFactory. Any ideas on who to resolve the issue?
    Best Regards, Andrew

    Like

  4. Hi,
    im using the namespace Microsoft.Azure.Management.DataFactory in Azure Function. Function is failing here. How do i add reference to this dll in Azure function. Im using portal for creating the function.

    Like

    1. I recommend using Visual Studio or VSCode to develop the function. It makes the adding of NuGet libraries much easier.

      Like

      1. Hi Paul,
        Im getting error An unhandled exception of type ‘System.AggregateException’ occurred in mscorlib.dll
        at AcquireTokenAsync function call. Authentication Id and Key are correct.
        Please suggest .

        Like

      2. Mmmm, not sure, don’t think I’ve had that one before. Is you function app configured for .Net core rather than .Net framework

        Like

  5. Im going to write Error handling module for ADF pipelines. We are planning to use same module for all pipeline.

    Like

  6. Trying to execute the function to run an existing pipeline fails with error:
    System.Private.CoreLib: Exception while executing function: ADFAutomationExecutor. System.Private.CoreLib: One or more errors occurred. (Client IP not authorized to access the API.). Microsoft.Azure.Management.DataFactory: Client IP not authorized to access the API.

    Please help.

    Like

  7. “The reason for needing such an Azure Function is because currently the Data Factory activity to execute another pipeline is not dynamic. The name of the downstream pipeline called can not be driven by metadata which upsets me greatly, everything should be dynamic”

    Any reason we can’t use the Web Activity or Web Hook in ADF to run another pipeline in another ADF? https://docs.microsoft.com/en-us/rest/api/datafactory/pipelines/createrun

    I am trying it at the moment. I am creating the URL using dynamic content, using the subscription/resource group/data factory name/pipeline name, passing these in as they are stored in a table.

    Like

    1. No reason you can’t. Functions just gives you more control over pipeline parameters being passed and the ability to use Key Vault.
      Logic Apps is the third option.

      Like

      1. Ahhh OK, cool thanks Paul. I guess you just have to set the Body up correctly to pass in parameters.

        I had a wee test and an error that I was missing the Authentication header. I see we can add headers, so will set one up tomorrow when back at work. I take it the best idea with authentication though is to set up a “service account” (service principal), then use that as the authentication mechanism? If I do this, I can’t seem to find how I would set up the authentication header correctly. Do you know or have a sample you can give me? Information was a little sparse on this!

        If I can’t get that working, I will swap to a function app or logic app. Thanks for your help! Brent

        Like

  8. Hi Paul, thanks for the article.

    What is the AuthenticationKey, is this using an AuthenticationKey from an Integration runtime, or is it something that’s created through AAD?

    Like

  9. Hi Paul, Thanks, I am getting below exception while running my code.
    Exception while executing function: ExecutePipeline. TriggerPipeline: Could not load file or assembly ‘Microsoft.IdentityModel.Clients.ActiveDirectory, Version=5.2.9.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35’. Could not find or load a specific file. (Exception from HRESULT: 0x80131621). System.Private.CoreLib: Could not load file or assembly ‘Microsoft.IdentityModel.Clients.ActiveDirectory, Version=5.2.9.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35’.

    Like

  10. Thanks for this article; Very useful.
    I was wondering if there is a way if the azure function can be started just before being called from ADF and then stopped immediately after the call to function is completed?

    Like

  11. Your article is so usefull ! Thanks a lot !
    It seems that my ADF pipelines launched by an Azure Function takes more time to be executed and finished than.
    As Instance : One ADF pipeline containing only a Get MetaData, takes 16 second to be launched and executed instead of 3 sec directly if it’s directly launched normally in ADF.
    Have you already noticed this performance issue ?

    Like

  12. Hi Paul thanks for this article;
    I’m connecting to Power BI Service via a service principal in my azure function with powershell and I’m outpushing the logs in a blob storage.
    I want the processing to be dynamic. But one of the constraints I have is to make the call to these azure function in azure data factory.
    Is there a way to do that?

    Like

  13. Hi Paul

    I am getting the below error when i execute the ‘parent’

    Error code ActionFailed
    Failure type User configuration issue
    Details Activity failed because an inner activity failed
    Source Pipeline 02-Parent
    I am getting inner activity faile on all azure function calls . This ahppened after i was getting ‘Unauthorised’ error which got fixed by giving suitable permission.
    I have already deployed twice but facing the same issue .Please help

    Like

    1. Hi, it sounds like you’ve got a permissions issue somewhere. Either between data factory and the functions app or between the functions app and your key vault. Depending on your setup. Also, try running the functions app locally and triggering it with postman using the same details, this will help with debugging.

      Like

  14. Hi Paul,
    Thanks for this article.
    I want to pass the output parameter to the Azure Datafactory Pipeline.
    So, Can you please suggest the solution?
    eg. I want to pass the parameter is as shown below
    pipelineparameter:
    {
    “emp_id”:1,
    “emp_name”: “Havells”,
    “emp_dept”:”IT”
    }

    When I tried to read that parameter like the below manner
    ‘pipelineRun.Parameters’
    then it gives me an error like this: System.Collections.Generic.Dictionary`2[System.String,System.String]

    Like

  15. Hi, How do we change the name or rather description of the azure pipiline I have one generic pipline called Master and I would like this name to change when I execute it from the function app it that posible by editing the header ?

    Like

  16. hi have one generic pipelne and I that I execute via this process works well what I would like to do is change the name of that pipelne during execution (While calling it) in order to see the run is the gui per name say I execute the generic pipe one of the name must become 1 and the other 2 though the pipeline ins the same one. It his posible maybe by passing headers?

    Like

  17. Hello Paul,

    Thank you for the detailed blog, however if I am blocking the entire public endpoint access where am not able access management.azure.com where adf need to be executed and get the status in that case can we still use this ?

    Like

  18. Hello Paul,
    Just want to check one thing. For calling pipeline dynamically using azure function. Are the pipelines need to be published before running from azure function or it will work if we just save the pipelines & run it using azure function. I am getting an error in function entity not found

    Like

  19. When I try to trigger the ADF pipeline from .Net6 Web App, I get following error – CloudException: The client ” with object id ” does not have authorization to perform action ‘Microsoft.DataFactory/factories/pipelines/createRun/action’ over scope ‘/subscriptions/{subId}/resourceGroups/{resGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelines/{PipelineName}’ or the scope is invalid.
    The client Id mentioned in error is not the App Id.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.