Using the ADFv2 Until Activity

Hello friends, with this post I wanted to give you a quick ‘how to’ guide when working with the Azure Data Factory (ADF) v2 Until activity… Having implemented the activity several times in production I’ve repeatedly found the current Microsoft documentation here falling a little short in its explanations and examples. So, I’ve created my own.

What is the Until activity?

Well, as the Microsoft people to tell us;

The Until activity provides the same functionality that a do-until looping structure provides in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true.

This is fine and we understand that, but we aren’t using a programming language. We are implementing an orchestration service controlled using JSON. So, in the context of ADF I feel we need a little more information here about how we construct our pipelines via the developer UI and given that environment how do we create a conditional recursive set of activities.

create a conditional recursive set of activities.

create a conditional recursive set of activities.

create a conditional recursive set of activities.

Ok, bad joke! Break! 🙂

In ADF the Until activity acts as a parent component and wrapper for the iterations with inner child activities forming the block of things to iterate over. It is then the output(s) from those inner child activities that we need to use in the parent Until expression to establish if another iteration is required. Or, if the pipeline can continue.

For me this was initially a little confusing. ADF after all is an orchestrator that invokes other Azure things. But with an Until activity we now need to use the output of those things to set ADF’s own control flow behaviour. Comments welcome here. Is this a chicken/egg situation, or is it just me?

Anyway, let’s move on. The Until expression MUST evaluate to either true or false. How or what we evaluate here doesn’t matter. It can be any value we require to inform another iteration.

  • True = the Until activity can break from the looping, be given a ‘Succeeded’ state and the pipeline can proceed.
  • False = the Until activity will perform another iteration of its inner activities.

For completeness, there is also a third state. Failure. If, for whatever reason, the Until activity fails or times out. Then normal pipeline runs apply. Although a failure of an Until inner activity doesn’t have to fail the parent and can be used as a condition in the expression! Let’s not go there now.

Given the above understanding, let’s move on and apply it. Like me you may learn better by doing.

Scenario

You have a pipeline that copies data from an on premises SQL Server into Azure Data Lake Storage. The pipeline will be triggered on a schedule at midnight every day. However, third party upstream systems mean the SQL tables may not actually be ready at midnight and we need to wait “until” they are. The SQL database tables are loaded locally using SQL Agent jobs, therefore we want to check when a particular agent job has completed before the upload into Azure starts.

The scenario data flow in picture form:

Solution

Ok, let’s build this! We’ll go quickly with the first part, copying data into Azure. Hopefully that’s fairly familiar to most of you. If not, please check out this Microsoft tutorial for doing a similar copy of data using ADF from Azure SQLDB.

https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database


1. Firstly, we need a couple of ADF linked services with credentials allowing us to access our source and target systems. We’ll skip the Hosted IR setup.


2. Next, our datasets for the copy.

Source

A SQL Server table in my local instance.

Sink

A CSV file in an Azure Data Lake Storage account.

I reused an old dataset here for ADLs that had expressions for the directory and filename. Just assume these will be static values. Not important for now.


3. Create a simple ADF pipeline with a Copy activity using the above source and sink.

Run the copy on its own first as a test of the data movement.

That’s the setup very quickly done. To recap, we have an ADF pipeline including a Copy activity.

  • The source dataset is a table from my local SQL Server instance via a hosted IR and linked service.
  • The sink is a file in an Azure Data Lake Storage account with a linked service using a service principal to authenticate.

Now the fun part… Let’s add an Until activity so we can handle the remaining requirements in the scenario above, waiting for some local SQL Agent job to complete before loading the data into Azure.

We’ve said the pipeline is going to be triggered daily and that we can use the TSQL GETDATE() in a query that checks the state of the agent for today. However, we may want to consider driving this from a pipeline parameter. Just a thought for another time. I won’t digress.

4. The agent job and check query. For the purpose of this blog I have an agent job called ‘LocalLoader’ with a single step that is writing some dummy values into my source table.

To check the status of this job I’ve written a quick TSQL query using todays date to return the jobs status last time it ran. I’m using the job history table from MSDB so I’m cheating and not detecting if the job is running or not started, but instead I’ve added a union in the results to a static value. Mainly because I always want something returned to ADF. No tolls please, I accept this query isn’t perfect given the other MSDB tables available. There are plenty of better agent job check queries out there. The point of this post is about the ADF Until activity… I’ll make the query small so you can’t read it without enlarging 🙂 it’ll do for this example.


5. Data Factory. Finally let’s get to the point. The context for needing the Until activity has been firmly set.

Add the Until activity before your Copy activity. Nothing has been configured in it yet, apart from renaming.


6. Build the child components first. Within the Until activity add a Lookup activity and a Wait activity.

  • The lookup is going to run the SQL Agent check query above.
  • The wait will just form part of the iteration block. This totally optional and more of a cost consideration for your ADF activity executions. In my case I added a wait of 2 minutes between checks (executions of the lookup query). There is also some argument here if the wait should come before the lookup or after. Again, I won’t digress.


7. Give the Lookup activity a dataset to use and a TSQL query (the one above or something else). The dataset for the lookup can be the same as your source if you want as we are overriding it with a query. Or create a new one. Either way, the Lookup really just requires a connection to SQL Server via the linked service.

Note; the TSQL query returns the field ‘JobOutcome’

You’ll see in the screen shot, I’ve set the Lookup activity to return the ‘First Row Only’. This will affect the output JSON that we use later. My query has ‘TOP 1’ added so this is already implied. However, we want to be explicit and tell the Lookup activity that is doesn’t need to return an array of values.


8. Now, let’s go back up a level and configure the Until activity itself. This is where the secret sauce comes in. We need to capture the output of the inner Lookup activity and use it as a condition that can be evaluated to true or false by the Until expression. AKA the programmatic Do Until something is X.

The TSQL query will return ‘Succeeded’ or ‘Unknown’. We therefore need to compare the returned value using the ADF expression function ‘equals’ and to capture the Lookup output we need to use the undocumented ADF expression function ‘body’.

  • ‘Equals’ compares any two values, returning true if they are the same and false if they aren’t.
  • ‘Body’ (from what I can tell) allows us to query the output of another activity that isn’t a direct descendant of the current control flow. To try and clarify with a comparison, if we had a Lookup activity going into a ForEach activity we would do @activity('Query MSDB').output.firstRow.JobOutcome. This syntax doesn’t work in this context. We have to use ‘Body’.

The correct expression should therefore look like this, with the question mark after the Body parentheses:

@equals('Succeeded',body('Query MSDB')?.output.firstRow.JobOutcome)

In my equals expression the first value is a static string ‘Succeeded’ and the second value is the result from the TSQL query.

Apply this to your Until activity expression using the ‘Add Dynamic Content’ panel, as below.


9. Run the pipeline and check the output.

Use the pipeline actions to ensure your Lookup activity output is as expected and that your hitting the correct level of the JSON in the Until expression.


That’s it for now. I hope you found this post helpful.

In my next post I want to build on this knowledge of using ‘Until’ and passing around activity outputs to control the start up and stopping of an SSIS IR cluster. Watch this space!

Many thanks for reading.

2 thoughts on “Using the ADFv2 Until Activity

  1. Hey Paul,

    Thanks for the wonderful post. I could not stop laughing on your jokes. 😉 No I am kidding. 😉

    However I do have a question here.

    Now I am aware about how to use body() tag in the UNTIL condition. However, do you have any idea how to update the pipeline parameter inside of the pipeline. Basically the response which I am getting through a web activity has pagination i.e. it gives the link/endpoint of the 2nd page inside the response of the first page. Hence if its the last page then the link tag in the response would be empty.

    Now that I have put the required condition in the UNTIL expression using body(), I am not sure how should I update the url parameter after the web activity with the url of the next page. Any idea?

    Like

    1. Hi Gagandeep, thanks for the comment. Sadly, I’m not aware of any way to update a pipeline parameter with such values at runtime. We recently had a very similar scenario internally. We decided to first work out the maximum number of pages, then use a ForEach activity to loop over each page in turn. The URL to the API became dynamic with page number passed from a Lookup activity array that fed into the ForEach. The approach isn’t ideal but it worked. Alternatively, use an Azure Automation Runbook with some PowerShell to concatenate the page results into some else first and using a WebHook to hit the Runbook from ADF, but this would also require some management of the Runbook as the call from the Web activity would be asynchronous. In short, I wouldn’t use Until to do this. Hope this helps. Good luck. Paul

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.