Azure Data Factory v1 & v2 Service Principal Authentication for Azure Data Lake

Sadly this post has been born out of frustration, so please accept my apologies from the outset if that comes across in the tone. That said, I imagine anyone reading it will empathise and will have come across the post while searching for an understanding of some inconsistencies when using service principals to authenticate against Azure Data Lake (ADL) using Azure Data Factory (ADF) Linked Service connections.

Let me start by stating that service principal authentication is possible for both ADL storage and ADL analytics from ADF v1 and v2. However, (here come the inconsistencies):

  • In ADFv1 session/token authentication can be used as well as service principals. In ADFv2 we can only use service principals.
  • In ADFv1 service principals are fully supported for the storage. They are not fully supported for ADL analytics. I’ll explain why shortly.
  • In ADFv1 if performing cloud to cloud data movements an execution location is explicitly required for activities if ADF and ADL services are in different Azure regions.
  • In ADFv2 I’m pleased to say service principals work as expected. In ADFv1 it was a feature introduced later and not tested properly.
  • In ADFv2 the Linked Service attribute for Tenant expects the domain ID. In ADFv1 the attribute expects the domain name.
  • In ADFv2 service principal keys are visible as pain text in the JSON under the new secure string attribute (ironic). In ADFv1 they are converted to stars if viewed post deployment.

Storage

For the storage I’m happy and less frustrated to say that all is well. But be aware of point 3 above. The integration runtime auto resolves to the Azure region of the service being called.

ADFv1 JSON


{
"name": "DataLakeStore",
"properties": {
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "Your Value Here",
"accountName": "Your Value Here",
"servicePrincipalId": "Your Value Here",
"servicePrincipalKey": "Your Value Here",
"tenant": "Your Value Here",
"resourceGroupName": "Your Value Here",
"subscriptionId": "Your Value Here"
}
}
}

ADFv2 JSON


{
"name": "DataLakeStore",
"properties": {
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "Your Value Here",
"servicePrincipalId": "Your Value Here",
"servicePrincipalKey": {
"type": "SecureString",
"value": "Your Value Here"
},
"tenant": "Your Value Here",
"subscriptionId": "Your Value Here",
"resourceGroupName": "Your Value Here"
}
}
}

Analytics

This is where the fun starts and the main reason for the blog post. As I’ve already said this is possible for ADFv1 and ADFv2, but is not supported by the ADFv1 developer tools.

ADFv1 JSON


{
"name": "DataLakeCompute",
"properties": {
"type": "AzureDataLakeAnalytics",
"typeProperties": {
"accountName": "beeeyedatalake01a ",
"dataLakeAnalyticsUri": "azuredatalakeanalytics.net",
"servicePrincipalId": "Your Value Here",
"servicePrincipalKey": "Your Value Here",
"tenant": "Your Value Here",
"subscriptionId": "Your Value Here",
"resourceGroupName": "Your Value Here"
}
}
}

If you use this JSON (which is valid) in a Visual Studio 2015 Azure Data Factory project this cannot be deployed using the publishing wizard, it’ll error. This will however build successfully in VS, albeit with some warnings about schema values. See below.

If you also try this in the Azure portal Author and Deploy blades for ADFv1 you’ll be met with the following browser intelli-sense issues.

Currently my advice here is simply to ignore the warnings and click deploy anyway. This will succeed or it’ll take a few attempts to do so.

What I have also encountered is if you change an existing ADL linked service from session/token to use a service principal, this change doesn’t take affect immediately. Some activities will throw errors similar to the below. Sadly under the hood there is something very inconsistent about updating linked services.

The slice (start: 01/08/2018 00:00:00, end: 01/09/2018 00:00:00) failed because these linked services are in failed state: DataLakeCompute. Please update the linked services to fix the provisioning issues and re-run the slice by setting its status to PendingExecution.

I’ve yet to understand why this happens and all I can say is try try again!

Furthermore, you of course can’t delete and replace the linked service because it’s used by existing activities in your ADF pipelines. Therefore, the best and cleanest work around I can offer is deploy a newly named linked service with the service principal. Then redeploy all your activities to point to the new link. Finally, delete the old version.

One thing I can say for sure, if you make the linked service change and you have U-SQL writing to tables in the ADL Analytics services you’ll definitely need to redeploy activities with a new version otherwise you may encounter this error:

Error: Data Lake Analytics service threw unexpected exception: Invalid URI: The hostname could not be parsed..

But it won’t occur straight away! I posted this one on Stack Overflow because the inconsistencies stumped me that much!

https://stackoverflow.com/questions/48230569/data-lake-analytics-invalid-uri-the-hostname-could-not-be-parsed

You see why I said this post was born out of frustration.

ADFv2 JSON


{
"name": "DataLakeCompute",
"properties": {
"type": "AzureDataLakeAnalytics",
"typeProperties": {
"accountName": "Your Value Here",
"servicePrincipalId": "Your Value Here",
"tenant": "Your Value Here",
"subscriptionId": "Your Value Here",
"resourceGroupName": "Your Value Here"
},
"connectVia": {
"referenceName": "Your Value Here",
"type": "IntegrationRuntimeReference"
}
}
}

Add User Wizard

As a side note, when adding any service principal to your Azure Data Lake service be sure to use the new wizard available in the analytics portal blades. It make life much easier.

Rant over I think. I hope you found this useful and that the above eases your pain when living on the bleeding edge of development with services that aren’t yet perfected.

Many thanks for reading.

3 thoughts on “Azure Data Factory v1 & v2 Service Principal Authentication for Azure Data Lake

  1. Working with Azure Data Factory has been an absolute mess so far. I started with v2 and just hit wall after wall. I didn’t realize there were GUI interfaces in v1, maybe I should have started there but geez this has been painful.

    Any mentions of a new release coming or some sort of help on the horizon?

    Like

    1. Hi Dan, I can confirm it is going to get better and easier to work with the service soon. I’m afraid I’m bound by an NDA from saying anymore at this point. Be patient and keep overcoming those barriers for a little while longer. Cheers Paul

      Like

  2. Thank you! Creating/cloning a new (identical) linked service to connect to my data lake store and referencing the new linked service in the impacted datasets worked! Thanks for posting this or I’d have been stuck.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.