Building a Data Mesh Architecture in Azure – Part 4 – Welcome to the Blog of Paul Andrew

Data Mesh vs Azure –
Theory vs practice

Use the tag Data Mesh vs Azure to follow this blog series.

As a reminder, the four data mesh principals:

domain-oriented decentralised data ownership and architecture.
data as a product.
self-serve data infrastructure as a platform.
federated computational governance.

Source reference: https://martinfowler.com/articles/data-mesh-principles.html

A Quick Recap

In Part 1, Part 2 and Part 3 of this blog series we defined our Azure data mesh nodes and edges. With the current conclusion that Azure Resource Groups can house our data product nodes within the mesh and for our edges (interfaces) we’ve established the following working definitions and potential Azure resources:

Primary – data integration/exchange. Provided by a SQL Endpoint.
Secondary – operational reporting and logging. Provided by Azure Log Analytics as a minimum.
Tertiary – resource (network) connectivity. Delivered using Azure VNet Peering in a hub/spoke setup.

Moving onto part 4 of the series I want to now focus on the third data mesh principal.

3. Self-Serve Data Infrastructure as a Platform

This principal is very broad, so I want to break down the theory vs practice as before. The idea of self-service is always a goal in any data platform and the normal thing for analytics is to focus on this within the context of our data consumption. Whereby a semantic layer technology can be used in a friendly business orientated, drag-drop type environment to create dashboards or whatever.

However, my interpretation of ‘self-serve’ for a data mesh architecture goes further than just the dashboard creation use case. This should not just apply at the data consumption layer, but all layers within the solution and for clarify, not just related to the data itself. Hence the term in this principal ‘data infrastructure as a platform’. This then unlocks the deeper implication of this serving for a data product, all abstracts of the platform can be consumed in a self-service manner from a series of predefined assets. Let’s think about this serving more like an internal marketplace or catalogue of assets for delivering everything the data product needs to enable a new node within the wider data mesh.

The other advantage of taking this stance, often missed, is that you reduce the human effort required for the delivery of a data product. Abstraction means a senior developer can create templates/assets that enable many less technical business users and more generic engineers to deploy data products in a simplified configuration driven way. This ultimately then unblocks some of the resourcing challenges faced within the industry regarding demands for skills.

Policies & Platform Wrappers

To facilitate this requirement, I want to focus on (and extend) what I labelled in my overall data mesh diagram as the ‘Policies & Platform Wrappers’, which sit under the ‘Core Services’.

Then, to further break down the practical implications of this I’m going to focus on the infrastructure. Deploying whatever is required by a data product within a respective node.

What flavour a Git repository should we use?
What flavour of resource template should we work with? ARM, Terraform, Bicep.
What other assets should we include in a template marketplace?
How should the deployment hang together as a pipeline?
What gateways are required when approving a new data product?
Does this depend on a green field project or some version of brown field?

In this area I’ve called upon my very excellent beardy friend Mr Rob Sewell to offer an opinion in this space when it comes to how we deliver a ‘data platform as a service’ at an infrastructure level with templates/assets. Building on the questions I’ve raised above.

Deploying a Data Platform as a Service

Thank you, Paul. Is deploying the correct word here? Or building? or CI/CD? or (whisper it) DevOps? Whichever terminology we are going to use I would like you to think about is as automation. Moving away from next, next, next or click, click, click to defining infrastructure in code and deploying it in a consistent and reliable manner. There are many MANY reasons for doing this that are outside of the scope of this blog series in addition to Paul’s requirement for abstraction, in the context of a data mesh architecture.

Let’s take the best-case scenario and make some assumptions, as follows:

We are only deploying to Azure.
We have complete control over all infrastructure and approval gates.
We are in a totally green field project.

With those restrictions in place, then we can begin to define the requirements for the tooling that will be best placed to do this. As we write, today in February 2022, my choices would be the following:

Azure DevOps – Source Control, Approval Gates, Artifact Creation, Release Pipeline. For controlling the state of our infrastructure in code and enabling consistent and reliable deployment of known infrastructure and changes to infrastructure.
- Enabling collaborative development. (Azure Repos).
- Peer reviewed changes and approval gates (Pull Requests and Pipelines).
- Automatic validation of changes (Continuous Integration pipeline integration in PRs).
- Automated controlled reportable deployment (Azure Pipelines).
- Wiki pages supporting documentation and knowledge sharing (Markdown).
Azure Bicep – Domain Specific Language for defining Azure Infrastructure which compares the existing infrastructure in Azure with the required infrastructure (code) and makes the required changes.
- Modules – template definition of base infrastructure, data product nodes (Resource Groups) and individual resource types.
- Version controlled infrastructure artifacts (Azure Bicep Repository).
- Naming convention enforcement.
- Data location restrictions and other organisational requirements enforcement.
Visual Studio Code – Cross Platform IDE
- Linter and examples for Bicep and Azure DevOps.

The responsible team uses this tooling to develop a marketplace that data teams/owners can use to create or alter the required Resource Groups (self-service of data product nodes).

The artifacts for the marketplace reside in a Bicep Repository –https://docs.microsoft.com/en-us/azure/azure-resource-manager/bicep/private-module-registry and are retrieved using references in Bicep code that the data teams create which is executed in Azure DevOps pipelines.

A key component of this machinery is the documentation of each artifact versions enabling the data teams to self-serve the required version of the module without additional communications. Example Azure DevOps pipelines should be provided and even template pipeline jobs can be added to the repository. Template pipelines are especially important outside of the ‘ideal world’ scenario here where additional configuration or interaction with other services (registration in DNS or IP Address range for example) is required. The simpler this is the better, time spent here is well spent.

In this green field world that we arbitrarily decided we could use here, we will have three types of deployments:

Base Infrastructure – Azure Networks, DNS, Azure Active Directory resources such as groups and service principals, Log Analytics (assuming best practice of a centralised Log Analytics per subscription is defined).
Resource Group – For creating a specified set of resources for a data product node.
Resource – for addition of additional capability to a Resource Group.

This means that:

A services team create and document the artifacts to deploy the building blocks of the infrastructure required.
The data team consume these artifacts to create the required data mesh architecture using Azure Pipelines which will deploy the infrastructure automatically whenever code is changed.

The defined template for each type of node, interface, and edge is the building block for the data services and deploying the artifact or using the template pipeline with correct parameters ensures consistency and repeatable deployments of infrastructure which can be integrated with pipelines deploying the code and dashboards for the…

Wait? What?

Of course, there needs to be additional validation, approval, and oversight in this workflow. For example, budget holders need to approve spend, security teams need to validate that the required policies are followed. These can all be built into the automation.

Real World

Of course, the real world is not like the rose glassed view above. For tooling, use the source control and CI/CD pipelines that are already in use, if the organisation uses GitHub Enterprise, Gitlab, Confluence and Jenkins then building a silo that uses Azure DevOps has no benefit. Existing organisational knowledge and experience of Terraform and ARM Templates should be considered before enforcing Bicep, although ARM templates and Bicep can co-exist much easier than Bicep and Terraform.

The biggest real-world obstacle in the above assumptions is the control of the entire infrastructure. Undoubtedly, when creating the Data Mesh infrastructure, you will interact with teams, people, and processes responsible for portions of the infrastructure consumed by the mesh or that it interacts with. Collaboration is important to ensure that it is as easy as possible to enable self-service of Data Mesh infrastructure. Naming conventions of resources that are consumed by the pipelines, reduction of manual input and task queues from disparate remote teams are certainly factors to consider.

Lastly, I would be remiss if I didn’t mention PowerShell. In this (highly likely) real-world situation PowerShell is always a go to friend for adding the ‘glue’ to a set of deployment tasks. Ideally everything would be handled by our template configuration. But, again likely, due to resource dependencies or whatever, a final scripted stage of updates maybe required for a given data product.

Thanks Rob for your insight and wisdom 🙂

Conclusions

Regardless of tooling and technology used. The objective here remains the same. To build a template marketplace that enables data teams to self-serve the infrastructure required to create and alter a data mesh node (data product). We can therefore represent this in the following generic form as a workflow within the ‘Platform Wrappers’ portion of our overall data mesh architecture.

Otherwise, to draw our data mesh architecture in a real-world environment, we could represent it, something like the below. Depending on your preferred icon stack (tooling).

Generic or real, visually, we achieve the practical result informed by the third principal of the data mesh architecture, where we started in this post.

Finally, to counter argue this good stuff with our data product infrastructure deployment wrappers. Is this abstraction a bad thing? Are we making the self-service element too easy and risking carelessness or a lack of accountability? Particularly considering Azure consumption costs if, for example, our template interface allows the easy provisioning a 50x Azure Analysis Services instances (costing circa £1,000 a month depending on your service tier and uptime). In this situation, governance and process must rule, with clear approval gateways built into each data product provisioned.

It depends? 🙂

Many thanks for reading, stay tuned for more in this series of blogs.