Building a Data Mesh Architecture in Azure – Part 5 – Welcome to the Blog of Paul Andrew

Data Mesh vs Azure –
Theory vs practice

Use the tag Data Mesh vs Azure to follow this blog series.

As a reminder, the four data mesh principals:

domain-oriented decentralised data ownership and architecture.
data as a product.
self-serve data infrastructure as a platform.
federated computational governance.

Source reference: https://martinfowler.com/articles/data-mesh-principles.html

A Quick Recap

In Part 1, Part 2 and Part 3 of this blog series we defined our Azure data mesh nodes and edges. With the current conclusion that Azure Resource Groups can house our data product nodes within the mesh and for our edges (interfaces) we’ve established the following working definitions and potential Azure resources:

Primary – data integration/exchange. Provided by a SQL Endpoint.
Secondary – operational reporting and logging. Provided by Azure Log Analytics as a minimum.
Tertiary – resource (network) connectivity. Delivered using Azure VNet Peering in a hub/spoke setup.

Then in Part 4 of the series we explored and concluded how we could template and control the deployment of our nodes using Azure DevOps, Azure Bicep and VS Code.

In part 5, I want to bring the first principal into focus and address a key point about structure in my practical version of a data mesh architecture.

1. Domain-oriented Decentralised Data Ownership and Architecture

When we consider this in the context of what I’ve already established in part 1 of the series, I focused on our data products and ownership. Now I want to re-introduce our data domains as a level above our data products. We can even consider this a hierarchy.

Data Domains
- Data Products

Why?

The argument of scalability within the data mesh is clear. However, I think its short sighted to assume that we stop scaling at the level of data domains. We can go further, which has always been the basis for me defining data products (nodes) within the mesh as Azure Resource Groups.

Data domains on their own can be huge. In the majority of the examples I’ve seen about the data mesh architecture, data domains are established at a very coarse high level. Which often means the granularity/scalability is lost. For example, finance as a data domain.

To finalise the point, would we really have a single Data Lake storage account for all finance data? I don’t think so. To summarise the problem, why stop provisioning scalable infrastructure at the domain level.

My solution when moving from theory to practice is to keep the data domain concept. But deployed as an Azure Subscriptions. Then provision data products as Azure Resource Groups within the domain subscription.

Let’s draw these domains around our existing data product instructure.

To translate the hierarchy:

Data Domains
- Data Products

Azure Subscriptions
- Azure Resource Groups

Scale out with domains as Azure Subscriptions.

Scale out within the products as Azure Resource Groups.

This ultimately means decentralised ownership can also be established at two levels as well, if needed.

That concludes this part of the blog series. Short and sweet. But required to make an important point about the theory of data domains and data products vs the practical way I see that they should be handled in Azure.

Many thanks for reading.

6 thoughts on “Building a Data Mesh Architecture in Azure – Part 5”

Leave a comment Cancel reply

About Me

mrpaulandrew

Paul (AKA @mrpaulandrew) is the Founder & CTO of Cloud Formations, a specialist data consultancy based in the UK. With nearly 20 years’ experience designing and delivering Microsoft data architectures, Paul leads a passionate team of engineers, supporting businesses small and large with scalable cloud platforms. Business value delivered through data insights. Over the years, Paul has covered the breadth and depth of design patterns and industry leading concepts, including Lambda, Kappa, Delta Lake, Data Mesh and Data Fabric. Paul is also a Microsoft Data Platform MVP, director for the Data Relay community conference, East Midlands user group leader, book author and mentor. In addition to the day job(s), Paul is a father of three, husband, foodie, runner, blood donor, geek, Lego, and Star Wars fan! Lastly, Paul confesses to enjoying a Ramstein playlist when given half a chance to do some coding for a customer project.