Data Mesh vs Azure –
Theory vs practice
Use the tag Data Mesh vs Azure to follow this blog series.
As a reminder, the four data mesh principals:
- domain-oriented decentralised data ownership and architecture.
- data as a product.
- self-serve data infrastructure as a platform.
- federated computational governance.
Source reference: https://martinfowler.com/articles/data-mesh-principles.html
A Quick Recap
In Part 1, Part 2 and Part 3 of this blog series we defined our Azure data mesh nodes and edges. With the current conclusion that Azure Resource Groups can house our data product nodes within the mesh and for our edges (interfaces) we’ve established the following working definitions and potential Azure resources:
- Primary – data integration/exchange. Provided by a SQL Endpoint.
- Secondary – operational reporting and logging. Provided by Azure Log Analytics as a minimum.
- Tertiary – resource (network) connectivity. Delivered using Azure VNet Peering in a hub/spoke setup.
Then in Part 4 of the series we explored and concluded how we could template and control the deployment of our nodes using Azure DevOps, Azure Bicep and VS Code.
In part 5, I want to bring the first principal into focus and address a key point about structure in my practical version of a data mesh architecture.
1. Domain-oriented Decentralised Data Ownership and Architecture
When we consider this in the context of what I’ve already established in part 1 of the series, I focused on our data products and ownership. Now I want to re-introduce our data domains as a level above our data products. We can even consider this a hierarchy.
- Data Domains
- Data Products
The argument of scalability within the data mesh is clear. However, I think its short sighted to assume that we stop scaling at the level of data domains. We can go further, which has always been the basis for me defining data products (nodes) within the mesh as Azure Resource Groups.
Data domains on their own can be huge. In the majority of the examples I’ve seen about the data mesh architecture, data domains are established at a very coarse high level. Which often means the granularity/scalability is lost. For example, finance as a data domain.
To finalise the point, would we really have a single Data Lake storage account for all finance data? I don’t think so. To summarise the problem, why stop provisioning scalable infrastructure at the domain level.
My solution when moving from theory to practice is to keep the data domain concept. But deployed as an Azure Subscriptions. Then provision data products as Azure Resource Groups within the domain subscription.
Let’s draw these domains around our existing data product instructure.
To translate the hierarchy:
- Data Domains
- Data Products
- Azure Subscriptions
- Azure Resource Groups
Scale out with domains as Azure Subscriptions.
Scale out within the products as Azure Resource Groups.
This ultimately means decentralised ownership can also be established at two levels as well, if needed.
That concludes this part of the blog series. Short and sweet. But required to make an important point about the theory of data domains and data products vs the practical way I see that they should be handled in Azure.
Many thanks for reading.