Using Mermaid to Create a ProcFwk Pipeline Lineage Diagram

The other day Kamil (@NowinskiK) messaged me on Teams asking if I liked Mermaid’s ๐Ÿ™‚

I prepared myself for a conversation about which was better; the fish half being on top vs on the bottom. Ha! Of course not!!

Kamil was actually referring to the…

Javascript based diagramming and charting tool that renders Markdown-inspired text definitions to create and modify diagrams dynamically.

GitHub Pages: https://mermaid-js.github.io/mermaid/

GitHub Repo: https://github.com/mermaid-js

I confess until this point, I hadn’t encountered the tool.

The conversation that followed, with Kamil, was about creating a pipeline lineage diagram from implementations of the Azure pipeline orchestration framework (AKA: procfwk, AKA: my baby ๐Ÿ™‚ ).

Given the bespoke natural of the framework, a lot of traditional data cataloging tools can’t simply scrap lineage metadata from Data Factory or Synapse Integration Pipelines directly as the framework means worker pipeline triggering is abstracted way via the control pipelines. They would need to traverse the procfwk metadata model.

To cut a long story short, this traversal is what has now been done. Below you’ll find a T-SQL script for the procfwk database that outputs a Mermaid ‘flowchart’ diagram showing the execution stages for a given batch, with pipelines per stage and across stage pipeline dependencies connected via a dotted line.

To render the graphic we used a Wiki page within Azure DevOps.


I’ll apply a few more tweaks to this and then include it in the next release of the procfwk via it’s own repository, probably wrapped up as a Stored Procedure and better documented.


T-SQL

SET NOCOUNT ON;

--local variables
DECLARE @BatchName VARCHAR(255) = 'Daily'; --set as required

DECLARE @PageContent NVARCHAR(MAX) = '';
DECLARE @BaseData TABLE
	(
	[OrchestratorId] INT NOT NULL,
	[OrchestratorName] NVARCHAR(200) NOT NULL,
	[StageId] INT NOT NULL,
	[StageName] VARCHAR(225) NOT NULL,
	[PipelineId] INT NOT NULL,
	[PipelineName] NVARCHAR(200) NOT NULL
	)


--get reusable metadata
INSERT INTO @BaseData
SELECT
	o.[OrchestratorId],
	o.[OrchestratorName],
	s.[StageId],
	s.[StageName],
	p.[PipelineId],
	p.[PipelineName]
FROM
	[procfwk].[Pipelines] p
	INNER JOIN [procfwk].[Orchestrators] o
		ON p.[OrchestratorId] = o.[OrchestratorId]
	INNER JOIN [procfwk].[Stages] s
		ON p.[StageId] = s.[StageId]
	INNER JOIN [procfwk].[BatchStageLink] bs
		ON s.[StageId] = bs.[StageId]
	INNER JOIN [procfwk].[Batches] b
		ON bs.[BatchId] = b.[BatchId]
WHERE
	p.[Enabled] = 1
	AND b.[BatchName] = @BatchName;

--add orchestrator(s) sub graphs
;WITH orchestrators AS
	(
	SELECT DISTINCT
		[OrchestratorId],
		[OrchestratorName],
		'subgraph ' + [OrchestratorName] + CHAR(13) + 
		'style ' + [OrchestratorName] + ' fill:#E2F0D9,stroke:#E2F0D9' + CHAR(13) + 
		'##o' + CAST([OrchestratorId] * 10000 AS VARCHAR) + '##' + CHAR(13) + 'end' + CHAR(13)
		 AS OrchestratorSubGraphs
	FROM
		@BaseData
	)
SELECT
	@PageContent += OrchestratorSubGraphs
FROM
	orchestrators;

--add stage sub graphs
;WITH stages AS
	(
	SELECT DISTINCT
		[OrchestratorId],
		[StageName],
		[StageId]
	FROM
		@BaseData
	),
	stageSubs AS
	(
	SELECT
		[OrchestratorId],
		STRING_AGG('subgraph ' + [StageName] + CHAR(13) + 
			'style ' + [StageName] + ' fill:#FFF2CC,stroke:#FFF2CC' + CHAR(13) + 
			'##s' + CAST([StageId] AS VARCHAR) + '##' + CHAR(13) + 'end', CHAR(13)
			) AS 'StageSubGraphs'
	FROM
		stages
	GROUP BY
		[OrchestratorId]
	)
SELECT		
	@PageContent = REPLACE(@PageContent,'##o' + CAST([OrchestratorId] * 10000 AS VARCHAR) + '##',[StageSubGraphs])
FROM
	stageSubs;

--add pipelines within stage
;WITH pipelines AS
	(
	SELECT
		[StageId],
		STRING_AGG(
			CONCAT('p',CAST([PipelineId] * 10 AS VARCHAR),'(',[PipelineName],')',CHAR(13),
			'style ','p',CAST([PipelineId] * 10 AS VARCHAR),' fill:#F2F2F2,stroke:#F2F2F2'),CHAR(13)
			) AS 'PipelinesInStage'
	FROM
		@BaseData
	GROUP BY
		[StageId]
	)
SELECT 
	@PageContent = REPLACE(@PageContent,'##s' + CAST([StageId] AS VARCHAR) + '##',[PipelinesInStage])
FROM
	pipelines

--add stage nodes
;WITH stageNodes AS
	(
	SELECT DISTINCT
		[StageId],
		's' + CAST([StageId] * 100 AS VARCHAR) + '[' + [StageName] + ']' + CHAR(13) +
		'style s' + CAST([StageId] * 100 AS VARCHAR) + ' fill:#FFF2CC,stroke:#FFF2CC' + CHAR(13) AS StageNode
	FROM
		@BaseData
	)
SELECT
	@PageContent = @PageContent + [StageNode]
FROM
	stageNodes
ORDER BY
	[StageId];

--add stage to pipeline relationships
SELECT	
	@PageContent = @PageContent + 's' + CAST([StageId] * 100 AS VARCHAR) 
	+ ' --> ' + 'p' + CAST([PipelineId] * 10 AS VARCHAR) + CHAR(13)
FROM
	@BaseData;

--add stage to stage relationships
;WITH maxStage AS
	(
	SELECT 
		MAX([StageId]) -1 AS maxStageId
	FROM
		@BaseData
	),
	stageToStage AS
	(
	SELECT DISTINCT
		's' + CAST(b.[StageId] * 100 AS VARCHAR) 
		+ ' ==> ' + 's' + CAST((b.[StageId] + 1) * 100 AS VARCHAR) + CHAR(13) AS Content
	FROM
		@BaseData b
		CROSS JOIN maxStage
	WHERE
		b.[StageId] <= maxStage.[maxStageId]
	)
SELECT
	@PageContent = @PageContent + [Content]
FROM
	stageToStage;

--add pipeline to pipeline relationships
SELECT 
	@PageContent = @PageContent + 'p' + CAST(pd.[PipelineId] * 10 AS VARCHAR) 
	+ ' -.- ' + 'p' + CAST(pd.[DependantPipelineId] * 10 AS VARCHAR) + CHAR(13)
FROM 
	[procfwk].[PipelineDependencies] pd
	INNER JOIN @BaseData b1
		ON pd.[PipelineId] = b1.[PipelineId]
	INNER JOIN @BaseData b2
		ON pd.[DependantPipelineId] = b2.[PipelineId];

--add batch subgraph
SELECT
	@PageContent = 'subgraph ' + [BatchName] + CHAR(13) +
	'style ' + @BatchName + ' fill:#DEEBF7,stroke:#DEEBF7' + CHAR(13) + @PageContent
FROM
	[procfwk].[Batches]
WHERE
	[BatchName] = @BatchName;

SET @PageContent = @PageContent + 'end';

--add mermaid header
SELECT
	@PageContent = '::: mermaid' + CHAR(13) + 'graph LR' + CHAR(13) + @PageContent;

--return output
PRINT @PageContent

Mermaid (Markdown)

This is an example of the output from the procfwk default metadata.

::: mermaid
graph LR
subgraph Daily
style Daily fill:#DEEBF7,stroke:#DEEBF7
subgraph FrameworkFactory
style FrameworkFactory fill:#E2F0D9,stroke:#E2F0D9
subgraph Extract
style Extract fill:#FFF2CC,stroke:#FFF2CC
p10(Wait 1)
style p10 fill:#F2F2F2,stroke:#F2F2F2
p20(Wait 2)
style p20 fill:#F2F2F2,stroke:#F2F2F2
p30(Intentional Error)
style p30 fill:#F2F2F2,stroke:#F2F2F2
p40(Wait 3)
style p40 fill:#F2F2F2,stroke:#F2F2F2
end
subgraph Load
style Load fill:#FFF2CC,stroke:#FFF2CC
p90(Wait 8)
style p90 fill:#F2F2F2,stroke:#F2F2F2
p100(Wait 9)
style p100 fill:#F2F2F2,stroke:#F2F2F2
end
subgraph Transform
style Transform fill:#FFF2CC,stroke:#FFF2CC
p50(Wait 4)
style p50 fill:#F2F2F2,stroke:#F2F2F2
p60(Wait 5)
style p60 fill:#F2F2F2,stroke:#F2F2F2
p70(Wait 6)
style p70 fill:#F2F2F2,stroke:#F2F2F2
p80(Wait 7)
style p80 fill:#F2F2F2,stroke:#F2F2F2
end
end
subgraph WorkersFactory
style WorkersFactory fill:#E2F0D9,stroke:#E2F0D9
subgraph Serve
style Serve fill:#FFF2CC,stroke:#FFF2CC
p110(Wait 10)
style p110 fill:#F2F2F2,stroke:#F2F2F2
end
end
s100[Extract]
style s100 fill:#FFF2CC,stroke:#FFF2CC
s200[Transform]
style s200 fill:#FFF2CC,stroke:#FFF2CC
s300[Load]
style s300 fill:#FFF2CC,stroke:#FFF2CC
s400[Serve]
style s400 fill:#FFF2CC,stroke:#FFF2CC
s100 --> p10
s100 --> p20
s100 --> p30
s100 --> p40
s200 --> p50
s200 --> p60
s200 --> p70
s200 --> p80
s300 --> p90
s300 --> p100
s400 --> p110
s100 ==> s200
s200 ==> s300
s300 ==> s400
p30 -.- p60
p30 -.- p70
p70 -.- p100
p100 -.- p110
end

2 thoughts on “Using Mermaid to Create a ProcFwk Pipeline Lineage Diagram

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.