Structuring Your Databricks Notebooks with Markdown, Titles, Widgets and Comments

Just a short post following a recent question I got from my delivery team… Are there any best practices for structuring our Databricks Notebooks in terms of code comments and markdown? Having done a little Googling I simply decided to whip up a quick example that could be adopted as a technical standard for the team going forward.

For me, one of the hardest parts of developing anything is when you need to pick up and rework code that has been created by someone else. That said, my preferred Notebook structure shown below is not about technical performance or anything complicated. This is simply for ease of sharing and understanding, as well as some initial documentation for work done.

In my example I created a Scala Notebook, but this could of course apply to any flavour.

The key things I would like to see in a Notebook are:

Markdown Headings – including the Notebook title, who created it, why, input and output details. We might also have references to external resources and maybe a high level version history. I created this in a table via the markdown and injected a bit of HTML too for the bullet points.
Common Code – where boiler plate code is used I like to have this in a set of common Notebooks that are ran to establish a framework for any proceeding content.
Widgets – if required I expect all widgets to be created and referenced near the top of the Notebook. Maybe with some defensive checks on values passed.
Cell Titles – all cells within the Notebook should include a title to support there propose in the overall script.
Logging – in most cases we should have a framework for outputting log information to a central location, via Application Insights or even just a SQLDB table.
Comments – probably the most important thing to include in all code is the comments. This should not be text for the sake of it. Or text that simply translates from code to English. This should be small amounts of narrative explaining why. What was the thinking behind a certain line or condition. If hard coded values have to be used, what do they mean in the wider business logic. When writing comments in code, I think to myself, what would the next person that reads this want to know?

Graphically these are shown in my simple example Notebook below. Free feel to also download this Scala file from my GitHub repository. Notebook Example.scala

If you think this was useful, or if you know of other best practices for structuring a Notebook I’d be interested to know so please leave a comment.

Many thanks for reading.

9 thoughts on “Structuring Your Databricks Notebooks with Markdown, Titles, Widgets and Comments”

This is awesome! Thanks for sharing.

LikeLike

Pingback: Structuring Databricks Notebooks – Curated SQL

Pingback: My Script for Peer Reviewing Code – Welcome to the Technical Community Blog of Paul Andrew

Hi Paul,
how are you putting the cell titles. This is not working for me.
//DBTITLE 1,Set & Get Widgets
dbutils.widgets.text(“RunDate”,””)

LikeLike

mrpaulandrew says:

June 17, 2021 at 6:48 pm

The markdown for the Notebooks may have changed since I did this.
I’ll check.

LikeLike

Reply
1. raqsof says:
  
  June 24, 2021 at 9:52 am
  
  %md # 1.TITLE
  %md ## 1.1.SUBTITLE
  
  LikeLike

Thanks for sharing awesome content! Is it possible to share the common libraries notebook and also the CreateLogEntry function?

LikeLike

:)..this is great, thanks Paul!

LikeLike

:)..this is simple and helpful..thanks Paul!

LikeLike

Leave a comment Cancel reply

About Me

mrpaulandrew

Paul (AKA @mrpaulandrew) is the Founder & CTO of Cloud Formations, a specialist data consultancy based in the UK. With nearly 20 years’ experience designing and delivering Microsoft data architectures, Paul leads a passionate team of engineers, supporting businesses small and large with scalable cloud platforms. Business value delivered through data insights. Over the years, Paul has covered the breadth and depth of design patterns and industry leading concepts, including Lambda, Kappa, Delta Lake, Data Mesh and Data Fabric. Paul is also a Microsoft Data Platform MVP, director for the Data Relay community conference, East Midlands user group leader, book author and mentor. In addition to the day job(s), Paul is a father of three, husband, foodie, runner, blood donor, geek, Lego, and Star Wars fan! Lastly, Paul confesses to enjoying a Ramstein playlist when given half a chance to do some coding for a customer project.

Keat says:

November 28, 2019 at 10:08 pm

This is awesome! Thanks for sharing.

LikeLike

Pingback: Structuring Databricks Notebooks – Curated SQL
Pingback: My Script for Peer Reviewing Code – Welcome to the Technical Community Blog of Paul Andrew
Alivia Banerjee says:

June 14, 2021 at 2:45 am

Hi Paul,
how are you putting the cell titles. This is not working for me.
//DBTITLE 1,Set & Get Widgets
dbutils.widgets.text(“RunDate”,””)

LikeLike

1. mrpaulandrew says:
  
  June 17, 2021 at 6:48 pm
  
  The markdown for the Notebooks may have changed since I did this.
  I’ll check.
  
  LikeLike
  
  1. raqsof says:
    
    June 24, 2021 at 9:52 am
    
    %md # 1.TITLE
    %md ## 1.1.SUBTITLE
    
    LikeLike
Baatch says:

September 7, 2021 at 11:47 am

Thanks for sharing awesome content! Is it possible to share the common libraries notebook and also the CreateLogEntry function?

LikeLike

Rash says:

December 14, 2021 at 4:35 am

:)..this is great, thanks Paul!

LikeLike

Rashbel says:

December 14, 2021 at 4:38 am

:)..this is simple and helpful..thanks Paul!

LikeLike