Just a short post following a recent question I got from my delivery team… Are there any best practices for structuring our Databricks Notebooks in terms of code comments and markdown? Having done a little Googling I simply decided to whip up a quick example that could be adopted as a technical standard for the team going forward.
For me, one of the hardest parts of developing anything is when you need to pick up and rework code that has been created by someone else. That said, my preferred Notebook structure shown below is not about technical performance or anything complicated. This is simply for ease of sharing and understanding, as well as some initial documentation for work done.
In my example I created a Scala Notebook, but this could of course apply to any flavour.
The key things I would like to see in a Notebook are:
- Markdown Headings – including the Notebook title, who created it, why, input and output details. We might also have references to external resources and maybe a high level version history. I created this in a table via the markdown and injected a bit of HTML too for the bullet points.
- Common Code – where boiler plate code is used I like to have this in a set of common Notebooks that are ran to establish a framework for any proceeding content.
- Widgets – if required I expect all widgets to be created and referenced near the top of the Notebook. Maybe with some defensive checks on values passed.
- Cell Titles – all cells within the Notebook should include a title to support there propose in the overall script.
- Logging – in most cases we should have a framework for outputting log information to a central location, via Application Insights or even just a SQLDB table.
- Comments – probably the most important thing to include in all code is the comments. This should not be text for the sake of it. Or text that simply translates from code to English. This should be small amounts of narrative explaining why. What was the thinking behind a certain line or condition. If hard coded values have to be used, what do they mean in the wider business logic. When writing comments in code, I think to myself, what would the next person that reads this want to know?
Graphically these are shown in my simple example Notebook below. Free feel to also download this Scala file from my GitHub repository. Notebook Example.scala
If you think this was useful, or if you know of other best practices for structuring a Notebook I’d be interested to know so please leave a comment.
Many thanks for reading.