Custom Markdown extensions in Python

October 17, 2014

Markdown, that unwieldy, undying, but still relevant writing format has been through some controversy recently. There are tons of custom implementations and extensions, with Github flavored markdown being one of the more popular dialects. GFM is a prime example of how Markdown can be enhanced with meaningful extensions - from URL auto-linking, tables, to task lists.

One of the reasons for alternate implementations is that Markdown doesn’t have an official spec. Unfortunately, attempts to standardize the format has not been blessed by the original creator, John Gruber.

So why not have some fun and create our own Markdown extension! For this exercise, we will try and mimic the task list syntax, which is also similar to the orgmode checkbox list. We have to transform the following syntax:

- [ ] Some important task
- [x] A task that is already done
- [ ] Another thing for me to do

into a list with checkboxes:

Markdown task list

Let us begin! We start off by creating a custom markdown class, extending from markdown2.Markdown. Fortunately for us, markdown2 offers pre and post processing hooks, which we can use to plug in our formatting rules. Since our syntax is similar to Markdown’s list syntax, we format it during preprocessing.

import markdown2

class CustomMarkdown(markdown2.Markdown):

  ''' Our custom markdown class '''
  def preprocess(self, text):
	  pass

Next up, we have to define the regex and corresponding HTML template for matching and replacing.

reTaskList = re.compile('''
(?P<prefix>[\r\n|\n|\r]*)
-\s\[(?P<done>[x|\s]?)\]\s*(?P<item>.*)
(?P<suffix>[\r\n|\n|\r]*)
''', re.IGNORECASE | re.MULTILINE | re.VERBOSE)

LIST_ITEM_TEMPLATE = '''
<li class="markdown-task-item %s">
  <label class="markdown-task-label">
	<input type="checkbox" %s>
	%s
  </label>
</li>'''

Our replacing logic is simple. For every match in the pattern, we replace it with the li definition in the template. If a list item is checked with [x], we make sure that the checkbox is also checked. Also, we add a class denoting if the item is pending or completed. This can come handy for formatting.

def preprocess(self, text):
  def replace(match):
    item = match.groups()
    html = ''
    # The starting of the list if denoted by the first group having 2 or more newline chars
    if len(item[0]) >= 2:
      html += '\n<ul class="md-task-list">'

    # Now, toggle the checked status
    checked, klass = ('checked="checked"', 'completed') if item[1].lower() == 'x' else ('', 'pending')
    html += CustomMarkdown.LIST_ITEM_TEMPLATE % (klass, checked, item[2])

    # Similarly, check for ending
    if len(item[3]) >= 2:
      html += '</ul>\n'

    return html

  return CustomMarkdown.reTaskList.sub(replace, text)

And as a final step, define the actual method that will use our CustomMarkdown processor.

def markdown(text, html4tags=False, tab_width=markdown2.DEFAULT_TAB_WIDTH, safe_mode=None, extras=None, link_patterns=None, use_file_vars=False):

	return CustomMarkdown(html4tags=html4tags, tab_width=tab_width, safe_mode=safe_mode, extras=extras, link_patterns=link_patterns, use_file_vars=use_file_vars).convert(text)

Find the entire code and a working example in this gist.

comments powered by Disqus