Experience

Craft Your Content With Markdown and Matrix

Published on 8th January, 2014

We’re big fans of Markdown here at Experience, so much so that we’ve written MultiMarkdown plugins for both Craft and ExpressionEngine. Everything we write (including this blog) is formatted using MultiMarkdown, and we’ve even found it to be a good fit for a few clients over the years.

The problem

Markdown1 does an excellent job of fulfilling its original brief as a convenient, minimal text formatting syntax, but is a poor fit for more complex elements, such as tables, images with captions, or quotes with citations.

The problem is two-fold:

  1. The syntax for inserting non-textual content is both ugly and forgettable.
  2. The HTML generated by Markdown is frequently less than ideal.

It’s possible to customise the HTML generated by MultiMarkdown using XSLT, but it’s not a simple process, and in practise we’ve found it to be more trouble than it’s worth.

Enter the Matrix

With the introduction of Matrix in Craft 1.3, we finally have a solution to this perennial problem. For those of you unfamiliar with Matrix in Craft, here’s the pitch:

A single Matrix field can have as many types of blocks as needed, which the author can pick and choose from when adding new content. Each block type gets its own set of fields.

In other words, instead of presenting an author with a single Markdown-formatted “article body” field, and some arcane syntax for inserting images and the like, we can instead let him construct his article from a series of content blocks.

The user interface for each such block is tailored to suit the type of content being created, making life much easier for content authors. For example, the “image” content block type for this site lets an author specify the image alignment and caption, without resorting to raw HTML.

Interface for adding an "image" block.

From a development perspective, we now have complete control over the HTML that is generated for each Matrix block type, and can finally achieve nerd nirvana by completely separating content from structure.

The naïve approach

At this point, the steps required to implement this solution probably seem pretty obvious:

  1. Create a Matrix field, with a bunch of different content block types.
  2. Publish an article.
  3. Loop through the article’s content blocks, generating HTML and parsing Markdown as we go.

In reality, there is an unexpected (at least by us) problem with this approach: when parsing each content block separately, Markdown reference links and footnotes no longer work correctly.

Thankfully, there is a simple solution which doesn’t require any additional work on the part of the author.

Our example Matrix field

The solution described in the remainder of this article uses a single Matrix field named articleBody, containing four content block types: Markdown-formatted text; a code “block”; an image; a quote.

Here’s how that looks on the Matrix field settings screen:

The "article body" settings screen.

And here’s how it looks to a content author creating a new article:

Publishing an article.

An initial (partial) solution

The solution to the aforementioned problems with reference links and footnotes is simple: rather than parsing the Markdown as we loop through the content blocks, we instead construct a “raw” Markdown-formatted string, and then parse it once, at the moment of output.

A quick snippet of code may help to clarify that statement:

{# BAD: Reference links and footnotes won't work #}
{% for block in entry.articleBody %}
{% if block.type == 'text' %}
{{ block.text | smartdown }}
{% endif %}
{% endfor %}

{# GOOD: First we construct the string #}
{% set articleBody %}
{% for block in entry.articleBody %}
{% if block.type == 'text' %}
{{ block.body | raw }}
{% endif %}
{% endfor %}
{% endset %}

{#  Then we output it #}
{{ articleBody | smartdown }}

Gotchas and annoyances

There are a couple of peculiarities with the above (partial) solution worth noting:

  1. Indentation: Markdown considers any string which is indented by 4 (or more) characters to be a pre-formatted code block. As such, we need to ensure that we don’t inadvertently add any unwanted whitespace before our Markdown-formatted content, hence the lack of indentation.
  2. The “raw” filter: By default, Twig will auto-escape our Markdown-formatted string, making certain elements (such as links) unintelligible to the Markdown parser. The raw filter tells Twig to leave well alone.

The indentation issue becomes particularly annoying after a while. It greatly affects the readability of our code, and any stray tabs can completely destroy our Markdown formatting. Not a great solution, then.

A better (complete) solution

A far better approach is to use sub-templates for each content block type. On this site, we have the following templates:

  • modules/_matrix.html contains the code which loops through the Matrix content blocks, and loads the appropriate “block type” template. It also handles any Markdown-formatted text fields directly.
  • modules/_blockquote.html renders our “quote” block type.
  • modules/_figure.html renders our “image” block type.
  • modules/_precode.html renders our “code” block type.

Any template which uses our Matrix articleBody field simply loads the modules/_matrix.html template, which does all the hard work. For the purposes of this example, we’ll concentrate on one such template, blog/_post.html.

Now that you know how everything is structured, let’s dive into the code.

blog/_post.html

The relevant section of our blog post template is nice and simple:

{# Step 1: Call the 'matrix' template to generate the raw MD string #}
{% set articleBody %}
    {% include 'modules/_matrix' with {
        'matrix': entry.articleBody
    } %}
{% endset %}

{# Step 2: Parse and output the article body #}
{{ articleBody | smartdown }}

modules/_matrix.html

The “matrix” template is where most of the action takes place. It loops through the supplied Matrix field, determines the content block type, and (where appropriate) loads the appropriate “sub” template:

{% for block in matrix %}

{% if block.type == 'text' %}
{# This is the one place we still need to watch the indentation #}
{{ block.body | raw }}
{% endif %}

{% if block.type == 'blockquote' %}
  {% include 'modules/_blockquote' with {
    'quote': block.quote,
    'source': block.cite,
    'sourceUrl': block.sourceUrl
  } %}
{% endif %}

{% if block.type == 'code' %}
  {% include 'modules/_precode' with {
    'code': block.code,
    'language': block.language
  } %}
{% endif %}

{% if block.type == 'image' %}
  {% include 'modules/_figure' with {
    'alignment': block.alignment,
    'caption': block.caption,
    'image': block.image
  } %}
{% endif %}

{% endfor %}

modules/_blockquote.html

You’ll notice that we’re passing the quote text through the Markdown parser. This makes it easy for content authors to add formatting and links to quotes, but it does mean that footnotes won’t work correctly.

In practise, we’ve never found this to be an issue, but it’s something to bear in mind.

<blockquote>
    <div class="quote">
        {{ quote | smartdown }}
    </div>

    {% if source %}
    <footer>
        <cite>
            {{ sourceUrl ? "<a href='#{sourceUrl}'>" : '' }}
            {{ source }}
            {{ sourceUrl ? '</a>' : '' }}
        </cite>
    </footer>
    {% endif %}
</blockquote>

modules/_figure.html

We do a little bit of extra work to determine the alignment class, and to retrieve the image. Aside from that, the template is once again simplicity itself.

Note that, as with blockquotes, the image caption may not contain footnotes.

{% set alignment = alignment is defined ? "figure--#{alignment}" : 'figure--full' %}
{% set image = image is defined ? image.first() : '' %}
{% set caption = caption is defined ? caption : '' %}

{% if image %}
    <figure class="{{ alignment }}">
        {{ image.getImg() }}
        {% if caption %}
            <figcaption>{{ caption | smartdown }}</figcaption>
        {% endif %}
    </figure>
{% endif %}

modules/_precode.html

Finally, we have our “code” template. At present, we could easily make do without this, and instead rely on the HTML generated by the Markdown parser.

However, the plan is to implement syntax highlighting at some indeterminate point in the future, at which point we’ll be glad of the additional “language” information.

<pre
    {{ language ? "class='language-#{language}'" : '' }}
><code
    {{ language ? "class='language-#{language}'" : '' }}
>{{ code }}</code></pre>

Conclusion

As with most techniques, this one isn’t perfect.

On the plus side, we have:

  1. Freed our content authors from the syntax gymnastics required to add images, tables, and the like to their Markdown-formatted content.
  2. Provided our content authors with an intuitive interface for creating different types of content within a single article.
  3. Gained complete control over the HTML that is generated for each block type.
  4. Successfully separated content from markup, effectively “future proofing” our solution.

In exchange for all of the above, we have a more complex publishing screen. If, like me. you’re accustomed to writing a blog post in Byword (or similar), and then simply copy-pasting it into one big plain text field, this can feel like a step backwards.

On balance, though, it’s something I can personally live with, given the aforementioned advantages. I also suspect that this solution will be far less intimidating for some of our less technically-literate clients.


  1. For the purposes of convenience, all subsequent references to “Markdown” should be read as “Markdown and MultiMarkdown”. ↩︎