md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
git clone https://noulin.net/git/md4c.git
Log | Files | Refs | README | LICENSE

commit 809e611b3c6647d6c7ec4d29434410fb49227d4f
parent d5a8c6995b605beb902388c1075edae1473e2464
Author: Martin Mitas <mity@morous.org>
Date:   Sun, 20 Nov 2016 00:57:32 +0100

Migrate to CommonMark pecification 0.27.

Diffstat:
MREADME.md | 2+-
Mmd4c/md4c.c | 3+--
Mtest/spec.txt | 115+++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
3 files changed, 83 insertions(+), 37 deletions(-)

diff --git a/README.md b/README.md @@ -68,7 +68,7 @@ directory which implements a conversion utility from Markdown to HTML. The goal is be compliant to the latest version of [CommonMark specification](http://spec.commonmark.org/). -The list below corresponds to chapters of the specification version 0.26 and +The list below corresponds to chapters of the specification version 0.27 and more or less forms our to do list. - **Preliminaries:** diff --git a/md4c/md4c.c b/md4c/md4c.c @@ -4030,8 +4030,7 @@ redo_indentation_after_blockquote_mark: goto done; } - /* Check whether we are ATX header. - * (We check the indentation to fix http://spec.commonmark.org/0.26/#example-40) */ + /* Check whether we are ATX header. */ if(line->indent < ctx->code_indent_offset && CH(off) == _T('#')) { unsigned level; diff --git a/test/spec.txt b/test/spec.txt @@ -1,8 +1,8 @@ --- title: CommonMark Spec author: John MacFarlane -version: 0.26 -date: '2016-07-15' +version: 0.27 +date: '2016-11-18' license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' ... @@ -1985,7 +1985,7 @@ by their start and end conditions. The block begins with a line that meets a [start condition](@) (after up to three spaces optional indentation). It ends with the first subsequent line that meets a matching [end condition](@), or the last line of -the document or other [container block](@), if no line is encountered that meets the +the document or other [container block]), if no line is encountered that meets the [end condition]. If the first line meets both the [start condition] and the [end condition], the block will contain just that line. @@ -2015,7 +2015,8 @@ followed by one of the strings (case-insensitive) `address`, `article`, `aside`, `base`, `basefont`, `blockquote`, `body`, `caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`, `dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`, -`footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`, +`footer`, `form`, `frame`, `frameset`, +`h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `head`, `header`, `hr`, `html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`, `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `section`, `source`, `summary`, `table`, `tbody`, `td`, @@ -3636,11 +3637,11 @@ The following rules define [list items]: If the list item is ordered, then it is also assigned a start number, based on the ordered list marker. - Exceptions: When the list item interrupts a paragraph---that - is, when it starts on a line that would otherwise count as - [paragraph continuation text]---then (a) the lines *Ls* must - not begin with a blank line, and (b) if the list item is - ordered, the start number must be 1. + Exceptions: When the first list item in a [list] interrupts + a paragraph---that is, when it starts on a line that would + otherwise count as [paragraph continuation text]---then (a) + the lines *Ls* must not begin with a blank line, and (b) if + the list item is ordered, the start number must be 1. For example, let *Ls* be the lines @@ -4730,8 +4731,7 @@ takes four spaces (a common case), but diverge in other cases. A [list](@) is a sequence of one or more list items [of the same type]. The list items -may be separated by single [blank lines], but two -blank lines end all containing lists. +may be separated by any number of blank lines. Two list items are [of the same type](@) if they begin with a [list marker] of the same type. @@ -4809,10 +4809,11 @@ Foo `Markdown.pl` does not allow this, through fear of triggering a list via a numeral in a hard-wrapped line: -```````````````````````````````` markdown +``` markdown The number of windows in my house is 14. The number of doors is 6. -```````````````````````````````` +``` + Oddly, though, `Markdown.pl` *does* allow a blockquote to interrupt a paragraph, even though the same considerations might apply. @@ -4821,10 +4822,12 @@ In CommonMark, we do allow lists to interrupt paragraphs, for two reasons. First, it is natural and not uncommon for people to start lists without blank lines: - I need to buy - - new shoes - - a coat - - a plane ticket +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` Second, we are attracted to a @@ -4836,20 +4839,24 @@ Second, we are attracted to a (Indeed, the spec for [list items] and [block quotes] presupposes this principle.) This principle implies that if - * I need to buy - - new shoes - - a coat - - a plane ticket +``` markdown + * I need to buy + - new shoes + - a coat + - a plane ticket +``` is a list item containing a paragraph followed by a nested sublist, as all Markdown implementations agree it is (though the paragraph may be rendered without `<p>` tags, since the list is "tight"), then - I need to buy - - new shoes - - a coat - - a plane ticket +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` by itself should be a paragraph followed by a nested sublist. @@ -5671,6 +5678,16 @@ single spaces, just as they would be by a browser: ```````````````````````````````` +Not all [Unicode whitespace] (for instance, non-breaking space) is +collapsed, however: + +```````````````````````````````` example +`a  b` +. +<p><code>a  b</code></p> +```````````````````````````````` + + Q: Why not just leave the spaces, since browsers will collapse them anyway? A: Because we might be targeting a non-HTML format, and we shouldn't rely on HTML-specific rendering assumptions. @@ -6558,7 +6575,7 @@ Note that in the preceding case, the interpretation is precluded by the condition that a delimiter that -can both open and close (like the `*` after `foo` +can both open and close (like the `*` after `foo`) cannot form emphasis if the sum of the lengths of the delimiter runs containing the opening and closing delimiters is a multiple of 3. @@ -6590,12 +6607,6 @@ omitted: ```````````````````````````````` -```````````````````````````````` example -*foo**bar*** -. -<p><em>foo<strong>bar</strong></em></p> -```````````````````````````````` - Indefinite levels of nesting are possible: ```````````````````````````````` example @@ -7361,6 +7372,16 @@ may be used in titles: ```````````````````````````````` +Titles must be separated from the link using a [whitespace]. +Other [Unicode whitespace] like non-breaking space doesn't work. + +```````````````````````````````` example +[link](/url "title") +. +<p><a href="/url%C2%A0%22title%22">link</a></p> +```````````````````````````````` + + Nested balanced quotes are not allowed without escaping: ```````````````````````````````` example @@ -8025,7 +8046,8 @@ following closing bracket: ```````````````````````````````` -Full references take precedence over shortcut references: +Full and compact references take precedence over shortcut +references: ```````````````````````````````` example [foo][bar] @@ -8036,6 +8058,31 @@ Full references take precedence over shortcut references: <p><a href="/url2">foo</a></p> ```````````````````````````````` +```````````````````````````````` example +[foo][] + +[foo]: /url1 +. +<p><a href="/url1">foo</a></p> +```````````````````````````````` + +Inline links also take precedence: + +```````````````````````````````` example +[foo]() + +[foo]: /url1 +. +<p><a href="">foo</a></p> +```````````````````````````````` + +```````````````````````````````` example +[foo](not a link) + +[foo]: /url1 +. +<p><a href="/url1">foo</a>(not a link)</p> +```````````````````````````````` In the following case `[bar][baz]` is parsed as a reference, `[foo]` as normal text: @@ -9045,7 +9092,7 @@ blocks. But we cannot close unmatched blocks yet, because we may have a [lazy continuation line]. 2. Next, after consuming the continuation markers for existing -blocks, we look for new block starts (e.g. `>` for a block quote. +blocks, we look for new block starts (e.g. `>` for a block quote). If we encounter a new block start, we close any blocks unmatched in step 1 before creating the new block as a child of the last matched block.