md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
git clone https://noulin.net/git/md4c.git
Log | Files | Refs | README | LICENSE

README.md (6878B)


      1 [![Build status (travis-ci.com)](https://img.shields.io/travis/mity/md4c/master.svg?label=linux%20build)](https://travis-ci.org/mity/md4c)
      2 [![Build status (appveyor.com)](https://img.shields.io/appveyor/ci/mity/md4c/master.svg?label=windows%20build)](https://ci.appveyor.com/project/mity/md4c/branch/master)
      3 [![Coverity Scan Build Status](https://img.shields.io/coverity/scan/mity-md4c.svg?label=coverity%20scan)](https://scan.coverity.com/projects/mity-md4c)
      4 [![Codecov](https://img.shields.io/codecov/c/github/mity/md4c/master.svg?label=code%20coverage)](https://codecov.io/github/mity/md4c)
      5 
      6 # MD4C Readme
      7 
      8 * Home: http://github.com/mity/md4c
      9 * Wiki: http://github.com/mity/md4c/wiki
     10 
     11 MD4C stands for "Markdown for C" and, unsurprisingly, it is a C Markdown parser
     12 implementation.
     13 
     14 
     15 ## What is Markdown
     16 
     17 In short, Markdown is the markup language this `README.md` file is written in.
     18 
     19 The following resources can explain more if you are unfamiliar with it:
     20 * [Wikipedia article](http://en.wikipedia.org/wiki/Markdown)
     21 * [CommonMark site](http://commonmark.org)
     22 
     23 
     24 ## What is MD4C
     25 
     26 MD4C is C Markdown parser with the following features:
     27 
     28 * **Compliance:** Generally MD4C aims to be compliant to the latest version of
     29   [CommonMark specification](http://spec.commonmark.org/). Right now we are
     30   fully compliant to CommonMark 0.28.
     31 
     32 * **Extensions:** MD4C supports some commonly requested and accepted extensions.
     33   See below.
     34 
     35 * **Compactness:** MD4C is implemented in one source file and one header file.
     36 
     37 * **Embedding:** MD4C is easy to reuse in other projects, its API is very
     38   straightforward: There is actually just one function, `md_parse()`.
     39 
     40 * **Push model:** MD4C parses the complete document and calls callback
     41   functions provided by the application for each start/end of block, start/end
     42   of a span, and with any textual contents.
     43 
     44 * **Portability:** MD4C builds and works on Windows and Linux, and it should
     45   be fairly simple to make it run also on most other systems.
     46 
     47 * **Encoding:** MD4C can be compiled to recognize ASCII-only control characters,
     48   UTF-8 and, on Windows, also UTF-16, i.e. what is on Windows commonly called
     49   just "Unicode". See more details below.
     50 
     51 * **Permissive license:** MD4C is available under the MIT license.
     52 
     53 * **Performance:** MD4C is very fast. Preliminary tests show it's quite faster
     54   then [Hoedown](https://github.com/hoedown/hoedown) or
     55   [Cmark](https://github.com/jgm/cmark).
     56 
     57 
     58 ## Using MD4C
     59 
     60 The parser is implemented in a single C source file `md4c.c` and its
     61 accompanying header `md4c.h`.
     62 
     63 The main provided function is `md_parse()`. It takes a text in Markdown syntax
     64 as an input and a pointer to renderer structure which holds pointers to few
     65 callback functions.
     66 
     67 As `md_parse()` processes the input, it calls the appropriate callbacks
     68 allowing application to convert it into another format or render it onto
     69 the screen.
     70 
     71 More comprehensive guide can be found in the header `md4c.h` and also
     72 on [MD4C wiki](http://github.com/mity/md4c/wiki).
     73 
     74 Example implementation of simple renderer is available in the `md2html`
     75 directory which implements a conversion utility from Markdown to HTML.
     76 
     77 
     78 ## Markdown Extensions
     79 
     80 The default behavior is to recognize only elements defined by the [CommonMark
     81 specification](http://spec.commonmark.org/).
     82 
     83 However with appropriate renderer flags, the behavior can be tuned to enable
     84 some extensions or allowing some deviations from the specification.
     85 
     86  * With the flag `MD_FLAG_COLLAPSEWHITESPACE`, non-trivial whitespace is
     87    collapsed into a single space.
     88 
     89  * With the flag `MD_FLAG_TABLES`, GitHub-style tables are supported.
     90 
     91  * With the flag `MD_FLAG_STRIKETHROUGH`, strikethrough spans are enabled
     92    (text enclosed in tilde marks, e.g. '~foo bar~').
     93 
     94  * With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS` permissive URL autolinks
     95    (not enclosed in `<` and `>`) are supported.
     96 
     97  * With the flag `MD_FLAG_PERMISSIVEAUTOLINKS`, ditto for e-mail autolinks.
     98 
     99  * With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS` permissive WWW autolinks
    100    (without any scheme specified; `http:` is assumed) are supported.
    101 
    102  * With the flag `MD_FLAG_NOHTMLSPANS` or `MD_FLAG_NOHTML`, raw inline HTML
    103    or raw HTML blocks respectively are disabled.
    104 
    105  * With the flag `MD_FLAG_NOINDENTEDCODEBLOCKS`, indented code blocks are
    106    disabled.
    107 
    108 
    109 ## Input/Output Encoding
    110 
    111 The CommonMark specification generally assumes UTF-8 input, but under closer
    112 inspection, Unicode plays any role in few very specific situations when parsing
    113 Markdown documents:
    114 
    115   * For detection of word boundary when processing emphasis and strong emphasis,
    116     some classification of Unicode character (whitespace, punctuation) is used.
    117 
    118   * For (case-insensitive) matching of a link reference with corresponding link
    119     reference definition, Unicode case folding is used.
    120 
    121   * For translating HTML entities (e.g. `&amp;`) and numeric character
    122     references (e.g. `&#35;` or `&#xcab;`) into their Unicode equivalents.
    123     However MD4C leaves this translation on the renderer/application; as the
    124     renderer is supposed to really know output encoding and whether it really
    125     needs to perform this kind of translation. (Consider that a renderer
    126     converting Markdown to HTML may leave the entities untranslated and defer
    127     the work to a web browser.)
    128 
    129 MD4C relies on this property of the CommonMark and the implementation is, to
    130 a large degree, encoding-agnostic. Most of MD4C code only assumes that the
    131 encoding of your choice is compatible with ASCII, i.e. that the codepoints
    132 below 128 have the same numeric values as ASCII.
    133 
    134 Any input MD4C does not understand is simply seen as part of the document text
    135 and sent to the renderer's callback functions unchanged.
    136 
    137 The two situations where MD4C has to understand Unicode are handled accordingly
    138 to the following preprocessor macros:
    139 
    140  * If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8
    141    for word boundary detection and case-folding.
    142 
    143  * On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses
    144    `WCHAR` instead of `char` and assumes UTF-16 encoding in those situations.
    145    (UTF-16 is what Windows developers usually call just "Unicode" and what
    146    Win32API works with.)
    147 
    148  * By default (when none of the macros is defined), ASCII-only mode is used
    149    even in the specific situations. That effectively means that non-ASCII
    150    whitespace or punctuation characters won't be recognized as such and that
    151    case-folding is performed only on ASCII letters (i.e. `[a-zA-Z]`).
    152 
    153 (Adding support for yet another encodings should be relatively simple due
    154 the isolation of the respective code.)
    155 
    156 
    157 ## License
    158 
    159 MD4C is covered with MIT license, see the file `LICENSE.md`.
    160 
    161 
    162 ## Reporting Bugs
    163 
    164 If you encounter any bug, please be so kind and report it. Unheard bugs cannot
    165 get fixed. You can submit bug reports here:
    166 
    167 * http://github.com/mity/md4c/issues