md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
git clone https://noulin.net/git/md4c.git
Log | Files | Refs | README | LICENSE

commit 8a5402740faf8c180605d1455d5dd873462477d8
parent a930e46fc6ffff3c7655d1ee200bb4777cb15007
Author: Martin Mitas <mity@morous.org>
Date:   Thu, 24 Nov 2016 15:40:01 +0100

README.md: Add section about encoding.

Diffstat:
MREADME.md | 26+++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md @@ -64,7 +64,6 @@ Example implementation of simple renderer is available in the `md2html` directory which implements a conversion utility from Markdown to HTML. - ## Extensions By default, MD4C recognizes only elements defined by CommonMark specification. @@ -83,6 +82,31 @@ Currently, these extensions are available: disabled. +## Support Encodings + +The CommonMark specification generally assumes UTF-8 input, but under closer +inspection Unicode is actually used on very few occasions. + +MD4C uses this property of the standard and its implementation is to a large +degree encoding-agnostic, just with the assumption the encoding of your choice +is compatible with ASCII. + +By default MD4C simply only understands the ASCII characters as those making +the marks in the document, and all the other input (the text) is provided +as it is on the input. + +That said, the Unicode is supported too: + + * If you predefine macro `MD4C_USE_UNICODE`, MD4C performs parsing of UTF-8 + locally where it does matter. + + * On Windows, if you predefine macro `MD4C_USE_WIN_UNICODE`, MD4C shall use + `WCHAR` instead of `char` and will assume UTF16-LE encoding. + +It should be relatively easy to add support for any other encoding, as long as +its codepoints below 128 are compatible with ASCII. + + ## License MD4C is covered with MIT license, see the file `LICENSE.md`.