commit 8a5402740faf8c180605d1455d5dd873462477d8
parent a930e46fc6ffff3c7655d1ee200bb4777cb15007
Author: Martin Mitas <mity@morous.org>
Date: Thu, 24 Nov 2016 15:40:01 +0100
README.md: Add section about encoding.
Diffstat:
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
@@ -64,7 +64,6 @@ Example implementation of simple renderer is available in the `md2html`
directory which implements a conversion utility from Markdown to HTML.
-
## Extensions
By default, MD4C recognizes only elements defined by CommonMark specification.
@@ -83,6 +82,31 @@ Currently, these extensions are available:
disabled.
+## Support Encodings
+
+The CommonMark specification generally assumes UTF-8 input, but under closer
+inspection Unicode is actually used on very few occasions.
+
+MD4C uses this property of the standard and its implementation is to a large
+degree encoding-agnostic, just with the assumption the encoding of your choice
+is compatible with ASCII.
+
+By default MD4C simply only understands the ASCII characters as those making
+the marks in the document, and all the other input (the text) is provided
+as it is on the input.
+
+That said, the Unicode is supported too:
+
+ * If you predefine macro `MD4C_USE_UNICODE`, MD4C performs parsing of UTF-8
+ locally where it does matter.
+
+ * On Windows, if you predefine macro `MD4C_USE_WIN_UNICODE`, MD4C shall use
+ `WCHAR` instead of `char` and will assume UTF16-LE encoding.
+
+It should be relatively easy to add support for any other encoding, as long as
+its codepoints below 128 are compatible with ASCII.
+
+
## License
MD4C is covered with MIT license, see the file `LICENSE.md`.