README.md: Improve wording. - md4c - C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.

commit a4d4f4638f2d3b9db1a0ee0a898ed0355777509a
parent 09ae86095f902c672e408ffba710138612ad4f1d
Author: Martin Mitas <mity@morous.org>
Date:   Mon, 12 Dec 2016 18:04:14 +0100

README.md: Improve wording.

Diffstat:
M README.md  | 49 +++++++++++++++++++++++++++----------------------

1 file changed, 27 insertions(+), 22 deletions(-)
diff --git a/README.md b/README.md
@@ -103,39 +103,44 @@ some extensions or allowing some deviations from the specification.
 ## Input/Output Encoding
 
 The CommonMark specification generally assumes UTF-8 input, but under closer
-inspection Unicode is actually used on very few occasions:
+inspection, Unicode plays any role in few very specific situations when parsing
+Markdown documents:
 
-  * Classification of Unicode character as a Unicode whitespace or Unicode
-    punctuation. This is used for detection of word boundary when processing
-    emphasis and strong emphasis.
+  * For detection of word boundary when processing emphasis and strong emphasis,
+    some classification of Unicode character (whitespace, punctuation) is used.
 
-  * Unicode case folding. This is used to perform case-independent matching
-    of link labels when resolving reference links.
+  * For (case-insensitive) matching of a link reference with corresponding link
+    reference definition, Unicode case folding is used.
 
-  * Translating HTML entities and numeric character references (e.g. `&amp;`,
-    `&#35;`). However MD4C leaves the translation on the renderer/application;
-    as the renderer is supposed to really know output encoding.
+  * For translating HTML entities (e.g. `&amp;`) and numeric character
+    references (e.g. `&#35;` or `&#xcab;`) into their Unicode equivalents.
+    However MD4C leaves this translation on the renderer/application; as the
+    renderer is supposed to really know output encoding and whether it really
+    needs to perform this kind of translation. (Consider that a renderer
+    converting Markdown to HTML may leave the entities untranslated and defer
+    the work to a web browser.)
 
-MD4C uses this property of the standard and its implementation is, to a large
-degree, encoding-agnostic. Most of the code only assumes that the encoding of
-your choice is compatible with ASCII, i.e. that the codepoints below 128 have
-the same numeric values as ASCII.
+MD4C relies on this property of the CommonMark and the implementation is, to
+a large degree, encoding-agnostic. Most of MD4C code only assumes that the
+encoding of your choice is compatible with ASCII, i.e. that the codepoints
+below 128 have the same numeric values as ASCII.
 
-All input MD4C does not understand is seen as a text and sent to the callbacks
-unchanged.
+Any input MD4C does not understand is simply seen as part of the document text
+and sent to the renderer's callback functions unchanged.
 
-The behavior of MD4C in the isolated listed situations where the encoding
-really matters is determined by preprocessor macros:
+The two situations where MD4C has to understand Unicode are handled accordingly
+to the following preprocessor macros:
 
  * If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8
-   in the specific situations.
+   for word boundary detection and case-folding.
 
- * On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C assumes
-   UTF-16 and uses `WCHAR` instead of `char`. (UTF-16 is what Windows
-   developers usually call just "Unicode" and what Win32API works with.)
+ * On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses
+   `WCHAR` instead of `char` and assumes UTF-16 encoding in those situations.
+   (UTF-16 is what Windows developers usually call just "Unicode" and what
+   Win32API works with.)
 
  * By default (when none of the macros is defined), ASCII-only mode is used
-   even in the situations listed above. This effectively means that non-ASCII
+   even in the specific situations. That effectively means that non-ASCII
    whitespace or punctuation characters won't be recognized as such and that
    case-folding is performed only on ASCII letters (i.e. `[a-zA-Z]`).

	md4c C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
	git clone https://noulin.net/git/md4c.git
	Log \| Files \| Refs \| README \| LICENSE