sync doc 2020-10-09

wacky6 · wacky6 · commit e78fdb413095 · 2020-10-09T16:10:02.000+11:00
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+.DS_Store
+
diff --git a/explainer.md b/explainer.md
@@ -5,37 +5,41 @@ Authors: Jiewei Qian <qjw@google.com>, Matt Giuca <mgiuca@chromium.org>, Jon Nap
 
 ## Overview
 
-Handwriting is a widely used input method, one key usage is to recognize the texts when users are drawing. This feature already exists on many operating systems (e.g. handwriting input methods). However, the web platform as of today can't tap into this capability. Developers need to integrate with third-party libraries (or cloud services), or to develop native apps.
+Handwriting is a widely used input method, one key usage is to recognize the texts when users are drawing. This feature already exists on many operating systems (e.g. handwriting input methods). However, the web platform as of today doesn’t have this capability, the developers need to integrate with third-party libraries (or cloud services), or to develop native apps.
 
-We want to add handwriting recognition capability to the web platform, so developers can use the it, which is readily available on the operating system.
+We want to add handwriting recognition capability to the web platform, so developers can use the existing handwriting recognition features available on the operating system.
 
-This document describes our proposal for a Web Platform API for performing on-line handwriting recognition from recorded real-time user inputs (e.g. touch, stylus). This API does not aim to support recognizing handwritings in images (off-line recognition).
-
-The term “on-line” means the API recognizes the text as the users are drawing them. Specifically, the handwriting contains temporal information (e.g. the pen-tip is at position A at time T). 
-
-The recognizer will be able to function without the Internet, but might make use of cloud services (if available) to improve the result.
-
-This API allow Web applications to:
-
-1. Collect some handwriting inputs (ink strokes)
-2. Request the user agent to recognize the texts
-3. Retrieve the result
-    *   Recognized texts as a JavaScript String
-    *   Optional: Alternative results
-    *   Optional: Extra information (e.g. character segmentation)
+This document describes our proposal for a Web Platform API for performing on-line handwriting recognition from recorded real-time user inputs (e.g. touch, stylus).  The term “on-line” means the API recognizes the text as the users are drawing them.
 
 
 ## Problem Description
 
 Conceptually, handwriting inputs are drawings. A drawing captures the information required to recreate a pen-tip movement for text recognition purposes. Take the handwritten “WEB” for example:
 
-![Handwriting Concept](images/handwriting-concept.svg)
+<img src="images/handwriting-concept.svg" style="width: 95% display: block; margin: 0 auto;" alt="Handwriting Concept"></img>
 
 *   A **drawing** consists of multiple ink strokes (e.g. the above letter E consists of three ink strokes).
 *   An **ink stroke** represents one continuous pen-tip movement that happens across some time period (e.g. from one `touchstart` to its corresponding `touchend` event). The movement trajectory is represented by a series of ink points.
 *   An **ink point** is an observation of the pen-tip in space and time. It records the timestamp and position of the pen-tip on the writing surface (e.g. a `touchmove` event).
 
-The job of a handwriting recognizer is to determine the text written in a drawing. 
+We want the handwriting API to enable web developers to fully utilize the capabilities available in common handwriting recognition libraries. The recognizer will need to:
+* Accept a vector representation of a drawing (described in the above picture).
+* Recognize texts as users are writing, in real-time (each recognition costs less than hundreds of milliseconds).
+* Not rely on the Internet (a note taking website should still work in flight mode). Though the recognizer can make use of cloud services if available.
+* Return the text that’s most likely written as a string.
+* Allow web developers to control or fine-tune the recognizer. For example, allow developers to specify the language (an English recognizer won’t recognize Chinese characters).
+* Offer an extensible way to add support for new features, in order to utilize the latest features available on the underlying libraries.
+* Provide a way for developers to query feature support, so developers can decide if the recognizer should be used in their app.
+
+To satisfy common use cases, the recognizer also need to:
+Return a list of alternatives (candidates) of the text.
+* Rank alternatives based on likelihood of being correct.
+* Return segmentation result for each character (or words). So clients can know which strokes (and points) make up a character. One use case is in note taking apps, users select recognized texts, and delete all strokes.
+
+Non-goals:
+* Design an API to recognize texts in static images. That is optical character recognition, and is better aligned with Shape Detection API.
+* Deliver consistent output across all platforms. This is very difficult, unless we implement a common (publicly owned) algorithm. Therefore, we allow the same drawing to yield different outputs. But we want to achieve a common output structure (e.g. what attribute A means, which attributes must be present).
+
 
 
 ## Existing APIs
@@ -299,6 +303,42 @@ A **prediction result** is a JavaScript object. It _must_ contain the text attri
 The prediction result _may_ contain (if the implementation choose to support):
 
 *   `alternatives`: A list of JavaScript objects, where each object has a text field. These are the next best predictions (alternatives), in decreasing confidence order. Up to a maximum of `alternatives `(given in hints) strings. For example, the first string is the second best prediction (the best being the prediction result).
+* `segmentationResult`: [TODO] Come up with a way to represent text segmentation.
+
+### Segmentation result
+[TODO] Come up with a way to represent grapheme set segmentation.
+
+[TODO] per character (grapheme set) segmentation vs. word (phrase) segmentation. Or make this configurable?
+
+<img src="images/segmentation-concept.svg" style="max-width: 40%; min-width: 300px; display: block; margin: 0 auto;" alt="Segmentation Concept"></img>
+
+## Design Questions
+### Why not use Web Assembly?
+Web Assembly would not allow the use of more advanced proprietary handwriting libraries (e.g. those available on the operating system). Web developers also need to manage distribution (and update) of such libraries (might take several megabytes for each update).
+
+Web API can do the same task more efficiently (better models, zero distribution cost, faster computation). This topic was previously discussed in Shape Detection API and Text-to-Speech API.
+
+
+### Why not use Shape Detection?
+Handwriting (in this proposal) includes temporal information (how the shape is drawn, pixel by pixel). We believe this additional temporal information distinguishes handwriting recognition from shape detection.
+
+If we take out the temporal information, the task becomes optical character recognition (given a photo of written characters). This is a different task, and indeed fits within the scope of shape detection.
+
+### Grapheme clusters vs. Unicode code points
+Grapheme clusters is the minimal unit used in writing. It represents visual shape. On the other hand, Unicode code points are a computer's internal representation. It represents meaning. The two concepts aren’t fully correlated.
+
+Unicode combining marks are represented as a single code point. They are used to modify other characters, but not by themselves. This creates a problem when we need to distinguish between shape and meaning. For example, letter a (U+0061) and grave accent combining mark (U+0300) combines to à. Letter न (U+0928) and combining mark  ि (U+093F) combines to letter नि.
+
+Handwriting recognition concerns with shape (input) and meaning (output). It’s important to distinguish between those two. For example, when requesting to recognize only certain characters, grapheme clusters should be used.
+
+### Ranking vs Score
+It’s very common to use a score for assessing alternative texts. This is commonly implemented in machine learning algorithms. However, it is not a good idea for the Web. 
+
+We expect different browser vendors to offer varying recognizer implementations, this will inevitably lead to the score being incomparable.
+
+Because the score is an implementation detail of machine learning models, the meaning of score changes if the model changes. Therefore, scores are not comparable unless everyone uses the same model.
+
+Thus, we choose to use ranking instead of score. This gives some indication on which alternative is better. This avoids the scenario where web developers misunderstand the score’s implication and try to compare scores across different libraries, or filtering results based on it.
 
 
 ## Considerations
diff --git a/images/segmentation-concept.svg b/images/segmentation-concept.svg
@@ -0,0 +1 @@
+<svg version="1.1" viewBox="0.0 0.0 154.5275590551181 120.67191601049869" fill="none" stroke="none" stroke-linecap="square" stroke-miterlimit="10" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg"><clipPath id="p.0"><path d="m0 0l154.52756 0l0 120.67191l-154.52756 0l0 -120.67191z" clip-rule="nonzero"/></clipPath><g clip-path="url(#p.0)"><path fill="#000000" fill-opacity="0.0" d="m0 0l154.52756 0l0 120.67191l-154.52756 0z" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m40.48624 53.742287c-2.0451965 20.512383 5.0046005 41.844902 0 61.84227" fill-rule="evenodd"/><path stroke="#000000" stroke-width="3.0" stroke-linejoin="round" stroke-linecap="butt" d="m40.48624 53.742287c-2.0451965 20.512383 5.0046005 41.844902 0 61.84227" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m73.52758 52.04592c2.550209 12.740326 0.84817505 25.976875 0.84817505 38.969925c0 6.78302 -3.8867416 26.395721 -0.84817505 20.331367c9.28363 -18.52826 0.614624 -61.01562 21.179543 -58.453114c17.874847 2.2273064 8.4711 35.358074 8.4711 53.37116" fill-rule="evenodd"/><path stroke="#000000" stroke-width="3.0" stroke-linejoin="round" stroke-linecap="butt" d="m73.52758 52.04592c2.550209 12.740326 0.84817505 25.976875 0.84817505 38.969925c0 6.78302 -3.8867416 26.395721 -0.84817505 20.331367c9.28363 -18.52826 0.614624 -61.01562 21.179543 -58.453114c17.874847 2.2273064 8.4711 35.358074 8.4711 53.37116" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m37.94692 36.795734c0.49578857 -0.74264526 3.28442 0.35359192 2.5409737 0.84817886" fill-rule="evenodd"/><path stroke="#000000" stroke-width="3.0" stroke-linejoin="round" stroke-linecap="butt" d="m37.94692 36.795734c0.49578857 -0.74264526 3.28442 0.35359192 2.5409737 0.84817886" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m20.88674 40.243027l44.913383 0l0 33.921265l-44.913383 0z" fill-rule="evenodd"/><path fill="#000000" d="m34.35549 60.763027l-1.046875 0l0 -6.71875q-0.390625 0.359375 -1.015625 0.734375q-0.6093769 0.359375 -1.0937519 0.53125l0 -1.015625q0.8750019 -0.421875 1.5312519 -1.0q0.671875 -0.59375 0.953125 -1.15625l0.671875 0l0 8.625z" fill-rule="nonzero"/><path fill="#000000" fill-opacity="0.0" d="m51.887383 39.535263l44.91339 0l0 33.921265l-44.91339 0z" fill-rule="evenodd"/><path fill="#000000" d="m66.93426 59.03964l0 1.015625l-5.6875 0q0 -0.375 0.125 -0.734375q0.21875 -0.578125 0.6875 -1.140625q0.484375 -0.5625 1.390625 -1.296875q1.40625 -1.15625 1.890625 -1.828125q0.5 -0.671875 0.5 -1.265625q0 -0.625 -0.453125 -1.046875q-0.453125 -0.4375 -1.171875 -0.4375q-0.765625 0 -1.21875 0.453125q-0.453125 0.453125 -0.46875 1.265625l-1.078125 -0.109375q0.109375 -1.21875 0.828125 -1.84375q0.734375 -0.640625 1.96875 -0.640625q1.234375 0 1.953125 0.6875q0.71875 0.6875 0.71875 1.703125q0 0.515625 -0.21875 1.015625q-0.203125 0.484375 -0.703125 1.046875q-0.484375 0.546875 -1.609375 1.5q-0.953125 0.796875 -1.234375 1.09375q-0.265625 0.28125 -0.4375 0.5625l4.21875 0z" fill-rule="nonzero"/><path fill="#000000" fill-opacity="0.0" d="m116.73606 61.36524c5.8210983 2.3274574 12.813911 -0.21351624 18.635002 -2.5409737" fill-rule="evenodd"/><path stroke="#000000" stroke-width="3.0" stroke-linejoin="round" stroke-linecap="butt" d="m116.73606 61.36524c5.8210983 2.3274574 12.813911 -0.21351624 18.635002 -2.5409737" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m127.74788 42.726585c0 24.241165 -14.077263 72.00973 10.163895 72.00973" fill-rule="evenodd"/><path stroke="#000000" stroke-width="3.0" stroke-linejoin="round" stroke-linecap="butt" d="m127.74788 42.726585c0 24.241165 -14.077263 72.00973 10.163895 72.00973" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m127.98248 37.54839l44.91339 0l0 33.92126l-44.91339 0z" fill-rule="evenodd"/><path fill="#000000" d="m137.48248 55.802765l1.0625 -0.140625q0.171875 0.890625 0.609375 1.296875q0.4375 0.390625 1.0625 0.390625q0.75 0 1.265625 -0.515625q0.515625 -0.515625 0.515625 -1.28125q0 -0.71875 -0.484375 -1.1875q-0.46875 -0.484375 -1.203125 -0.484375q-0.296875 0 -0.75 0.125l0.125 -0.9375q0.109375 0.015625 0.171875 0.015625q0.671875 0 1.203125 -0.34375q0.546875 -0.359375 0.546875 -1.09375q0 -0.578125 -0.390625 -0.953125q-0.390625 -0.390625 -1.015625 -0.390625q-0.609375 0 -1.03125 0.390625q-0.40625 0.390625 -0.515625 1.15625l-1.0625 -0.1875q0.1875 -1.046875 0.875 -1.625q0.6875 -0.59375 1.703125 -0.59375q0.703125 0 1.296875 0.3125q0.59375 0.296875 0.90625 0.8125q0.3125 0.515625 0.3125 1.109375q0 0.546875 -0.296875 1.015625q-0.296875 0.453125 -0.875 0.71875q0.75 0.171875 1.171875 0.734375q0.421875 0.546875 0.421875 1.390625q0 1.125 -0.828125 1.90625q-0.8125 0.78125 -2.0625 0.78125q-1.125 0 -1.875 -0.671875q-0.75 -0.6875 -0.859375 -1.75z" fill-rule="nonzero"/><path fill="#000000" fill-opacity="0.0" d="m127.98248 91.46866l44.91339 0l0 33.921257l-44.91339 0z" fill-rule="evenodd"/><path fill="#000000" d="m140.85748 111.988655l0 -2.0625l-3.71875 0l0 -0.96875l3.921875 -5.5625l0.859375 0l0 5.5625l1.15625 0l0 0.96875l-1.15625 0l0 2.0625l-1.0625 0zm0 -3.03125l0 -3.859375l-2.6875 3.859375l2.6875 0z" fill-rule="nonzero"/><path fill="#000000" fill-opacity="0.0" d="m20.062077 10.2673435l44.913387 0l0 33.921257l-44.913387 0z" fill-rule="evenodd"/><path fill="#000000" d="m29.562077 28.537344l1.109375 -0.09375q0.125 0.8125 0.5625 1.21875q0.453125 0.40625 1.09375 0.40625q0.75 0 1.28125 -0.578125q0.53125 -0.578125 0.53125 -1.515625q0 -0.90625 -0.515625 -1.421875q-0.5 -0.53125 -1.328125 -0.53125q-0.5 0 -0.921875 0.234375q-0.40625 0.234375 -0.640625 0.59375l-0.984375 -0.125l0.828125 -4.40625l4.28125 0l0 1.0l-3.4375 0l-0.453125 2.3125q0.765625 -0.546875 1.609375 -0.546875q1.125 0 1.890625 0.78125q0.78125 0.78125 0.78125 2.015625q0 1.15625 -0.671875 2.015625q-0.828125 1.03125 -2.25 1.03125q-1.171875 0 -1.921875 -0.65625q-0.734375 -0.65625 -0.84375 -1.734375z" fill-rule="nonzero"/></g></svg>