Skip to content

Commit e78fdb4

Browse files
committed
sync doc 2020-10-09
1 parent 907b827 commit e78fdb4

File tree

3 files changed

+61
-18
lines changed

3 files changed

+61
-18
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.DS_Store
2+

explainer.md

+58-18
Original file line numberDiff line numberDiff line change
@@ -5,37 +5,41 @@ Authors: Jiewei Qian <qjw@google.com>, Matt Giuca <mgiuca@chromium.org>, Jon Nap
55

66
## Overview
77

8-
Handwriting is a widely used input method, one key usage is to recognize the texts when users are drawing. This feature already exists on many operating systems (e.g. handwriting input methods). However, the web platform as of today can't tap into this capability. Developers need to integrate with third-party libraries (or cloud services), or to develop native apps.
8+
Handwriting is a widely used input method, one key usage is to recognize the texts when users are drawing. This feature already exists on many operating systems (e.g. handwriting input methods). However, the web platform as of today doesn’t have this capability, the developers need to integrate with third-party libraries (or cloud services), or to develop native apps.
99

10-
We want to add handwriting recognition capability to the web platform, so developers can use the it, which is readily available on the operating system.
10+
We want to add handwriting recognition capability to the web platform, so developers can use the existing handwriting recognition features available on the operating system.
1111

12-
This document describes our proposal for a Web Platform API for performing on-line handwriting recognition from recorded real-time user inputs (e.g. touch, stylus). This API does not aim to support recognizing handwritings in images (off-line recognition).
13-
14-
The term “on-line” means the API recognizes the text as the users are drawing them. Specifically, the handwriting contains temporal information (e.g. the pen-tip is at position A at time T).
15-
16-
The recognizer will be able to function without the Internet, but might make use of cloud services (if available) to improve the result.
17-
18-
This API allow Web applications to:
19-
20-
1. Collect some handwriting inputs (ink strokes)
21-
2. Request the user agent to recognize the texts
22-
3. Retrieve the result
23-
* Recognized texts as a JavaScript String
24-
* Optional: Alternative results
25-
* Optional: Extra information (e.g. character segmentation)
12+
This document describes our proposal for a Web Platform API for performing on-line handwriting recognition from recorded real-time user inputs (e.g. touch, stylus). The term “on-line” means the API recognizes the text as the users are drawing them.
2613

2714

2815
## Problem Description
2916

3017
Conceptually, handwriting inputs are drawings. A drawing captures the information required to recreate a pen-tip movement for text recognition purposes. Take the handwritten “WEB” for example:
3118

32-
![Handwriting Concept](images/handwriting-concept.svg)
19+
<img src="images/handwriting-concept.svg" style="width: 95% display: block; margin: 0 auto;" alt="Handwriting Concept"></img>
3320

3421
* A **drawing** consists of multiple ink strokes (e.g. the above letter E consists of three ink strokes).
3522
* An **ink stroke** represents one continuous pen-tip movement that happens across some time period (e.g. from one `touchstart` to its corresponding `touchend` event). The movement trajectory is represented by a series of ink points.
3623
* An **ink point** is an observation of the pen-tip in space and time. It records the timestamp and position of the pen-tip on the writing surface (e.g. a `touchmove` event).
3724

38-
The job of a handwriting recognizer is to determine the text written in a drawing.
25+
We want the handwriting API to enable web developers to fully utilize the capabilities available in common handwriting recognition libraries. The recognizer will need to:
26+
* Accept a vector representation of a drawing (described in the above picture).
27+
* Recognize texts as users are writing, in real-time (each recognition costs less than hundreds of milliseconds).
28+
* Not rely on the Internet (a note taking website should still work in flight mode). Though the recognizer can make use of cloud services if available.
29+
* Return the text that’s most likely written as a string.
30+
* Allow web developers to control or fine-tune the recognizer. For example, allow developers to specify the language (an English recognizer won’t recognize Chinese characters).
31+
* Offer an extensible way to add support for new features, in order to utilize the latest features available on the underlying libraries.
32+
* Provide a way for developers to query feature support, so developers can decide if the recognizer should be used in their app.
33+
34+
To satisfy common use cases, the recognizer also need to:
35+
Return a list of alternatives (candidates) of the text.
36+
* Rank alternatives based on likelihood of being correct.
37+
* Return segmentation result for each character (or words). So clients can know which strokes (and points) make up a character. One use case is in note taking apps, users select recognized texts, and delete all strokes.
38+
39+
Non-goals:
40+
* Design an API to recognize texts in static images. That is optical character recognition, and is better aligned with Shape Detection API.
41+
* Deliver consistent output across all platforms. This is very difficult, unless we implement a common (publicly owned) algorithm. Therefore, we allow the same drawing to yield different outputs. But we want to achieve a common output structure (e.g. what attribute A means, which attributes must be present).
42+
3943

4044

4145
## Existing APIs
@@ -299,6 +303,42 @@ A **prediction result** is a JavaScript object. It _must_ contain the text attri
299303
The prediction result _may_ contain (if the implementation choose to support):
300304

301305
* `alternatives`: A list of JavaScript objects, where each object has a text field. These are the next best predictions (alternatives), in decreasing confidence order. Up to a maximum of `alternatives `(given in hints) strings. For example, the first string is the second best prediction (the best being the prediction result).
306+
* `segmentationResult`: [TODO] Come up with a way to represent text segmentation.
307+
308+
### Segmentation result
309+
[TODO] Come up with a way to represent grapheme set segmentation.
310+
311+
[TODO] per character (grapheme set) segmentation vs. word (phrase) segmentation. Or make this configurable?
312+
313+
<img src="images/segmentation-concept.svg" style="max-width: 40%; min-width: 300px; display: block; margin: 0 auto;" alt="Segmentation Concept"></img>
314+
315+
## Design Questions
316+
### Why not use Web Assembly?
317+
Web Assembly would not allow the use of more advanced proprietary handwriting libraries (e.g. those available on the operating system). Web developers also need to manage distribution (and update) of such libraries (might take several megabytes for each update).
318+
319+
Web API can do the same task more efficiently (better models, zero distribution cost, faster computation). This topic was previously discussed in Shape Detection API and Text-to-Speech API.
320+
321+
322+
### Why not use Shape Detection?
323+
Handwriting (in this proposal) includes temporal information (how the shape is drawn, pixel by pixel). We believe this additional temporal information distinguishes handwriting recognition from shape detection.
324+
325+
If we take out the temporal information, the task becomes optical character recognition (given a photo of written characters). This is a different task, and indeed fits within the scope of shape detection.
326+
327+
### Grapheme clusters vs. Unicode code points
328+
Grapheme clusters is the minimal unit used in writing. It represents visual shape. On the other hand, Unicode code points are a computer's internal representation. It represents meaning. The two concepts aren’t fully correlated.
329+
330+
Unicode combining marks are represented as a single code point. They are used to modify other characters, but not by themselves. This creates a problem when we need to distinguish between shape and meaning. For example, letter a (U+0061) and grave accent combining mark (U+0300) combines to à. Letter न (U+0928) and combining mark ि (U+093F) combines to letter नि.
331+
332+
Handwriting recognition concerns with shape (input) and meaning (output). It’s important to distinguish between those two. For example, when requesting to recognize only certain characters, grapheme clusters should be used.
333+
334+
### Ranking vs Score
335+
It’s very common to use a score for assessing alternative texts. This is commonly implemented in machine learning algorithms. However, it is not a good idea for the Web.
336+
337+
We expect different browser vendors to offer varying recognizer implementations, this will inevitably lead to the score being incomparable.
338+
339+
Because the score is an implementation detail of machine learning models, the meaning of score changes if the model changes. Therefore, scores are not comparable unless everyone uses the same model.
340+
341+
Thus, we choose to use ranking instead of score. This gives some indication on which alternative is better. This avoids the scenario where web developers misunderstand the score’s implication and try to compare scores across different libraries, or filtering results based on it.
302342

303343

304344
## Considerations

images/segmentation-concept.svg

+1
Loading

0 commit comments

Comments
 (0)