Skip to content

Commit

Permalink
link to cqt.html from README
Browse files Browse the repository at this point in the history
Signed-off-by: Daniel Hardman <daniel.hardman@gmail.com>
  • Loading branch information
dhh1128 committed Sep 6, 2024
1 parent ac838d2 commit f356b63
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# CQT
This is a spec for a simple but powerful algorithm for canonicalizing chunks of text that flow not via files but via chat, copy/paste, or other non-file-oriented channels (social media, SMS, email, etc.). Note the reference implementation in python [cqt.py](cqt.py), and ports in javascript [cqt.js](cqt.js), java [Cqt.java](Cqt.java), go [cqt.go](cqt.go), and rust [cqt.rs](cqt.rs). Note also the unit tests in [test_cqt.py](test_cqt.py).
This is a spec for a simple but powerful algorithm for canonicalizing chunks of text that flow not via files but via chat, copy/paste, or other non-file-oriented channels (social media, SMS, email, etc.).

An interactive form that lets you run the algorithm on arbitrary text is located [here](cqt.html).

Reference implementations in python [cqt.py](cqt.py), and ports in javascript [cqt.js](cqt.js), java [Cqt.java](Cqt.java), go [cqt.go](cqt.go), and rust [cqt.rs](cqt.rs). Note also the unit tests in [test_cqt.py](test_cqt.py).

### Purpose
Cryptographic hashes and signatures are usually applied to files or data structures. However, a very important category of communication is not file-oriented. In our modern world, lots of text moves across system boundaries using mechanisms that are prone to reformatting and error due to their inherent fuzziness. We see a post on social media on our phones, copy it, and paste it into a text to a friend. She emails it to a journalist acquaintance, who moves it into a word processor that is configured to use a different locale with different autocorrect and punctuation settings. Eventually, a student cites the journalist in a paper they're writing. Somewhere along the way, whitespace is deleted, capitalization or spelling is altered, the codepage changes, smart quotes turn into dumb quotes or two hyphens become an em dash.
Expand Down

0 comments on commit f356b63

Please sign in to comment.