Skip to content

mleku/realy-protocol

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

48 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

REALY Protocol

Documentation matrix chat zap mleku: ⚑️mleku@getalby.com

about

Inspired by the event bus architecture of nostr but redesigned to avoid the serious deficiencies of that protocol for both developers and users.

why REALY?

Since the introduction of the idea of a general "public square" style social network as seen with Facebook and Twitter, the whole world has been overcome by something of a plague of mind control brainwashing cults.

Worse than "Beatlemania" people are being lured into the control of various kinds of "influencers" and adopting in-group words and "challenges" that are more often harmful to the people than actually beneficial like an exercise challenge might be.

Nostr protocol is a super simple event bus architecture, blended with a post office protocol, and due to various reasons related to the recent buyout of Twitter by Elon Musk, who plainly wants to turn it into the Western version of Wechat, it has become plagued with bad subprotocol designs that negate the benefits of self sovereign identity (elliptic curve asymmetric cryptography) and a dominant form of client that is essentially a travesty of Twitter itself.

REALY is being designed with the lessons learned from Nostr and the last 30 years of experience of internet communications protocols to aim to resist this kind of Embrace/Extend/Extinguish protocol that has repeatedly been performed on everything from email, to RSS, to threaded forums and instant messaging, by starting with the distilled essence of how these protocols should work so as to not be so easily vulnerable to being coopted by what is essentially in all but name the same centralised event bus architecture of social networks like Facebook and Twitter.

Use Cases

The main purposes that REALY will target are:

  • synchronous instant messaging protocols with IRC style nickserv and chanserv permissions and persistence, built from the ground up to take advantage of the cryptographic identities, with an intuitive threaded structure that allows users to peruse a larger discussion without the problem of threads of discussion breaking the top level structure

  • structured document repositories primarily for text media, as a basis for collaborative documentation and literature collections, and software source code (breaking out of the filesystem tree structure to permit much more flexible ways of organising code)

  • persistent threaded discussion forums for longer form messages than the typical single sentence/paragraph of instant messaging

  • simple cross-relay data query protocol that enables minimising the data cost of traffic to clients

  • push style notification systems that can be programmed by the users' clients to respond to any kind of event broadcast to a relay

Architectural Philosophy

A key concept in the REALY architecture is that of relays being a heterogeneous group of data repositories and relaying systems that are built specific to purpose, such as

  • a chat relay, which does not store any messages but merely bounces messages around ot subscribers,

  • a document repository, which provides read access to data with full text search capability, that can ne specialised for a singular data format (eg markdown, eg mediawiki, eg code), a threaded, moderated forum, and others,

  • a directory relay which stores and distributes user metadata such as profiles, relay lists, follows, mutes, deletes and reports

  • an authentication relay, which can be sent messages to add or remove users from access whitelists and blacklists, that provides this state data to relays it is used by

A second key concept in REALY is the integration of Lightning Network payments - again mostly copying what is done with Nostr but enabling both pseudonymous micro-accounts and long term subscription styles of access payment, and the promotion of a notion of user-pays - where all data writing must be charged for, and most reading must be paid for.

Lightning is perfect for this because it can currently cope with enormous volumes of payments with mere seconds of delay for settlement and a granularity of denomination that lends itself to the very low cost of delivering a one-time service, or maintaining a micro-account.

Network Protocol

Following are several important specifications and rationales for the way the messages are encoded and handled.

Simple Plaintext Message Codec

Features are the equivalent of volume in construction and building architecture. They have an exponential time cost. Most API wire codecs make assumptions about data structures that do not hold for all applications, and it is simpler to make one that fits. Protobuf, for example, does not have any constraints for lengths of binary digits. This can be quite a problem for cryptographic data protocols, which then need to write extra validation code in addition to integrating the generated API message codec.

The existing nostr protocol uses JSON, which is awful, and space inefficient, and complex to parse due to its intolerance of terminal commas and annoying to work with because of its retarded, multi-standards of string escaping.

Thus instead of giving options for no reason, to developers, we are going to dictate that a plain text based protocol be used, in place of any other option. The performance difference is very minimal and a well designed plaintext message encoding is nearly as efficient as binary, and anyway, decent GZIP compression can also be applied to messages, flattening especially textual content.

Line structured documents are much more readily amenable to human reading and editing, and \n/;/: is more efficient than "," as an item separator. Data structures can be much more simply expressed in a similar way as how they are in programming languages.

It is one of the guiding principles of the Unix philosophy to keep data in plain text, human readable format wherever possible, forcing the interposition of a parser just for humans to read the data adds extra brittleness to a protocol.

REALY protocol format is extremely simple and should be trivial to parse in any programming language with basic string slicing operators.

Unpadded Base64 Encoding for Fixed Length Binary Fields

To save space and eliminate the need for ugly = padding characters, we invoke RFC 4648 section 3.2 for the case of using base64 URL encoding without padding because we know the data length. In this case, it is used for IDs and pubkeys (32 bytes payload each, 43 characters base64 raw URL encoded) and signatures (64 bytes payload, 86 characters base64 raw URL encoded) - the further benefit here is the exact same string can be used in HTTP GET parameters ?key=value&…​ context. The standard = padding would break this usage as well.

For those who can’t find a "raw" codec for base64, the 32 byte length has 1 = pad suffix and the 64 byte length has 2: == and this can be trimmed off and added back to conform to this requirement.

Due to the fact that potentially there can be hundreds if not thousands of these in event content and tag fields the benefit can be quite great, as well as the benefit of being able to use these codes also in URL parameter values - 43 bytes is not so much more than 32 binary bytes and because it is an even number base it is also cheaper to decode.

HTTP for Request/Response, Websockets for Push and Subscriptions

Only subscriptions require server push messaging pattern, thus all other queries in REALY can be done with simple HTTP POST requests.

A relay should respond to a subscribe request by upgrading from http to a websocket. The client should send this in the header also.

It is unnecessary messages and work to use websockets for queries that match the HTTP request/response pattern, and by only requiring sockets for APIs that actually need server initiated messaging, the complexity of the relay is greatly reduced.

There can be a separate subscription type also, where there is delivering the IDs only, or forwarding the whole event.

HTTP with upgrades to websockets, and in the future HTTP/3 (QUIC) will be possible, have a big advantage of being generic, having a built in protocol for metadata, and are universally supported.

Socket protocols have a higher overhead in processing, memory and bandwidth compared to simple request/response messages so it is more efficient to be able to support both models, as many times there is one or two subscriptions that might be opened, these can live on one socket per client, but the other requests are momentary so they have no state management cost. If the message type is this type, it makes no sense to do it over transports with a higher cost per byte and per user. A subscription is longer lasting, so it is ok that it takes a little longer to negotiate.

Relays

Modular Architecture

A key design principle employed in REALY is that of relay specialization.

Instead of making a relay a hybrid event store and router, in REALY a relay does only one thing. Thus there can be

  • a simple event repository that only understands queries to fetch a list of events by ID

  • a relay that only indexes and keeps a space/time limited cache of events to process filters

  • a relay that only keeps a full text search index and a query results cache

  • a relay that only accepts list change CRDT events such as follow, join/create/delete/leave group, block, delete, report and compiles these events into single lists that are accessible to another relay that can use these compiled lists to control access either via explicit lists or by matching filters

  • a relay that stores and fetches media, including being able to convert and cache such as image size and formats

  • …​and many others are possible

By constraining the protocol interoperability compliance down to small simple sub-protocols the ability for clients to maintain currency with other clients and with relays is greatly simplified, without gatekeepers.

The Continuum between Client and Server

It should be normalized that relays can include clients that query other specialist relays, especially for such things as caching results fetched from other relays.

Thus one relay can be queried for a filter index, and the list of Event Ids returned can then be fetched from another relay that specialises in storing events and returning them on request by lists of Event Ids, and still other relays could store media files and be able to convert them on demand.

Specifications

Replication Instead of Arbitration

Along with the use of human-readable type identifiers for documents and the almost completely human-composable event encoding, the specification of REALY is not dependent on any kind of authoritative gatekeeping organisation, but instead organisations can add these to their own specifications lists as they see fit, eliminating a key problem with the operation of the nostr protocol.

There need not be bureaucratic RFC style specifications, but instead use human-readable names and be less formally described, the formality improving as others adopt it and expand or refine it.

Keeping Specifications With Implementations

Thus also it is recommended that implementations of any or all REALY servers and clients should keep a copy of the specification documents found in other implementations and converge them to each other as required when their repositories update support to changes and new sub-protocols.

Client Message Authentication and Integrity

All queries and submissions must be authenticated in order to enable a REALY relay to allow access. The signing key does not have to be identifying, but it serves as a HMAC for the messages, as implementations can in fact expose parts of the path to plaintext and at least same-process possible interception.

Thus access control becomes simple, and privacy also equally simple if the relay is public access to read, the client should default to one-shot keys for each request.

Authentication Message Format

Authenticating messages, for simplicity, is a simple message suffix.

πŸ” 1. Authenticated Message Encoding
Message Description

<message payload>\n

all messages must be terminated with a newline

<request URL>\n

<unix timestamp in decimal ascii>\n

<public key of signer>\n

<signature>\n

For simplicity, the signature is on a separate line, just as it is in the event format, this avoids needing to have a separate codec, and for the same reason the timestamp and public key.

For reasons of security, a relay should not allow a time skew in the timestamp of more than 15 seconds.

The signature is upon the Blake 2b message hash of everything up to and including the newline preceding it, and only relates to the HTTP POST payload, not including the header.

Even subscription messages should be signed the same way, to avoid needing a secondary protocol. "open" relays that have no access control (which is retarded, but just to be complete) must still require this authentication message, but simply the client can use one-shot keys to sign with, as it also serves as a HMAC to validate the consistency of the request data, since it is based on the hash.

πŸ”₯
One shot keys for requests and publications is recommended especially for the case of users of Tor accessing relays, as this ensures traffic that emerges from the same exit or comes to the same hidden service looks the same. However, it should be also pointed out that a client is likely to query for one specific pubkey on a fairly regular basis which should be considered with respect to triggering the use of a new path in the tor connection (or other anonymizing protocol).

RESTful APIs

HTTP conveniently allows for the use of paths, and a list of key/values for parameters where necessary, to enable a query to stay entirely within the context of a HTTP GET request.

As such, most queries can be identified simply by the path they refer to, instead of the messaging needing to additionally conatin this context.

Capability Messages

Capabilities are an important concept for an open, extensible network protocol. It is also very important to narrow down the surface of each API in the protocol in order to make it more efficient to deploy.

One of the biggest mistakes in the design of nostr is precisely in the blurring of APIs and even message types together with ambiguous elements to their structure.

The COUNT and AUTH protocol method types have this property. Their structure is defined by an implicit data point - the sender of the message, which means parsing the message isn’t just identifying it but also reading context.

Capability must be provided at the /capability path of the relay’s web server path scheme.

Capability Response

The message that is sent back from a GET request at /capability should be as follows:

πŸ” 2. Capability Response

Message

Description

<protocol name>:<URL of protocol spec>;vX.X.X;

Protocol name and version, the protocol spec URL.

The protocol name must be identical to the message header used in the protocol.

The version number should be a tag on the commit at the URL that matches the version specified.

<flag>[=<value>];…​>

flag,…​ for relevant flags on the protocol, for example whitelisted, so for a filter this means "authenticate to read as whitelisted user". All messages must be authenticated, but without this flag any user can use this protocol on this relay. The last flag ends with a newline, not a semicolon.

By maintaining a very small, method-based definition of protocols, complex flags are not required, in many cases, unnecessary

\n

Each protocol spec ends with a newline.

πŸ—©
Because lists of event Ids are relatively small, there should be no need for a limit on a filter with at least one parameter, even if it may yield a > 500kb message this is trivial considering the client can keep this and use it for a long time without needing to do that query again. This is the reason for separating the filter and fulltext-search from the event retrieval syntax.

Protocol names should be defined in the same sense as a set of API calls - the details of how to write that exactly differs somewhat for different languages (and may involve checks not native to the language) but they should map to something along similar lines as a Go⌯ interface{}

The protocol name is a shortcut and convenience, but should make automatic decisions by clients regarding a capability set simple.

As per implementation, each capability should be part of a registered list of message types that will match the message sentinel that is also the protocol name, using a registry of available functions.

Events

Message Format

πŸ” 3. Event Encoding
Message Description

<type name>:

can be anything, hierarchic names like note/html note/md are possible, or type.subtype or whatever

<pubkey>;

encoded in URL-base64 with the padding single = elided

<unix second precision timestamp in decimal ascii>\n

this ends the first line of the event format

tags follow, they end with \ncontent:<length>; the end of tags and beginning of content

key:value;extra;…​\n

zero or more line separated, fields cannot contain a semicolon, end with newline instead of semicolon, key lowercase alphanumeric, first alpha, no whitespace or symbols, only key and following : are mandatory

content:<length>\n

literally this word on one line directly after the newline of the previous, the length value refers to after the newline and the end of it MUST be a newline and then the signature

<content>\n

any number of further line breaks, last line is signature, everything before signature line is part of the canonical hash

The canonical form is the above, creating the message hash that is generated with SHA256

sig:<BIP-340 secp256k1 schnorr signature encoded in unpadded URL-base64>\n

this field would have two padding chars ==, these should be elided before generating the encoding. The length is always 86 characters/bytes.

Example

note/adoc:6iiJMRHgRA4SZcc7Jg-k8kD81tJQYpM1saUykC5YCDs;1740226569
event:V6zWuopmz3D7pWZyqTZOZtIHlq8LrLAToWNZ9wBbnLo;root
event:4g6hb5mpNXupjigkdYU_vim9rnmUhR_mibfkpPs5d2A;root
event:jjBUzkXZkD9vwmHqwsCzQP07o-npo-4F-ciA0pWrJr8;root
pubkey:DLAJqN-E2n1OLP1gXDnMk2lgra6qYGTULuIJk4KriCk
pubkey:j5L8SIYV3yQPhHkp4vbFSTUh4kEbeL9SfZM8CGk5lMs
hashtag:Megan Boswell
hashtag:#AEWGrandSlam
hashtag:2024 BMW
hashtag:Censure
content:449
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
sig:OX5r0PdsC6p1dXf1Jr225O5bDupLGA-ZGKKxC59GOtqMPXfW9HZQPhURMe24WdrciWwJ0j7R7WWnuS32xFUyjA

Use in Data Storage

The encoding is already suitable for encoding to a database, it is optional to use a somewhat more compact binary encoding, especially if the database has good compression like ZST, which will flatten tables of these values quite effectively, with little overhead cost for lowered complexity.

Tags

Tags are simply a list of <key>:<field>[;<field>]…​\n and the terminator is the sentinel \ncontent:<length>\n

Common tags would include event and pubkey and hashtag - these are guidelines, the specifics of tag content and syntax is tied to the type, the first string at the top of the event, as described above.

Protocols

Every REALY protocol should be simple and precise, and use HTTP for request/response pattern and only use websocket upgrades for publish/subscribe pattern.

The list of protocols below can be expanded to add new categories. The design should be as general as possible for each to isolate the application features from the relay processing cleanly.

Publication Protocols

store, update and relay

store\n
<event>
update:<event id>\n
<event>
relay:\n
<event>

Submitting an event to be stored is the same as a result sent from an Event Id query except with the type of operation intended: store\n to store an event, replace:<Event Id>\n to replace an existing event and relay\n to not store but send to subscribers with open matching filters.

πŸ—©
Replace will not be accepted if the message type and pubkey are different to the original that is specified.

The use of specific different types of store requests eliminates the complexity of defining event types as replaceable, by making this intent explicit. A relay can also only allow one of these, such as a pure relay, which only accepts relay requests but neither store nor replace, or any combination of these. The available API calls should be listed in the capability response

An event is then acknowledged to be stored or rejected with a message ok:<true/false>;<Event Id>;<reason type>:human readable part where the reason type is one of a set of common types to indicate the reason for the false.

Events that are returned have the <subscription Id>:<Event Id>\n as the first line, and then the event in the format described above afterwards.

There is four basic types of queries in REALY, derived from the nostr design, but refined and separated into distinct, small API calls.

Query Protocols

events

A key concept in REALY protocol is minimising the footprint of each API call. Thus, a primary query type is the simple request for a list of events by their ID hash:

Request

πŸ” 4. events request
Message Description

events:\n

message header

<event ID one>\n

one or more event ID to be returned in the response

Unlike in event tags and content, the e: prefix is unnecessary. The previous two query types only have lists of events in return, and to fetch the event a client then must send an events request.

Normally clients will gather a potentially longer list of events and then send Event Id queries in segments according to the requirements of the user interface.

The results are returned as a series as follows, for each item returned:

Response

πŸ” 5. events response
Message Description

event:<Event Id>\n

each event is marked with his header, so \nevent: serves as a section marker

<event>\n

the full event text as described previously

filter

A filter has one or more of the fields listed below, and headed with filter:

Request

πŸ” 6. filter request
Message Description

filter:\n

message type header

types:<one>;<two>;…​\n

these should be the same as the ones that appear in events, and match on the prefix so subtypes, eg note/text and note/html will both match on note.

pubkeys:<one>;<two>;…​\n

list of pubkeys to only return results from

timestamp:<since>;<until\n

either can be empty but not both, omit line for this, both are inclusive

tags:\n

these end with a second newline

<key>:<value>[;…​]\n

only the value can be searched for, and must be semicolon separated for multiple

…​

several tags can be present, they will act as OR

\n

tags end with a second newline

The response message is simply a list of the matching events IDs, which are expected to be in reverse chronological order:

Response

πŸ” 7. filter response
Message Description

response:filter\n

message type header, all use response: for HTTP style request/response

<event id>\n

each event id is separated by a newline

…​

…​any number of events further.

subscribe

subscribe means to request to be sent events that match a filter, from the moment the request is received. Mixing queries and subscriptions is a bad idea because it makes it difficult to specify the expected behaviour from a relay, or client. Thus, a subset of the filter is used. The subscription ends when the client sends unsubscribe message.

πŸ” 8. subscribe request
Message Description

subscribe:<subscription id>\n

the ID is for the use of the client to distinguish between multiple subscriptions on one socket, there can be more than one.

types:<one>;<two>;…​\n

these should be the same as the ones that appear in events, and match on the prefix so subtypes, eg note/text and note/html will both match on note.

pubkeys:<one>;<two>;…​\n

list of pubkeys to only return results from

tags:\n

these end with a second newline

<key>:<value>[;…​]\n

only the value can be searched for, and must be semicolon separated for multiple matches

…​

several tags can be present, they will act as OR

\n

tags end with a second newline

πŸ—©
There is no timestamp field in a subscribe.

After a subscribe request the relay will send an acknowledgement:

πŸ” 9. subscribed response
Message Description

subscribed:<subscription id>\n

To close a subscription the client sends an unsubscribe:

πŸ” 10. unsubscribe request
Message Description

unsubscribe:<subscription id>\n

πŸ”₯
Direct messages, for example, are privileged and can only be sent in response to a query or subscription signed with one of the keys appearing in the message (author or recipient/s)

The subscribe query streams back results containing just the event ID hash, in the following message:

πŸ” 11. subscription response
Message Description

subscription:<subscription id>:<event id>\n

The client can then send an events query to actually fetch the data. This enables collecting a list and indicating the count without consuming the bandwidth for it until the view is opened.

fulltext

A fulltext query is just fulltext: followed by a series of space separated tokens if the event store has a full text index, terminated with a newline.

πŸ” 12. fulltext request
Message Description

fulltext:text to do full text search with\n

search terms are space separated, terminated by newline

The response message is like as the filter, the actual fetching of events is a separate operation.

πŸ” 13. fulltext response
Message Description

response:fulltext\n

each event is marked with his header, so \nevent: serves as a section marker

<event id>\n

event id that matches the search terms

…​

any number of events further, sorted by relevance.

About

relay events and like yeah protocol

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages