-
Notifications
You must be signed in to change notification settings - Fork 8
NOTE: Terminology for Layers
This Wiki page is for collecting text that is following on with the discussion on "Terminology for Layers" at the Workshop on IoT Semantic/Hypermedia Interoperability in Prague, 2017-07. It can draw from the notes collected at that workshop.
(Note that the term "layers" is itself not necessarily the right term for the modular structure that we are looking at.)
Names: Michael Koster, Carsten Bormann, Michael McCool, (Milan is watching)
In the IOTSI and WISHI workshops, it has turned out to be useful to think about data interoperability in at least three layers:
- Serialization, (Syntactic Interoperability?)
- Structural Interoperability
- Semantic Interoperability
The latter two seem to correspond in some ways to RFC 3444's data modeling and information modeling layers (since serialization is handled automatically by today's data modeling languages, it is not addressed by RFC 3444). Earlier data interchange standards such as EDIFACT usually intertwined serialization with structural interoperability; the separation is more clearly realized in more recent standards. (Calling the serialization level interoperability "syntactic interoperability" is somewhat unfortunate as syntax is structure and serialization really is more akin to the lexical level of language parsing.)
HIMSS defines "three levels of health information technology interoperability:
- Foundational; 2) Structural; and 3) Semantic.". What is termed "Foundational Interoperability" here is about being able to exchange data at all and is usually taken for granted in the contexts discussed here. "Structural Interoperability" seems to include Serialization and is vaguely defined as "it ensures that data exchanges between information technology systems can be interpreted at the data field level".
“Semantic interoperability" then "provides interoperability at the highest level, which is the ability of two or more systems or elements to exchange information and to use the information that has been exchanged"." The definition adds "Semantic interoperability takes advantage of both the structuring of the data exchange" (apparently, the structural interoperability) "and the codification of the data including vocabulary", which seems to be the aspect that is specific to semantic interoperability.
A large number of environments (platforms, ecosystems?) have evolved that provide some generic infrastructure to support a variety of specific models and instances for interchange. Examples well-used in the IETF are:
Common name | Data Modeling | Serialization |
---|---|---|
ASN.1 | ASN.1 | BER (DER), PER, OER, ... |
XML | DTD, XSD, Relax-NG | (Character) XML, EXI |
JSON | (various) | JSON, CBOR |
CBOR | e.g., CDDL | CBOR |
These environments often also have evolved specifications for secure interchange (e.g., CMS, XMLDsig, JOSE, COSE, respectively, for the above); these generally are not well-supported by the modeling mechanisms.
Note that many of these environments provide more than one serialization. ASN.1 is somewhat unusual as the data modeling constructs it provides specifically are intended to support some aspects of its original serialization (encoding tags in BER); newer languages mostly do not do this any more (but note, e.g., Google protobuf, which again exposes encoding tags in the modeling language).
Some of the modeling languages have counterparts for modeling interactions as RPCs, e.g., XDR (RFC 4506, not listed above) has RFC 5531 RPC, XML is often used with XML-RPC or SOAP, etc. For JSON, these now take the form of "API languages" (e.g., RAML or OpenAPI). YANG integrates an RPC mechanism in its datastore modeling language.
Insert discussion from https://cbor-wg.github.io/CBORbis/#rfc.section.2 and generalize.
In general, data modeling languages provide a limited universe of data models that can be specified, even if that cannot easily be described as a "generic data model". Hence the need for information modeling, which may be able to describe elements of the model in more abstract, application specific terms, without needing to prematurely "represent" these terms in the limited set of mechanisms available at the data model universe level.
Data modeling in open communication systems often is implicitly about data to be transferred between systems.
However, data modeling also may be about "data at rest", i.e. a data store offered by one system. Such a data store may be only retrievable/updatable as a whole, in which case the modeling of data transfers is sufficient. However, some application provide a finer grained access protocol (or "API") to these data, modeling then is not just (or not really at all) about the data itself but on the interactions that can be performed on the data (usually in a client-server relationship, sometimes augmented by notifications).
Modeling languages such as SMIv2 and YANG describe data at rest.
The description of these data gives rise to a somewhat implicit
interaction model; that interaction model may be supplementally controlled by
additional information in a data model (such as config yes
in YANG)
that is not meaning for a data transfer modeling mechanism.
"Remote procedure calls" can be used to provide additional
interactions not already implied by the data modeling; recent work
also starts addressing additional notifications.
(YANG has recently started work on addressing data transfer oriented data modeling as well. TBD: write more about NMDA)
Database modeling languages (data definition languages, "schema languages") also describe data at rest. These data can then be accessed or modified using a data manipulation language, which recurs on the items (relations [tables], columns, their data types, and relationships between tables) established by the schema language. SQL continues to be the dominant standard for both data definition and data manipulation in this space, "No-SQL" database designs notwithstanding.
SQL also adds a "data control language", which, based on a predefined common data model, can be used to set up authorization, see "Authorization modeling" below.
In today's data modeling environments, the modeling of authorization is usually addressed by special models, syntax, and semantics.
TBD: Explain the SNMP models as well as NACM; contrast to SQL DCL.
REST views a server as a collection of resources, which are not modeled a priori — their structure is deduced at runtime using hypermedia as the engine of application state. (Obviously, "API languages" such as OpenAPI do want to provide some structure here.)
The representations exchanged to retrieve and operate on resources are modeled using media types. REST environments often have a plethora or these, based on quite different forms of modeling. Again, "API languages" provide means to specify (or imply from the data modeling) some structure of these representations.