Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uuid: Add Universally Unique Identifier (UUID) utilities #77884

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sorru94
Copy link
Contributor

@sorru94 sorru94 commented Sep 2, 2024

Add a basic implementation of UUID generating and manipulating functions.

Fixes: #77882

@zephyrbot zephyrbot added the area: Base OS Base OS Library (lib/os) label Sep 2, 2024
@jukkar
Copy link
Member

jukkar commented Sep 2, 2024

There are lot of uuid handling code in Bluetooth, should BT use this library?

@jukkar jukkar requested a review from jhedberg September 2, 2024 13:36
@sorru94
Copy link
Contributor Author

sorru94 commented Sep 2, 2024

There are lot of uuid handling code in Bluetooth, should BT use this library?

I'm unfamiliar regarding the Bluetooth use of the UUID. I see it uses 16 and 32 bits partial UUIDs, which are not defined in RFC 9562.
We could add a couple of defines similar to the ones in the bluetooth driver (BT_UUID_INIT_128 for example) to ease the transition.

@carlescufi carlescufi requested a review from Thalley September 2, 2024 15:11
@jhedberg
Copy link
Member

jhedberg commented Sep 2, 2024

There are lot of uuid handling code in Bluetooth, should BT use this library?

I'm unfamiliar regarding the Bluetooth use of the UUID. I see it uses 16 and 32 bits partial UUIDs, which are not defined in RFC 9562. We could add a couple of defines similar to the ones in the bluetooth driver (BT_UUID_INIT_128 for example) to ease the transition.

AFAIK at least the 16- and 32-bit Bluetooth UUIDs don't follow any non-Bluetooth standard. The way the Bluetooth core specification deals with these, is that in practice everything is a 128-bit UUID, however UUIDs which are derived from the so-called Bluetooth Base UUID can be compressed into 16- and 32-bit formats (and these again expanded back into 128 bit format). I have a feeling we'll need to keep the Bluetooth UUID API separate, but some proper research into their compatibility with these RFC specs needs to be done. We might at most e.g. provide conversion functions to convert back and forth between bt_uuid_t and uuid_t.

@jukkar
Copy link
Member

jukkar commented Sep 3, 2024

@sorru94 could you check if there would be any non-BT code that could be changed to use this library?
At least in dhcpv6

struct dhcpv6_duid_uuid {
there is uuid handling.

Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there's an endianness collision between the existing code and the new library? UUIDs basically made a huge mess of this with that text format, and almost nothing does it "correctly" (if there even is a "correct"). But 100% for sure we should be doing it consistently, and I don't think we are?

uuid_t third_uuid_v4;
uuid_t first_uuid_v5;

uint8_t expected_first_uuid_v4_byte_array[16u] = {0x44, 0xb3, 0x5f, 0x73, 0xcf, 0xbd,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More drive-by: this is clearly byte-packing a big endian UUID. The first element is a 32 bit dword and should be stored "73, 5f, b3, 44" on little endian systems.

@sorru94
Copy link
Contributor Author

sorru94 commented Sep 6, 2024

Seems like there's an endianness collision between the existing code and the new library? UUIDs basically made a huge mess of this with that text format, and almost nothing does it "correctly" (if there even is a "correct"). But 100% for sure we should be doing it consistently, and I don't think we are?

I'm unsure about the endianness for the UUID used/expected by the BT module (or other modules).
However, I tried to follow as much as possible RFC 9562 for this library where they state:

In the absence of explicit application or presentation protocol specification to the contrary, each field is encoded with the most significant byte first (known as "network byte order").

And:

Saving UUIDs to binary format is done by sequencing all fields in big-endian format. However, there is a known caveat that Microsoft's Component Object Model (COM) GUIDs leverage little-endian when saving GUIDs. The discussion of this (see [MS_COM_GUID]) is outside the scope of this specification.

I'm guessing the UUID in the Bluetooth module uses the COM GUIDs format where the first 4 bytes are represented using little endian while the last 6 are represented using big-endian.

I don't know if including a non-standard representation of UUIDs would be a good choice for this library. We could always introduce helper functions for conversion to the "COM GUIDs" format. Here or directly in the BT module.

What do you think? @andyross

@sorru94 sorru94 requested a review from andyross September 6, 2024 07:57
@sorru94 sorru94 requested a review from Thalley September 6, 2024 09:12
@andyross
Copy link
Collaborator

andyross commented Sep 6, 2024

However, I tried to follow as much as possible RFC 9562 for this library

Yeah... the problem is that "UUIDs" in practical code don't really and haven't ever followed that RFC, which is a post-hoc kind of cleanup thing. Windows GUIDs are the obvious example of something that is always converted LE, and the code @Thalley posted above certainly implies the BT inherited the convention.

I'm don't have enough of a stake here to demand one or the other, I'm just saying (1) whatever code we commit needs to be extremely clear about the endianness convention and (2) it should match the conventions used by any existing in-tree users (or else support both, I guess, but... yuck)

@sorru94 sorru94 requested a review from Thalley September 9, 2024 12:35
@sorru94 sorru94 force-pushed the add-uuid-utilities branch 3 times, most recently from 64065cc to 0ab6537 Compare September 9, 2024 12:54
Copy link
Collaborator

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks OK to me. Unsure how and if we can use this in BT due to endianess, so I suggest to make it experimental so that we can modify things until we figure that part out.

Comment on lines 93 to 96
* An UUID in the standard big-endian RFC9562 representation `00112233-4455-6677-8899-AABBCCDDEEFF`
* has the equivalent little-endian COM GUID representation `33221100-5544-7766-8899-AABBCCDDEEFF`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? The last 2 values, 8899 and AABBCCDDEEFF, don't need to be swapped?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8899 should be 9988 while the last one is correct. See the other related comment for more info.

What other related comment? Can you share the link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we follow the COM GUID representation the last two should not be swapped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I deleted the comment as It was my mistake. I replied on the other comment and will push the changes shortly.

lib/uuid/uuid.c Outdated
Comment on lines 177 to 171
memcpy_swap(&out[first_part_start], &data[first_part_start], first_part_size);
memcpy_swap(&out[second_part_start], &data[second_part_start], second_part_size);
memcpy_swap(&out[third_part_start], &data[third_part_start], third_part_size);
memcpy_swap(&out[fourth_part_start], &data[fourth_part_start], fourth_part_size);
memcpy(&out[fifth_part_start], &data[fifth_part_start], fifth_part_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the documentation for this function, only the first 3 parts are swapped, but this swaps the first 4. Why?

And why isn't the 5th swapped? (not familiar with the UUID spec)

Copy link
Contributor Author

@sorru94 sorru94 Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with the endianness is that the UUID spec defines UUID only as big-endian.
The interpretation of little-endian UUID is not universal.
From my understanding here are our options for the UUID:

  • Standard big-endian representation: 00112233-4455-6677-8899-AABBCCDDEEFF
  • little-endian representation inverting each block: 33221100-5544-7766-9988-FFEEDDCCBBAA
  • little-endian representation inverting the full UUID: FFEEDDCC-BBAA-9988-7766-554433221100
  • little-endian representation for Microsoft COM GUIs representation: 33221100-5544-7766-8899-AABBCCDDEEFF

I'm not sure which of those little-endian notations is used in the Bluetooth module.
However, from my very superficial research, I think the last one is the more widely adopted. It's also the only one referenced in the standard.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to read up on the UUID spec. Endianess is always a bit tricky, but it usually comes down the the language and terminology used in the specs and whether the blocks are values or arrays of values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blocks change based on the UUID version. For example, v5 is defined as follows:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           sha1_high                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         sha1_high             |  ver  |      sha1_mid         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                       sha1_low                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           sha1_low                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Where sha1_* are all components of the same SHA-1 hash.
While UUID v1 is defined as follows:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           time_low                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           time_mid            |  ver  |       time_high       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|         clock_seq         |             node              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                              node                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

As a consequence, it might be difficult to find a generic conversion method.

@sorru94 sorru94 requested a review from Thalley September 11, 2024 10:46
@Thalley
Copy link
Collaborator

Thalley commented Sep 12, 2024

So after the RFC and looking more into this, I've noticed the following:

  1. 4122, which is obsoleted by 9562, does seemingly not define any specific endianess
  2. 9562 defines UUIDs to be stored as big-endian
  3. MS documentation seemingly only refers to 4122
  4. Haven't found any references to 4122 or 9562 from the BT specs
  5. If we want to be 9562 compliant and treat all UUIDs a big-endian, then it's pretty each to support
  6. If we want to support converting between little and big endian, then it's more tricky. Each version of UUID (version 1 to 8) have slightly different structures, and ideally each value in each version should be converted. As far as I can tell 9562 does not define how to represent those as strings, because it is big-endian.
  7. MS converts the first 3 values to little endian (https://devblogs.microsoft.com/oldnewthing/20220928-00/?p=107221) and then treats the 6-byte MAC address as a 6-octet array instead of a 6-octet value. MS seemingly also treats time_high and ver as a single value, instead of 12 and 4 bit values respectively. MS seemingly uses UUID version 1 (https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-version-1) for their GUIDs (UUIDs). The way that MS treats UUIDs is a complete mess 🙃

To continue with this PR, I think we need to ask what it is we want to achieve. Do we want to library that implements 9562? Do we want a library that is compatible with MS COM? Do we want a library that is compatible with Bluetooth?

The 9562 RFC has this nice little paragraph in the beginning that pretty much sums up the mess:

The use of UUIDs is extremely pervasive in computing. They comprise the core identifier infrastructure for many operating systems such as Microsoft Windows and applications such as the Mozilla Web browser; in many cases, they can become exposed in many non-standard ways.

The purpose of 9562 is to take RFC 4122 and properly standardize it.

@sorru94
Copy link
Contributor Author

sorru94 commented Oct 3, 2024

@Thalley and @jhedberg. As for your suggestion, I replaced the type definition with a struct. Let me know if it was what you intended.
@jukkar I tried to use this library in the ext2 library, just to have one that uses it.

@Thalley
Copy link
Collaborator

Thalley commented Oct 3, 2024

@sorru94 Looks like CI tests are failing, please have a look

@sorru94 sorru94 force-pushed the add-uuid-utilities branch 3 times, most recently from 6e5b0fe to 668ca04 Compare October 4, 2024 10:05
@sorru94 sorru94 requested a review from Thalley October 4, 2024 12:00
@@ -27,7 +29,7 @@ struct ext2_cfg {
uint32_t block_size;
uint32_t fs_size; /* Number of blocks that we want to take. */
uint32_t bytes_per_inode;
uint8_t uuid[16];
struct uuid uuid;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API change, isn't it?

If it is a stable API, then it needs to follow https://docs.zephyrproject.org/latest/develop/api/api_lifecycle.html#introducing-breaking-api-changes

An alternative which is a bit dirty would be to do

Suggested change
struct uuid uuid;
union {
uint8_t uuid[16];
struct uuid uuid;
};

So that it's not breaking. I'll leave this up to the maintainer(s) of the fs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true, I'm not sure how much the uuid field is used in the current version of the ext2 module. It would be nice if the maintainers of the fs could help.
Another option for this change to be non-breaking would be to keep the typedef uint8_t uuid_t[UUID_SIZE]; in uuid.h instead of the struct version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sorru94 I still think this needs to be considered an API change, which we probably should avoid or follow the procedure for breaking a stable API

* @retval 0 The UUID has been correctly copied in @p dst
* @retval -EINVAL @p dst is not acceptable
*/
int uuid_copy(struct uuid *dst, struct uuid src);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've noticed now is that you more often than not pass by value (e.g. struct uuid src) rather than by reference (struct uuid *dst).

Is that something you have considered? Since the struct is 16 octets, that's quite a lot of data being passed around compared to a pointer (typically 4 octets).

Should the API primarily use pass by reference instead of value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually pass by reference only when changes to the parameter are needed, i.e. when it's used as an output.
Do you know if copying 16 Bytes on each function call is an issue? Are there guidelines for this in the Zephyr project?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if copying 16 Bytes on each function call is an issue? Are there guidelines for this in the Zephyr project?

I'm not aware of any guidelines, but pass by value would increase the stack usage, right?

Personally I (almost) always pass by reference for structs (unless they are smaller than the size of a pointer).

Copy link
Member

@jhedberg jhedberg Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sorru94 to make the distinction of "input" vs "output" parameter clear, a good practice is to use const struct uuid *uuid for input, which makes it clear that the function will not modify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I updated the affected functions.

Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Dec 14, 2024
@github-actions github-actions bot closed this Dec 29, 2024
@pdgendt
Copy link
Collaborator

pdgendt commented Feb 25, 2025

Sad to see this closed, @sorru94 do you think you can come back to this?

Add UUID generation and parsing utilities compliant
with RFC9562.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
Add some tests for the UUID utilities.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
Include the UUID utilities in the Miscellaneous
documentation page.
The UUID is being placed in a new Identifier
APIs section.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
Use the sys/uuid.h library to generate and hold the
ext2 UUID.
This commit also includes some formatting for the
ext2 library.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
@sorru94
Copy link
Contributor Author

sorru94 commented Feb 27, 2025

Sad to see this closed, @sorru94 do you think you can come back to this?

Sure, if someone reopens this I will address the comments.

@pdgendt
Copy link
Collaborator

pdgendt commented Feb 27, 2025

Sure, if someone reopens this I will address the comments.

I'm not able to reopen it, maybe try to rebase and force push first? Other than that, I don't know.

@kartben
Copy link
Collaborator

kartben commented Feb 27, 2025

@sorru94 if you reset the sorru94:add-uuid-utilities branch to commit a93cdb1 (SHA this PR was at when closed) we shall be able to re-open the PR

@sorru94
Copy link
Contributor Author

sorru94 commented Feb 27, 2025

@sorru94 if you reset the sorru94:add-uuid-utilities branch to commit a93cdb1 (SHA this PR was at when closed) we shall be able to re-open the PR

I did reset to a93cdb1. However, are you sure you didn't mean 18400cd? I think that was the last commit in this PR.

@pdgendt
Copy link
Collaborator

pdgendt commented Feb 27, 2025

I did reset to a93cdb1. However, are you sure you didn't mean 18400cd? I think that was the last commit in this PR.

It can't be reopened now, so probably, yes.

@pdgendt pdgendt reopened this Feb 27, 2025
@pdgendt pdgendt requested a review from Thalley February 27, 2025 12:05
@pdgendt pdgendt added this to the v4.2.0 milestone Feb 27, 2025
@github-actions github-actions bot removed the Stale label Feb 28, 2025
@@ -27,7 +29,7 @@ struct ext2_cfg {
uint32_t block_size;
uint32_t fs_size; /* Number of blocks that we want to take. */
uint32_t bytes_per_inode;
uint8_t uuid[16];
struct uuid uuid;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sorru94 I still think this needs to be considered an API change, which we probably should avoid or follow the procedure for breaking a stable API

@sorru94
Copy link
Contributor Author

sorru94 commented Mar 10, 2025

@Thalley
Regarding this suggestion. I don't see how it would work. There is no apparent way to distinguish which of the two options is used. Accessing an array differs from accessing a struct.

It would have worked with my initial proposal of defining the uuid as a typedefinition of an array. But I have been pushed in the direction of using a struct. This makes it much harder to implement non-breaking changes to existing APIs.

I honestly do not know how to implement such a change in a non-breaking manner. Other suggestions are welcome.

@Thalley
Copy link
Collaborator

Thalley commented Mar 10, 2025

@Thalley Regarding this suggestion. I don't see how it would work. There is no apparent way to distinguish which of the two options is used. Accessing an array differs from accessing a struct.

It would have worked with my initial proposal of defining the uuid as a typedefinition of an array. But I have been pushed in the direction of using a struct. This makes it much harder to implement non-breaking changes to existing APIs.

I honestly do not know how to implement such a change in a non-breaking manner. Other suggestions are welcome.

IIRC the reason why you changed that, was to have a use of this new API, which makes sense. I wonder if there's a alternative thing we can modify to use the new API that does not require a change to a public API. Maybe a test or a sample? Alternatively, you could add a new sample that exercises some of the new functions you've added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Base OS Base OS Library (lib/os) area: File System
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Addition of Universally Unique Identifier (UUID) utilities
9 participants