Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOSD-Reporter 2.1, subcomponents in the output JSON are not as specified #10082

Open
MNesche opened this issue Mar 24, 2025 · 8 comments
Open
Labels
reporter About the reporter tool

Comments

@MNesche
Copy link

MNesche commented Mar 24, 2025

Describe the bug

The spec for the AOSD-format 2.1 describes the subcomponents as following:

for every license identified within all files of the softwarecomponent shall a subcomponent be provided. - The first subcomponent in a component block should contain the main license of the component and must be named main. - All following subcomponents inside a component can be freely assigned.

Actually, the AOSD-reporter puts all licenses in the main subcomponent, which is wrong.
Subcomponent called "main" should be the declared license(s) only. Any additional licenses should be in an additional subcomponent.
The results of the license findings are (from v42.0.1 cause the webapp output of version 55.0.0 lacks the "detected excluded"):

Effective SPDX
Apache-2.0 AND EPL-2.0 AND EPL-2.0 AND GPL-2.0-only WITH Classpath-exception-2.0

Declared
EPL-2.0, GPL-2.0-with-classpath-exception

Declared (SPDX)
EPL-2.0 OR GPL-2.0-only WITH Classpath-exception-2.0

Detected
Apache-2.0, EPL-2.0, EPL-2.0 OR GPL-2.0-only WITH Classpath-exception-2.0, GPL-2.0-only WITH Classpath-exception-2.0

Detected Excluded
LicenseRef-scancode-efsl-1.0, LicenseRef-scancode-unknown-license-reference

To Reproduce

Steps to reproduce the behavior:

  1. Use a package with multiple license findings
  2. Export the results with the AOSD2.1-Reporter
  3. Export the results as WebApp for comparison, if needed
  4. See results

Expected behavior

This is the output, as it should be, according to the description of the spec (unnecessary fields have been removed):

{
    "componentName": "jakarta.ws.rs-api",
    "componentVersion": "3.0.0",
    "id": 22,
    "linking": "dynamic_linking",
    "modified": false,
    "scmUrl": "https://github.com/jakartaee/rest.git",
    "subcomponents": [
        {
            "licenseText": "Eclipse Public License - v 2.0\n--\nGNU GENERAL PUBLIC LICENSE",
            "licenseTextUrl": "",
            "selectedLicense": "EPL-2.0",
            "spdxId": "EPL-2.0 OR GPL-2.0-only WITH Classpath-exception-2.0",
            "subcomponentName": "main"
        },
        {
            "licenseText": "Apache License",
            "licenseTextUrl": "",
            "selectedLicense": "",
            "spdxId": "Apache-2.0",
            "subcomponentName": "sub1"
        }
    ],
    "transitiveDependencies": [
    ]
}

Console / log output

This is the actual output in the JSON-File.

{
    "componentName": "jakarta.ws.rs-api",
    "componentVersion": "3.0.0",
    "id": 22,
    "linking": "dynamic_linking",
    "modified": false,
    "scmUrl": "https://github.com/jakartaee/rest.git",
    "subcomponents": [
        {
            "licenseText": "Apache License\n--\nEclipse Public License - v 2.0\n--\nGNU GENERAL PUBLIC LICENSE",
            "licenseTextUrl": "",
            "selectedLicense": "Apache-2.0 AND EPL-2.0 AND GPL-2.0-only WITH Classpath-exception-2.0",
            "spdxId": "Apache-2.0 AND EPL-2.0 AND (EPL-2.0 OR GPL-2.0-only WITH Classpath-exception-2.0) AND GPL-2.0-only WITH Classpath-exception-2.0",
            "subcomponentName": "main"
        }
    ],
    "transitiveDependencies": [
    ]
}

Environment

  • ORT version: [e.g. 55.0.0]
  • Java version: [e.g. 21]
  • OS: [e.g. MS Windows 10]
@MNesche MNesche added the to triage Issues that need triaging label Mar 24, 2025
@sschuberth sschuberth added reporter About the reporter tool and removed to triage Issues that need triaging labels Mar 24, 2025
@sschuberth
Copy link
Member

@MNesche even after read the description again, I'm still unclear what a "subcomponent" actually is, and what defines it.

Is any arbitrary set of files that happen to have the same license a subcomponent? That would seem weird, esp. as the spdxId of a subcomponent can be an SPDX expression, i.e. it might not only include the OR but also the AND operator, basically allowing you to group everything into one main subcomponent, like ORT currently does.

Subcomponent called "main" should be the declared license(s) only.

I'm also not sure about that. What about a license detected in a root LICENSE file of a software component? Shouldn't that also be a "main" license?

Note that ORT actually doe snot have the concept of a "main license" of a package yet, but I've implemented something like that already for the SPDX report.

@sschuberth sschuberth added the needs info An issue where further information is required label Apr 7, 2025
@MNesche
Copy link
Author

MNesche commented Apr 7, 2025

Hi @sschuberth, thank you for your reply.
The description posted is from the item-field, here's the description of the array "subcomponents" itself:

  "Mandatory - Array with all subcomponents of the specific software component. A subcomponent is a finding in a software component with license and / or copyright information (sometimes also referred to as part). Usually there is a main license of the component and further subcomponent licenses in individual directories or files of the component. - Important hint: The first subcomponent in every component block must be named main!"

Indeed, the description is probably a bit inaccurate, but the declared license(s) should be the main license and any other finding as subcomponent, no matter where it has been found.
Unfortunately, the specified examples aren't very good also, otherwise I'd post em here.
I'm not a lawyer but it would make some sense, cause what is important actually, is the license under which the package is made public (declared) and if this license is compatible with any contained code under different licenses (i.e. code snippets or the origin license, if the package is an adjusted "new" work).

Not sure if you're familiar with Black Duck, but there it's handled similarly.
The declared license is just called "License" and "Deep Licenses" is everything else that has been found additionally and is listed individually.
Could be, it was an intention to develop that scheme, but I really don't know.

@sschuberth
Copy link
Member

So, a license detected in a root LICENSE file is not a "main" license? Quite odd, IMO.

Also, this still leaves the question open to me how many non-main subcomponents we should have. Just one, where the conjunction of all detected licenses is put?

@MNesche
Copy link
Author

MNesche commented Apr 7, 2025

Well, guess we'd have to discuss your question about the License-file in a root with a lawyer to get a bulletproof reply ;).

About the additional subcomponents, I can only reply how we made it.
We put any license finding in an individual subcomponent. Reason is, that if you put all in one subcomponent, the laywers who review the reports (and anybody else) have to "split up" the whole license-texts again, because they review these also.
So it's a lot more review-friendly, if they get the license text of the GPL in subcomponent_1 and the text of the MIT in subcomponent_2 for example, instead of both texts alltogether in one string.
However, license-findings with a choice have to be in one subcomponent, cause there needs to be a value in the "selectedLicense"-field.
In that case, we used the term ___OR___ as a divider for the license texts.
Since it's probably not only the lawyer reviewing the file, but even the party who created it, if there are import problems, try to search for an "or" ... and good luck finding the delimiter ;). If using ___OR___ instead, you'll find it immediately.
We had a couple of import issues with the format and needed to review the JSONs manually, to find the problems, that's why we came up with this sort of solution.

@sschuberth
Copy link
Member

Well, guess we'd have to discuss your question about the License-file in a root with a lawyer to get a bulletproof reply ;).

Actually, IIRC it was @LeChasseur who proposed at some ORT Community Days to introduce the concept of a "main license for a package" in ORT, which (again IIRC) explicitly included the license detected in a LICENSE file in the root of a repository. Do I remember this correctly, @LeChasseur?

Also, I just files this PR to make the idea of a "main license" more prominent in ORT.

We put any license finding in an individual subcomponent.

I guess that should say "any distinct license finding", right? Because in ORT, a "license finding" refers to an individual finding within a file, and a single file can have multiple findings for different licenses.

In that case, we used the term ___OR___ as a divider for the license texts.

That does not sound very... standard 😉

@MNesche
Copy link
Author

MNesche commented Apr 8, 2025

Good morning & thank you for your reply,

I guess that should say "any distinct license finding", right? Because in ORT, a "license finding" refers to an individual finding within a file, and a single file can have multiple findings for different licenses.

Indeed, any distinct license finding, but any license-spdx only mentioned once as subcomponent (no duplicates).

In the topic of a main license, I don't really understand what would make the difference to the "declared" license.
From my understanding, the declared license is the main license of a package, because the developer declared to make the package public under this License.

The declared license is already part of ORT and from a user's perspective working with the Webapp as "viewer" for the ORT results, the standard view already shows that in a good way.
ORT is already quite complex on it's own, not sure if an additional license category would make it better :).

Image

In that case, we used the term ___OR___ as a divider for the license texts.
That does not sound very... standard 😉

True, I'm always open for new ideas :D

@sschuberth
Copy link
Member

From my understanding, the declared license is the main license of a package, because the developer declared to make the package public under this License.

ORT uses the term "declared" in a more specific way: For ORT, a declared license exclusively comes from package metadata. That is, if you as a human "declare" a license as part of a LICENSE file in a repository, that's not a "declared license" in the ORT sense, because it does not come from metadata, but from a file that needs to be scanned by a scanner; hence ORT calls this a "detected license".

@MNesche
Copy link
Author

MNesche commented Apr 8, 2025

Alright, good to know 😄

@sschuberth sschuberth removed the needs info An issue where further information is required label Apr 8, 2025
sschuberth added a commit that referenced this issue Apr 8, 2025
By extracting the existing code to a function, this prepares for reuse in
order to address issue #10082.

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
sschuberth added a commit that referenced this issue Apr 8, 2025
By extracting the existing code to a function, this prepares for reuse in
order to address an issue with AOSD reports [1].

[1]: #10082

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reporter About the reporter tool
Projects
None yet
Development

No branches or pull requests

2 participants