Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new rule for mit #4121

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

alok1304
Copy link
Contributor

@alok1304 alok1304 commented Jan 25, 2025

Now we detects the correct license expression i.e mit.

Fixes #3860
Fixes #3861

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Signed-off-by: Alok Kumar alokkumarjipura9973@gmail.com

@alok1304
Copy link
Contributor Author

alok1304 commented Jan 25, 2025

@pombredanne can you please review this PR, Now there is no bug. We detects correct mit license Expression.

Eg:

  "files": [
    {
      "path": "_context.py",
      "type": "file",
      "detected_license_expression": "mit",
      "detected_license_expression_spdx": "MIT",
      "license_detections": [
        {
          "license_expression": "mit",
          "license_expression_spdx": "MIT",
          "matches": [
            {
              "license_expression": "mit",
              "license_expression_spdx": "MIT",
              "from_file": "_context.py",
              "start_line": 2,
              "end_line": 2,
              "matcher": "2-aho",
              "score": 100.0,
              "matched_length": 10,
              "match_coverage": 100.0,
              "rule_relevance": 100,
              "rule_identifier": "mit_1351.RULE",
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/mit_1351.RULE",
              "matched_text": "# MIT License (see LICENSE or https://opensource.org/licenses/MIT)",
              "matched_text_diagnostics": "MIT License (see LICENSE or https://opensource.org/licenses/MIT)"
            }
          ],
          "identifier": "mit-cf1210d2-5820-2542-c850-cac164e24fed"
        }
      ],
      "license_clues": [],
      "percentage_of_license_text": 0.24,
      "scan_errors": []
    }
  ]

This new rule fixes two bugs that are facing in #3861 and #3860.

Hope this helps.

@alok1304 alok1304 changed the title New rule for mit Add new rule for mit Jan 25, 2025
@alok1304 alok1304 force-pushed the new_rule_for_mit branch 6 times, most recently from 900b882 to 505e7b4 Compare January 25, 2025 11:51
@alok1304
Copy link
Contributor Author

@pombredanne can you please review this PR it solves two bugs.

Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alok1304 Thanks++
See comments for suggested improvements.
To fix license detection false positive issues by updating rules you need to:

  1. Add new rules as applicable
  2. Also add required phrases/minimum_coverage to the falsely detected rules with different licenses which are not present: For example here you also need to modify the src/licensedcode/data/rules/cc-by-3.0_and_mit_3.RULE RULE (see the scan result JSON you pasted for this), by adding required phrases to the license names + URLs seperately, so similar issues does not happen in the future with this rule. Please also apply the same to other/future PRs. Eventually we are doing this massively in Update rules with required phrases automatically #3924 to improve rules.

@alok1304 alok1304 force-pushed the new_rule_for_mit branch 3 times, most recently from ae27002 to af0d9a7 Compare April 5, 2025 16:42
ref: aboutcode-org#3860
ref: aboutcode-org#3861
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
@pombredanne
Copy link
Member

@alok1304 This overall look good! 👍
You can improve this further this way:

  1. create tests adding a test and expected file in https://github.com/aboutcode-org/scancode-toolkit/tree/develop/tests/licensedcode/data/datadriven/lic4 ... see all examples of test file pairs there.

The test for #3860 and #3861 would be the same with this text (like for https://github.com/aboutcode-org/scancode-toolkit/blob/develop/tests/licensedcode/data/datadriven/lic4/2675-sqlite.cpp )

# Copyright: (c) 2020, Jordan Borean (@jborean93) <jborean93@gmail.com>
# MIT License (see LICENSE or https://opensource.org/licenses/MIT)

And expected YAML file, (like for https://github.com/aboutcode-org/scancode-toolkit/blob/develop/tests/licensedcode/data/datadriven/lic4/2675-sqlite.cpp.yml )

license_expressions:
  - mit
  1. Also add a few new rules with this related contents (this can be a separate PR alright):
---
license_expression: mit
is_license_notice: yes
relevance: 100
referenced_filenames:
    - LICENSE
ignorable_urls:
    - https://opensource.org/licenses/MIT
---

{{MIT License (see LICENSE or https://opensource.org/licenses/MIT) }}

And another:

---
license_expression: mit
is_license_notice: yes
relevance: 100
referenced_filenames:
    - LICENSE
---

{{MIT License (see LICENSE) }}

And a few variations that can bee seen in the wild if we do not detect these exactly:

And all the variations where there is a LICENSE.txt:

And a few rst:

And more variants:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MIT reported as "CC-BY-3.0 AND MIT" MIT reported as "GPL-3.0-or-later OR MIT"
3 participants