-
Notifications
You must be signed in to change notification settings - Fork 24
[SP-2991] feat: add recursive search support for folder hashing #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SP-2991] feat: add recursive search support for folder hashing #141
Conversation
WalkthroughAdds CLI flags for folder-scan/folder-hash (depth, min-cutoff-threshold) and defaults; updates FolderHasher and ScannerHFH to use them. Regenerates many protobuf/gRPC modules with protobuf runtime validation, OpenAPI v2 option migration, type stubs, and import-time gRPC version guards. Updates packaging and file-filter behavior; bumps package version. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant CLI as scanoss CLI
participant S as ScannerHFH
participant FH as FolderHasher
participant RPC as Scanning gRPC Client
participant SV as Server
U->>CLI: folder-scan --depth D --min-cutoff-threshold T
CLI->>S: init(depth=D, min_cutoff_threshold=T)
S->>FH: hash_directory(path, depth=D)
FH-->>S: directory hashes
S->>RPC: FolderHashScan(HFHRequest{root:..., min_cutoff_threshold:T})
RPC->>SV: gRPC call
SV-->>RPC: HFHResponse
RPC-->>S: response
S-->>CLI: results
CLI-->>U: output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (1)
1-194
: Generated files should not be manually modifiedThis file is auto-generated by the gRPC Python protocol compiler (as indicated by the header comment "Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!"). All changes to this file should be made by updating the source
.proto
files and regenerating the code using the appropriate protoc compiler with the correct version.The changes appear to be from regenerating with a newer version of grpcio-tools that produces code requiring grpcio>=1.73.1.
Ensure that:
- These changes were made by regenerating from
.proto
files, not manual edits- The gRPC tools version used for generation is documented in your build configuration
- The minimum grpcio version (1.73.1) is reflected in your project's dependencies
♻️ Duplicate comments (1)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (1)
1-238
: Regenerated gRPC code maintains consistencyThis file exhibits the same regeneration pattern as previously reviewed gRPC files in this PR.
🧹 Nitpick comments (2)
src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2_grpc.py (1)
1-1
: Track generator version in auto-generated filesSince this is auto-generated code that should not be manually edited, consider adding the grpc-tools version used for generation as a comment to help track which version produced these files.
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (1)
4-4
: Unused import detected.The
warnings
module is imported but not used anywhere in the file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
CHANGELOG.md
(2 hunks)src/protoc_gen_swagger/options/annotations_pb2.py
(2 hunks)src/protoc_gen_swagger/options/annotations_pb2.pyi
(1 hunks)src/protoc_gen_swagger/options/annotations_pb2_grpc.py
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2.py
(2 hunks)src/protoc_gen_swagger/options/openapiv2_pb2.pyi
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
(1 hunks)src/scanoss/__init__.py
(1 hunks)src/scanoss/api/common/v2/scanoss_common_pb2.py
(2 hunks)src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py
(7 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py
(1 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py
(9 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
(1 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
(6 hunks)src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2.py
(1 hunks)src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2_grpc.py
(6 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(2 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
(5 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py
(1 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
(5 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py
(1 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py
(6 hunks)src/scanoss/cli.py
(5 hunks)src/scanoss/constants.py
(1 hunks)src/scanoss/scanners/folder_hasher.py
(7 hunks)src/scanoss/scanners/scanner_hfh.py
(7 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/protoc_gen_swagger/options/annotations_pb2.pyi (1)
src/protoc_gen_swagger/options/openapiv2_pb2.pyi (5)
Swagger
(25-212)Operation
(217-378)Schema
(748-803)Tag
(974-999)JSONSchema
(808-969)
🔇 Additional comments (62)
src/scanoss/constants.py (1)
16-18
: LGTM! Well-defined constants for folder hashing functionality.The new constants follow consistent naming conventions and have reasonable default values. The depth of 1 for recursive scanning and 25% threshold for minimum cutoff are appropriate defaults for folder operations.
src/scanoss/__init__.py (1)
25-25
: LGTM! Version bump aligns with new feature additions.The minor version increment from 1.30.0 to 1.31.0 is appropriate for the recursive search support features being added in this PR.
CHANGELOG.md (2)
12-16
: LGTM! Proper changelog documentation for new features.The changelog entry correctly documents the new CLI arguments and follows the established format. The features align with the PR objectives for adding recursive search support.
623-623
: LGTM! Version comparison link properly added.The version comparison link is correctly formatted and placed in the appropriate location.
src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py (1)
4-24
: LGTM! Robust gRPC version compatibility check.The version validation logic is well-implemented with proper error handling:
- Clear minimum version requirement (1.73.1)
- Graceful handling of ImportError cases
- Informative error message with specific upgrade/downgrade instructions
- Prevents runtime incompatibility issues
This follows best practices for generated gRPC code compatibility enforcement.
src/protoc_gen_swagger/options/annotations_pb2_grpc.py (1)
4-24
: LGTM! Consistent gRPC version compatibility enforcement.The version validation logic matches the pattern used in other generated gRPC files, ensuring consistent compatibility checks across the codebase. The error message correctly references this specific file path.
src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py (1)
4-24
: LGTM! Proper gRPC version validation.The runtime version compatibility check is well-implemented with appropriate error handling and informative error messages. This prevents runtime incompatibility issues between the generated code and the gRPC library.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (3)
9-19
: LGTM! Proper protobuf runtime validation.The explicit runtime version validation ensures compatibility with protobuf Python runtime version 6.31.0, preventing potential compatibility issues at module load time.
27-27
: Good modernization of OpenAPI v2 imports.The import has been updated from swagger options to the more current OpenAPI v2 annotations, reflecting the evolution of the API specification standards.
30-30
: Schema evolution supports new scanning parameters.The protobuf descriptor includes the new
min_cutoff_threshold
field (line 30 in the serialized descriptor), which aligns with the PR objectives to add configurable threshold parameters for folder scanning operations.src/protoc_gen_swagger/options/openapiv2_pb2.py (3)
9-19
: LGTM! Consistent protobuf runtime validation.The runtime version validation follows the same pattern as other protobuf modules in this PR, ensuring compatibility with protobuf Python runtime version 6.31.0.
31-34
: Good modernization of descriptor building.The use of a local
_globals
variable and the updated conditional check (if not _descriptor._USE_C_DESCRIPTORS:
) reflects modern protobuf code generation practices.
35-62
: Proper update to descriptor option handling.The migration from
_options
to_loaded_options
aligns with updates in the protobuf internal API, ensuring compatibility with the newer protobuf runtime.src/protoc_gen_swagger/options/annotations_pb2.py (2)
9-19
: LGTM! Consistent protobuf modernization.The runtime version validation and import order changes maintain consistency with the other protobuf modules updated in this PR.
31-36
: Good adherence to modern protobuf patterns.The use of
_globals
variable and_loaded_options
follows the same modernization pattern applied consistently across all protobuf-generated modules in this PR.src/scanoss/scanners/scanner_hfh.py (5)
32-36
: Good import organization for new constants.The addition of the new default constants (
DEFAULT_HFH_DEPTH
andDEFAULT_HFH_MIN_CUTOFF_THRESHOLD
) maintains clean separation of configuration values.
55-76
: Well-designed parameter extension.The constructor properly extends the interface with the new
depth
andmin_cutoff_threshold
parameters while maintaining backward compatibility through sensible defaults. The parameter documentation is clear and comprehensive.
94-99
: Proper integration with FolderHasher.The
depth
parameter is correctly passed to theFolderHasher
instance, enabling the recursive search depth functionality as intended by the PR objectives.
114-118
: Good integration of new scanning parameters.The scan method properly incorporates both the
min_cutoff_threshold
parameter in the gRPC request and uses a more readable keyword argument for thehash_directory
call.
207-207
: Minor formatting improvement.The whitespace cleanup improves code readability without affecting functionality.
src/scanoss/api/common/v2/scanoss_common_pb2.py (2)
3-19
: LGTM! Protobuf runtime validation and modernized imports.The addition of runtime version validation for protobuf 6.31.0 and the reordered imports with explicit runtime version checking are appropriate updates for generated protobuf code.
29-48
: LGTM! Modernized descriptor handling.The changes to use a local
_globals
variable instead ofglobals()
directly, and the updated conditional check and option handling (_loaded_options
instead of_options
) are consistent with protobuf toolchain improvements.src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (3)
3-19
: LGTM! Protobuf runtime validation and modernized imports.The addition of runtime version validation for protobuf 6.31.0 follows the same pattern as other protobuf modules in this update.
27-27
: LGTM! Migration from Swagger to OpenAPI v2 annotations.The update from
protoc_gen_swagger.options
toprotoc_gen_openapiv2.options
aligns with the broader ecosystem migration to OpenAPI v2 specifications.
32-51
: LGTM! Modernized descriptor and global namespace handling.The changes to use local
_globals
variable and updated descriptor option handling are consistent with protobuf toolchain improvements across all modules.src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (3)
3-19
: LGTM! Consistent protobuf runtime validation.The runtime version validation and import reordering follows the same modernization pattern as other protobuf modules.
27-27
: LGTM! OpenAPI v2 migration.Consistent with other modules, the migration from Swagger to OpenAPI v2 annotations is appropriate.
32-65
: LGTM! Comprehensive descriptor handling updates.The extensive descriptor option updates reflect the Dependencies service having multiple methods. The pattern of using
_globals
and_loaded_options
is consistent with other protobuf modules in this update.src/scanoss/api/components/v2/scanoss_components_pb2.py (3)
3-19
: LGTM! Standard protobuf modernization.The runtime version validation and import updates are consistent with the protobuf toolchain upgrade across all modules.
27-27
: LGTM! OpenAPI v2 adoption.The migration to OpenAPI v2 annotations is consistent with other protobuf modules in this update.
32-71
: LGTM! Extensive but consistent descriptor updates.The Components service has multiple methods requiring extensive descriptor option updates. The pattern using
_globals
and_loaded_options
is consistent with the modernization approach across all protobuf modules.src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (3)
3-19
: LGTM! Completes the protobuf modernization pattern.The runtime version validation and import updates are identical to other modules, completing the systematic protobuf toolchain upgrade.
27-27
: LGTM! Consistent OpenAPI v2 migration.The Swagger to OpenAPI v2 annotations migration is consistent across all protobuf modules.
32-75
: LGTM! Final comprehensive descriptor modernization.The Cryptography service descriptor updates complete the systematic modernization pattern across all protobuf modules, using the consistent
_globals
and_loaded_options
approach.src/scanoss/cli.py (5)
62-63
: LGTM!The new constants for depth and minimum cutoff threshold are correctly imported and will be used as defaults for the new CLI arguments.
679-690
: LGTM!The new CLI arguments are well-defined with appropriate types, default values, and descriptive help text. The integration with the imported constants is correct.
709-714
: LGTM!The --depth argument for folder-hash is consistent with the folder-scan implementation and properly configured.
2059-2060
: LGTM!The new parameters are correctly passed from CLI arguments to the ScannerHFH constructor.
2094-2094
: LGTM!The depth parameter is correctly passed from CLI arguments to the FolderHasher constructor.
src/scanoss/scanners/folder_hasher.py (9)
9-9
: LGTM!The import of DEFAULT_HFH_DEPTH is correct and consistent with the CLI implementation.
76-82
: LGTM!The class docstring is properly updated to document the new depth parameter with clear explanation of its purpose.
89-89
: LGTM!The depth parameter is correctly added to the constructor with appropriate default value.
112-112
: LGTM!The depth parameter is properly stored as an instance variable for later use.
135-138
: LGTM!The method signature formatting improvement enhances readability.
195-195
: LGTM!The addition of current_depth parameter with default value 1 is well-designed for the recursive depth tracking.
204-204
: LGTM!The parameter documentation clearly explains the current_depth parameter's purpose and 1-based indexing.
220-224
: LGTM!The depth limiting logic is correctly implemented. The condition
current_depth < self.depth
properly prevents recursion beyond the configured depth limit, and the recursive call correctly increments the depth counter.
231-231
: LGTM!The children list assignment is correctly updated to use the depth-limited results.
src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2.py (3)
3-19
: LGTM!The protobuf runtime version validation is correctly implemented, ensuring compatibility with protobuf Python runtime version 6.31.0. The explicit validation helps prevent runtime errors due to version mismatches.
27-27
: LGTM!The migration from deprecated
protoc_gen_swagger
toprotoc_gen_openapiv2
annotations is a positive modernization that aligns with current OpenAPI standards.
32-59
: LGTM!The refactored global symbol handling uses a more structured approach with explicit
_globals
dictionary management. This improves code clarity and provides better control over descriptor options and namespace handling in the generated protobuf code.src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (1)
3-59
: LGTM!The protobuf-generated code updates in this file are consistent with the pattern established in the geoprovenance file:
- Proper runtime version validation for protobuf 6.31.0
- Migration from deprecated swagger to openapiv2 annotations
- Improved global symbol and descriptor handling
The consistent application of these changes across protobuf modules ensures maintainability and correctness.
src/protoc_gen_swagger/options/annotations_pb2.pyi (1)
1-49
: Type stub file is auto-generatedThis
.pyi
file is auto-generated by mypy-protobuf (as indicated by the header "@generated by mypy-protobuf. Do not edit manually!"). The file provides type hints for static type checking and IDE support.The addition of type stubs improves type safety and developer experience when working with these protobuf modules.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)
1-150
: Consistent pattern in generated gRPC filesThis file follows the same regeneration pattern as other gRPC files in this PR. While the changes are consistent, remember that these are auto-generated files that should not be manually edited.
The regenerated code is consistent with the gRPC version requirements across all service modules.
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)
1-194
: Mismatch between PR objectives and reviewed filesThe PR title "[SP-2991] feat: add recursive search support for folder hashing" suggests implementation of folder scanning features with depth and cutoff parameters. However, all the files provided for review are auto-generated gRPC/protobuf code that don't contain the actual implementation.
Please verify that the core implementation files for the folder hashing feature are included in this PR:
#!/bin/bash # Search for files containing folder hashing implementation echo "Searching for folder hashing related code..." rg -l "folder.*hash|hash.*folder" --type py | grep -v pb2 echo -e "\nSearching for depth parameter usage..." rg -l "depth|recursive" --type py | grep -v pb2 echo -e "\nSearching for cutoff threshold..." rg -l "cutoff|threshold" --type py | grep -v pb2 echo -e "\nListing non-generated Python files changed in this PR..." git diff --name-only HEAD~1 | grep "\.py$" | grep -v "pb2\.py$" | grep -v "pb2_grpc\.py$"Likely an incorrect or invalid review comment.
src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2_grpc.py (1)
9-26
: LGTM! Version compatibility check ensures runtime safety.The grpc version validation with clear error messaging is appropriate for generated code.
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (1)
31-32
: Action Required: Verify and correct service name in source .proto
I wasn’t able to locate any.proto
files in the repo—please confirm where the Semgrep service is defined and ensure its docstring reads “Semgrep RPCs” instead of “SCANOSS Cryptography RPCs” before regenerating the Python stubs.• Locate the
.proto
file for the Semgrep gRPC service
• Update the service documentation to reference “Semgrep RPCs”
• Regeneratescanoss_semgrep_pb2_grpc.py
to pick up the changesrc/protoc_gen_swagger/options/openapiv2_pb2.pyi (1)
1-5
: Good addition of type stubs for OpenAPI v2The addition of mypy-protobuf generated type stubs enhances IDE support and type safety for OpenAPI v2 protobuf messages.
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (4)
9-26
: LGTM! Robust version compatibility checking.The runtime version validation logic is well-implemented with proper fallback handling and clear error messaging for version mismatches.
44-69
: LGTM! Consistent RPC method registration.All RPC methods in the stub correctly include
_registered_method=True
parameter, which aligns with the newer gRPC requirements.
156-156
: LGTM! Proper method handler registration.The explicit registration of method handlers complements the existing generic handler registration and aligns with the
_registered_method=True
flags used in the client stub.
176-325
: LGTM! Consistent experimental API updates.The multi-line formatting and
_registered_method=True
additions are consistent across all experimental static methods, maintaining uniformity with the client stub updates.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
c0635a7
to
ac8940f
Compare
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (2)
src/scanoss/file_filters.py (2)
573-577
: Case-sensitivity bug when honoring custom skip_extensionsfile_name_lower is lowercased, but endings from self.skip_extensions are not, causing fails to match uppercase or mixed-case endings supplied by users.
Apply this fix:
- if self.skip_extensions: - for ending in self.skip_extensions: - if file_name_lower.endswith(ending): + if self.skip_extensions: + for ending in self.skip_extensions: + ending_l = ending.strip().lower() + if file_name_lower.endswith(ending_l): self.print_debug(f'Skipping file: {file_rel_path} (matches skip extension)') return True
527-536
: HFH directory-skip sets not honored consistently in should_skip_dirThe membership check is always against DEFAULT_SKIPPED_DIRS and DEFAULT_SKIPPED_DIR_EXT, ignoring the HFH variants used elsewhere. Even if both sets are currently identical, this is brittle and can drift.
Use the HFH-aware sets here as well:
- dir_name_lower = dir_name.lower() - if dir_name_lower in DEFAULT_SKIPPED_DIRS: + dir_name_lower = dir_name.lower() + DEFAULT_SKIPPED_DIR_SET = DEFAULT_SKIPPED_DIRS_HFH if self.is_folder_hashing_scan else DEFAULT_SKIPPED_DIRS + if dir_name_lower in DEFAULT_SKIPPED_DIR_SET: self.print_debug(f'Skipping directory: {dir_rel_path} (matches default skip directory)') return True if self.skip_folders and dir_name in self.skip_folders: self.print_debug(f'Skipping directory: {dir_rel_path} (matches skip folder)') return True - for ext in DEFAULT_SKIPPED_DIR_EXT: + DEFAULT_SKIPPED_DIR_EXT_SET = ( + DEFAULT_SKIPPED_DIR_EXT_HFH if self.is_folder_hashing_scan else DEFAULT_SKIPPED_DIR_EXT + ) + for ext in DEFAULT_SKIPPED_DIR_EXT_SET: if dir_name_lower.endswith(ext): self.print_debug(f'Skipping directory: {dir_rel_path} (matches default skip extension: {ext})') return True
♻️ Duplicate comments (6)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (1)
9-26
: grpc runtime compatibility guard mirrors components moduleSame approach and message; see earlier comment about ensuring repo pins match the required grpc version.
src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2_grpc.py (1)
9-26
: grpc runtime compatibility guard is consistent with other modulesSame gate as components/vulnerabilities; consistent and correct.
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (2)
12-19
: Protobuf runtime version validation added: ensure dependency pins (protobuf ≥ 6.31.0)Mirrors geoprovenance pb2; ensure your packaging pins protobuf to at least this version to avoid runtime import errors.
27-27
: OpenAPI v2 annotations dependency imported at runtimeAs with the geoprovenance module, confirm protoc-gen-openapiv2 Python package is present in runtime requirements if these modules are imported by clients.
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)
9-26
: Same grpc version guard as other pb2_grpc modulesSee the comment in cryptography/v2 pb2_grpc regarding packaging pins for grpcio/grpcio-tools.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)
9-26
: Same grpc version guard as other pb2_grpc modulesSee prior comment about pinning grpcio/grpcio-tools to >=1.73.1 to avoid import-time failures.
🧹 Nitpick comments (7)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (2)
3-3
: Clarify the “NO CHECKED-IN PROTOBUF GENCODE” policy vs. checked-in fileThis header explicitly says not to check in generated code, but this file is checked in. Please align the policy and practice to avoid confusion for future contributors and tooling.
Would you like to keep Python pb2 code checked in (common for Python projects) and remove/adjust this header, or exclude these from VCS and generate at build time?
9-19
: Import-time runtime guard will break older environmentsValidateProtobufRuntimeVersion enforces the generator/runtime match at import. If older protobuf runtimes are in use anywhere (CI, downstream consumers), imports will fail. Confirm that all runtime environments are already on protobuf >= 6.31.0, or bump pins accordingly.
If helpful, I can propose dependency updates (requirements/pyproject) to ensure consistent protobuf/grpc versions across the repo.
src/scanoss/file_filters.py (1)
579-581
: Normalize path before calling pathspec to avoid type surprisesGitIgnoreSpec.match_file expects a string path. file_rel_path can be a Path in some flows. Normalize to str/os.fspath for safety across environments.
- if self.file_folder_pat_spec and self.file_folder_pat_spec.match_file(file_rel_path): + if self.file_folder_pat_spec and self.file_folder_pat_spec.match_file(os.fspath(file_rel_path)): self.print_debug(f'Skipping file: {file_rel_path} (matches custom pattern)') return Truesrc/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (1)
37-37
: Typo in service metadata URL (“crpytography”)The service options embed a misspelled GitHub URL: https://github.com/scanoss/crpytography. This will surface in generated OpenAPI docs/metadata. Fix this in the source .proto and regenerate.
Suggested change in scanoss/api/cryptography/v2/scanoss-cryptography.proto (do not edit pb2 directly):
- option (grpc.gateway.protoc_gen_openapiv2.options.openapiv2_swagger).external_docs.url = "https://github.com/scanoss/crpytography"; + option (grpc.gateway.protoc_gen_openapiv2.options.openapiv2_swagger).external_docs.url = "https://github.com/scanoss/cryptography";src/protoc_gen_swagger/options/annotations_pb2_grpc.py (2)
11-16
: Prefer robust version comparison without relying on private gRPC internalsImporting grpc._utilities.first_version_is_lower uses a private API that may change. Consider a fallback comparison using packaging.version when the private import fails. This avoids hard-failing on environments where the private API is absent but the installed grpc still satisfies the requirement.
Apply this diff to make the check more resilient without introducing a hard dependency:
-try: - from grpc._utilities import first_version_is_lower - _version_not_supported = first_version_is_lower(GRPC_VERSION, GRPC_GENERATED_VERSION) -except ImportError: - _version_not_supported = True +try: + from grpc._utilities import first_version_is_lower + _version_not_supported = first_version_is_lower(GRPC_VERSION, GRPC_GENERATED_VERSION) +except Exception: + try: + from packaging.version import Version as _V # optional fallback if available + _version_not_supported = _V(GRPC_VERSION) < _V(GRPC_GENERATED_VERSION) + except Exception: + # As a safe default, reject unknown or unparsable versions + _version_not_supported = TrueNote: This does not add a hard dependency on packaging; it only uses it if present.
19-23
: Avoid hard-coding file paths in error messagesHard-coding protoc_gen_swagger/options/annotations_pb2_grpc.py can drift if the module is moved or vendored. Use the module name to build a stable reference in the error.
Apply this diff:
if _version_not_supported: raise RuntimeError( f'The grpc package installed is at version {GRPC_VERSION},' - + f' but the generated code in protoc_gen_swagger/options/annotations_pb2_grpc.py depends on' + + f' but the generated code in ' + __name__.replace('.', '/') + '.py depends on' + f' grpcio>={GRPC_GENERATED_VERSION}.' + f' Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION}' + f' or downgrade your generated code using grpcio-tools<={GRPC_VERSION}.' )src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)
108-150
: Experimental wrappers updated — LGTM; consider documenting the breaking guardThe wrappers are correct. Since import-time version enforcement is a breaking behavior for older environments, ensure release notes/CHANGELOG call this out and bump the appropriate version.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
CHANGELOG.md
(2 hunks)requirements.txt
(1 hunks)scanoss.json
(2 hunks)setup.cfg
(1 hunks)src/protoc_gen_swagger/options/annotations_pb2.py
(2 hunks)src/protoc_gen_swagger/options/annotations_pb2.pyi
(1 hunks)src/protoc_gen_swagger/options/annotations_pb2_grpc.py
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2.py
(2 hunks)src/protoc_gen_swagger/options/openapiv2_pb2.pyi
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
(1 hunks)src/scanoss/api/common/v2/scanoss_common_pb2.py
(2 hunks)src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py
(7 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py
(1 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py
(9 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
(1 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
(6 hunks)src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2.py
(1 hunks)src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2_grpc.py
(6 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(2 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
(5 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py
(1 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
(5 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py
(1 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py
(6 hunks)src/scanoss/cli.py
(5 hunks)src/scanoss/constants.py
(1 hunks)src/scanoss/file_filters.py
(1 hunks)src/scanoss/scanners/folder_hasher.py
(8 hunks)src/scanoss/scanners/scanner_hfh.py
(7 hunks)
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (15)
- requirements.txt
- src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
- src/scanoss/constants.py
- src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
- src/protoc_gen_swagger/options/openapiv2_pb2.py
- setup.cfg
- src/scanoss/scanners/folder_hasher.py
- src/protoc_gen_swagger/options/openapiv2_pb2.pyi
- src/scanoss/api/components/v2/scanoss_components_pb2.py
- src/protoc_gen_swagger/options/annotations_pb2.py
- src/scanoss/cli.py
- src/scanoss/api/common/v2/scanoss_common_pb2.py
- src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
- src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
- src/scanoss/scanners/scanner_hfh.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/protoc_gen_swagger/options/annotations_pb2.pyi (1)
src/protoc_gen_swagger/options/openapiv2_pb2.pyi (5)
Swagger
(25-212)Operation
(217-378)Schema
(748-803)Tag
(974-999)JSONSchema
(808-969)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (39)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (5)
27-27
: OpenAPI v2 annotations import requires packaging the new moduleSwitching to protoc_gen_openapiv2.options.annotations_pb2 is correct, but make sure the generated Python module(s) for protoc_gen_openapiv2 are checked in and included in your wheel/sdist. Also verify there are no lingering protoc_gen_swagger imports elsewhere.
Run the script in my earlier comment to:
- Confirm protoc_gen_openapiv2/options/annotations_pb2.py exists in the repo.
- Ensure deprecated protoc_gen_swagger imports are gone.
Additionally, verify that your packaging configuration includes the protoc_gen_openapiv2 package in distributions (e.g., via find_packages or package_data).
32-35
: LGTM: Modern builder usage with globalsUsing _globals with BuildMessageAndEnumDescriptors and BuildTopDescriptorsAndMessages matches current generator patterns and improves initialization ordering.
35-41
: LGTM: Descriptor option wiring and C-descriptor guardUsing not _USE_C_DESCRIPTORS and wiring serialized options onto service/methods is consistent with modern protobuf Python generation.
42-51
: LGTM: Serialized offsets updatedThe updated _serialized_start/_serialized_end offsets are consistent with the regenerated descriptor. Nothing to change here.
5-5
: Pin Protobuf Runtime to ≥6.31.0
The generated code enforces Protobuf Python Version 6.31.0 at import time. Please verify and update your project’s dependency files to include:
- requirements.txt / Pipfile:
protobuf>=6.31.0- pyproject.toml / poetry.lock / setup.py / setup.cfg:
Under your dependencies or install_requires, pin protobuf to ≥6.31.0If you manage dependencies elsewhere (e.g. Docker images, CI configs), confirm that all environments install Protobuf >= 6.31.0.
src/scanoss/file_filters.py (1)
568-571
: Review HFH file-ending skip behavior
The_should_skip_file
method insrc/scanoss/file_filters.py
now unconditionally usesDEFAULT_SKIPPED_EXT
for both regular and folder-hashing scans, broadening the set of skipped files under HFH. This will affect folder-hash results and file counts.Please verify intent and take one of the following actions:
• If this wider skip list is intentional:
– Update or add tests covering HFH scans to assert the new skips.
– Revise documentation (e.g., README or API docs) to list all extensions now excluded in folder hashing.• If this change was unintentional:
– Reintroduce an HFH-specific extension set and select it whenis_folder_hashing_scan
is true.Key location:
src/scanoss/file_filters.py
, method_should_skip_file
(around lines 567–571)Suggested refactor if reverting to HFH-specific extensions:
- for ending in DEFAULT_SKIPPED_EXT: + skipped_exts = DEFAULT_SKIPPED_EXT_HFH if self.is_folder_hashing_scan else DEFAULT_SKIPPED_EXT + for ending in skipped_exts: if file_name_lower.endswith(ending): self.print_debug(f'Skipping file: {file_rel_path} (matches skip ending: {ending})') return Truesrc/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (3)
9-19
: Early runtime version validation is a solid additionValidating the protobuf runtime at import time will fail fast if environments are out of sync. Good defensive programming for generated code.
32-35
: Descriptor build/refactor LGTMSwitching to the _globals + _builder helpers and conditional C-descriptor handling mirrors modern generator output. Looks correct and keeps public API unchanged.
Also applies to: 35-49
3-3
: “NO CHECKED-IN PROTOBUF GENCODE” header conflicts with current repo practiceThis file is committed, yet the header states no checked-in gen code. If the project policy is to vendor generated pb2/grpc code (as this PR suggests), consider removing or suppressing this header at generation time to avoid confusion for future contributors and automation.
Would you like me to propose an update to your codegen task (Makefile/Buf/Prototool/etc.) to drop or replace this header?
scanoss.json (1)
20-20
: Confirm behavior change: bom.remove -> skip.patterns.scanningSwitching from BOM removal to pre-scan skipping changes semantics. The skipped paths will no longer be scanned at all (vs. being scanned and filtered post-fact). Please confirm this is intended and that BOM parity with prior releases is maintained.
Would you like me to generate a script to diff BOM outputs before/after to validate parity in CI?
src/protoc_gen_swagger/options/annotations_pb2.pyi (1)
1-49
: Type stubs look correct and usefulStubs correctly expose extension descriptors and field numbers, with accurate type references to OpenAPI v2 messages. Good addition for static typing.
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (6)
40-60
: Client stubs: _registered_method=True is correctly appliedPassing _registered_method=True on unary_unary calls aligns with server-side registration for better performance/validation.
96-123
: Server: add_registered_method_handlers added appropriatelyExplicit registration complements the generic handler and is guarded by the version gate above. No issues spotted.
142-156
: Experimental wrapper: Echo updated to registered method styleThe multi-line experimental wrapper with _registered_method=True is consistent with the new registration path.
169-184
: Experimental wrapper: SearchComponents updated to registered method styleConsistent with Echo; no concerns.
196-211
: Experimental wrapper: GetComponentVersions updated to registered method styleConsistent and correct.
223-238
: Experimental wrapper: GetComponentStatistics updated to registered method styleConsistent and correct.
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (5)
40-54
: Client stubs: _registered_method=True is correctly appliedMatches the server registration path; looks good.
102-106
: Server: add_registered_method_handlers added appropriatelyAligned with the client stub changes; no issues spotted.
125-139
: Experimental wrapper: Echo uses registered method pathConsistent and correct.
152-166
: Experimental wrapper: GetCpes uses registered method pathConsistent and correct.
179-193
: Experimental wrapper: GetVulnerabilities uses registered method pathConsistent and correct.
src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2.py (1)
27-27
: OpenAPI v2 annotations dependency imported at runtimeSince protoc_gen_openapiv2.options.annotations_pb2 is imported at import-time, ensure the corresponding Python package is included in install/runtime requirements.
src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2_grpc.py (5)
40-55
: Client stubs: _registered_method=True is correctly appliedMatches server registration and experimental wrappers; no issues.
102-106
: Server: add_registered_method_handlers added appropriatelyAligned with client changes; version gate ensures availability.
125-139
: Experimental wrapper: Echo updated to registered method pathConsistent and correct.
152-166
: Experimental wrapper: GetComponentContributors updated to registered method pathConsistent and correct.
179-193
: Experimental wrapper: GetComponentOrigin updated to registered method pathConsistent and correct.
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (3)
40-70
: Stub methods use _registered_method=True — LGTMExplicit registration on the client stub methods aligns with the server-side handler registration and the enforced grpc version.
153-157
: Server registers method handlers explicitly — LGTMadd_registered_method_handlers complements the generic handler and matches the client’s _registered_method usage.
176-190
: Experimental client wrappers updated with _registered_method=True — LGTMSame pattern applied across all static wrappers (Echo, GetAlgorithms*, GetHints*). Looks correct given the version guard.
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (3)
40-55
: Client stub uses _registered_method=True — LGTMConsistent with server registration and enforced grpc version.
102-106
: Server registered method handlers — LGTMExplicit registration is correct and consistent with the rest of the PR.
125-140
: Experimental client wrapper arguments modernized — LGTMMultiline style with _registered_method=True matches the new grpc tooling.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (3)
30-31
: HFHRequest: new field min_cutoff_threshold (float) — LGTMField number 5 added; backward-compatible for clients not sending it (default 0). Ensure server-side validation and CLI/plumbing default semantics are consistent.
Would you like me to scan server/CLI code to confirm the field is plumbed and defaults handled as intended?
44-59
: Descriptor indices updated — LGTMOffsets reflect the new field and descriptor changes; nothing actionable.
27-28
: Vendored OpenAPI v2 annotations module foundThe file
protoc_gen_openapiv2/options/annotations_pb2.py
is present in the repository, so the import path is valid. No further changes are required.src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)
40-50
: Client stub uses _registered_method=True — LGTMEcho and FolderHashScan are correctly registered on the client.
85-89
: Server registers method handlers — LGTMExplicit registration aligns with client registration and version guard.
…r hashing commands
ac8940f
to
64ca0d3
Compare
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
♻️ Duplicate comments (2)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (1)
9-26
: gRPC runtime guard enforces grpcio>=1.73.1 (import-time failure on older runtimes)This mirrors prior feedback on similar modules; ensure all constraints pin grpcio and grpcio-tools to >=1.73.1 across dev/CI/prod.
You can reuse the earlier repo-wide scan script to verify grpcio/grpcio-tools pins; I can also raise a separate PR to align constraints if desired.
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)
9-26
: gRPC runtime guard: ensure grpcio/grpcio-tools>=1.73.1 everywhereSame enforcement here; align pins to avoid import-time RuntimeError.
I can submit a follow-up PR to bump and unify these pins across all requirement files.
🧹 Nitpick comments (8)
src/scanoss/file_filters.py (4)
525-536
: Align direct dir checks with HFH mode to avoid relying on patterns for parityDirect checks in should_skip_dir use DEFAULT_SKIPPED_DIRS/DEFAULT_SKIPPED_DIR_EXT only, while _get_operation_patterns picks the HFH variants when is_folder_hashing_scan=True. Behavior is still correct due to the final pattern-spec check, but using the HFH sets here improves determinism and debug messages (it will say “matches default skip directory” instead of “matches custom pattern” for HFH-only defaults).
Apply:
dir_name_lower = dir_name.lower() - if dir_name_lower in DEFAULT_SKIPPED_DIRS: + default_skipped_dirs = DEFAULT_SKIPPED_DIRS_HFH if self.is_folder_hashing_scan else DEFAULT_SKIPPED_DIRS + default_skipped_dir_ext = ( + DEFAULT_SKIPPED_DIR_EXT_HFH if self.is_folder_hashing_scan else DEFAULT_SKIPPED_DIR_EXT + ) + if dir_name_lower in default_skipped_dirs: self.print_debug(f'Skipping directory: {dir_rel_path} (matches default skip directory)') return True if self.skip_folders and dir_name in self.skip_folders: self.print_debug(f'Skipping directory: {dir_rel_path} (matches skip folder)') return True - for ext in DEFAULT_SKIPPED_DIR_EXT: + for ext in default_skipped_dir_ext: if dir_name_lower.endswith(ext): self.print_debug(f'Skipping directory: {dir_rel_path} (matches default skip extension: {ext})') return True
568-571
: Prefer most-specific endings first for clearer logs in ambiguous casesMinor: since DEFAULT_SKIPPED_EXT contains mixed endings (e.g., '.manifest' and 'manifest'), consider checking longer endings first to make the debug message reflect the most specific match.
- for ending in DEFAULT_SKIPPED_EXT: + for ending in sorted(DEFAULT_SKIPPED_EXT, key=len, reverse=True): if file_name_lower.endswith(ending): self.print_debug(f'Skipping file: {file_rel_path} (matches default skip ending: {ending})') return True
573-577
: Normalize custom skip_extensions to lowercase and include the matched ending in logsUser-provided endings may be mixed-case. Normalize during the check and surface the actual ending in the message for traceability.
- if self.skip_extensions: - for ending in self.skip_extensions: - if file_name_lower.endswith(ending): - self.print_debug(f'Skipping file: {file_rel_path} (matches skip extension)') - return True + if self.skip_extensions: + for ending in self.skip_extensions: + e = ending.lower() + if file_name_lower.endswith(e): + self.print_debug(f'Skipping file: {file_rel_path} (matches skip extension: {e})') + return TrueAdditionally (outside this range), consider normalizing once in init to avoid repeated lower() calls:
# In __init__ after reading kwargs self.skip_extensions = [e.lower() for e in kwargs.get('skip_extensions', [])]
311-353
: Optional: add a focused unit test around the new unified extension behaviorA small test ensures HFH vs non-HFH produce identical skip decisions for extension-based cases.
Happy to open a test PR; example outline:
- Create temp files: foo.md, bar.pdf, baz.min.js, README, LICENSE.
- Assert _should_skip_file returns True for both is_folder_hashing_scan=True/False.
- Include a custom skip_extensions=['.foo'] to verify normalization and logging path.
Let me know and I’ll draft pytest-compatible tests.
src/scanoss/api/components/v2/scanoss_components_pb2.py (1)
3-5
: Checked-in generated code vs header disclaimerHeader states “NO CHECKED-IN PROTOBUF GENCODE”, yet this file is committed. If intentional, please confirm policy (why generated code is vendored, regeneration process, and when to re-gen) and align CONTRIBUTING docs accordingly. Otherwise, consider excluding from VCS.
Would you like me to draft a short “Generated code policy” section for the repo?
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (1)
27-27
: OpenAPI v2 annotations import: verify it’s available in packageConfirm protoc_gen_openapiv2/options/annotations_pb2 is present and included in wheels/sdists.
I can add a packaging check to CI that imports all pb2 modules to catch missing vendored deps. Interested?
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (1)
4-4
: Unused import: warningswarnings isn’t used. If this file is hand-edited (not strictly generated), drop it. If purely generated, feel free to ignore.
Apply if applicable:
-import warnings +# (removed unused import)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)
4-4
: Unused import: warningsNot used. Remove if this file is not strictly generated; otherwise safe to ignore.
-import warnings +# (removed unused import)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (31)
CHANGELOG.md
(2 hunks)requirements.txt
(1 hunks)scanoss.json
(2 hunks)setup.cfg
(1 hunks)src/protoc_gen_swagger/options/annotations_pb2.py
(2 hunks)src/protoc_gen_swagger/options/annotations_pb2.pyi
(1 hunks)src/protoc_gen_swagger/options/annotations_pb2_grpc.py
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2.py
(2 hunks)src/protoc_gen_swagger/options/openapiv2_pb2.pyi
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
(1 hunks)src/scanoss/api/common/v2/scanoss_common_pb2.py
(2 hunks)src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py
(7 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py
(1 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py
(9 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
(1 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
(6 hunks)src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2.py
(1 hunks)src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2_grpc.py
(6 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(2 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
(5 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py
(1 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
(5 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py
(1 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py
(6 hunks)src/scanoss/cli.py
(5 hunks)src/scanoss/constants.py
(1 hunks)src/scanoss/file_filters.py
(1 hunks)src/scanoss/scanners/folder_hasher.py
(8 hunks)src/scanoss/scanners/scanner_hfh.py
(7 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/scanoss/scanners/scanner_hfh.py
🚧 Files skipped from review as they are similar to previous changes (19)
- src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
- scanoss.json
- src/protoc_gen_swagger/options/annotations_pb2_grpc.py
- src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
- src/scanoss/api/common/v2/scanoss_common_pb2.py
- src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py
- src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py
- setup.cfg
- src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
- src/protoc_gen_swagger/options/openapiv2_pb2.pyi
- src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
- src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py
- src/protoc_gen_swagger/options/annotations_pb2.pyi
- src/scanoss/constants.py
- src/protoc_gen_swagger/options/annotations_pb2.py
- src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2_grpc.py
- src/scanoss/api/geoprovenance/v2/scanoss_geoprovenance_pb2.py
- src/scanoss/cli.py
- src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/scanners/folder_hasher.py (1)
src/scanoss/scanoss_settings.py (1)
ScanossSettings
(76-337)
🪛 LanguageTool
CHANGELOG.md
[grammar] ~12-~12: There might be a mistake here.
Context: ...ing changes... ## [1.32.0] - 2025-08-13 ### Added - Add --min-cutoff-threshold
arg...
(QB_NEW_EN)
[grammar] ~13-~13: There might be a mistake here.
Context: ...s... ## [1.32.0] - 2025-08-13 ### Added - Add --min-cutoff-threshold
argument to...
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (32)
src/scanoss/file_filters.py (2)
568-571
: Unified extension-skip across modes — LGTMSwitching to a single DEFAULT_SKIPPED_EXT for both HFH and non-HFH paths is clear and reduces config drift. No correctness concerns spotted in this loop.
568-576
: No lingering references to removed HFH extension set foundI searched the entire codebase for
DEFAULT_SKIPPED_EXT_HFH
andDEFAULT_SKIPPED_EXT_LIST
and confirmed there are no matches—everything now consistently usesDEFAULT_SKIPPED_EXT
.src/scanoss/api/components/v2/scanoss_components_pb2.py (2)
32-35
: Descriptor construction via _globals pattern looks correctSwitch to _builder.Build* with a local _globals mirror matches protobuf 6.x generated style. No issues spotted.
36-71
: Options wiring and serialized offsets are consistent with protoc outputAssignments to _loaded_options/_serialized_options and *_serialized_start/end align with modern generator output. Safe to keep as-is.
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (2)
27-27
: OpenAPI v2 options import migration: confirm availabilityEnsure protoc_gen_openapiv2/options/annotations_pb2.py is present (vendored) and packaged.
Use the same discovery script I shared on the Components file to validate presence and packaging manifest entries.
32-35
: _globals-based descriptor build is standard for protobuf 6.xLooks good and consistent with other regenerated modules.
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (2)
9-19
: Protobuf runtime guard requires protobuf>=6.31.0As above, this will fail fast in older environments. Align pins to >=6.31.0.
Run the protobuf pin scan script provided earlier to ensure consistency across files.
32-35
: Descriptor/global wiring matches generator outputNo functional concerns here.
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (3)
40-70
: Client stub now passes _registered_method=True for all RPCsGood: aligns with grpcio>=1.73 method registration semantics.
156-156
: Server registers method handlers explicitlyserver.add_registered_method_handlers(...) is correct for newer grpc runtimes and is guarded by the version check above.
165-191
: Experimental static wrappers updated to pass _registered_method=TrueLooks consistent and correct across all RPCs.
Also applies to: 203-217, 230-245, 257-271, 284-299, 311-325
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (3)
40-55
: Client stub: _registered_method=True on all RPCsGood alignment with grpc 1.73+.
105-105
: Explicit server method registrationserver.add_registered_method_handlers(...) is appropriate and matches the version guard.
125-139
: Experimental wrappers updated consistentlyAll static helpers now pass _registered_method=True. Looks good.
Also applies to: 152-167, 179-194
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (4)
32-35
: Descriptor build via _builder + _globals looks correct and matches modern generator output.This is the expected pattern for protobuf Python 6.x; no issues spotted.
36-43
: C-descriptor conditional options assignment is correct.Using
_descriptor._USE_C_DESCRIPTORS
guard and_loaded_options/_serialized_options
aligns with current generator behavior.
44-65
: Generated serialized ranges — no action needed.These offsets are generator artifacts; nothing to review manually here.
27-27
: No action needed—protoc-gen-openapiv2
is already declared as a dependency• setup.cfg includes
• protoc-gen-openapiv2 on line 34
• requirements.txt includes
• protoc-gen-openapiv2 on line 7Since the needed package providing
protoc_gen_openapiv2.options.annotations_pb2
is already listed, no further changes are required.requirements.txt (1)
7-7
: New OpenAPI v2 dependency is properly integrated.The addition of
protoc-gen-openapiv2
as a dependency is consistent with the OpenAPI v2 migration across the protobuf modules. This supports the transition from Swagger to OpenAPI v2 annotations.src/protoc_gen_swagger/options/openapiv2_pb2.py (4)
3-19
: Enhanced protobuf runtime validation is properly implemented.The addition of runtime version validation provides better protection against protobuf version mismatches. The validation call with specific version requirements (6.31.0) and the "NO CHECKED-IN PROTOBUF GENCODE" header indicate this is properly generated code with runtime checks.
31-33
: Modernized descriptor construction pattern.The transition to using
_globals = globals()
and passing it to the builder functions is a more modern approach for descriptor construction. This pattern allows for better control over the global namespace during module initialization.
34-62
: Improved descriptor usage guard condition.The change from
if _descriptor._USE_C_DESCRIPTORS == False:
toif not _descriptor._USE_C_DESCRIPTORS:
is more Pythonic and follows PEP 8 guidelines for boolean checks.
35-62
: Proper migration to globals-based descriptor options handling.The migration from direct
DESCRIPTOR._options
access to_globals['DESCRIPTOR']._loaded_options
provides better encapsulation and follows the modern protobuf code generation patterns. All descriptor entries are consistently updated with this approach.src/scanoss/scanners/folder_hasher.py (7)
9-9
: Proper constant import for depth control.The import of
DEFAULT_HFH_DEPTH
from constants is consistent with the PR's goal of adding configurable depth support for folder hashing.
75-80
: Enhanced constructor with depth parameter.The addition of the
depth
parameter with proper default value and documentation enhances the class functionality. The parameter is correctly stored as an instance variable for use throughout the class.Also applies to: 87-87, 110-110
133-136
: Improved method signature formatting.The multiline formatting of
_build_root_node
method signature improves readability and follows Python formatting best practices.
193-202
: Well-implemented depth-controlled recursion.The addition of the
current_depth
parameter with proper documentation enables controlled recursion depth. The parameter naming and documentation clearly indicate it's 1-based with root at depth 1.
218-222
: Effective depth limit implementation.The depth limit check
current_depth < self.depth
properly controls recursion, and thechildren
list is correctly populated only when within the depth limit. This prevents infinite recursion and controls the hashing scope as intended.Also applies to: 229-229
273-276
: Verification needed: duplicate key handling infolder_hasher._hash_calc
I didn’t find any existing tests or documentation covering duplicate‐key scenarios or the downstream
content_hash
behavior after the recent change. Please confirm that retaining duplicate file hashes is intentional and verify its impact on your overall hashing logic. You may also want to add explicit tests for these cases.– File to review:
src/scanoss/scanners/folder_hasher.py
(around lines 273–276)
– Missing tests for:
• Duplicate‐key handling in_hash_calc
• Correctness of resultingcontent_hash
when duplicates are present
153-154
: No filename‐length filter present—ignore this concern.All references to
MAXIMUM_FILE_NAME_LENGTH
have been removed and theFileFilters
class does not enforce any filename‐length limits. File filtering is governed by hidden‐file rules, default skip lists, extensions, custom patterns, and file‐size thresholds. No further action required.Likely an incorrect or invalid review comment.
CHANGELOG.md (2)
12-15
: Well-documented feature additions.The changelog entry clearly documents the new CLI arguments for depth and min-cutoff-threshold, providing users with clear information about the new functionality.
656-656
: Correct version reference link added.The version reference link for 1.32.0 is properly formatted and points to the correct GitHub comparison URL.
@@ -2,8 +2,9 @@ requests | |||
crc32c>=2.2 | |||
binaryornot | |||
progress | |||
grpcio>1.42.0 | |||
grpcio>=1.70.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify gRPC version compatibility across the codebase.
The update from grpcio>1.42.0
to grpcio>=1.70.0
is a significant version bump. This change aligns with the protobuf runtime validation added throughout the codebase, but requires verification that all gRPC-related code is compatible with the new minimum version.
🏁 Script executed:
#!/bin/bash
# Search for gRPC version checks and compatibility guards in the codebase
echo "Searching for gRPC version compatibility checks..."
rg -n "GRPC_GENERATED_VERSION|GRPC_VERSION|_version_not_supported" --type=py
echo -e "\nSearching for potential version-specific gRPC code..."
rg -nC3 "grpc\.|grpcio" --type=py -g '!requirements.txt' -g '!setup.cfg'
Length of output: 110731
Ensure grpcio requirement matches generated‐code version checks
The generated gRPC stubs in *.pb2_grpc.py
perform runtime guards against the embedded GRPC_GENERATED_VERSION
(1.73.0 in protoc_gen_swagger
stubs and 1.73.1 in Scanoss API stubs). With requirements.txt
currently set to:
- grpcio>=1.70.0
+ grpcio>=1.73.1
installations of grpcio 1.70.x–1.72.x will trigger a RuntimeError
on import since first_version_is_lower('1.72.0','1.73.1')
is true. To resolve this critical mismatch:
• Update requirements.txt
(line 5) to grpcio>=1.73.1
to satisfy the highest GRPC_GENERATED_VERSION
.
• Optionally, if you prefer to lock to 1.73.0 instead, regenerate the Swagger stubs so their GRPC_GENERATED_VERSION
matches (but note Scanoss API stubs require 1.73.1).
Please confirm that moving to grpcio ≥1.73.1 is acceptable for all environments and downstream consumers.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
grpcio>=1.70.0 | |
grpcio>=1.73.1 |
🤖 Prompt for AI Agents
In requirements.txt around line 5 the grpcio pin is too low (grpcio>=1.70.0)
which will cause runtime import RuntimeError due to generated gRPC stubs
expecting GRPC_GENERATED_VERSION up to 1.73.1; update the requirement on line 5
to grpcio>=1.73.1 to satisfy the highest embedded generated version (or
alternatively regenerate Swagger/Scanoss stubs to match a lower grpcio if you
cannot bump), and verify this bump is acceptable for all environments and
downstream consumers.
from google.protobuf import runtime_version as _runtime_version | ||
from google.protobuf import symbol_database as _symbol_database | ||
from google.protobuf.internal import builder as _builder | ||
_runtime_version.ValidateProtobufRuntimeVersion( | ||
_runtime_version.Domain.PUBLIC, | ||
6, | ||
31, | ||
0, | ||
'', | ||
'scanoss/api/components/v2/scanoss-components.proto' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Import-time Protobuf runtime guard will hard-fail on protobuf<6.31.0
The ValidateProtobufRuntimeVersion check enforces protobuf>=6.31.0 at import time. Ensure all environments (dev/CI/user installs) pin protobuf>=6.31.0 to avoid RuntimeError.
Run this to locate and review pins:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Scanning dependency specs for protobuf pins..."
fd -HI -t f 'pyproject.toml' 'Pipfile' 'poetry.lock' 'requirements*.txt' 'setup.cfg' 'constraints*.txt' || true
echo
rg -n -S '(^|\s)(protobuf|googleapis-common-protos)\b' \
-g 'pyproject.toml' -g 'Pipfile' -g 'poetry.lock' -g 'requirements*.txt' -g 'setup.cfg' -g 'constraints*.txt' || true
Length of output: 844
Require Protobuf ≥ 6.31.0 in all dependency specs
The import-time guard added by ValidateProtobufRuntimeVersion(..., 6, 31, 0, …)
will immediately throw a RuntimeError if the installed protobuf
is older than 6.31.0. Our current pins only demand protobuf>3.19.1
, which still permits versions in the 4.x and 5.x series—and those will fail at import. To prevent CI and user installs from breaking, please update every spec that mentions protobuf
to require at least 6.31.0 (and optionally cap at <7.0.0
):
• requirements.txt (line 6)
• setup.cfg (line 33, under [options.install_requires]
)
• tests/data/requirements.txt (line 6)
Suggested diff for each occurrence:
- protobuf>3.19.1
+ protobuf>=6.31.0,<7.0.0
If you use other constraint files or Pipfile, apply the same change there as well.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/scanoss/api/components/v2/scanoss_components_pb2.py around lines 9 to 19,
the generated module enforces Protobuf >= 6.31.0 at import time, but our
dependency pins still allow older 4.x/5.x versions; update every place we
declare protobuf to require at least 6.31.0 (and optionally <7.0.0) so
installs/CI won't break: change requirements.txt (line 6), setup.cfg (line 33
under [options.install_requires]), and tests/data/requirements.txt (line 6) to
use protobuf>=6.31.0 (or protobuf>=6.31.0,<7.0.0); also search and update any
other constraint/Pipfile/lock files to the same version requirement.
# @@protoc_insertion_point(imports) | ||
|
||
_sym_db = _symbol_database.Default() | ||
|
||
|
||
from scanoss.api.common.v2 import scanoss_common_pb2 as scanoss_dot_api_dot_common_dot_v2_dot_scanoss__common__pb2 | ||
from google.api import annotations_pb2 as google_dot_api_dot_annotations__pb2 | ||
from protoc_gen_swagger.options import annotations_pb2 as protoc__gen__swagger_dot_options_dot_annotations__pb2 | ||
|
||
from protoc_gen_openapiv2.options import annotations_pb2 as protoc__gen__openapiv2_dot_options_dot_annotations__pb2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Swagger→OpenAPI v2 options import: verify module availability at runtime
The import now requires protoc_gen_openapiv2.options.annotations_pb2. Confirm this module is vendored in the repo and included in packaging, or otherwise available on PYTHONPATH at runtime.
Run:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Looking for protoc_gen_openapiv2 annotations module..."
fd -HI 'protoc_gen_openapiv2' || true
fd -HI -t f 'annotations_pb2.py' | rg -n 'protoc_gen_openapiv2/options/annotations_pb2.py' || true
Length of output: 264
Add and Package the protoc_gen_openapiv2 Annotations Module
At present, protoc_gen_openapiv2/options/annotations_pb2.py
is not vendored into our repo (no files under a protoc_gen_openapiv2
directory were found), so the import in src/scanoss/api/components/v2/scanoss_components_pb2.py
will fail at runtime. You must:
- Verify that the third-party package providing
protoc_gen_openapiv2.options.annotations_pb2
is declared as a dependency (e.g., insetup.py
,pyproject.toml
, orrequirements.txt
). - If you intend to vendor the proto‐generated code instead of relying on an external package, add the
protoc_gen_openapiv2/options/annotations_pb2.py
file (and its parent package) under version control and include it in your package manifest. - Ensure that whichever approach you choose, the module is available on
PYTHONPATH
when consumers install or vendor this library.
Failure to address this will lead to ModuleNotFoundError
at runtime when importing the Swagger→OpenAPI v2 annotations.
🤖 Prompt for AI Agents
In src/scanoss/api/components/v2/scanoss_components_pb2.py at line 27 the import
from protoc_gen_openapiv2.options.annotations_pb2 will fail because that package
is not vendored or declared as a dependency; fix by either (A) declaring the
provider package in your project packaging (add it to requirements.txt,
pyproject.toml[tool.poetry.dependencies] or setup.py install_requires so it is
installed into PYTHONPATH), or (B) vendor the generated module: add a
protoc_gen_openapiv2 package directory into the repo containing
options/annotations_pb2.py plus __init__.py files, update
MANIFEST.in/package_data or pyproject include settings so it is packaged, and
adjust any relative imports if necessary so import works at runtime.
GRPC_GENERATED_VERSION = '1.73.1' | ||
GRPC_VERSION = grpc.__version__ | ||
_version_not_supported = False | ||
|
||
try: | ||
from grpc._utilities import first_version_is_lower | ||
_version_not_supported = first_version_is_lower(GRPC_VERSION, GRPC_GENERATED_VERSION) | ||
except ImportError: | ||
_version_not_supported = True | ||
|
||
if _version_not_supported: | ||
raise RuntimeError( | ||
f'The grpc package installed is at version {GRPC_VERSION},' | ||
+ f' but the generated code in scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py depends on' | ||
+ f' grpcio>={GRPC_GENERATED_VERSION}.' | ||
+ f' Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION}' | ||
+ f' or downgrade your generated code using grpcio-tools<={GRPC_VERSION}.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add an import smoke test to CI to catch version-mismatch early
Given multiple import-time guards, add a quick test that imports all *_pb2.py and *_pb2_grpc.py modules under src/ to fail fast when pins drift.
Example script:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Smoke-importing generated protobuf/grpc modules..."
fd -HI -t f '_pb2(_grpc)?\.py$' src | sed 's#^src/##; s#\.py$##; s#/#.#g' | while read -r mod; do
python - <<PY
import importlib, sys
mod = "${mod}"
try:
importlib.import_module(mod)
print(f"OK: {mod}")
except Exception as e:
print(f"FAIL: {mod}: {e}", file=sys.stderr)
raise
PY
done
Length of output: 1453
Ensure CI smoke-test imports all generated protobuf/grpc modules and installs required plugins
The import smoke test uncovered a missing dependency—protoc_gen_swagger.options.annotations_pb2
fails to import—indicating that the CI environment must install the Swagger plugin that your generated code depends on. To catch version mismatches and missing plugins early:
- Update your CI setup to install the protoc-gen-swagger plugin (so that
import protoc_gen_swagger
succeeds) before running the smoke-test. - Add a dedicated CI job that recursively imports every
_pb2.py
and_pb2_grpc.py
undersrc/
and fails if any import errors occur.
Example CI script (bash):
#!/bin/bash
set -euo pipefail
echo "Installing protobuf & gRPC codegen dependencies…"
# Ensure grpcio-tools for Python stubs and the Swagger plugin are available
pip install grpcio-tools protoc-gen-swagger
echo "Running smoke-import on generated protobuf/grpc modules..."
fd -HI -t f '_pb2(_grpc)?\.py$' src \
| sed 's#^src/##; s#\.py$##; s#/#.#g' \
| while read -r mod; do
python - <<PY
import importlib, sys
try:
importlib.import_module("${mod}")
echo = sys.stdout.write
echo(f"OK: {mod}\n")
except Exception as e:
sys.stderr.write(f"FAIL: {mod}: {e}\n")
sys.exit(1)
PY
done
• CI pipeline: install grpcio-tools
and protoc-gen-swagger
(or your project’s codegen plugins)
• New CI job: smoke-import all *_pb2.py
/*_pb2_grpc.py
modules under src/
and fail on any import error
🤖 Prompt for AI Agents
In src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py around lines
9-26 the CI import smoke-test revealed a missing dependency
(protoc_gen_swagger.options.annotations_pb2) causing imports to fail; update CI
to install the required codegen plugins (at least grpcio-tools and
protoc-gen-swagger or your project’s equivalent) before running tests and add a
dedicated smoke-import job that recursively finds all _pb2.py and _pb2_grpc.py
under src/ and attempts to import each module, failing the job on any import
error so missing plugins or version mismatches are detected early.
GRPC_GENERATED_VERSION = '1.73.1' | ||
GRPC_VERSION = grpc.__version__ | ||
_version_not_supported = False | ||
|
||
try: | ||
from grpc._utilities import first_version_is_lower | ||
_version_not_supported = first_version_is_lower(GRPC_VERSION, GRPC_GENERATED_VERSION) | ||
except ImportError: | ||
_version_not_supported = True | ||
|
||
if _version_not_supported: | ||
raise RuntimeError( | ||
f'The grpc package installed is at version {GRPC_VERSION},' | ||
+ f' but the generated code in scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py depends on' | ||
+ f' grpcio>={GRPC_GENERATED_VERSION}.' | ||
+ f' Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION}' | ||
+ f' or downgrade your generated code using grpcio-tools<={GRPC_VERSION}.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Repo-wide consistency checks for generated grpc modules
Consider verifying that all *_pb2_grpc.py share the same GRPC_GENERATED_VERSION and that all imports succeed in CI.
Run:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Checking GRPC_GENERATED_VERSION consistency..."
rg -n --glob 'src/**/_pb2_grpc.py' 'GRPC_GENERATED_VERSION\s*=\s*' | sort -u
echo
echo "Enumerating generated grpc modules for import smoke tests (pair with earlier script)..."
fd -HI -t f '_pb2_grpc\.py$' src
Length of output: 350
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
echo "Locating all *_pb2_grpc.py files:"
find . -type f -name '*_pb2_grpc.py' || true
echo
printf "Checking GRPC_GENERATED_VERSION in each file:\n"
for f in $(find . -type f -name '*_pb2_grpc.py'); do
version_line=$(grep -En "^GRPC_GENERATED_VERSION\\s*=\\s*'.+'" "$f" || true)
if [ -n "$version_line" ]; then
echo "$f: $(echo "$version_line" | sed -E \"s/^[0-9]+:/Line &/\")"
else
echo "$f: No GRPC_GENERATED_VERSION found"
fi
done
Length of output: 7557
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
echo "GRPC_GENERATED_VERSION occurrences across *_pb2_grpc.py files:"
grep -Rnh "^GRPC_GENERATED_VERSION" src | sort
Length of output: 564
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
echo "Listing GRPC_GENERATED_VERSION occurrences with file names:"
grep -R -n "^GRPC_GENERATED_VERSION" src | sort
Length of output: 1161
Ensure consistent GRPC_GENERATED_VERSION across all generated gRPC modules
Our audit found two different GRPC_GENERATED_VERSION
values in your *_pb2_grpc.py
files:
src/protoc_gen_swagger/options/annotations_pb2_grpc.py:7
– currently1.73.0
src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py:7
– currently1.73.0
- All other
src/scanoss/api/**/v2/*_pb2_grpc.py
files use1.73.1
To prevent version mismatches at runtime, please align these values. For example:
-
Update line 7 of
src/protoc_gen_swagger/options/annotations_pb2_grpc.py
src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
from
GRPC_GENERATED_VERSION = '1.73.0'
to
- GRPC_GENERATED_VERSION = '1.73.0' + GRPC_GENERATED_VERSION = '1.73.1'
Additionally, consider adding a CI smoke test that dynamically imports every *_pb2_grpc.py
under src/
to catch import errors early.
🤖 Prompt for AI Agents
In src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py around lines
9-26 the GRPC_GENERATED_VERSION is '1.73.1' while other generated modules use
'1.73.0'; update the GRPC_GENERATED_VERSION in
src/protoc_gen_swagger/options/annotations_pb2_grpc.py (line ~7) and
src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py (line ~7) to '1.73.1' so
all *_pb2_grpc.py files match, then run a quick import of the changed modules to
verify no runtime import errors and add a CI smoke test that dynamically imports
every *_pb2_grpc.py under src/ to prevent future mismatches.
@@ -1,55 +1,66 @@ | |||
# -*- coding: utf-8 -*- | |||
# Generated by the protocol buffer compiler. DO NOT EDIT! | |||
# NO CHECKED-IN PROTOBUF GENCODE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Clarify policy on checking in generated protobuf code.
This file is clearly generated, yet it contains the marker “NO CHECKED-IN PROTOBUF GENCODE”. If the project intentionally vendors pb2 code (common for CLI/libs), please confirm and document it to avoid confusion for future contributors and automated linters.
To keep the repo consistent when regenerating, you can verify all checked-in pb2 files were produced by the same generator version:
🏁 Script executed:
#!/bin/bash
# Verify all checked-in pb2 files were generated by Protobuf Python 6.31.0
rg -n --glob 'src/**/*.py' -P '^\s*# Protobuf Python Version:\s*6\.31\.0' -C0
echo
echo "Files without the expected generator version:"
fd --extension py src | rg -nP --invert-match 'scanoss_dependencies_pb2\.py$' | xargs -I{} sh -c "grep -q 'Protobuf Python Version: 6.31.0' '{}' || echo '{}'"
Length of output: 1281
Clarify and document the project’s protobuf-vendor policy
All of the checked-in *_pb2.py
files—including scanoss_dependencies_pb2.py
—carry the version header
# Protobuf Python Version: 6.31.0
so it’s clear they’re intentionally committed, not omitted. The existing “NO CHECKED-IN PROTOBUF GENCODE” comment is now misleading and should be replaced with a short note that these files are generated by Protobuf 6.31.0 and are tracked in source.
Please update accordingly:
- src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py, line 3: remove or replace
- # NO CHECKED-IN PROTOBUF GENCODE + # Generated by Protobuf Python 6.31.0 – checked into source per project policy
- Apply the same header change to every
*_pb2.py
undersrc/
. - Add a “Protobuf code generation” section to README or CONTRIBUTING:
- State that
.proto
→_pb2.py
is vendor-tracked - Specify the required generator version (6.31.0)
- Provide the regeneration command, e.g.:
protoc \ --python_out=src/ \ --proto_path=proto/ \ proto/**/*.proto
- State that
This will prevent confusion for future contributors and linters.
🤖 Prompt for AI Agents
In src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py around line 3 the
comment "# NO CHECKED-IN PROTOBUF GENCODE" is misleading because these files are
vendor-tracked; replace that line with a short header stating "Generated by
Protobuf Python Version: 6.31.0 and committed to source control" (or similar)
and ensure the existing version header remains; then apply the same replacement
to every *_pb2.py under src/; finally add a "Protobuf code generation" section
to the project README or CONTRIBUTING that states .proto → _pb2.py files are
vendor-tracked, specifies the required generator version 6.31.0, and includes
the regeneration command using protoc with --python_out=src/ and
--proto_path=proto/ for proto/**/*.proto.
# Protobuf Python Version: 6.31.0 | ||
"""Generated protocol buffer code.""" | ||
from google.protobuf.internal import builder as _builder | ||
from google.protobuf import descriptor as _descriptor | ||
from google.protobuf import descriptor_pool as _descriptor_pool | ||
from google.protobuf import runtime_version as _runtime_version | ||
from google.protobuf import symbol_database as _symbol_database | ||
from google.protobuf.internal import builder as _builder | ||
_runtime_version.ValidateProtobufRuntimeVersion( | ||
_runtime_version.Domain.PUBLIC, | ||
6, | ||
31, | ||
0, | ||
'', | ||
'scanoss/api/dependencies/v2/scanoss-dependencies.proto' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Protobuf runtime guard added — make sure packaging pins a compatible version to prevent import-time crashes.
ValidateProtobufRuntimeVersion(PUBLIC, 6, 31, 0, ...)
will raise at import time if protobuf
< 6.31.0 is installed. Please ensure your packaging explicitly requires a compatible range (e.g., protobuf >= 6.31.0, < 7) to avoid breaking downstream users.
Run this to locate and review where protobuf
is pinned:
Optionally add/adjust (outside this file) in your packaging config:
# pyproject.toml (example)
[project]
dependencies = [
"protobuf>=6.31.0,<7",
"googleapis-common-protos>=1.63.0", # ensure annotations_pb2 is present
# add the package providing protoc_gen_openapiv2.options (see next comment)
]
🏁 Script executed:
#!/bin/bash
# Locate dependency declarations for protobuf and related deps
fd -a 'pyproject.toml' 'setup.*' 'requirements*.txt' 'constraints*.txt' || true
echo
rg -n -C2 -iP '\bprotobuf\b|googleapis-common-protos|protoc[-_]gen[-_]openapiv2' || true
Length of output: 107622
Pin Protobuf Runtime to ≥6.31.0,<7 in Packaging
The new ValidateProtobufRuntimeVersion(PUBLIC, 6,31,0,…)
guard will cause import-time failures if users install any protobuf
version below 6.31.0. Our current packaging only specifies protobuf>3.19.1
, which still allows 4.x, 5.x, and early 6.x releases. Please update your dependency declarations as follows:
• requirements.txt (line 6):
- protobuf>3.19.1
+ protobuf>=6.31.0,<7
• setup.cfg (line 33):
- protobuf>3.19.1
+ protobuf>=6.31.0,<7
Additionally, since the generated code imports google.api.annotations_pb2
, ensure you have:
# requirements.txt or setup.cfg
googleapis-common-protos>=1.63.0
And verify that the package providing protoc_gen_openapiv2.options
(e.g. protoc-gen-openapiv2
) is correctly listed. These changes will prevent downstream import-time errors and ensure the runtime guard functions as intended.
🤖 Prompt for AI Agents
In src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py around lines 5 to
19, the generated runtime guard requires protobuf >=6.31.0 which our packaging
currently does not guarantee; update requirements.txt and setup.cfg to pin
protobuf to ">=6.31.0,<7" (replace the existing protobuf entry), add
"googleapis-common-protos>=1.63.0" to both to satisfy google.api.annotations_pb2
import, and ensure the package that provides protoc_gen_openapiv2.options (e.g.
protoc-gen-openapiv2) is declared in dependencies so downstream installs include
it.
from google.protobuf import runtime_version as _runtime_version | ||
from google.protobuf import symbol_database as _symbol_database | ||
from google.protobuf.internal import builder as _builder | ||
_runtime_version.ValidateProtobufRuntimeVersion( | ||
_runtime_version.Domain.PUBLIC, | ||
6, | ||
31, | ||
0, | ||
'', | ||
'scanoss/api/vulnerabilities/v2/scanoss-vulnerabilities.proto' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Import-time Protobuf runtime guard requires protobuf>=6.31.0
Same import-time enforcement here. Please ensure pins across all envs to prevent RuntimeError on import.
Run:
I can open a follow-up PR to align dependency pins if you prefer.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -S '\bprotobuf\b' -g 'requirements*.txt' -g 'constraints*.txt' -g 'pyproject.toml' -g 'setup.cfg' -g 'Pipfile' -g 'poetry.lock' || true
Length of output: 261
Update Protobuf Minimum Version Constraint
The import‐time guard in scanoss_vulnerabilities_pb2.py
enforces protobuf ≥ 6.31.0 but our pins only require > 3.19.1. This mismatch will trigger a RuntimeError
on import whenever protobuf is below 6.31.0. Please update all dependency files to pin protobuf to at least 6.31.0:
– In setup.cfg (around line 33)
Change
protobuf>3.19.1
to
protobuf>=6.31.0
– In requirements.txt (line 6)
Change
protobuf>3.19.1
to
protobuf>=6.31.0
– In tests/data/requirements.txt (line 6)
Change
protobuf>3.19.1
to
protobuf>=6.31.0
If you’d like, I can open a follow-up PR to align these pins across all environments.
🤖 Prompt for AI Agents
In src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py around
lines 9 to 19 the generated import-time guard requires protobuf >= 6.31.0 but
project dependency pins currently allow >3.19.1; update dependency pins to match
by editing setup.cfg (around line 33) change protobuf>3.19.1 to
protobuf>=6.31.0, update requirements.txt (line 6) change protobuf>3.19.1 to
protobuf>=6.31.0, and update tests/data/requirements.txt (line 6) change
protobuf>3.19.1 to protobuf>=6.31.0 so runtime guard and package pins are
consistent.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
src/scanoss/file_filters.py (5)
553-571
: Use set() instead of {} for an empty extension collection in HFH mode{} creates an empty dict, not a set. Iterating an empty dict works but is misleading, risks future misuse, and may trigger static-analysis complaints. Prefer set() for clarity and type correctness.
Apply this diff:
- DEFAULT_SKIPPED_EXT_LIST = {} if self.is_folder_hashing_scan else DEFAULT_SKIPPED_EXT + DEFAULT_SKIPPED_EXT_LIST = set() if self.is_folder_hashing_scan else DEFAULT_SKIPPED_EXT
525-536
: Align directory-skip logic with HFH mode (use HFH-specific dir sets consistently)should_skip_dir() always checks DEFAULT_SKIPPED_DIRS/DEFAULT_SKIPPED_DIR_EXT, ignoring HFH variants. While current HFH/non-HFH dir sets appear identical here, this asymmetry is brittle and can cause subtle bugs if the sets diverge later. Mirror the pattern used in _get_operation_patterns().
Proposed refactor:
if self.all_folders: return False dir_name_lower = dir_name.lower() - if dir_name_lower in DEFAULT_SKIPPED_DIRS: + DEFAULT_SKIPPED_DIR_LIST = DEFAULT_SKIPPED_DIRS_HFH if self.is_folder_hashing_scan else DEFAULT_SKIPPED_DIRS + if dir_name_lower in DEFAULT_SKIPPED_DIR_LIST: self.print_debug(f'Skipping directory: {dir_rel_path} (matches default skip directory)') return True if self.skip_folders and dir_name in self.skip_folders: self.print_debug(f'Skipping directory: {dir_rel_path} (matches skip folder)') return True - for ext in DEFAULT_SKIPPED_DIR_EXT: + DEFAULT_SKIPPED_DIR_EXT_LIST = ( + DEFAULT_SKIPPED_DIR_EXT_HFH if self.is_folder_hashing_scan else DEFAULT_SKIPPED_DIR_EXT + ) + for ext in DEFAULT_SKIPPED_DIR_EXT_LIST: if dir_name_lower.endswith(ext): self.print_debug(f'Skipping directory: {dir_rel_path} (matches default skip extension: {ext})') return TrueAlso applies to: 533-536
367-389
: Normalize types passed to _should_skip_file and pathspec matcherrel_path can be a Path (when scan_root is provided) or a str (otherwise). Pass a string consistently to avoid surprises with libraries that expect str, and keep behavior uniform across code paths.
Apply this diff:
- if self._should_skip_file(rel_path): + if self._should_skip_file(str(rel_path)): continueAnd:
- if self.file_folder_pat_spec and self.file_folder_pat_spec.match_file(file_rel_path): + if self.file_folder_pat_spec and self.file_folder_pat_spec.match_file(str(file_rel_path)): self.print_debug(f'Skipping file: {file_rel_path} (matches custom pattern)') return TrueAlso applies to: 579-581
372-376
: Fix misleading debug message for relative_to() failureValueError from Path.relative_to(scan_root) doesn’t imply a broken symlink; it typically means the file isn’t under scan_root. Adjust the message to avoid confusion during troubleshooting.
- except ValueError: - self.print_debug(f'Ignoring file: {file_path} (broken symlink)') + except ValueError: + self.print_debug(f'Ignoring file not under scan root: {file_path}') continue
553-571
: HFH behavior change: no default extension-based skips — confirm intent and guard performanceBy setting DEFAULT_SKIPPED_EXT_LIST to empty for HFH, binary and non-source files (e.g., .pdf, .png, .docx, .min.js) will be included. This is likely intentional for folder hashing uniformity, but it may materially impact performance on large repos.
Suggested follow-ups (pick any):
- If needed, cap very large binaries in HFH via skip_size or size rules for the 'fingerprinting' operation type, documented for users.
- Emit a one-time debug/trace message in HFH mode clarifying that extension-based skips are disabled to avoid confusion.
- Consider a CLI switch to re-enable legacy HFH extension skips for edge cases.
Would you like me to draft a small benchmark harness (pure Python, no repo exec) to estimate the delta on representative folder trees?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/scanoss/file_filters.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
Summary by CodeRabbit
New Features
Refactor
Documentation
Chores