You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We regularly observe the effects of changes to red-knot in the diagnostics emitted when running our (performance) benchmark against tomllib, and sometimes catch issues by seeing new false positives. But a) tomllib is one small codebase, not necessarily representative, and b) the ergonomics of catching these issues in the performance benchmark is poor.
We should select some larger real-world codebases and implement tooling to run red-knot on them and snapshot the diagnostics emitted, such that we can see (and easily update) this snapshot when we make changes to red-knot.
We should run this in CI to ensure our snapshot stays up to date.
(This is very similar to the ruff ecosystem check, and we can likely reuse some of that infrastructure?)
Bonus: if we implement #15696, we can also track changes to the prevalence of Todo types over time.
The text was updated successfully, but these errors were encountered:
Just noting down some thoughts while working on this:
It would help to have a concise (or structured) diagnostics diagnostics format. mypy_primer uses a simple line-based diff (without context) and it turns out that diffs between two rich-diagnostics red-knot outputs are … not great
While selecting some projects from mypy_primer, I noted down some issues that appear very frequently across multiple codebases. If we want to work on reducing false positives diagnostics, those would likely be the candidates with the biggest impact:
lint:unresolved-import and subsequent lint:unresolved-attribute due to *-imports from e.g. collections.abc or asyncio => [red-knot] support * imports #14169
lint:unused-ignore-comment => Some of these could be a signal for false negatives? Others could just be an indication that we might need a feature to silence those unused generic# type: ignore comments
Description
We regularly observe the effects of changes to red-knot in the diagnostics emitted when running our (performance) benchmark against tomllib, and sometimes catch issues by seeing new false positives. But a) tomllib is one small codebase, not necessarily representative, and b) the ergonomics of catching these issues in the performance benchmark is poor.
We should select some larger real-world codebases and implement tooling to run red-knot on them and snapshot the diagnostics emitted, such that we can see (and easily update) this snapshot when we make changes to red-knot.
We should run this in CI to ensure our snapshot stays up to date.
(This is very similar to the ruff ecosystem check, and we can likely reuse some of that infrastructure?)
Bonus: if we implement #15696, we can also track changes to the prevalence of Todo types over time.
The text was updated successfully, but these errors were encountered: