Skip to content

Conversation

zewenli98
Copy link
Collaborator

Description

TensorRT 10.14 will add an argument trt.SerializationFlag.INCLUDE_REFIT to allow refitted engines to keep refittable. Based on the capability, this PR enhances the existing engine caching and refitting features as follows:

  1. To save hard disk space, engine caching will only save weight-stripped engines on disk regardless of compilation_settings.strip_engine_weights. Then, when users pull out the cached engine, it will be automatically refitted and kept refittable.
  2. Compiled TRT modules can be refitted multiple times with refit_module_weights(). e.g.:
for _ in range(3):
    trt_gm = refit_module_weights(trt_gm, exp_program)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@zewenli98 zewenli98 self-assigned this Aug 20, 2025
@meta-cla meta-cla bot added the cla signed label Aug 20, 2025
@zewenli98 zewenli98 marked this pull request as draft August 20, 2025 20:04
@github-actions github-actions bot added component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: torch_compile labels Aug 20, 2025
@github-actions github-actions bot requested a review from narendasan August 20, 2025 20:05
@zewenli98
Copy link
Collaborator Author

TODO:

  1. Consider if turning on engine caching by default.
  2. Consider which arguments should be put into _SETTINGS_TO_BE_ENGINE_INVARIANT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests component: torch_compile
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant