torchpipe

Accelerated Pytorch Serving through Multithreading

Torchpipe is a multi-instance pipeline parallel library that acts as a bridge between lower-level acceleration libraries (such as TensorRT, OpenCV, TorchScript) and RPC frameworks (like Thrift, gRPC), ensuring a strict decoupling from them. It offers a thread-safe function interface for the PyTorch frontend at a higher level, while empowering users with fine-grained backend extension capabilities at a lower level.

Notes

Use the latest tag and corresponding release.
The main branch is used for releasing version updates, while the develop branch is used for code submission and daily development.

Quick Start

1. Installation

See Installation.

2. Get appropriate model file (currently supports ONNX, TensorRT engine, etc.).

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True).eval().cuda()

import tempfile, os, torch
model_path =  os.path.join(tempfile.gettempdir(), "./resnet18.onnx") 
resnet18 = models.resnet18(pretrained=True).eval().cuda()
data_bchw = torch.rand((1, 3, 224, 224)).cuda()
print("export: ", model_path)
torch.onnx.export(resnet18, data_bchw, model_path,
                  opset_version=17,
                  do_constant_folding=True,
                  input_names=["in"], output_names=["out"],dynamic_axes={"in": {0: "x"},"out": {0: "x"}})

# os.system(f"onnxsim {model_path} {model_path}")

3. Now you can perform concurrent calls to a single model.

import torch, torchpipe
model = torchpipe.pipe({'model': model_path,
                        'backend': "Sequential[cvtColorTensor,TensorrtTensor,SyncTensor]", # Backend engine, see backend API reference documentation
                        'instance_num': 2, 'batching_timeout': '5', # Number of instances and timeout time
                        'max': 4, # Maximum value of the model optimization range, which can also be '4x3x224x224'
                        'mean': '123.675, 116.28, 103.53', # 255*"0.485, 0.456, 0.406"
                        'std': '58.395, 57.120, 57.375', # Fusion into TensorRT network
                        'color': 'rgb'}) # Parameters for cvtColorTensor backend: target color space order
data = torch.zeros((1, 3, 224, 224)) # or torch.from_numpy(...)
input = {"data": data, 'color': 'bgr'}
model(input)  # Can be called in parallel with multiple threads
# Use "result" as the data output identifier; of course, other key values can also be custom written
print(input["result"].shape)  # If failed, this key value must not exist, even if it already exists when input.

c++ API is also possible through [libtorch+cmake] or [pybind11].

4. Our core functionality is a series of pipeline facilities

For more information, please visit the Torchpipe documentation.

5. Roadmap

torchpie is currently in a rapid iteration phase, and we greatly appreciate your help. We prioritize content over the contribution format. Feel free to provide feedback through issues or merge requests. Check out our Contribution Guidelines.

Our ultimate goal is to make high-throughput deployment on the server side as simple as possible. To achieve this, we actively iterate and are willing to collaborate with other projects with similar goals.

RoadMap for 2023 and 2024:

Technical reports
Examples of large models
Optimization of the compilation system, divided into modules such as core, pplcv, model/tensorrt, opencv, etc.
Optimization of the basic structure, including Python and C++ interaction, exception handling, logging system, compilation system, and cross-process backend optimization.

Potential research directions that have not been completed:

Single-node scheduling and multi-node scheduling backends, which have no essential difference from the computing backend, need to be decoupled more towards users. We want to optimize this part as part of the user API.
Debugging tools for multi-node scheduling. Since stack simulation design is used in multi-node scheduling, it is relatively easy to design node-level debugging tools.
Load balancing.

6. Acknowledgements

Our codebase is built using multiple opensource contributions, please see ACKNOWLEDGEMENTS for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

README_en.md

README_en.md

torchpipe

Accelerated Pytorch Serving through Multithreading

Notes

Quick Start

1. Installation

2. Get appropriate model file (currently supports ONNX, TensorRT engine, etc.).

3. Now you can perform concurrent calls to a single model.

4. Our core functionality is a series of pipeline facilities

5. Roadmap

6. Acknowledgements

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

torchpipe

Accelerated Pytorch Serving through Multithreading

Notes

Quick Start

1. Installation

2. Get appropriate model file (currently supports ONNX, TensorRT engine, etc.).

3. Now you can perform concurrent calls to a single model.

4. Our core functionality is a series of pipeline facilities

5. Roadmap

6. Acknowledgements