Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][Frontend] Add a OneFlow Frontend #24

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
134 changes: 134 additions & 0 deletions rfcs/0024-add-oneflow-frontend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
- Feature Name: (`add oneflow frontend`)
- Start Date: (2021-8-20)
- RFC PR: [apache/tvm-rfcs#0024](https://github.com/apache/tvm-rfcs/pull/24)
- GitHub Issue: [apache/tvm#8804](https://github.com/apache/tvm/issues/8804)

# Summary
[summary]: #summary

To enhance the compatibility of TVM with deep learning frameworks,
we have created a frontend for TVM that targets [oneflow](https://github.com/Oneflow-Inc/oneflow)

# Motivation
[motivation]: #motivation

OneFlow, an open source deep learning framework with whole new frame design and the world's leading technology for distributed system. Here are advantages of OneFlow:

- Perfectly support container platforms(k8s & docker)
- Handle large models easily
- Almost zero runtime overhead & linear speedup
- Support automatic mixed precision
- ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a placeholder? We should fill out the rest of the motivations or remove the ellipses.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,I will delete it。

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this isn't an issue that is covered in the scope of this RFC, but I'm wondering how independent the importers are from one another, and if it's worthwhile to look at a more general framework for third-party importers.

Are we aiming to support all operators? What does the mapping from OneFlow operators to TVM look like? Will there be gaps that a user will have to be concerned with? What is the plain for maintenance of the new format to import?

Yes, we will support all commonly used operators. OneFlow will release version 1.0 soon, and our operator interface is aligned with the Pytorch API.


We are proud that OneFlow can support basic CNNs as well as very large pre-trained models, such as [GPT3, BERT, etc](https://github.com/Oneflow-Inc/OneFlow-Benchmark/tree/master/LanguageModeling). They are built using an early version of OneFlow, based on lazy mode.

Currently, oneflow(nightly) supports the conversion of eager models to lazy graphs. We can quickly convert the eager model built by OneFlow to a lazy graph with the following code.

```python
import oneflow as flow


class Graph(flow.nn.Graph):
def __init__(self, module):
super().__init__()
self.m = module

def build(self, x):
out = self.m(x)
return out


# module: eager model built by OneFlow, align with Pytorch's python api.
graph = Graph(module)
```

Because of these features we wrote this `from_oneflow` which is based on lazy mode.

We also note that many developers have converted their models to ONNX format for compatibility with TVM, and this part of the work is ongoing, as you can see [here](https://github.com/Oneflow-Inc/oneflow_convert_tools).

Based on this background, we proposed this RFC to add a OneFlow frontend for TVM, improving usability for OneFlow users and enhancing the compatibility between OneFlow and TVM.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

We use a simple API to help users convert oneflow to tvm relay.

```python
relay.frontend.from_oneflow(graph, model_dir_path, freeze_params=True, user_input=None)
```

- graph: flow.nn.Graph, contains information about the nodes and edges of the model
- model_dir_path: str, path of parameters
- freeze_params: bool, if this parameter is False, then the user can specify the input of the graph
- user_input: dict, information about the specified input of the model

> NOTES:
> We prefer to let the user change the model node information by changing the model itself

The following codes will show how to convert a oneflow model to a tvm relay

```python
import tvm
import tvm.relay as relay


# load eager model(assuming that resnet50 has been built)
res50_module = resnet50()
pretrain_models = flow.load(model_path)
res50_module.load_state_dict(pretrain_models)
Comment on lines +76 to +78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does OneFlow support any input model from a file? If so, I'd like also to suggest that you also cover tvmc on your plans/implementation at some point.

Example implementation - https://github.com/apache/tvm/blob/e691c7f83892d7242d0992c78cec1e2f8953a9e3/python/tvm/driver/tvmc/frontends.py#L174-L198

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions! We have added this task to Future possibilities.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestions! We can't support this feature yet, but it is in our future development plan. After OneFlow supports this feature, we will add it to tvmc .

res50_module.eval().to("cuda")

# load test image
image = load_image(image_path)
image_flow = flow.Tensor(image, device=flow.device("cuda"))

# use Graph convert eager to lazy
res50_graph = Graph(res50_module)
_ = res50_graph._compile(image_flow)

# get tvm relay
mod, params = relay.frontend.from_oneflow(res50_graph, model_path)
```

If user writes a model that contains an op that is not yet supported, the program will report an error.

More demos could be seen at [this](https://github.com/Oneflow-Inc/oneflow_convert_tools/tree/tvm_oneflow/oneflow_tvm).

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Since the purpose of this RFC is only to add a new front-end for converting the OneFlow model to TVM Relay IR, other functions will not be affected.

In this proposed RFC, the whole process of OneFlow frontend conversion can be divided into 2 steps:

1. Parse Graph: Read the node information contained in the graph, extract the size, data type of each node and construct the computational graph by each computational node (in oneflow, these nodes are marked as user_conf). At the same time, we get the data information of the parameter nodes (in oneflow, these nodes are labeled as variable_conf) according to the paths where the parameters are provided by the user.
2. Convert operators: After the exported inference graph is loaded, we will extract its parameters and convert operators one by one. The order of all operator conversions will be based on the flow of the graph.

# Drawbacks
[drawbacks]: #drawbacks

Potential increase in time-cost of unit tests.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The frontend of OneFlow is similar to ONNX. For now, we consider only supporting conversion via `flow.nn.Graph`, as this API will be rolled out and recommended starting with OneFlow 0.5.0. In the meantime, we will develop the latest OneFlow eager model to ONNX format conversion script based on the existing OneFlow lazy model conversion to ONNX format to make it easy to work with TVM.

# Prior art
[prior-art]: #prior-art

It's the first time we have added a OneFlow frontend to an ML compiler.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

We will add new unit test cases that rely on OneFlow framework, and this may increase time-cost of unit tests. If there are any problems, please let me know.

# Future possibilities
[future-possibilities]: #future-possibilities

For this RFC, we have made an established plan,

- cover `tvmc`, like [this](https://github.com/apache/tvm/blob/e691c7f83892d7242d0992c78cec1e2f8953a9e3/python/tvm/driver/tvmc/frontends.py#L174-L198)
- Support OneFlow Eager to ONNX
- Some operators will be supported in this quarter