Skip to content

Conversation

jtoman
Copy link

@jtoman jtoman commented Jul 22, 2025

What I did

This commit contains two main changes.

The first are two new output modes: the symbol_map and symbol_map_runtime. These contain both contain the raw "symbol map" produced during the respective assembly of the deployment and runtime code into bytecode. This information is crucial for determining the exact starting position of internal functions in the assembled bytecode.

The second component is a change to the existing metadata output. When the user selects the experimental codegen, the metadata for internal functions will contain a list of the parameter names which are passed via the stack in venom_via_stack key. Similarly, if the function returns its value via the stack, the "venom_return_via_stack" key records this as a boolean. These keys are omitted if experimental codegen is not selected. (Credit where credit is due: this feature was primarily developed by Charles Coop).

How I did it

The symbol_map outputs are generated by calling the assembly_to_evm_with_symbol_map with the correct assembly and then throwing away all of the output except for the symbol map. I confess I cargo-culted the other arguments (pc_ofst and compiler_metadata) with their declared defaults; if these are indeed the correct values to use and they should always just be whatever the defaults are, we can remove their explicit inclusion.

The "calling convention" changes simply queries the same functions which determines the calling convention used during codegen and records that during metadata generation.

How to verify it

The test_output_json test ensures that the output machinery works as intended.

To verify the metadata and symbol map generation works correctly, we can look at the following simple contract:

@external
@pure
def ext(a: uint256, b: uint256[2]) -> uint256:
    return self.int(a, b)

@internal
@pure
def int(a: uint256, b: uint256[2]) -> uint256:
    return a + b[0] + b[1]


@external
@pure
def ext2(a: uint256, b: uint256[2]) -> uint256:
    return self.int(a, b)

When using -f symbol_map_runtime,metadata,opcodes_runtime --experimental-codegen we get the following output for the symbol_map

{"_sym_selector_bucket_1": 25, "_sym_label_ret_0": 63, "_sym_selector_bucket_0": 72, "_sym_label_ret_1": 110, "_sym_internal 0 int(uint256,uint256[2])_runtime": 119, "_sym___revert": 147, "_sym_selector_buckets": 151, "_sym_code_end": 155, "_mem_deploy_start": null, "_mem_deploy_end": null}

This tells us that int starts at PC 119. Looking at the opcodes:

... PUSH2 0x006E PUSH2 0x0077 JUMP JUMPDEST PUSH1 0xA0 MSTORE PUSH1 0x20 PUSH1 0xA0 RETURN JUMPDEST SWAP1 DUP1 PUSH1 0x60 MLOAD ADD SWAP1 DUP2 LT PUSH2 0x0093 JUMPI DUP1 PUSH1 0x80 MLOAD ADD SWAP1 DUP2 LT PUSH2 0x0093 JUMPI SWAP1 JUMP ...

We can see that 0x0077 (aka 119) is the PC of the start of int.

Further, we see in the metadata output:

"int (0)": {"name": "int", "positional_args": {"a": "uint256", "b": "uint256[2]"}, "keyword_args": {}, "return_type": "uint256", "visibility": "internal", "mutability": "pure", "_ir_identifier": "internal 0 int(uint256,uint256[2])", "default_values": {}, "frame_info": {"frame_start": 64, "frame_size": 96}, "module_path": "test.vy", "source_id": 0, "function_id": 0, "venom_via_stack": ["a"], "venom_return_via_stack": true}

which reflects that a is passed via the stack and the return value is on the stack, as we expect.

Commit message

Commit message for the final, squashed PR. (Optional, but reviewers will appreciate it! Please see our commit message style guide for what we would ideally like to see in a commit message.)

Description for the changelog

Adds additional debugging output for metadata and accessing internal mappings generated by the compiler.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

This commit contains two main changes.

The first are two new output modes: the `symbol_map` and
`symbol_map_runtime`. These contain both contain the raw "symbol map"
produced during the respective assembly of the deployment and runtime
code into bytecode. This information is crucial for determining the
exact starting position of internal functions in the assembled bytecode.

The second component is a change to the existing metadata output. When
the user selects the experimental codegen, the metadata for internal
functions will contain a list of the parameter names which are passed via the
stack in `venom_via_stack` key. Similarly, if the function returns its
value via the stack, the "venom_return_via_stack" key records this
as a boolean. These keys are omitted if experimental codegen is not
selected. (Credit where credit is due: this feature was primarily
developed by Charles Coop).

Co-Authored-By: Charles Cooper <cooper.charles.m@gmail.com>
@charles-cooper
Copy link
Member

i think this looks good. could we add a test for this to protect against regressions? the easiest would be to just copy/paste the symbol map output and check that the literal contents are produced in the test.

@jtoman
Copy link
Author

jtoman commented Aug 14, 2025

i think this looks good. could we add a test for this to protect against regressions? the easiest would be to just copy/paste the symbol map output and check that the literal contents are produced in the test.

Won't this be extremely sensitive w.r.t. codegen changes? If there is any perturbation in how something is compiled in it will throw off the symbol locations. Perhaps regenning the expected constants isn't so bad?

@jtoman
Copy link
Author

jtoman commented Aug 14, 2025

@charles-cooper I added a regression test, but didn't have the courage to test for an exact constant. It's easy enough to add in to the sketched test here. Let me know if you want me to push further on testing different output scenarios.

Uses new way to compute symbol map, adjust to existence of Label type.
Copy link
Member

@charles-cooper charles-cooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks! @cyberthirst to take a look

Copy link

codecov bot commented Sep 10, 2025

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 93.22%. Comparing base (2e57901) to head (3753797).

Files with missing lines Patch % Lines
vyper/cli/vyper_json.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4718   +/-   ##
=======================================
  Coverage   93.22%   93.22%           
=======================================
  Files         136      136           
  Lines       19190    19207   +17     
  Branches     3293     3296    +3     
=======================================
+ Hits        17889    17905   +16     
  Misses        884      884           
- Partials      417      418    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jtoman jtoman changed the title feat[tool]: Add extra debugging output feat[tool]: add extra debugging output Sep 10, 2025
return {k.label: v for (k, v) in sym.items()}


def buld_symbol_map_runtime(compiler_data: CompilerData) -> dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@cyberthirst
Copy link
Collaborator

assume this PR broke after merging #4744

@jtoman
Copy link
Author

jtoman commented Sep 11, 2025

assume this PR broke after merging #4744

Yep, pointed resolve_symbols to its new home.

@charles-cooper
Copy link
Member

just a lint remaining. @cyberthirst does it look good for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants