Skip to content

Conversation

gballet
Copy link
Member

@gballet gballet commented Aug 14, 2025

πŸ—’οΈ Description

Add a test required as part of the BloatNet effort. This is the

πŸ”— Related Issues or PRs

Not an issue, but a test plan is described here.

βœ… Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
@gballet gballet changed the title BloatNet: add first few single-opcode test for state access. feat(tests): add first few single-opcode test for state access in BloatNet Aug 14, 2025
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>

remove leftover single whitespace :|
@gballet gballet force-pushed the bloatnet-test-SSTORE branch from b6cd62a to 374e08a Compare August 14, 2025 19:16
@LouisTsai-Csie
Copy link
Collaborator

LouisTsai-Csie commented Aug 15, 2025

Hello @gballet ! Thanks for adding this case.

This is the issue tracker for bloatnet test cases, could you please help me (1) add the PR to the issue tracker PR description (like this) (2) link this PR to the issue, this would help us better track the progress, thank you!

For benchmark test, we now add new cases under tests/benchmark, and I think test_worst_stateful_opcodes.py best fit in your test.

I also add some review below, please feel free to let me know if you have any issue! If you want some reference for benchmark test, maybe you can take a look at this This is a similar case for this benchmark! You can take a look at this structure!


REFERENCE_SPEC_GIT_PATH = "DUMMY/eip-DUMMY.md"
REFERENCE_SPEC_VERSION = "0.1"
GAS_LIMIT = 30_000_000 # Default gas limit seems to be >90M in this env
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for the transaction gas limit cap? If so, this value is updated to 2**24 (reference), and we would like to use this in the test: fork.transaction_gas_limit_cap(), it will adjust to different value based on the fork.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value I find is much higher than 2**24... but yeah, if that function gives me the info, I'll use it. Is it going to be valid in benchmarks though?

Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what you found is the following fixture:

BENCHMARKING_MAX_GAS = 1_000_000_000_000

@pytest.fixture
def env(request: pytest.FixtureRequest) -> Environment:  # noqa: D103
    """Return an Environment instance with appropriate gas limit based on test type."""
    if request.node.get_closest_marker("benchmark") is not None:
        return Environment(gas_limit=BENCHMARKING_MAX_GAS)
    return Environment()

The BENCHMARKING_MAX_GAS is not the actual benchmarking value we are using, currently we are testing under 1,10,30,45,60,100,150M gas limit, and this is passed to gas_benchmark_value parameter. In the past, we create different genesis file with these configuration (so for each config, we need a corresponding genesis file.) but now we set the gas limit to a extremely high value and then restrict the block gas limit usage to these configuration, so that we only need one genesis file. This would help simplify the client's overhead in running benchmarking.

For the fork.transaction_gas_limit_cap(), (1) if you want to test under Prague now, you can directly pass the gas_benchmark_value fixture as the gas limit for the transaction. (2) But if you want to fill the test for Osaka, you might need to use blockchain test that fill the block with tx that each of them consume at most 2**24 as tx gas limit.

But I think you can do the first option now, we are working on a general refactoring that automatically upgrade all the benchmark test to be compatible to Osaka fork.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what I would like is the max gas limit for the block, not the gas limit per transaction. Is there a helper for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, found it

Comment on lines 35 to 49
while totalgas + gas_increment < GAS_LIMIT:
totalgas += gas_increment
# print(f"increment={gas_increment} < totalgas={totalgas} i={i}")
sstore_code = sstore_code + Op.DUP1
if i < 256:
sstore_code = sstore_code + Op.PUSH1(i)
else:
sstore_code = sstore_code + Op.PUSH2(i)

sstore_code = sstore_code + Op.SSTORE(unchecked=True)

storage[storage_slot] = 0x02 << 248
storage_slot += 1
i += 1
sstore_code = sstore_code + Op.POP # Drop last value on the stack
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that here we want to update storage. Maybe we can try something like:

setup = Op.CALLDATALOAD(0)

available_gas = (
    fork.transaction_gas_limit_cap()
    # intrinsic gas cost
    - fork.transaction_intrinsic_cost_calculator()(
        calldata=code,
    )
    # gas cost for push0 and calldataload
    - fork.gas_costs().G_BASE * 2
)

# gas cost for each storage operation
iteration_count = available_gas // (
    fork.gas_costs().G_VERY_LOW * 2
    + fork.gas_costs().G_STORAGE_SET
    + fork.gas_costs().G_COLD_SLOAD
)

code = setup + sum(Op.SSTORE(i, Op.DUP1) for i in range(iteration_count))
assert(len(code) - fork.max_code_size)

For SSTORE operation, it will automatically adjust to different push variants based on your data size

I am not sure if this applies to this case, but we use hash function for storage key to make it not consecutive in benchmarking access list tx.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sum() for i in range() won't work, because we also collect the locations that are touched, although since it grows linearly, we could simply do that in a separate loop. It feels like it'll be slower, but you might know better.

Question: is it really correct to write Op.SSTORE(i, Op.DUP1)? sounds too good to be true, so I want to double-check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found other instances, ok, that's amazing ❀️

Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
@gballet
Copy link
Member Author

gballet commented Aug 21, 2025

Hey @LouisTsai-Csie thanks for the feedback.

This is the issue tracker for bloatnet test cases, could you please help me (1) add the PR to the issue tracker PR description (like this) (2) link this PR to the issue, this would help us better track the progress, thank you!

I tried to do this, but this looks like it's very involved. I did my best effort but since I don't know what you're expecting, and also that I don't have all the time in the world, I'll leave it in your court to comment on that. #2064

For benchmark test, we now add new cases under tests/benchmark, and I think test_worst_stateful_opcodes.py best fit in your test.

if I do that, how do I run the test? it seems to ignore them after I moved it to the directory. I have pushed it to this PR for your consideration.

I also add some review below, please feel free to let me know if you have any issue! If you want some reference for benchmark test, maybe you can take a look at this This is a similar case for this benchmark! You can take a look at this structure!

Thanks for the reference.

@LouisTsai-Csie
Copy link
Collaborator

@gballet Appologies. I forgot to link the issue tracker for you. We've created an issue tracker based on your documentation.

I help you link this PR to the SSTORE β€” Fill block with SSTORE(0 β†’ 1) to maximize new storage slot creation, please let me know if this does not fit in the category.

Also, it would be great if you can help me review if there is anything missing / wrong in our issue tracker!

@gballet
Copy link
Member Author

gballet commented Aug 21, 2025

I'll need to have a closer look, but it seems fine as a first pass. Do you know what the problem is with moving my file to benchmarks?

@LouisTsai-Csie
Copy link
Collaborator

LouisTsai-Csie commented Aug 21, 2025

Our documentation is incomplete (I will fix them ASAP), for running the test, you will need to add a flag -m benchmark to run the test under the benchmark/ folder. By default, these tests are ignored to avoid some overhead in the CI/release process

This is the command on our documentation:

fill -v tests/benchmark/test_worst_blocks.py::test_block_full_of_ether_transfers --fork Osaka

But I would add some flag to run it:

uv run fill -v tests/benchmark/test_worst_blocks.py::test_block_full_of_ether_transfers --fork Osaka -m benchmark --clean
  • uv run: we use uv as package manager
  • -m benchmark: We need this flag or benchmark test will be ignored by default
  • --clean: you will need this if you already fill test before.

Please let me know if there is anything unclear to you!

)
from ethereum_test_tools.vm.opcode import Opcodes as Op

REFERENCE_SPEC_GIT_PATH = "DUMMY/eip-DUMMY.md"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fselmo @LouisTsai-Csie this is not related to an EIP, do I need to keep it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this. See other benchmark tests under the benchmark/ dir for inspiration. Also if there is no EIP, let's fix the docstring at the very top of the file to have a better description of what the test is, rather than using EIP-8047.

Copy link
Member Author

@gballet gballet Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
REFERENCE_SPEC_GIT_PATH = "DUMMY/eip-DUMMY.md"
REFERENCE_SPEC_GIT_PATH = "TODO"

@gballet gballet marked this pull request as ready for review August 27, 2025 08:53
Copy link
Collaborator

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gballet. I did a first pass at this strictly just from setup. I didn't dig into the actual test case to make sure it's doing what we want it to do. I am going to take a deeper look at the logic.

@pytest.mark.valid_from("Prague")
@pytest.mark.parametrize("final_storage_value", [0x02 << 248, 0x02])
def test_bloatnet(
blockchain_test: BlockchainTestFiller, pre: Alloc, fork: Fork, final_storage_value: int
Copy link
Collaborator

@fselmo fselmo Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @LouisTsai-Csie mentioned, I think what we want for benchmark gas limit is the gas_benchmark_value fixture.

@pytest.mark.valid_from("Prague")
@pytest.mark.parametrize("final_storage_value", [0x02 << 248, 0x02])
def test_bloatnet(
    blockchain_test: BlockchainTestFiller,
    pre: Alloc,
    fork: Fork,
    gas_benchmark_value: int,
    final_storage_value: int,
):
    ...

@LouisTsai-Csie would know better here what's more appropriate but I do see this in other benchmark tests
cc: @marioevz, does that seem right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says "fill the block", so this is what I'm doing. It's not clear to me what that value is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reposted from mattermost.

There are two variables/terms that can be confusing in the context of benchmarking:

  • Environment().gas_limit
  • gas_benchmark_value

In blockchain_test benchmarking, our original idea is to fill a block up to the block gas limit using a specific opcode/precompile/operation. But the question is: what should we use as the gas limit in our framework?

You might think it’s Environment().gas_limit, but that’s not correct. We use a small workaround here instead of consuming all the gas in the block, we define a value called gas_benchmark_value and only consume gas up to this amount for benchmarking purposes.

When running tests with -m benchmark, you’ll see two fixtures:

  • genesis_environment
  • env
    In both, you can see the gas_limit is set to 1_000_000_000_000. This is a fixed value and not our benchmark target (e.g., 45M, 60M, 90M). The actual benchmark value is gas_benchmark_value, which we pass via the flag: --gas-benchmark-values 10M,45M,60M. In this case, it will create 3 different tests with respective gas_benchmark_value. And this value should then be used as the effective gas limit in your test.

Again, we’re not literally consuming all the gas in the block. This is a workaround for how the genesis file is generated:

  • The genesis file always sets gasLimit to 1_000_000_000_000 during benchmarking.
  • The gasUsed field is set to the chosen gas_benchmark_value.

This approach lets us use a single genesis file for the Nethermind team. To change the benchmark gas value, we only need to update gasUsed rather than regenerate a new genesis file for every configuration.

gas_increment = gas_costs.G_VERY_LOW * 2 + gas_costs.G_STORAGE_SET + gas_costs.G_COLD_SLOAD
sstore_code = Op.PUSH0 + Op.CALLDATALOAD
storage_slot: int = 0
while totalgas + gas_increment < Environment().gas_limit:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while totalgas + gas_increment < Environment().gas_limit:
while totalgas + gas_increment < gas_benchmark_value:

sstore_code = sstore_code + Op.POP # Drop last value on the stack

sender = pre.fund_eoa()
print(sender)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print

Suggested change
print(sender)

@fselmo
Copy link
Collaborator

fselmo commented Aug 28, 2025

I just wanted to add a bit more context on the gas_benchmark_value. This allows us to run something like:

uv run fill --fork=Prague -m benchmark --gas-benchmark-values 1,10,30,45,100,150 --clean -k bloatnet

This allows us to test against the different gas limit values specified for the block, not transaction gas cap (1 = 1 Mgas). I am looking a bit deeper into the PR next but wanted to provide some better context.

"""
A test that calls a contract with many SSTOREs.
The first block will have many SSTORES that go from 0 -> 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, I think we want two tests, based on the doc here?

If so:

  1. A test that maximizes cold SSTORE (0 -> 1) up to the gas_benchmark_value for a single block
  2. A test that maximizes SSTORE (1 -> 2) up to the gas_benchmark_value for a single block

I think it may also be useful to mark these valid from Osaka and take the fork.transaction_gas_limit_cap() into account. We can fill up as many SSTOREs up, splitting into multiple transactions as we hit the tx gas limit cap on each, until we blow past the block gas_benchmark_value, then subtract an SSTORE operation, calculate our gas used and set it as the expected_benchmark_gas_used (as is done here).

Let me know if this makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gballet I iterated on this quite a bit. I wanted to get familiar with the benchmark tests. If this isn't the route, no worries but I but up a patch for what I was thinking against your branch here. Let me know, I only touched the sstore tests but I also realized this wasn't very straightforward to set up with tx gas cap.

We can run those tests with uv run fill --fork=Osaka -m benchmark --gas-benchmark-values 1,10,30,45,60,100,150 --clean -k bloatnet_sstore for example, though. Which I think is the preferred way to run benchmark tests but @LouisTsai-Csie would know best and I believe he will be taking a look here soon.

Hope I helped in some way πŸ˜… .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also... If you do think that patch makes sense and you merge it, you can ignore quite a few of the comments I made here as I implemented those and got the CI passing as well.

# with them. Only fill the block by a factor of SPEEDUP.
SPEEDUP: int = 100


Copy link
Collaborator

@jsign jsign Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have covered cold and warm SSTORE/SLOAD with same and different values.

Are these tests different in some way?

Copy link
Collaborator

@fselmo fselmo Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, good observation. The biggest question I suppose is whether those worst case scenarios are the same as the ones in the bloatnet doc here? If they are, maybe creating a marker on existing relevant tests, so we don't have to redefine them, would be a good approach. Then, instead of trying to run only the tests in this file, we could use a marker like -m bloatnet and if we mark those existing, relevant, param cases with this marker, they get included in all the tests you want to run. We could then mark this whole file (test_bloatnet.py) with the appropriate marker for your use so that you get all the test cases relevant to you in one command, whether they are defined here or elsewhere.

Just a thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not conflate two different things together, the objectives are clearly different:

  • BloatNet looks for the performance of regular execution
  • the zk benchmarks are benchmarking zkevms, a widely different environment

Adding an extra coupling here is causing more work for no benefit, since these tests are maintained by different sets of people.

Regarding the worst case, the tests are doing different things since the code itself is different. The goal of the bloatnet test it to measure the sole performance of SSTORE in client, whereas when I read the zkvm-specific code, it is doing extra stuff like jumps. It's normal, you wouldn't be able to load so much code as in our test inside a zkvm. But we can.

Copy link
Collaborator

@jsign jsign Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gballet, the tests in benchmark are not specific to zkvms. (They are even used for PerfNet).

The test I linked are executing blocks where the full gas limit is used to do cold or warm reads, or to write slots to existent or non-existent storage slots. There's nothing specific to zkvms there, thus why I ask how these tests are different -- mainly to avoid duplication or explain better what different variant is trying to be benchmarked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants