-
Notifications
You must be signed in to change notification settings - Fork 168
feat(tests): add first few single-opcode test for state access in BloatNet #2040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com> remove leftover single whitespace :|
b6cd62a
to
374e08a
Compare
Hello @gballet ! Thanks for adding this case. This is the issue tracker for bloatnet test cases, could you please help me (1) add the PR to the issue tracker PR description (like this) (2) link this PR to the issue, this would help us better track the progress, thank you! For benchmark test, we now add new cases under I also add some review below, please feel free to let me know if you have any issue! If you want some reference for benchmark test, maybe you can take a look at this This is a similar case for this benchmark! You can take a look at this structure! |
|
||
REFERENCE_SPEC_GIT_PATH = "DUMMY/eip-DUMMY.md" | ||
REFERENCE_SPEC_VERSION = "0.1" | ||
GAS_LIMIT = 30_000_000 # Default gas limit seems to be >90M in this env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for the transaction gas limit cap? If so, this value is updated to 2**24 (reference), and we would like to use this in the test: fork.transaction_gas_limit_cap()
, it will adjust to different value based on the fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value I find is much higher than 2**24... but yeah, if that function gives me the info, I'll use it. Is it going to be valid in benchmarks though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what you found is the following fixture:
BENCHMARKING_MAX_GAS = 1_000_000_000_000
@pytest.fixture
def env(request: pytest.FixtureRequest) -> Environment: # noqa: D103
"""Return an Environment instance with appropriate gas limit based on test type."""
if request.node.get_closest_marker("benchmark") is not None:
return Environment(gas_limit=BENCHMARKING_MAX_GAS)
return Environment()
The BENCHMARKING_MAX_GAS
is not the actual benchmarking value we are using, currently we are testing under 1,10,30,45,60,100,150M
gas limit, and this is passed to gas_benchmark_value
parameter. In the past, we create different genesis file with these configuration (so for each config, we need a corresponding genesis file.) but now we set the gas limit to a extremely high value and then restrict the block gas limit usage to these configuration, so that we only need one genesis file. This would help simplify the client's overhead in running benchmarking.
For the fork.transaction_gas_limit_cap()
, (1) if you want to test under Prague now, you can directly pass the gas_benchmark_value
fixture as the gas limit for the transaction. (2) But if you want to fill the test for Osaka, you might need to use blockchain test that fill the block with tx that each of them consume at most 2**24 as tx gas limit.
But I think you can do the first option now, we are working on a general refactoring that automatically upgrade all the benchmark test to be compatible to Osaka fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what I would like is the max gas limit for the block, not the gas limit per transaction. Is there a helper for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, found it
while totalgas + gas_increment < GAS_LIMIT: | ||
totalgas += gas_increment | ||
# print(f"increment={gas_increment} < totalgas={totalgas} i={i}") | ||
sstore_code = sstore_code + Op.DUP1 | ||
if i < 256: | ||
sstore_code = sstore_code + Op.PUSH1(i) | ||
else: | ||
sstore_code = sstore_code + Op.PUSH2(i) | ||
|
||
sstore_code = sstore_code + Op.SSTORE(unchecked=True) | ||
|
||
storage[storage_slot] = 0x02 << 248 | ||
storage_slot += 1 | ||
i += 1 | ||
sstore_code = sstore_code + Op.POP # Drop last value on the stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that here we want to update storage. Maybe we can try something like:
setup = Op.CALLDATALOAD(0)
available_gas = (
fork.transaction_gas_limit_cap()
# intrinsic gas cost
- fork.transaction_intrinsic_cost_calculator()(
calldata=code,
)
# gas cost for push0 and calldataload
- fork.gas_costs().G_BASE * 2
)
# gas cost for each storage operation
iteration_count = available_gas // (
fork.gas_costs().G_VERY_LOW * 2
+ fork.gas_costs().G_STORAGE_SET
+ fork.gas_costs().G_COLD_SLOAD
)
code = setup + sum(Op.SSTORE(i, Op.DUP1) for i in range(iteration_count))
assert(len(code) - fork.max_code_size)
For SSTORE
operation, it will automatically adjust to different push variants based on your data size
I am not sure if this applies to this case, but we use hash function for storage key to make it not consecutive in benchmarking access list tx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum() for i in range()
won't work, because we also collect the locations that are touched, although since it grows linearly, we could simply do that in a separate loop. It feels like it'll be slower, but you might know better.
Question: is it really correct to write Op.SSTORE(i, Op.DUP1)
? sounds too good to be true, so I want to double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found other instances, ok, that's amazing β€οΈ
Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Hey @LouisTsai-Csie thanks for the feedback.
I tried to do this, but this looks like it's very involved. I did my best effort but since I don't know what you're expecting, and also that I don't have all the time in the world, I'll leave it in your court to comment on that. #2064
if I do that, how do I run the test? it seems to ignore them after I moved it to the directory. I have pushed it to this PR for your consideration.
Thanks for the reference. |
@gballet Appologies. I forgot to link the issue tracker for you. We've created an issue tracker based on your documentation. I help you link this PR to the Also, it would be great if you can help me review if there is anything missing / wrong in our issue tracker! |
I'll need to have a closer look, but it seems fine as a first pass. Do you know what the problem is with moving my file to benchmarks? |
Our documentation is incomplete (I will fix them ASAP), for running the test, you will need to add a flag This is the command on our documentation:
But I would add some flag to run it:
Please let me know if there is anything unclear to you! |
tests/benchmark/test_bloatnet.py
Outdated
) | ||
from ethereum_test_tools.vm.opcode import Opcodes as Op | ||
|
||
REFERENCE_SPEC_GIT_PATH = "DUMMY/eip-DUMMY.md" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fselmo @LouisTsai-Csie this is not related to an EIP, do I need to keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this. See other benchmark tests under the benchmark/
dir for inspiration. Also if there is no EIP, let's fix the docstring at the very top of the file to have a better description of what the test is, rather than using EIP-8047.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REFERENCE_SPEC_GIT_PATH = "DUMMY/eip-DUMMY.md" | |
REFERENCE_SPEC_GIT_PATH = "TODO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gballet. I did a first pass at this strictly just from setup. I didn't dig into the actual test case to make sure it's doing what we want it to do. I am going to take a deeper look at the logic.
@pytest.mark.valid_from("Prague") | ||
@pytest.mark.parametrize("final_storage_value", [0x02 << 248, 0x02]) | ||
def test_bloatnet( | ||
blockchain_test: BlockchainTestFiller, pre: Alloc, fork: Fork, final_storage_value: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @LouisTsai-Csie mentioned, I think what we want for benchmark gas limit is the gas_benchmark_value
fixture.
@pytest.mark.valid_from("Prague")
@pytest.mark.parametrize("final_storage_value", [0x02 << 248, 0x02])
def test_bloatnet(
blockchain_test: BlockchainTestFiller,
pre: Alloc,
fork: Fork,
gas_benchmark_value: int,
final_storage_value: int,
):
...
@LouisTsai-Csie would know better here what's more appropriate but I do see this in other benchmark tests
cc: @marioevz, does that seem right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec says "fill the block", so this is what I'm doing. It's not clear to me what that value is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reposted from mattermost.
There are two variables/terms that can be confusing in the context of benchmarking:
Environment().gas_limit
gas_benchmark_value
In blockchain_test
benchmarking, our original idea is to fill a block up to the block gas limit using a specific opcode/precompile/operation. But the question is: what should we use as the gas limit in our framework?
You might think itβs Environment().gas_limit
, but thatβs not correct. We use a small workaround here instead of consuming all the gas in the block, we define a value called gas_benchmark_value
and only consume gas up to this amount for benchmarking purposes.
When running tests with -m benchmark
, youβll see two fixtures:
- genesis_environment
- env
In both, you can see the gas_limit is set to 1_000_000_000_000. This is a fixed value and not our benchmark target (e.g., 45M, 60M, 90M). The actual benchmark value isgas_benchmark_value
, which we pass via the flag:--gas-benchmark-values 10M,45M,60M
. In this case, it will create 3 different tests with respectivegas_benchmark_value
. And this value should then be used as the effective gas limit in your test.
Again, weβre not literally consuming all the gas in the block. This is a workaround for how the genesis file is generated:
- The genesis file always sets
gasLimit
to1_000_000_000_000
during benchmarking. - The
gasUsed
field is set to the chosengas_benchmark_value
.
This approach lets us use a single genesis file for the Nethermind team. To change the benchmark gas value, we only need to update gasUsed rather than regenerate a new genesis file for every configuration.
gas_increment = gas_costs.G_VERY_LOW * 2 + gas_costs.G_STORAGE_SET + gas_costs.G_COLD_SLOAD | ||
sstore_code = Op.PUSH0 + Op.CALLDATALOAD | ||
storage_slot: int = 0 | ||
while totalgas + gas_increment < Environment().gas_limit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while totalgas + gas_increment < Environment().gas_limit: | |
while totalgas + gas_increment < gas_benchmark_value: |
sstore_code = sstore_code + Op.POP # Drop last value on the stack | ||
|
||
sender = pre.fund_eoa() | ||
print(sender) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove print
print(sender) |
I just wanted to add a bit more context on the
This allows us to test against the different gas limit values specified for the block, not transaction gas cap (1 = 1 Mgas). I am looking a bit deeper into the PR next but wanted to provide some better context. |
""" | ||
A test that calls a contract with many SSTOREs. | ||
The first block will have many SSTORES that go from 0 -> 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, I think we want two tests, based on the doc here?
If so:
- A test that maximizes cold
SSTORE
(0 -> 1) up to thegas_benchmark_value
for a single block - A test that maximizes
SSTORE
(1 -> 2) up to thegas_benchmark_value
for a single block
I think it may also be useful to mark these valid from Osaka
and take the fork.transaction_gas_limit_cap()
into account. We can fill up as many SSTOREs up, splitting into multiple transactions as we hit the tx gas limit cap on each, until we blow past the block gas_benchmark_value
, then subtract an SSTORE operation, calculate our gas used and set it as the expected_benchmark_gas_used
(as is done here).
Let me know if this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gballet I iterated on this quite a bit. I wanted to get familiar with the benchmark tests. If this isn't the route, no worries but I but up a patch for what I was thinking against your branch here. Let me know, I only touched the sstore
tests but I also realized this wasn't very straightforward to set up with tx gas cap.
We can run those tests with uv run fill --fork=Osaka -m benchmark --gas-benchmark-values 1,10,30,45,60,100,150 --clean -k bloatnet_sstore
for example, though. Which I think is the preferred way to run benchmark tests but @LouisTsai-Csie would know best and I believe he will be taking a look here soon.
Hope I helped in some way π .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, also... If you do think that patch makes sense and you merge it, you can ignore quite a few of the comments I made here as I implemented those and got the CI passing as well.
# with them. Only fill the block by a factor of SPEEDUP. | ||
SPEEDUP: int = 100 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah, good observation. The biggest question I suppose is whether those worst case scenarios are the same as the ones in the bloatnet doc here? If they are, maybe creating a marker on existing relevant tests, so we don't have to redefine them, would be a good approach. Then, instead of trying to run only the tests in this file, we could use a marker like -m bloatnet
and if we mark those existing, relevant, param
cases with this marker, they get included in all the tests you want to run. We could then mark this whole file (test_bloatnet.py
) with the appropriate marker for your use so that you get all the test cases relevant to you in one command, whether they are defined here or elsewhere.
Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not conflate two different things together, the objectives are clearly different:
- BloatNet looks for the performance of regular execution
- the zk benchmarks are benchmarking zkevms, a widely different environment
Adding an extra coupling here is causing more work for no benefit, since these tests are maintained by different sets of people.
Regarding the worst case, the tests are doing different things since the code itself is different. The goal of the bloatnet test it to measure the sole performance of SSTORE
in client, whereas when I read the zkvm-specific code, it is doing extra stuff like jumps. It's normal, you wouldn't be able to load so much code as in our test inside a zkvm. But we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gballet, the tests in benchmark
are not specific to zkvms. (They are even used for PerfNet).
The test I linked are executing blocks where the full gas limit is used to do cold or warm reads, or to write slots to existent or non-existent storage slots. There's nothing specific to zkvms there, thus why I ask how these tests are different -- mainly to avoid duplication or explain better what different variant is trying to be benchmarked.
ποΈ Description
Add a test required as part of the BloatNet effort. This is the
π Related Issues or PRs
Not an issue, but a test plan is described here.
β Checklist
tox
checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
type(scope):
.mkdocs serve
locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned@ported_from
marker.