Skip to content

Commit 7ba7e0f

Browse files
committed
adds design docs for bench
Signed-off-by: Andrey Kononykhin <andkononykhin@gmail.com>
1 parent a46d6d2 commit 7ba7e0f

File tree

4 files changed

+152
-13
lines changed

4 files changed

+152
-13
lines changed

bench/README.md

+35-12
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# DCLedger Load Testing
22

3+
DCLedger testing is implemented in python3 and bases on [Locust](https://locust.io/) framework.
4+
35
## Requirements
46

57
* python >= 3.7
@@ -22,44 +24,65 @@ Each write transactions is signed and thus requires:
2224
By that reason load test uses prepared load data which can be generated as follows:
2325

2426
```bash
25-
$ sudo make localnet_clean
26-
$ make localnet_init
27-
$ ./gentestaccounts.sh <NUM-USERS>
28-
$ make localnet_start
27+
sudo make localnet_clean
28+
make localnet_init
29+
30+
# ./gentestaccounts.sh [<NUM-USERS>]
31+
./gentestaccounts.sh
32+
33+
make localnet_start
2934
# Note: once started ledger may require some time to complete the initialization.
30-
$ DCLBENCH_WRITE_USERS_COUNT=<NUM-USERS> DCLBENCH_WRITE_USERS_Q_COUNT=<NUM-REQ-PER-USER> python bench/generate.py bench/test.spec.yaml ./txns
35+
36+
# DCLBENCH_WRITE_USERS_COUNT=<NUM-USERS> DCLBENCH_WRITE_USERS_Q_COUNT=<NUM-REQ-PER-USER> python bench/generate.py bench/test.spec.yaml bench/txns
37+
python bench/generate.py bench/test.spec.yaml bench/txns
3138
```
3239

33-
Here the following inputs are considered:
40+
Here the following (**optional**) inputs are considered:
3441

35-
* `NUM-USERS`: number of client accounts with write access (created as Vendors)
36-
* `NUM-REQ-PER-USER`: number of write txns to perform per a user
42+
* `NUM-USERS`: number of client accounts with write access (created as Vendors). Default: 10
43+
* `NUM-REQ-PER-USER`: number of write txns to perform per a user. Default: 1000
3744

3845
## Run
3946

4047
### Headless
4148

4249
```bash
43-
locust -f bench/locustfile.py --headless --dcl-users <NUM-USERS> -s 10
50+
locust --headless
4451
```
4552

4653
### Web UI
4754

4855
```bash
49-
locust -f bench/locustfile.py --dcl-users <NUM-USERS> -s 10
56+
locust
5057
```
5158

5259
Then you can open <http://localhost:8089/> and launch the tests from the browser.
5360

5461
### Configuration
5562

63+
Run options (DCLedger custom ones):
64+
5665
* `--dcl-users`: number of users
5766
* `--dcl-spawn-rate` Rate to spawn users at (users per second)
5867
* `--dcl-hosts <comma-sepated-list>`: list of DCL nodes to target. Each user randomly picks one
5968
E.g. for local ledger `http://localhost:26657,http://localhost:26659,http://localhost:26661,http://localhost:26663` will specify all the nodes.
6069
* `--dcl-txn-file` path to a file with generated txns
6170

62-
Please check `locust -f bench/locustfile.py --help` for the more details.
71+
Statistic options:
72+
73+
[Locust](https://locust.io/) provides the following options to present the results:
74+
75+
* `--csv <prefix>`: generates a set of stat files (summary, failures, exceptions and stats history) with the provided `<prefix>`
76+
* `--csv-full-history`: populates the stats history with more entries (including each specific request type)
77+
* `--html <path>`: generates an html report
78+
* Web UI also includes `Download Data` tab where the reports can be found.
79+
80+
More details can be found in:
81+
82+
* [locust.conf](../locust.conf): default values
83+
* `locust --help` (being in the project root)
84+
* [locust configuration](https://docs.locust.io/en/stable/configuration.html)
85+
* [locust stats](https://docs.locust.io/en/stable/retrieving-stats.html)
6386

6487
### Re-run
6588

@@ -69,7 +92,7 @@ will complain about already written data or wrong sequence numbers.
6992
For that case you may consider to reset the ledger as follows:
7093

7194
```bash
72-
$ make localnet_reset localnet_start
95+
make localnet_reset localnet_start
7396
```
7497

7598
## FAQ

bench/test.spec.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ defaults:
2222
{%- set ctx = {} %}
2323
{%- endif %}
2424

25-
{%- set write_users_c = ctx.get("write_users_count", 100) | int + 1 %}
25+
{%- set write_users_c = ctx.get("write_users_count", 10) | int + 1 %}
2626
{%- set write_users_q_c = ctx.get("write_users_q_count", 1000) | int + 1 %}
2727

2828
users:

docs/design/benchmarking.md

+108
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Benchmarking
2+
3+
## Client Side Metrics
4+
5+
* `response time` (percentiles): the time between client's initial request and the last byte of a validator response
6+
* `requests per second (RPS)`: number of requests per second
7+
* `transactions per second (TPS)`: number of write requests (txns) per second
8+
* `number of clients`: number of concurrent clients that ledger serves
9+
* (optional) `throughtput` (in/out): number of KB per second. Marked as optional since we don't consider much in/out data due to relatively small txns payloads.
10+
11+
## Server Side Metrics
12+
13+
Starting from `v0.40.0` Cosmos SDK provides [telemetry](https://docs.cosmos.network/master/core/telemetry.html) package as a server side support for application performance and behavior explorations.
14+
15+
The following [metrics](https://docs.cosmos.network/master/core/telemetry.html#supported-metrics) make sense to consider:
16+
17+
* `tx_count`: Total number of txs processed via DeliverTx (tx)
18+
* `tx_successful`: Total number of successful txs processed via DeliverTx  (tx)
19+
* `tx_failed`: Total number of failed txs processed via DeliverTx
20+
* `abci_deliver_tx`: Duration of ABCI DeliverTx  (ms)
21+
* `abci_commit`: Duration of ABCI Commit (ms)
22+
* `abci_query`: Duration of ABCI Query  (ms)
23+
* `abci_begin_block`: Duration of ABCI BeginBlock (ms)
24+
* `abci_end_block`: Duration of ABCI EndBlock   (ms)
25+
* `begin_blocker`: Duration of BeginBlock for a given module   (ms)
26+
* `end_blocker`: Duration of EndBlock for a given module (ms)
27+
* `store_iavl_get`: Duration of an IAVL Store#Get call  (ms)
28+
* `store_iavl_set`: Duration of an IAVL Store#Set call  (ms)
29+
* `store_iavl_has`: Duration of an IAVL Store#Has call  (ms)
30+
* `store_iavl_delete`: Duration of an IAVL Store#Delete call   (ms)
31+
* `store_iavl_commit`: Duration of an IAVL Store#Commit call   (ms)
32+
* `store_iavl_query`: Duration of an IAVL Store#Query (ms)
33+
34+
## Environment
35+
36+
**Note**. For the moment it's not clear enough what production setup will look like, in particular:
37+
38+
* number of vendor companies (number of validators)
39+
* type of external endpoints, options are [Cosmos SDK / Tendermint endpoints](https://docs.cosmos.network/master/core/grpc_rest.html)
40+
* type and number of proxies for validator-validator and client-validator connections
41+
42+
Current assumptions:
43+
44+
* multiple companies (vendors) will manage one/multiple validators
45+
* while some common requirements and recommendations would be provided each vendor will deploy the infrastructure independently with some freedom regarding internal architecture
46+
* there would be a set of external (for clients) and internal (for validators to support txn flows) endpoints
47+
* most likely observer nodes along with REST http servers with clients authentication would be in front of the client endpoints
48+
49+
## Workloads
50+
51+
### Transactions Types
52+
53+
* write txns:
54+
* `tx/modelinfo/add-model`
55+
* with `vid` constant for a particular client
56+
* variable (incremented) `pid`
57+
* **ToDo** consider other request types (e.g. `update-model`)
58+
* read txns:
59+
* `query/modelinfo/model`
60+
* **ToDo** consider other request types (e.g. `all-models`)
61+
62+
### Clients
63+
64+
**ToDo** define which client endpoints are considered in production
65+
66+
As long as CosmosSDK (Tendermint) provides multiple client [endpoints](https://docs.cosmos.network/master/core/grpc_rest.html) makes sense to benchmark all of them (separately and in a combination), in particular:
67+
68+
* http RPC
69+
* websocket RPC
70+
* http REST
71+
72+
### Load Types
73+
74+
* per txns types:
75+
* only write txns: to measure server-side (consensus related) bottlenecks and limitations
76+
* only read txns: to measure client-side (setup related) limitations
77+
* combined loads with read/write ratio as a parameter
78+
* **ToDo** define anticipated real loads
79+
* per scenario:
80+
* stepping load: to identify the point where performance degrades significantly
81+
82+
* waves: to emulate peaks and troughs in clients behavior
83+
84+
## Load Generation Framework
85+
86+
As long as DCledger based on Cosmos SDK and Tendermint which provide standard HTTP/websocket RPC and REST [endpoints](https://docs.cosmos.network/master/core/grpc_rest.html) to perform both read & write txns generic production ready tools like [jMeter](https://jmeter.apache.org/), [Locust](https://locust.io/), [K6](https://k6.io/) may be considered.
87+
88+
[Locust](https://locust.io/) looks like the most easy-to-go option:
89+
90+
* tests can be configured using simple python scripts (version control, CI/CD), in comparison:
91+
* JS based configuration for [K6](https://k6.io/) will likely require more efforts
92+
* [jMeter](https://jmeter.apache.org/) configuration is mostly about UI but not coding
93+
* [distributed testing](https://docs.locust.io/en/stable/running-locust-distributed.html) with results aggregation is supported (if we decide to use it)
94+
* there are some [concerns](https://k6.io/blog/comparing-best-open-source-load-testing-tools/) regarding its performance and accuracy but the current vision is that it should be acceptable for our case
95+
96+
## Testing Environment Provisioning Automation
97+
98+
General considerations:
99+
100+
* as long as target production deploy architecture is not yet defined automation for testing environment provisioning would simplify comparison of the options
101+
* single cloud provider as a starting point
102+
* multiple cloud provides as a very possible production case
103+
* tools: [terraform](https://www.terraform.io/) and [pulumi](https://www.pulumi.com/) as the preferred options
104+
105+
## ToDo
106+
107+
* define acceptance criteria (target metrics values)
108+
* define target environment

locust.conf

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
locustfile = bench/locustfile.py
2+
spawn-rate = 1
3+
dcl-users=10
4+
dcl-hosts = http://localhost:26657,http://localhost:26659,http://localhost:26661,http://localhost:26663
5+
dcl-txn-file = bench/txns
6+
csv = dclbench
7+
# csv-full-history = true
8+
html = dclbench.stats.html

0 commit comments

Comments
 (0)