Skip to content

DCL MainNet Deployment

Alexander Shcherbakov edited this page Feb 4, 2022 · 38 revisions

Summary

This is WIP

DCL is in a better position regarding DDoS protection comparing to public Cosmos networks.

DCL is a permissioned network consisting of quite a limited set of trusted or semi-trusted nodes (validators and observers), and we don't require to make all the nodes public to anyone in the world (a company may make its nodes accessible to that company applications only). But Cosmos public networks are permisionless meaning that it may have any number of nodes, and that nodes need to be public for anyone in the world.

Moreover, DCL nodes do not compete for proposing the blocks, they don't play to "game of stake". There is no tokenomics in the permissioned DCL network (at least so far). So, if one node dies or unavailable for some time, this is not a catastrophe. A Node Admin can fix and repair it. But a crashed/non-available node can be a problem in a staking-based network (Cosmos), as the node can not propose new blocks, and may lost the "clients" (delegators) and their tokens.

In other words, DCL is more collaborative, while staking-based networks (like Cosmos) usually consist of competitive entities.

Options for Validators

All options assume that the validator node is not public and accepts incoming connections from trusted validators and observers only (see Options for network protection)

  • Option 1: Cloud, no HSM
    • Option 1A: no Sentry, private keys and secrets at the Validator machine
    • Option 1B: with Sentry, private keys and secrets at the Validator machine
    • Option 1C: no Sentry, private keys and secrets are not at Validator machine (tmkms, HashiCorp Vault)
    • Option 1D: with Sentry, private keys and secrets are not at Validator machine (tmkms, HashiCorp Vault)
  • Option 2: Physical machine, HSM, with Sentries

Recommended Option for DCL 1.0

Option 1B: with Sentry, private keys and secrets at the Validator machine

Why use Sentries:

  • Harder to DDoS the real Validator node (in case malicious Validators present)
  • Hides real Validator node's IP, so harder to attack a real validator
  • Can support HSM and Validators at physical machines w/o access to Internet (if not from beginning, then HSM support can be added in future)
  • Public Sentries are essentially Observers, so no need for more Observers
  • Can potentially auto-scale Sentry nodes (create new Sentries when attack is detected)

Why use separate KMS for Validator Keys:

  • Security best practice: do not keep secrets at Validator machine, so that if Validator is compromised, secrets are not accessed
  • In particular, helps to prevent double-signing by Validators (see https://kb.certus.one/hsm.html#double-signing)
  • Please note though, that double signing is not that critical for DCL comparing to permissionless proof-of-stake networks (Cosmos). In DCL nodes don't have any tokens and don't manage public reputation and clients. So, if a node tries to double sign, it will be just slashed (removed from the network). Later on Node Admins and Trustees can investigate what was the reason.

Why use HSM for Validator Keys:

  • The most secure key management
  • Not that critical for DCL comparing to permissionless proof-of-stake networks (Cosmos), see the previous Item.

Options for the network protection

https://kb.certus.one/peers.html#private-nodes

  • Option 1: no IPSec/VPN, just whitelist/blacklist via firewall rules
    • Seems enough and quite easy to do
    • We can expect/assume that all IPs are static
    • We don't need encryption at IP level, as auth encryption will be done on Tendermint P2P level in any case
      • Done in for example link, Sections 6.6 and 6.7
  • Option 2: IPSec/VPN
    • Mentioned as an option in https://docs.tendermint.com/master/spec/p2p/node.html#validator-node for validators that trust each other (actually our DCL case)
    • May be more tricky for configuration, especially in heterogeneous environment (different cloud providers etc.)
    • May handle IP changes better
    • Additional layer of encryption can be beneficial if there are concerns in Tendermint P2P auth encryption

Options for Nodes Discovery

  • Persistent peers between all Validators (or private Sentries if Validator is behind a Sentry Node)
    • This is how our current TestNet is deployed
    • May need to maintain and update the list of peers
  • One or multiple Seed nodes that all nodes use for discovery. The node can be managed by CSA for example.
    • All nodes have to trust and rely on that seed node
  • Every Validator starts up its own Seed Node

Options for Account Keys

See https://docs.cosmos.network/master/run-node/keyring.html#available-backends-for-the-keyring

Details

Goals

  1. DDoS Protection
  2. Private Key and secrets security
  3. Trusted relationship (can trust query results, no MITM)
  4. Health and monitoring
  5. Stability and performance
  6. High Availability and scalability

1. DDoS Protection

2. Private Keys and secrets security

3. Trusted relationship

  • [MUST] gRPC/REST over HTTPS (not HTTP)
  • [MUST] Tendermint RPC over HTTPS (not HTTP)
  • [MUST] Clients connect to trusted Observer nodes only. If there is no trusted Observer to connect to, clients should use Tendermint RPC queries and verify proofs via light client
    • There is support for Light Client Proxy Node, so that clients can run a Proxy node, send all RPC queries to that Proxy, and the Proxy will verify the proofs automatically.

4. Health and monitoring

  • [SHOULD] Monitor logs: ELK stack
  • [SHOULD] Monitor performance: prometheus, Kibana

5. Stability and performance

  • [MUST] Recommended config
    • disable PEX for private nodes
    • adjust timeouts
  • [SHOULD] State-Sync for new Nodes
  • [SHOULD] Seed Nodes for peer discovery??

6. High Availability and scalability

  • [SHOULD] Multiple Observers (Sentries)
  • [SHOULD] Load Balancers for Observers (Public Sentries)

References

Clone this wiki locally