-
Notifications
You must be signed in to change notification settings - Fork 46
DCL MainNet Deployment
This is WIP
DCL is in a better position regarding DDoS protection comparing to public Cosmos networks.
DCL is a permissioned network consisting of quite a limited set of trusted or semi-trusted nodes (validators and observers), and we don't require to make all the nodes public to anyone in the world (a company may make its nodes accessible to that company applications only). But Cosmos public networks are permisionless meaning that it may have any number of nodes, and that nodes need to be public for anyone in the world.
Moreover, DCL nodes do not compete for proposing the blocks, they don't play to "game of stake". There is no tokenomics in the permissioned DCL network (at least so far). So, if one node dies or unavailable for some time, this is not a catastrophe. A Node Admin can fix and repair it. But a crashed/non-available node can be a problem in a staking-based network (Cosmos), as the node can not propose new blocks, and may lost the "clients" (delegators) and their tokens.
In other words, DCL is more collaborative, while staking-based networks (like Cosmos) usually consist of competitive entities.
All options assume that the validator node is not public and accepts incoming connections from trusted validators and observers only (see Options for network protection)
- Option 1: Cloud, no HSM
- Option 1A: no Sentry, private keys and secrets at the Validator machine
- Option 1B: with Sentry, private keys and secrets at the Validator machine
- Option 1C: no Sentry, private keys and secrets are not at Validator machine (tmkms, HashiCorp Vault)
- Option 1D: with Sentry, private keys and secrets are not at Validator machine (tmkms, HashiCorp Vault)
- Option 2: Physical machine, HSM, with Sentries
Option 2 (Physical machine, HSM, with Sentries) or Option 1B (Cloud, no HSM, with Sentry, private keys and secrets at the Validator machine).
Why use Sentries:
- Harder to DDoS the real Validator node (in case malicious Validators present)
- Hides real Validator node's IP, so harder to attack a real validator
- Can support HSM and Validators at physical machines w/o access to Internet (if not from beginning, then HSM support can be added in future)
- Public Sentries are essentially Observers, so no need for more Observers
- Can potentially auto-scale Sentry nodes (create new Sentries when attack is detected)
- though it's not that simple, see https://kb.certus.one/peers.html#sentry-auto-scaling
Why use separate KMS for Validator Keys:
- Security best practice: do not keep secrets at Validator machine, so that if Validator is compromised, secrets are not accessed
- In particular, helps to prevent double-signing by Validators (see https://kb.certus.one/hsm.html#double-signing)
- Please note though, that double signing is not that critical for DCL comparing to permissionless proof-of-stake networks (Cosmos). In DCL nodes don't have any tokens and don't manage public reputation and clients. So, if a node tries to double sign, it will be just slashed (removed from the network). Later on Node Admins and Trustees can investigate what was the reason.
Why use HSM for Validator Keys:
- The most secure key management
- Not that critical for DCL comparing to permissionless proof-of-stake networks (Cosmos), see the previous Item.
https://kb.certus.one/peers.html#private-nodes
- Option 1: no IPSec/VPN, just whitelist/blacklist via firewall rules
- Seems enough and quite easy to do
- We can expect/assume that all IPs are static
- We don't need encryption at IP level, as auth encryption will be done on Tendermint P2P level in any case
- Done in for example link, Sections 6.6 and 6.7
- Option 2: IPSec/VPN
- Mentioned as an option in https://docs.tendermint.com/master/spec/p2p/node.html#validator-node for validators that trust each other (actually our DCL case)
- May be more tricky for configuration, especially in heterogeneous environment (different cloud providers etc.)
- May handle IP changes better
- Additional layer of encryption can be beneficial if there are concerns in Tendermint P2P auth encryption
- Persistent peers between all Validators (or private Sentries if Validator is behind a Sentry Node)
- This is how our current TestNet is deployed
- May need to maintain and update the list of peers
- One or multiple Seed nodes that all nodes use for discovery. The node can be managed by CSA for example.
- All nodes have to trust and rely on that seed node
- Every Validator starts up its own Seed Node
- See https://docs.cosmos.network/master/run-node/keyring.html#available-backends-for-the-keyring
- Ledger Nano is supported (though not tested):
-
https://hub.cosmos.network/main/resources/ledger.html#gaia-cli-ledger-nano: replace
gaia
bydcl
there.
-
https://hub.cosmos.network/main/resources/ledger.html#gaia-cli-ledger-nano: replace
- DDoS Protection
- Private Key and secrets security
- Trusted relationship (can trust query results, no MITM)
- Health and monitoring
- Stability and performance
- High Availability and scalability
- [MUST] No Public Validator nodes (Validator nodes allow incoming connections from other Validator nodes only)
- https://kb.certus.one/peers.html#private-nodes
- VPN/IPSec for all validators and observers. See https://docs.tendermint.com/master/spec/p2p/node.html#sentry-node
- Firewall rules to blacklist/whitelist validators and observers
- [SHOULD] Sentry Nodes
- Optional if Validators deployed at the Cloud
- Must-have for Validators deployed in a Data Center
- Not must-have for permissioned network (as DCL) unlike permissionless networks (as Cosmos)
- Can give additional protection for Validator nodes.
- Can hide Validator node's IP address.
- Two types of Sentries
- Private (not publicly available as Validators without Sentries) connected to other Sentries/Validators only
- Public - Observers
- [MUST] Cloud-specific DDoS protection for Sentry and Validation nodes
- [SHOULD] Stateful firewall and Network (TCP) Load Balancers - move DDoS closer to the providers edge
- Cosmos/Tendermint architecture allows to prevent some ways of DDoS by
- Only valid txns are broadcasted to other nodes
- Read requests are not broadcasted to other nodes
- Tendermint/Cosmos TPS is quite high
- Need to attack a lot of ONs
- Possible to not allow random ONs to be connected to your ON
- [MUST] Proper keyring backend for user account keys
- [SHOULD] Do not hold Validator private keys at the Validator Machine
- [SHOULD] HSM for Validator Keys
- YubiHSM2 for example
- It's possible to use software one, but not recommended for production
- AWS CloudHSM is not an option as it doesn't support ed25519 (which is default for Cosmos apps)
- [SHOULD] HashiCorp Vault for secrets
- [MUST] gRPC/REST over HTTPS (not HTTP)
- [MUST] Tendermint RPC over HTTPS (not HTTP)
- [MUST] Clients connect to trusted Observer nodes only. If there is no trusted Observer to connect to, clients should use Tendermint RPC queries and verify proofs via light client
- There is support for Light Client Proxy Node, so that clients can run a Proxy node, send all RPC queries to that Proxy, and the Proxy will verify the proofs automatically.
- [SHOULD] Monitor logs: ELK stack
- [SHOULD] Monitor performance: prometheus, Kibana
- [MUST] Recommended config
- disable PEX for private nodes
- adjust timeouts
- [SHOULD] State-Sync for new Nodes
- [SHOULD] Seed Nodes for peer discovery??
- [SHOULD] Multiple Observers (Sentries)
- [SHOULD] Load Balancers for Observers (Public Sentries)
- https://docs.tendermint.com/master/nodes/
- https://docs.google.com/document/d/e/2PACX-1vQXb1kd0zqYT8K4B4XYb-lrlfRIuPDXsgiTjj94gDOjw3ezEUAtjvxR8yfbKJypmioKeGRrhkLCtZog/pub
- https://kb.certus.one/
- https://medium.com/@kidinamoto/tech-choices-for-cosmos-validators-27c7242061ea
- https://medium.com/@kidinamoto/key-management-choices-for-cosmos-validators-29b910af23c0
- https://medium.com/@kidinamoto/setup-cosmos-validator-relay-network-6b6e63661100