- Many copies of data to scale horizontally and serve more reads
- Ways of replication
- Leader-follower: writes happen first on leader, then propagated to followers.
- Multi-leader
- Leaderless
- Types of replication:
- Synchronous: leader sends write and blocks writes till follower ack
- Asynchronous: leader sends write and continues with further writes
- Semi-synchronous: one follower is sync, rest async. If leader fails, the sync follower becomes leader.
- Setting up new followers: snapshot leader, launch follower, request changes since snapshot
- Node outage: catch up using on disk log
- Leader failover:
- Failover detection: polling. Poll frequency needs to account for load spike.
- New leader selection: election, controller node. Best candidate is with most writes.
- Reconfigure system to use new leader: old leader coming back causes split brain.
- Use manual failover to circumvent above subtleties.
- Replication log implementations: sequence of append-only bytes with all writes to db
- Statement-based: CRUD statements
- Write-Ahead Log (WAL) based: SStable/LSM Trees track sequence of bytes being edited in db. Tightly coupled with underlying storage engine.
- Logical (row) based: rows and columns affected.
- Trigger-based : execute custom application code (triggers & stored procedures) when changes to db occur.
- Eventual consistency : after a period of time, all data replicas will converge to the same state if no updates are made.
- Read-your-writes/Read-after-write consistency : users reads should be immediately visible.
- Monotonic reads: reader shouldn't read older reads after reading newer reads.
- Consistent prefix reads: reads must appear in the same order as writes.
- Multi-leader replication: write conflicts
- Conflict avoidance: write application code such that write conflicts do not occur. E.g. all writes of a user go through the same node.
- Converge writes:
- Last write wins (LWW)
- Prioritize nodes, higher priority node's write wins
- Merge conflicts
- Record conflict, fix on later read/write (possibly prompting user)
- Multi-leader replication topologies: how should writes propagate in nodes.
- All-to-all
- Circular
- Star/tree
- Leaderless replication: reads and writes are sent in parallel to many nodes, quorum response is taken.
- Read repair: if different read values are returned by different nodes, correct the nodes with wrong read values.
- Anti-entropy process: have background process checking differences in data across nodes, copy missing data. No guarantees on data propagation delays.
- Common issue:
- Sloppy-quorum: writes and reads ending up on different, non-overlapping nodes
- Write conflicts
- Concurrent reads and writes
- Partially succeeded writes