Lecture Given by Lindsey Kuper on April 10th, 2020 via YouTube
Previous | Next |
---|---|
Lecture 5 | Lecture 7 |
- Sending a message is active: you choose when and if to send a message
- Receiving a message is passive: you cannot choose either when or if a message will arrive. All you can do is react by capturing it
- Delivering a message is active: it is the conscious choice to act upon the contents of a received message.
But why would you want to queue a message before processing it? Typically, because messages need to be processed in the correct order, which could well be different from the order in which they were received.
If a process sends message M2
after M1
, then any process delivering both of these messages must deliver M1
first then M2
.
Failure to do this constitutes a protocol violation as described in the previous lecture as a "FIFO anomaly"
In this case, irrespective of the order in which process Bob
received messages m1
and m2
, it should always deliver message m1
first, followed by message m2
.
In the case that Bob
does not receive one (or either) of these messages, then no FIFO violation could have occurred, because here we are concerned with message delivery, not message receipt.
In real-life, it is unusual to have to implement FIFO delivery yourself because most distributed systems communicate using TCP which already implements FIFO packet delivery.
There are different ways of phrasing this, but one way is to say:
If
m1
's send happens beforem2
's send, thenm1
's delivery must happen beforem2
's delivery
Here's an example of such a violation
In process P1
, event A
happens in the causal history of event B
; therefore, any messages sent from P1
to P2
should be processed in the same causal order as the events that generated them.
But now, let's go back to the "Bob smells" example used in lecture 3
Q: Is this a FIFO violation?
A: No (but only in a vacuous sense...)
The reason is that a FIFO violation only occurs when two messages from the same originating process are delivered out of order by the receiving process.
In the above diagram, there is no single process that sends two distinct messages to the same receiving process. Here:
Alice
sends a single message toBob
Alice
sends a single message toCarol
Bob
sends a single message toAlice
Bob
sends a single message toCarol
Carol
is confused...
However, this scenario is still a causal anomaly. In general, at least three communicating processes are required to create a causal anomaly.
Yes. Messages can be sent either to a group of participants in a network (a multicast message), or to all participants in a network (a broadcast message). The idea of broadcast messages is something that will be dealt with later.
This is another correctness property.
If a process delivers message
M1
followed byM2
, then all processes delivering bothM1
andM2
must deliverM1
first followed byM2
.
Let's say we have two client processes C1
and C2
that each broadcast a message to two processes R1
and R2
.
In this scenario, processes R1
and R2
each maintain their own replica of some key/value store.
If processes R1
and R2
do not deliver the messages in the correct order, then we will encounter a violation that results in the replicas disagreeing with each other as to what the value of x
should be.
In other words, this violation creates an inconsistency between data replicas.
This is known as a Total-Order Anomaly and is created when process R1
delivers message m1
followed by m2
, but process R2
delivers message m2
followed by m1
.
Since we know that causal delivery also ensures FIFO delivery, we can start to arrange these delivery strategies in a hierarchy, with the weakest at the bottom.
Here, we will use the term YOLO
to indicate the delivery guarantee that makes no guarantees!
Where would Totally Ordered Delivery fit in to this scheme?
In fact, it would get its own branch because a FIFO anomaly is not necessarily an anomaly as far as Totally-Ordered Delivery is concerned.
We recall that a FIFO anomaly is the following:
But since the definition of Totally-Ordered Delivery says that all processes delivering both m1
and m2
must do so in a consistent order, the above FIFO anomaly is not an anomaly for Totally-Ordered Delivery because there is only one receiving process.
So the order in which that process delivers the messages is immaterial.
Thus, this scenario only vacuously conforms to a Totally-Ordered Delivery.
Conversely, Totally-Ordered Delivery violations are not necessarily FIFO violations.
This hierarchy helps us understand what we can and cannot expect out of a particular delivery guarantee. For instance, if we implement a system guaranteeing causal delivery, then in doing so, we would also be guaranteeing FIFO delivery, because FIFO delivery sits directly below Causal delivery in the hierarchy. However, if we implemented a FIFO delivery system, we could make no guarantees about causal delivery.
Similarly, if the system implements Totally-Ordered Delivery, then this guarantee, in and of itself, cannot ensure either FIFO or Causal delivery.
Turning this argument around, we can also gain an understanding of what type of anomalies can occur. For instance, if we have a FIFO anomaly, then this is also going to be a causal anomaly, but not necessarily a Totally-Ordered anomaly.
The rule here is that any process P2
delivering messages from some other process P1
, must do so in the order that P1
sent those messages; which, due to variations in network latency, might well be different from the order in which those messages arrive at P2
.
How then would we go about eliminating FIFO delivery anomalies?
One possibility is to use sequence numbers. This is where all messages from a given sender are tagged with a sequence number and a sender id. Each time a message is sent, the sender increments its sequence number. On the receiver's side, all the messages from a given sender are added to a queue ordered by the sequence number. When all the messages have arrived, the receiver can then deliver them in the correct order.
In this case, sequence numbers do not need to be unique across all the processes, because each message is also qualified with a sender id; therefore, it is the combination of the sender id and the sequence number that allows the receiver to discriminate who sent which message and in what order.
Problems with Sequence Numbers
What happens if a message is lost? Consider the following sequence of events:
Alice
sends three messages to Bob:m1
,m2
andm3
- Bob receives and correctly delivers messages
m1
andm2
- For some reason, Bob never received message
m3
- Unaware the message
m3
never arrived,Alice
sends messagesm4
andm5
whichBob
receives - However, because
Bob
is still waiting for the message with sequence number3
to arrive, he will delay the delivery of any subsequent messages (by adding them to a queue) - In this situation,
Bob
maye well end up waiting forever for the lost message to be delivered
Consequently, in a network where message delivery is unreliable, a naïve sequence number strategy like this will break as soon as message delivery fails for some reason.
Strategies to mitigate these problems could include:
- Buffering out of sequence messages for a pre-determined period of time, hoping that the late message arrives either before the message buffer fills or the pre-determined timeout expires
- Processing out of sequence messages on the assumption that the intervening message is lost. If this assumption turns out to be false and the message delivery was simply delayed, then the late message would have to be dropped
Neither of the above strategies are very good in that they tend to create more problems than they solve...
Now consider this situation. Are the conditions of FIFO delivery satisfied?
Yes, but only in a vacuous sense.
Due to the fact that Bob
drops all the messages he receives, zero messages are delivered; therefore, the conditions of FIFO delivery are vacuously satisfied.
In this approach, upon receipt of a message, every receiver must send a "message received" acknowledgment (such as ack
) back to the sender.
So, when Alice
sends a message to Bob
, neither Alice
nor Bob
need concern themselves with sequence numbers.
However, this approach has several distinct drawbacks:
- Communication now becomes sequential.
Alice
cannot sendm2
toBob
until she has received anack
fromBob
that he has receivedm1
- Increases the volume of network traffic
- We are still dependent upon a network that can guarantee reliable message delivery
One way of making this approach to communication more efficient is to gather messages into batches, thus decreasing the granularity of communication.
Let’s look again at the Causal Anomaly situation:
The problem here is that Carol
delivers the message she receives from Bob
out of causal order, thus resulting in confusion...
Here, we can use vector clocks to solve causal anomalies.
In the previous lecture, we looked at using vector clocks to count both message-send and -receive events; but in order to ensure Causal Delivery, it turns out that we only need to count message send events.
Before sending the message, Alice
updates her vector clock to [1,0,0]
Bob
receives the message from Alice
.
Q: Should he deliver it?
A: Yes, he has no reason not to.
Bob
delivers the message and discovers that it is not to his liking.
But, since Bob
has the emotional maturity of an eight-year-old, he fails to realise that soap and water will work far better than trading insults; so, he resorts to telling the world what he thinks of Alice
.
In delivering this message, Bob
examines the vector clock of the incoming message and discovers that its less than his, but only by the counter in Alice
's position.
This is to be expected, since the message came from Alice
.
He therefore uses the received vector clock to update his own vector clock, and then increments his position in the vector clock.
Since a broadcast message is treated as a single send event to multiple recipients, the same vector clock value of [1,1,0]
is sent as part of the messages to both Alice
and Carol
.
The message now arrives at Alice
.
Q: Should she deliver it?
A: Yes, she has no reason not to.
Alice
's vector clock is [1,0,0]
and the incoming vector clock on the message differs only by 1
in Bob
's position.
So, we can conclude that only one event has taken place since our last message send event, and that event happened in process Bob
from whom we received this message.
Bob
's message also arrives at Carol with vector clock [1,1,0]
, but earlier than Alice
's original message.
Q: Should she deliver it?
A: No — look at the vector clock values!
The reason is that compared to Carol
's vector clock (which is still set to [0,0,0]
), the vector clock on the incoming message is too big.
It’s fine for Bob
's position to be set to 1
because this is one bigger than Carol
's vector clock position for Bob
and the message came from Bob
.
But there's a 1
in Alice
's vector clock position.
Q: Hmmmm, that's odd. Where did that come from?
A: The value comes from the fact that this message is the response to some event that has taken place in Alice
, but that Carol doesn't yet know about.
In other words, as far as Carol
is concerned, this is a message from the future that has arrived too early and must therefore be buffered.
Finally, Alice
's original "Bob smells"
message arrives at Carol
.
Carol
now examines this message's vector clock and discovers that it has the expected value of [1,0,0]
; therefore, it is fine to deliver this message first.
Once this out-of-sequence message has been delivered, the message waiting in the buffer can be delivered because Carol
has now caught up with the event that took place in Alice
.
Carol
is no longer confused...
Previous | Next |
---|---|
Lecture 5 | Lecture 7 |