Skip to content

Commit 3e93ba6

Browse files
Improve our various round-trip timeout computations. (#33587)
We had a few issues: 1) Our "round-trip timeout" only accounted for one side of the round-trip needing to do MRP retransmits. So if the sender retried a few times, the last retry finally got through, then the response had to be retried a well, the sender would time out the exchange before the response was received. The fix here is to add the MRP backoff times for both the initial message and the response. 2) ReadClient could end up timing out a subscription before the server had actually given up on receiving a StatusReport in response to its ReportData, in situations where the server ended up doing MRP retries and the last MRP retry took a few seconds to get through the network. The fix here is to just estimate how long the server will be waiting for the StatusReport and not time out the subscription until then; at that point even if the server did in fact send its report on time, it will have dropped the subscription on its end.
1 parent b81370b commit 3e93ba6

10 files changed

+128
-25
lines changed

src/app/OperationalSessionSetup.cpp

+5
Original file line numberDiff line numberDiff line change
@@ -792,6 +792,11 @@ CHIP_ERROR OperationalSessionSetup::ScheduleSessionSetupReattempt(System::Clock:
792792
// but in practice for old devices BUSY often sends some hardcoded value
793793
// that tells us nothing about when the other side will decide it has
794794
// timed out.
795+
//
796+
// Unfortunately, we do not have the MRP config for the other side here,
797+
// but in practice if the other side is using its local config to
798+
// compute Sigma2 response timeouts, then it's also returning useful
799+
// values with BUSY, so we will wait long enough.
795800
auto additionalTimeout = CASESession::ComputeSigma2ResponseTimeout(GetLocalMRPConfig().ValueOr(GetDefaultMRPConfig()));
796801
actualTimerDelay += additionalTimeout;
797802
}

src/app/ReadClient.cpp

+17-15
Original file line numberDiff line numberDiff line change
@@ -938,24 +938,26 @@ CHIP_ERROR ReadClient::ComputeLivenessCheckTimerTimeout(System::Clock::Timeout *
938938

939939
//
940940
// To calculate the duration we're willing to wait for a report to come to us, we take into account the maximum interval of
941-
// the subscription AND the time it takes for the report to make it to us in the worst case. This latter bit involves
942-
// computing the Ack timeout from the publisher for the ReportData message being sent to us using our IDLE interval as the
943-
// basis for that computation.
941+
// the subscription AND the time it takes for the report to make it to us in the worst case.
944942
//
945-
// Make sure to use the retransmission computation that includes backoff. For purposes of that computation, treat us as
946-
// active now (since we are right now sending/receiving messages), and use the default "how long are we guaranteed to stay
947-
// active" threshold for now.
943+
// We have no way to estimate what the network latency will be, but we do know the other side will time out its ReportData
944+
// after its computed round-trip timeout plus the processing time it gives us (app::kExpectedIMProcessingTime). Once it
945+
// times out, assuming it sent the report at all, there's no point in us thinking we still have a subscription.
948946
//
949-
// TODO: We need to find a good home for this logic that will correctly compute this based on transport. For now, this will
950-
// suffice since we don't use TCP as a transport currently and subscriptions over BLE aren't really a thing.
947+
// We can't use ComputeRoundTripTimeout() on the session for two reasons: we want the roundtrip timeout from the point of
948+
// view of the peer, not us, and we want to start off with the assumption the peer will likely have, which is that we are
949+
// idle, whereas ComputeRoundTripTimeout() uses the current activity state of the peer.
951950
//
952-
const auto & localMRPConfig = GetLocalMRPConfig();
953-
const auto & defaultMRPConfig = GetDefaultMRPConfig();
954-
const auto & ourMrpConfig = localMRPConfig.ValueOr(defaultMRPConfig);
955-
auto publisherTransmissionTimeout =
956-
GetRetransmissionTimeout(ourMrpConfig.mActiveRetransTimeout, ourMrpConfig.mIdleRetransTimeout,
957-
System::SystemClock().GetMonotonicTimestamp(), ourMrpConfig.mActiveThresholdTime);
958-
*aTimeout = System::Clock::Seconds16(mMaxInterval) + publisherTransmissionTimeout;
951+
// So recompute the round-trip timeout directly. Assume MRP, since in practice that is likely what is happening.
952+
auto & peerMRPConfig = mReadPrepareParams.mSessionHolder->GetRemoteMRPConfig();
953+
// Peer will assume we are idle (hence we pass kZero to GetMessageReceiptTimeout()), but will assume we treat it as active
954+
// for the response, so to match the retransmission timeout computation for the message back to the peeer, we should treat
955+
// it as active.
956+
auto roundTripTimeout = mReadPrepareParams.mSessionHolder->GetMessageReceiptTimeout(System::Clock::kZero) +
957+
kExpectedIMProcessingTime +
958+
GetRetransmissionTimeout(peerMRPConfig.mActiveRetransTimeout, peerMRPConfig.mIdleRetransTimeout,
959+
System::SystemClock().GetMonotonicTimestamp(), peerMRPConfig.mActiveThresholdTime);
960+
*aTimeout = System::Clock::Seconds16(mMaxInterval) + roundTripTimeout;
959961
return CHIP_NO_ERROR;
960962
}
961963

src/darwin/Framework/CHIPTests/MTRControllerTests.m

+4
Original file line numberDiff line numberDiff line change
@@ -1557,6 +1557,10 @@ - (void)testSetMRPParameters
15571557
// Can be called before starting the factory
15581558
XCTAssertFalse(MTRDeviceControllerFactory.sharedInstance.running);
15591559
MTRSetMessageReliabilityParameters(@2000, @2000, @2000, @2000);
1560+
1561+
// Now reset back to the default state, so timings in other tests are not
1562+
// affected.
1563+
MTRSetMessageReliabilityParameters(nil, nil, nil, nil);
15601564
}
15611565

15621566
@end

src/darwin/Framework/CHIPTests/MTRPerControllerStorageTests.m

+4
Original file line numberDiff line numberDiff line change
@@ -2140,6 +2140,10 @@ - (void)testSetMRPParametersWithRunningController
21402140
XCTAssertTrue(controller.running);
21412141
MTRSetMessageReliabilityParameters(@2000, @2000, @2000, @2000);
21422142
[controller shutdown];
2143+
2144+
// Now reset back to the default state, so timings in other tests are not
2145+
// affected.
2146+
MTRSetMessageReliabilityParameters(nil, nil, nil, nil);
21432147
}
21442148

21452149
// TODO: This might also want to go in a separate test file, with some shared setup for commissioning devices per test

src/protocols/secure_channel/CASESession.cpp

+23-8
Original file line numberDiff line numberDiff line change
@@ -2362,21 +2362,36 @@ CHIP_ERROR CASESession::OnMessageReceived(ExchangeContext * ec, const PayloadHea
23622362
return err;
23632363
}
23642364

2365-
System::Clock::Timeout CASESession::ComputeSigma1ResponseTimeout(const ReliableMessageProtocolConfig & remoteMrpConfig)
2365+
namespace {
2366+
System::Clock::Timeout ComputeRoundTripTimeout(ExchangeContext::Timeout serverProcessingTime,
2367+
const ReliableMessageProtocolConfig & remoteMrpConfig)
23662368
{
2369+
// TODO: This is duplicating logic from Session::ComputeRoundTripTimeout. Unfortunately, it's called by
2370+
// consumers who do not have a session.
2371+
const auto & maybeLocalMRPConfig = GetLocalMRPConfig();
2372+
const auto & defaultMRRPConfig = GetDefaultMRPConfig();
2373+
const auto & localMRPConfig = maybeLocalMRPConfig.ValueOr(defaultMRRPConfig);
23672374
return GetRetransmissionTimeout(remoteMrpConfig.mActiveRetransTimeout, remoteMrpConfig.mIdleRetransTimeout,
2368-
// Assume peer is idle, since that's what we
2369-
// will assume for our initial message.
2375+
// Assume peer is idle, as a worst-case assumption (probably true for
2376+
// Sigma1, since that will be our initial message on the session, but less
2377+
// so for Sigma2).
23702378
System::Clock::kZero, remoteMrpConfig.mActiveThresholdTime) +
2371-
kExpectedSigma1ProcessingTime;
2379+
serverProcessingTime +
2380+
GetRetransmissionTimeout(localMRPConfig.mActiveRetransTimeout, localMRPConfig.mIdleRetransTimeout,
2381+
// Peer will assume we are active, since it's
2382+
// responding to our message.
2383+
System::SystemClock().GetMonotonicTimestamp(), localMRPConfig.mActiveThresholdTime);
2384+
}
2385+
} // anonymous namespace
2386+
2387+
System::Clock::Timeout CASESession::ComputeSigma1ResponseTimeout(const ReliableMessageProtocolConfig & remoteMrpConfig)
2388+
{
2389+
return ComputeRoundTripTimeout(kExpectedSigma1ProcessingTime, remoteMrpConfig);
23722390
}
23732391

23742392
System::Clock::Timeout CASESession::ComputeSigma2ResponseTimeout(const ReliableMessageProtocolConfig & remoteMrpConfig)
23752393
{
2376-
return GetRetransmissionTimeout(remoteMrpConfig.mActiveRetransTimeout, remoteMrpConfig.mIdleRetransTimeout,
2377-
// Assume peer is idle, as a worst-case assumption.
2378-
System::Clock::kZero, remoteMrpConfig.mActiveThresholdTime) +
2379-
kExpectedHighProcessingTime;
2394+
return ComputeRoundTripTimeout(kExpectedHighProcessingTime, remoteMrpConfig);
23802395
}
23812396

23822397
bool CASESession::InvokeBackgroundWorkWatchdog()

src/transport/GroupSession.h

+14
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,13 @@ class IncomingGroupSession : public Session, public ReferenceCounted<IncomingGro
7575
return System::Clock::Timeout();
7676
}
7777

78+
System::Clock::Milliseconds32 GetMessageReceiptTimeout(System::Clock::Timestamp ourLastActivity) const override
79+
{
80+
// There are no timeouts for group sessions.
81+
VerifyOrDie(false);
82+
return System::Clock::Timeout();
83+
}
84+
7885
GroupId GetGroupId() const { return mGroupId; }
7986

8087
private:
@@ -127,6 +134,13 @@ class OutgoingGroupSession : public Session, public ReferenceCounted<OutgoingGro
127134
return System::Clock::Timeout();
128135
}
129136

137+
System::Clock::Milliseconds32 GetMessageReceiptTimeout(System::Clock::Timestamp ourLastActivity) const override
138+
{
139+
// There are no timeouts for group sessions.
140+
VerifyOrDie(false);
141+
return System::Clock::Timeout();
142+
}
143+
130144
GroupId GetGroupId() const { return mGroupId; }
131145

132146
private:

src/transport/SecureSession.h

+21
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,27 @@ class SecureSession : public Session, public ReferenceCounted<SecureSession, Sec
179179
return System::Clock::Timeout();
180180
}
181181

182+
System::Clock::Milliseconds32 GetMessageReceiptTimeout(System::Clock::Timestamp ourLastActivity) const override
183+
{
184+
switch (mPeerAddress.GetTransportType())
185+
{
186+
case Transport::Type::kUdp: {
187+
const auto & maybeLocalMRPConfig = GetLocalMRPConfig();
188+
const auto & defaultMRRPConfig = GetDefaultMRPConfig();
189+
const auto & localMRPConfig = maybeLocalMRPConfig.ValueOr(defaultMRRPConfig);
190+
return GetRetransmissionTimeout(localMRPConfig.mActiveRetransTimeout, localMRPConfig.mIdleRetransTimeout,
191+
ourLastActivity, localMRPConfig.mActiveThresholdTime);
192+
}
193+
case Transport::Type::kTcp:
194+
return System::Clock::Seconds16(30);
195+
case Transport::Type::kBle:
196+
return System::Clock::Milliseconds32(BTP_ACK_TIMEOUT_MS);
197+
default:
198+
break;
199+
}
200+
return System::Clock::Timeout();
201+
}
202+
182203
const PeerAddress & GetPeerAddress() const { return mPeerAddress; }
183204
void SetPeerAddress(const PeerAddress & address) { mPeerAddress = address; }
184205

src/transport/Session.cpp

+4-1
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,10 @@ System::Clock::Timeout Session::ComputeRoundTripTimeout(System::Clock::Timeout u
7070
{
7171
return System::Clock::kZero;
7272
}
73-
return GetAckTimeout() + upperlayerProcessingTimeout;
73+
74+
// Treat us as active for purposes of GetMessageReceiptTimeout(), since the
75+
// other side would be responding to our message.
76+
return GetAckTimeout() + upperlayerProcessingTimeout + GetMessageReceiptTimeout(System::SystemClock().GetMonotonicTimestamp());
7477
}
7578

7679
uint16_t Session::SessionIdForLogging() const

src/transport/Session.h

+15-1
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,21 @@ class Session
244244
virtual bool AllowsLargePayload() const = 0;
245245
virtual const SessionParameters & GetRemoteSessionParameters() const = 0;
246246
virtual System::Clock::Timestamp GetMRPBaseTimeout() const = 0;
247-
virtual System::Clock::Milliseconds32 GetAckTimeout() const = 0;
247+
// GetAckTimeout is the estimate for how long it could take for the other
248+
// side to receive our message (accounting for our MRP retransmits if it
249+
// gets lost) and send a response.
250+
virtual System::Clock::Milliseconds32 GetAckTimeout() const = 0;
251+
252+
// GetReceiptTimeout is the estimate for how long it could take for us to
253+
// receive a message after the other side sends it, accounting for the MRP
254+
// retransmits the other side might do if the message gets lost.
255+
//
256+
// The caller is expected to provide an estimate for when the peer would
257+
// last have heard from us. The most likely values to pass are
258+
// System::SystemClock().GetMonotonicTimestamp() (to indicate "peer is
259+
// responding to a message it just received") and System::Clock::kZero (to
260+
// indicate "peer is reaching out to us, not in response to anything").
261+
virtual System::Clock::Milliseconds32 GetMessageReceiptTimeout(System::Clock::Timestamp ourLastActivity) const = 0;
248262

249263
const ReliableMessageProtocolConfig & GetRemoteMRPConfig() const { return GetRemoteSessionParameters().GetMRPConfig(); }
250264

src/transport/UnauthenticatedSessionTable.h

+21
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,27 @@ class UnauthenticatedSession : public Session, public ReferenceCounted<Unauthent
107107
return System::Clock::Timeout();
108108
}
109109

110+
System::Clock::Milliseconds32 GetMessageReceiptTimeout(System::Clock::Timestamp ourLastActivity) const override
111+
{
112+
switch (mPeerAddress.GetTransportType())
113+
{
114+
case Transport::Type::kUdp: {
115+
const auto & maybeLocalMRPConfig = GetLocalMRPConfig();
116+
const auto & defaultMRRPConfig = GetDefaultMRPConfig();
117+
const auto & localMRPConfig = maybeLocalMRPConfig.ValueOr(defaultMRRPConfig);
118+
return GetRetransmissionTimeout(localMRPConfig.mActiveRetransTimeout, localMRPConfig.mIdleRetransTimeout,
119+
ourLastActivity, localMRPConfig.mActiveThresholdTime);
120+
}
121+
case Transport::Type::kTcp:
122+
return System::Clock::Seconds16(30);
123+
case Transport::Type::kBle:
124+
return System::Clock::Milliseconds32(BTP_ACK_TIMEOUT_MS);
125+
default:
126+
break;
127+
}
128+
return System::Clock::Timeout();
129+
}
130+
110131
NodeId GetPeerNodeId() const
111132
{
112133
if (mSessionRole == SessionRole::kInitiator)

0 commit comments

Comments
 (0)