Tolerate packet reordering in the early init process
Fixes a bug where packet reordering made the server give the client two peer ids instead of one. This in turn confused reliable packet sending and made connecting to the server fail. The client usually sends three packets at init: one "dummy" packet consisting of two 0 bytes, and the init packet as well as its legacy counterpart. The last one can be turned off since commit af30183124d40a969040d7de4b3a487feec466e4, but this is of lower relevance for the bug. The relevant part here is that network packet reorder (which is a normal occurence) can make the packets reach the server in different order. If reorder puts the dummy packet further behind, the following would happen before the patch: 1. The server will get one of the init packets on channel 1 and assign the client a peer id, as the packet will have zero as peer id. 2. The server sends a CONTROLTYPE_SET_PEER_ID packet to inform the client of the peer id. 3. The next packet from the client will contain the peer id set by the server. 4. The server sets the m_has_sent_with_id member for the client's peer structure to true. 5. Now the dummy packet arrives. It has a peer id of zero, therefore the server searches whether it already has a peer id for the address the packet was sent from. The search fails because m_has_sent_with_id was set to true and the server only searched for peers with m_has_sent_with_id set to false. 6. In a working setup, the server would assign the dummy packet to the correct peer id. However the server instead now assigns a second peer id and peer structure to the peer, and assign the packet to that new peer. 7. In order to inform the peer of its peer id, the server sends a CONTROLTYPE_SET_PEER_ID command packet, reliably, to the peer. This packet uses the new peer id. 8. The client sends an ack to that packet, not with the new peer id but with the peer id sent in 2. 9. This packet reaches the server, but it drops the ACK as the peer id does not map to any un-ACK-ed packets with that seqnum. The same time, the server still waits for an ACK with the new peer id, which of course won't come. This causes the server to periodically re-try sending that packet, and the client ACKing it each time. Steps 7-9 cause annoyances and erroneous output, but don't cause the connection failure itself. The actual mistake that causes the connection failure happens in 6: The server does not assign the dummy packet to the correct peer, but to a newly created one. Therefore, all further packets sent by the client on channel 0 are now buffered by the server as it waits for the dummy packet to reach the peer, which of course doesn't happen as the server assigned that packet to the second peer it created for the client. This makes the connection code indefinitely buffer the TOSERVER_CLIENT_READY packet, not passing it to higher level code, which stalls the continuation of the further init process indefinitely and causes the actual bug. Maybe this can be caused by reordered init packets as well, the only studied case was where network has reliably reordered the dummy packet to get sent after the init packets. The patch fixes the bug by not ignoring peers where m_has_sent_with_id has been set anymore. The other changes of the patch are just cleanups of unused methods and fields and additional explanatory comments. One could think of alternate ways to fix the bug: * The client could simply take the new peer id and continue communicating with that. This is however worse than the fix as it requires the peer id set command to be sent reliably (which currently happens, but it cant be changed anymore). Also, such a change would require both server and client to be patched in order for the bug to be fixed, as right now the client ignores peer id set commands after the peer id is different from PEER_ID_INEXISTENT and the server requires modification too to change the peer id internally. And, most importantly, right now we guarantee higher level server code that the peer id for a certain peer does not change. This guarantee would have to be broken, and it would require much larger changes to the server than this patch means. * One could stop sending the dummy packet. One may be unsure whether this is a good idea, as the meaning of the dummy packet is not known (it might be there for something important), and as it is possible that the init packets may cause this problem as well (although it may be possible too that they can't cause this). Thanks to @auouymous who had originally reported this bug and who has helped patiently in finding its cause.
This commit is contained in:
parent
f64a6259b2
commit
423d8c1b0d
@ -2167,12 +2167,12 @@ void ConnectionReceiveThread::receive()
|
||||
throw InvalidIncomingDataException("Channel doesn't exist");
|
||||
}
|
||||
|
||||
/* preserve original peer_id for later usage */
|
||||
u16 packet_peer_id = peer_id;
|
||||
|
||||
/* Try to identify peer by sender address (may happen on join) */
|
||||
if (peer_id == PEER_ID_INEXISTENT) {
|
||||
peer_id = m_connection->lookupPeer(sender);
|
||||
// We do not have to remind the peer of its
|
||||
// peer id as the CONTROLTYPE_SET_PEER_ID
|
||||
// command was sent reliably.
|
||||
}
|
||||
|
||||
/* The peer was not found in our lists. Add it. */
|
||||
@ -2214,11 +2214,6 @@ void ConnectionReceiveThread::receive()
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* mark peer as seen with id */
|
||||
if (!(packet_peer_id == PEER_ID_INEXISTENT))
|
||||
peer->setSentWithID();
|
||||
|
||||
peer->ResetTimeout();
|
||||
|
||||
Channel *channel = 0;
|
||||
@ -2756,7 +2751,7 @@ u16 Connection::lookupPeer(Address& sender)
|
||||
for(; j != m_peers.end(); ++j)
|
||||
{
|
||||
Peer *peer = j->second;
|
||||
if (peer->isActive())
|
||||
if (peer->isPendingDeletion())
|
||||
continue;
|
||||
|
||||
Address tocheck;
|
||||
|
@ -663,8 +663,7 @@ class Peer {
|
||||
m_last_rtt(-1.0),
|
||||
m_usage(0),
|
||||
m_timeout_counter(0.0),
|
||||
m_last_timeout_check(porting::getTimeMs()),
|
||||
m_has_sent_with_id(false)
|
||||
m_last_timeout_check(porting::getTimeMs())
|
||||
{
|
||||
m_rtt.avg_rtt = -1.0;
|
||||
m_rtt.jitter_avg = -1.0;
|
||||
@ -687,21 +686,16 @@ class Peer {
|
||||
virtual void PutReliableSendCommand(ConnectionCommand &c,
|
||||
unsigned int max_packet_size) {};
|
||||
|
||||
virtual bool isActive() { return false; };
|
||||
|
||||
virtual bool getAddress(MTProtocols type, Address& toset) = 0;
|
||||
|
||||
bool isPendingDeletion()
|
||||
{ MutexAutoLock lock(m_exclusive_access_mutex); return m_pending_deletion; };
|
||||
|
||||
void ResetTimeout()
|
||||
{MutexAutoLock lock(m_exclusive_access_mutex); m_timeout_counter=0.0; };
|
||||
|
||||
bool isTimedOut(float timeout);
|
||||
|
||||
void setSentWithID()
|
||||
{ MutexAutoLock lock(m_exclusive_access_mutex); m_has_sent_with_id = true; };
|
||||
|
||||
bool hasSentWithID()
|
||||
{ MutexAutoLock lock(m_exclusive_access_mutex); return m_has_sent_with_id; };
|
||||
|
||||
unsigned int m_increment_packets_remaining;
|
||||
unsigned int m_increment_bytes_remaining;
|
||||
|
||||
@ -776,8 +770,6 @@ class Peer {
|
||||
float m_timeout_counter;
|
||||
|
||||
u32 m_last_timeout_check;
|
||||
|
||||
bool m_has_sent_with_id;
|
||||
};
|
||||
|
||||
class UDPPeer : public Peer
|
||||
@ -795,9 +787,6 @@ public:
|
||||
void PutReliableSendCommand(ConnectionCommand &c,
|
||||
unsigned int max_packet_size);
|
||||
|
||||
bool isActive()
|
||||
{ return ((hasSentWithID()) && (!m_pending_deletion)); };
|
||||
|
||||
bool getAddress(MTProtocols type, Address& toset);
|
||||
|
||||
void setNonLegacyPeer();
|
||||
|
Loading…
x
Reference in New Issue
Block a user