A robust, graceful disconnection mechanism for RDMA connections inspired by TCP's proven reliability model
Visual representation of the three-way handshake disconnection sequence
CLIENT SERVER | | | 1. DISCONNECT_REQ | |---------------------------------------------->| | | | 2. DISCONNECT_ACK | |<----------------------------------------------| | | | 3. DISCONNECT_FIN | |---------------------------------------------->| | | [CLOSED] [CLOSED]
Both client and server confirm disconnection intent, ensuring no data loss and preventing abrupt connection termination.
Built-in timeout mechanism with automatic retry (3 attempts) ensures disconnection completes even with network issues.
8-state finite state machine tracks disconnection progress, preventing race conditions and ensuring orderly cleanup.
Verified cleanup of Queue Pairs (QPs), Completion Queues (CQs), and memory regions prevents resource leaks.
Real-time progress indicators show each step of the disconnection process for debugging and monitoring.
Comprehensive test suite with 8/8 protocol checks passing, ensuring reliability in production environments.
The protocol uses a finite state machine to track disconnection progress
Initial state
Client sent request
Server received request
Server sent ACK
Client received ACK
Client sent FIN
Server received FIN
Disconnection complete
How we built a robust disconnection protocol for RDMA
// Special protocol messages with unique markers #define DISCONNECT_REQ "$$DISCONNECT_REQ$$" // Client initiates #define DISCONNECT_ACK "$$DISCONNECT_ACK$$" // Server acknowledges #define DISCONNECT_FIN "$$DISCONNECT_FIN$$" // Client confirms // Timeout configuration #define DISCONNECT_TIMEOUT_CLIENT 5 // 5 seconds for client #define DISCONNECT_TIMEOUT_SERVER 10 // 10 seconds for server #define DISCONNECT_RETRY_COUNT 3 // Maximum retry attempts
static int initiate_graceful_disconnect(struct client_context *client) { printf("โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\n"); printf("โ INITIATING GRACEFUL DISCONNECTION โ\n"); printf("โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\n"); // Step 1: Send DISCONNECT_REQ client->disconnect_ctx.state = DISC_STATE_REQ_SENT; start_disconnect_timer(&client->disconnect_ctx); send_message(client, DISCONNECT_REQ); printf("โ [1/3] โ Sent DISCONNECT_REQ to server\n"); // Step 2: Wait for DISCONNECT_ACK (with timeout) // Step 3: Send DISCONNECT_FIN // Complete disconnection return 0; }
static int handle_disconnect_request(struct client_connection *client) { printf("โ Client %d: GRACEFUL DISCONNECTION INITIATED โ\n", client->client_id); printf("โ [1/3] โ Received DISCONNECT_REQ from client\n"); // Update state and send acknowledgment client->disconnect_ctx.state = DISC_STATE_REQ_RECEIVED; send_message(client, DISCONNECT_ACK); printf("โ [2/3] โ Sent DISCONNECT_ACK to client\n"); // Wait for DISCONNECT_FIN client->disconnect_ctx.state = DISC_STATE_ACK_SENT; return 0; }
Comprehensive testing ensures protocol reliability
See the protocol in action with actual terminal output
Understanding the design rationale
The original implementation used a simple "quit" message, which had several issues:
Inspired by TCP's connection termination, our protocol ensures:
Minimal overhead for maximum reliability
Average disconnection time
Protocol messages exchanged
Resource cleanup success rate
Resource leaks detected
Comprehensive test suite ensures protocol correctness
Single client disconnection with all protocol steps verified
Multiple clients disconnecting simultaneously
Ensures normal messages complete before disconnection
Validates all RDMA resources are properly released
Tests automatic retry and forced disconnection
Handles network failures and unexpected states
Try the disconnection protocol in your environment
# Clone the repository git clone https://github.com/linjiw/rdma-multi-client.git cd rdma-multi-client # Build the project make all # Run the validation demo ./scripts/disconnection/demo_final_validation.sh # Run comprehensive tests ./scripts/disconnection/demo_disconnect_validation.sh