Skip to content

Conversation

ndyakov
Copy link
Member

@ndyakov ndyakov commented Jul 25, 2025

Hitless Upgrades

Seamless Redis connection handoffs during cluster changes without dropping connections.

Quick Start

client := redis.NewClient(&redis.Options{
    Addr:     "localhost:6379",
    Protocol: 3, // RESP3 required
    HitlessUpgrades: &hitless.Config{
        Mode: hitless.MaintNotificationsEnabled,
    },
})

Modes

  • MaintNotificationsDisabled - Hitless upgrades disabled
  • MaintNotificationsEnabled - Forcefully enabled (fails if server doesn't support)
  • MaintNotificationsAuto - Auto-detect server support (default)

Configuration

&hitless.Config{
    Mode:                       hitless.MaintNotificationsAuto,
    EndpointType:               hitless.EndpointTypeAuto,
    RelaxedTimeout:             10 * time.Second,
    HandoffTimeout:             15 * time.Second,
    MaxHandoffRetries:          3,
    MaxWorkers:                 0,    // Auto-calculated
    HandoffQueueSize:           0,    // Auto-calculated
    PostHandoffRelaxedDuration: 0,    // 2 * RelaxedTimeout
    LogLevel:                   logging.LogLevelError,
}

Endpoint Types

  • EndpointTypeAuto - Auto-detect based on connection (default)
  • EndpointTypeInternalIP - Internal IP address
  • EndpointTypeInternalFQDN - Internal FQDN
  • EndpointTypeExternalIP - External IP address
  • EndpointTypeExternalFQDN - External FQDN
  • EndpointTypeNone - No endpoint (reconnect with current config)

Auto-Scaling

Workers: min(PoolSize/2, max(10, PoolSize/3)) when auto-calculated
Queue: max(20×Workers, PoolSize) capped by MaxActiveConns+1 or 5×PoolSize

Examples:

  • Pool 100: 33 workers, 660 queue (capped at 500)
  • Pool 100 + MaxActiveConns 150: 33 workers, 151 queue

How It Works

  1. Redis sends push notifications about cluster changes
  2. Client creates new connections to updated endpoints
  3. Active operations transfer to new connections
  4. Old connections close gracefully

Supported Notifications

  • MOVING - Slot moving to new node
  • MIGRATING - Slot in migration state
  • MIGRATED - Migration completed
  • FAILING_OVER - Node failing over
  • FAILED_OVER - Failover completed

Hooks (Optional)

Monitor and customize hitless operations:

type NotificationHook interface {
    PreHook(ctx, notificationCtx, notificationType, notification) ([]interface{}, bool)
    PostHook(ctx, notificationCtx, notificationType, notification, result)
}

// Add custom hook
manager.AddNotificationHook(&MyHook{})

Metrics Hook Example

// Create metrics hook
metricsHook := hitless.NewMetricsHook()
manager.AddNotificationHook(metricsHook)

// Access collected metrics
metrics := metricsHook.GetMetrics()
fmt.Printf("Notification counts: %v\n", metrics["notification_counts"])
fmt.Printf("Processing times: %v\n", metrics["processing_times"])
fmt.Printf("Error counts: %v\n", metrics["error_counts"])

@ndyakov ndyakov force-pushed the ndyakov/CAE-1072-hitless-upgrades-2 branch 2 times, most recently from 49e5814 to 43aef14 Compare July 25, 2025 12:51
@ndyakov ndyakov changed the base branch from ndyakov/CAE-1088-resp3-notification-handlers to master July 25, 2025 12:52
@ndyakov ndyakov force-pushed the ndyakov/CAE-1072-hitless-upgrades-2 branch 21 times, most recently from 8608f93 to a39a23a Compare July 30, 2025 07:05
@ndyakov ndyakov force-pushed the ndyakov/CAE-1072-hitless-upgrades-2 branch 2 times, most recently from e88e673 to 4542e8f Compare August 4, 2025 13:01
@ndyakov ndyakov changed the base branch from master to ndyakov/CAE-1088-resp3-notification-handlers August 4, 2025 13:02
@ndyakov ndyakov force-pushed the ndyakov/CAE-1072-hitless-upgrades-2 branch 3 times, most recently from 9590c26 to 100c3d2 Compare August 4, 2025 13:57
@ndyakov ndyakov force-pushed the ndyakov/CAE-1072-hitless-upgrades-2 branch from c908056 to bfca15a Compare August 22, 2025 16:48
Copy link
Contributor

@htemelski-redis htemelski-redis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't see any glaring issues

Comment on lines 208 to 212
if c.Mode == "" {
result.Mode = defaults.Mode
} else {
result.Mode = c.Mode
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Can we simplify these if-else blocks

result.Mode = defaults.Mode
if c.Mode != "" {
    result.Mode = c.Mode
}

func (c *Config) applyWorkerDefaults(poolSize int) {
// Calculate defaults based on pool size
if poolSize <= 0 {
poolSize = 10 * runtime.GOMAXPROCS(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, GOMAXPROCS can lead to throttling

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the default of the pool at the moment, keeping it here as well. I agree that for containerized this is not correct with older go version.

@@ -73,10 +76,18 @@ func (c *PubSub) conn(ctx context.Context, newChannels []string) (*pool.Conn, er
return c.cn, nil
}

if c.opt.Addr == "" {
// TODO(hitless):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remain as a TODO, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@ndyakov ndyakov force-pushed the ndyakov/CAE-1072-hitless-upgrades-2 branch from d03b7fa to 22e1d74 Compare August 29, 2025 17:07
@ndyakov ndyakov force-pushed the ndyakov/CAE-1072-hitless-upgrades-2 branch from d7e7d44 to e6d4b46 Compare August 29, 2025 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants