tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Data Migration > Zero-Downtime Migration > Blue-Green Deployment for Zero-Downtime Migration

Blue-Green Deployment for Zero-Downtime Migration

Author: Venkata Sudhakar

Zero-downtime migration is the practice of moving data, databases, or applications from one system to another without any period where the service is unavailable to users. Traditional "big bang" migrations require a maintenance window where the application is taken offline, data is migrated, and then the new system is brought up. For high-traffic production systems, any downtime directly translates to lost revenue and customer trust. Zero-downtime patterns eliminate this risk entirely.

Blue-Green deployment is one of the most reliable patterns for zero-downtime migration. You maintain two identical production environments called Blue (current live system) and Green (new system being migrated to). Traffic continues flowing to Blue while you set up and validate Green in parallel. Once Green is fully ready and validated, you switch the load balancer or DNS to point to Green. If anything goes wrong, a single switch rolls back to Blue in seconds. The key enabler is keeping both environments in sync during the transition, which is where CDC comes in.

The below example shows the complete Blue-Green migration flow using CDC to keep both databases in sync, with an AWS Application Load Balancer weighted routing rule to gradually shift traffic from Blue to Green.


It gives the following output,

{
  "name": "blue-to-green-sync",
  "connector": {"state": "RUNNING"},
  "tasks": [{"id": 0, "state": "RUNNING"}]
}

# Check Kafka consumer group lag (replication lag in messages)
kafka-consumer-groups.sh --bootstrap-server kafka:9092 \
  --describe --group green-sync-consumer

GROUP              TOPIC                    PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
green-sync-consumer migration.appdb.orders  0          45231           45232           1

Once CDC lag is confirmed near-zero, the below example shows how to gradually shift traffic using AWS ALB weighted target groups, moving from 100% Blue to 100% Green over time while monitoring error rates.


It gives the following output,

# After 10% canary (Week 1):
Blue error rate:  0.01%
Green error rate: 0.02%  (acceptable)
Blue p99 latency:  45ms
Green p99 latency: 38ms  (improved!)

# After full cutover (Week 2):
Migration complete. 100% traffic on Green. Blue remains on standby for rollback.

# Verify no errors after cutover:
HTTPCode_Target_5XX_Count (Green): 0
Healthy host count: 4/4

Rollback Plan:

The critical safety net in Blue-Green is that rollback is instant. If the Green environment shows elevated error rates or latency after the cutover, a single ALB rule change redirects 100% of traffic back to Blue. Because CDC was still running during the cutover, Blue stays current with any writes that went to Green, so no data is lost in either direction. Only stop the CDC sync and decommission Blue after Green has been proven stable for at least 24-48 hours in production.


 
  


  
bl  br