In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Data Migration > Database Migration > Database Migration Strategies - Big Bang vs Phased

Database Migration Strategies - Big Bang vs Phased

Author: Venkata Sudhakar

A database migration strategy defines how you move data, schema, and application logic from a source system to a target system. Choosing the wrong strategy is one of the most common reasons enterprise migrations fail, go over budget, or cause production outages. The two fundamental approaches are Big Bang migration (move everything at once in a single cutover event) and Phased migration (move data and applications incrementally over weeks or months). Each has a very different risk profile, cost structure, and suitability depending on the system size, business requirements, and tolerance for downtime.

Big Bang migration is straightforward: freeze the source system, extract and load all data to the target, validate, and switch traffic. It is simpler to plan and execute for small systems but becomes extremely risky for large databases. A 10 TB database may take 8-12 hours to migrate, requiring a maintenance window of that duration. Any data validation failure during that window means either accepting a corrupt target or rolling back the entire migration - often losing hours of work. Big Bang is only appropriate when the maintenance window is acceptable to the business and the data volume is small enough to migrate and validate within that window.

The below example shows the decision framework and a Big Bang migration script for a MySQL to PostgreSQL migration using pg_loader, including validation queries to confirm data integrity after the move.

# Big Bang Migration: MySQL to PostgreSQL using pgloader
# Suitable for databases under 50 GB with an acceptable maintenance window

# Step 1: Create a pgloader configuration file
cat > migrate.load << EOF
LOAD DATABASE
    FROM mysql://appuser:password@mysql-host:3306/appdb
    INTO postgresql://appuser:password@pg-host:5432/appdb

WITH include drop,
     create tables,
     create indexes,
     foreign keys,
     reset sequences

SET work_mem to '256MB',
    maintenance_work_mem to '512MB'

CASTING TYPE tinyint(1) TO boolean USING tinyint-to-boolean,
         TYPE datetime  TO timestamptz;
EOF

# Step 2: Run the migration (requires maintenance window)
echo "Starting migration at $(date)"
pgloader migrate.load
echo "Migration completed at $(date)"

# Step 3: Validate row counts match
python3 << PYEOF
import mysql.connector
import psycopg2

mysql_conn = mysql.connector.connect(host="mysql-host", user="appuser", password="password", database="appdb")
pg_conn = psycopg2.connect(host="pg-host", user="appuser", password="password", dbname="appdb")

tables = ["customers", "orders", "products", "order_items"]
print("Table               MySQL       PostgreSQL  Match?")
print("-" * 55)
for table in tables:
    mysql_conn.cursor().execute(f"SELECT COUNT(*) FROM {table}")
    mysql_count = mysql_conn.cursor().fetchone()[0]
    pg_conn.cursor().execute(f"SELECT COUNT(*) FROM {table}")
    pg_count = pg_conn.cursor().fetchone()[0]
    match = "OK" if mysql_count == pg_count else "MISMATCH!"
    print(f"{table:<20} {mysql_count:<12} {pg_count:<12} {match}")
PYEOF

It gives the following output,

Starting migration at Thu Jan 15 02:00:00 UTC 2024

postgresql://appuser@pg-host/appdb

  table name        errors    rows      bytes      total time
  public.customers       0   125000    18.2 MB       00:00:04
  public.orders          0   890000   245.1 MB       00:00:31
  public.products        0    12000     2.1 MB       00:00:01
  public.order_items     0  3200000   890.3 MB       00:02:15

Migration completed at Thu Jan 15 02:03:12 UTC 2024

Table               MySQL       PostgreSQL  Match?
-------------------------------------------------------
customers           125000      125000      OK
orders              890000      890000      OK
products            12000       12000       OK
order_items         3200000     3200000     OK

The below example shows a Phased migration approach using parallel-run mode - both systems run simultaneously with CDC keeping them in sync, allowing gradual validation and traffic shifting before final cutover.

# Phased Migration Plan: 4-phase approach over 8 weeks

# PHASE 1 (Week 1-2): Schema migration and initial data load
#   - Create target schema (PostgreSQL)
#   - Migrate historical/reference data (low risk tables first)
#   - Set up CDC with Debezium to sync ongoing changes from MySQL

# PHASE 2 (Week 3-4): CDC sync and validation
#   - CDC running, both databases in sync
#   - Application still writing to MySQL only
#   - Run validation reports to confirm data quality

# Validation script (run daily during phase 2)
python3 validate_sync.py --source mysql://mysql-host/appdb \
                         --target postgresql://pg-host/appdb \
                         --tables customers,orders,products \
                         --sample-size 10000

# PHASE 3 (Week 5-6): Dual-write and read shifting
#   - Application writes to BOTH MySQL and PostgreSQL
#   - Gradually shift read traffic to PostgreSQL (10% -> 50% -> 100%)
#   - Monitor error rates and latency on both targets

# AWS ALB weighted routing - shift 20% reads to PostgreSQL
aws elbv2 modify-listener-rule \
  --rule-arn arn:aws:elasticloadbalancing:...:listener-rule/xxx \
  --conditions Field=http-request-method,Values=GET \
  --actions Type=forward,ForwardConfig='{
    "TargetGroups": [
      {"TargetGroupArn": "...mysql-backend", "Weight": 80},
      {"TargetGroupArn": "...pg-backend",    "Weight": 20}
    ]
  }'

# PHASE 4 (Week 7-8): Cutover and decommission
#   - 100% traffic to PostgreSQL
#   - Keep MySQL running for 2 weeks as fallback
#   - Decommission MySQL after validation period

echo "Phased migration reduces risk by validating in production gradually."
echo "Rollback at any phase: just redirect traffic back to MySQL."

It gives the following output,

# Validation script output (Phase 2 daily report):
Sync Validation Report - 2024-01-18 09:00:00

Table       Source Rows  Target Rows  Lag (rows)  Lag (seconds)  Status
customers   125,847      125,847      0           0.2            IN SYNC
orders      892,341      892,339      2           0.8            IN SYNC
products    12,003       12,003       0           0.1            IN SYNC

Sample check (10000 random rows): 100% match
CDC consumer lag: 3 messages (0.4 seconds behind)

RECOMMENDATION: Sync quality is acceptable. Ready to proceed to Phase 3.

Strategy selection decision framework:

Choose Big Bang when: database size is under 50 GB, business can accept 2-6 hours of downtime, the migration must be completed before a hard deadline, or the system is low-criticality with few concurrent users.

Choose Phased migration when: database is over 50 GB, zero-downtime is required, the system is business-critical with 24/7 uptime requirements, the schema or data model is changing significantly (requiring dual-write logic), or you need the ability to validate gradually in production before committing to the cutover.

Send your comments, suggestions or queries regarding this site to [email protected].