|
|
Spring Boot Circuit Breaker with Resilience4j
Author: Venkata Sudhakar
In distributed microservices architectures, a downstream service can become slow or unavailable at any time. Without protection, a slow dependency causes threads in the calling service to pile up waiting for responses, eventually exhausting the thread pool and causing the calling service to fail too - a cascade failure that takes down your entire system. The Circuit Breaker pattern prevents this by wrapping calls to external services with a state machine that monitors failure rates and automatically stops calling a failing service, giving it time to recover. Resilience4j is the standard circuit breaker library for Spring Boot, replacing the deprecated Netflix Hystrix. It provides five core fault tolerance patterns: CircuitBreaker (the main pattern), Retry (automatic retries with backoff), RateLimiter (limit calls per second), Bulkhead (limit concurrent calls), and TimeLimiter (timeout on calls). These can be applied individually or combined as annotations on Spring service methods. The CircuitBreaker has three states: CLOSED (normal operation, calls pass through), OPEN (failure rate exceeded threshold, calls fail fast without even attempting the downstream call), and HALF_OPEN (testing if the downstream has recovered). The below example shows a Spring Boot service calling an external migration status API with a circuit breaker, retry, and fallback method using Resilience4j annotations.
application.yml configuration for the circuit breakers,
resilience4j:
circuitbreaker:
instances:
migrationApi:
sliding-window-size: 10
failure-rate-threshold: 50
wait-duration-in-open-state: 30s
permitted-number-of-calls-in-half-open-state: 3
slow-call-rate-threshold: 50
slow-call-duration-threshold: 3s
retry:
instances:
migrationApi:
max-attempts: 3
wait-duration: 500ms
exponential-backoff-multiplier: 2
retry-exceptions:
- java.net.ConnectException
- java.util.concurrent.TimeoutException
It gives the following output when the circuit breaker trips,
# First 5 calls succeed, next 5 fail (50% failure rate threshold exceeded):
GET /admin/circuit-breakers/migrationApi/state
-> CLOSED
# After 5 failures in the 10-call sliding window:
GET /admin/circuit-breakers/migrationApi/state
-> OPEN
# Calls now fail fast with fallback (no HTTP attempt made):
Circuit OPEN for job MIG-1042 - returning cached/default status.
GET /admin/circuit-breakers/migrationApi/metrics
-> State: OPEN | FailureRate: 50.0% | SlowCallRate: 0.0%
| Buffered: 10 | Failed: 5 | Successful: 5
# After 30s wait-duration-in-open-state:
GET /admin/circuit-breakers/migrationApi/state
-> HALF_OPEN
# 3 test calls allowed through. If they succeed, circuit closes:
-> CLOSED
Circuit breaker state transitions: CLOSED -> OPEN: failure rate exceeds failure-rate-threshold (50%) over the last sliding-window-size (10) calls. OPEN -> HALF_OPEN: after wait-duration-in-open-state (30s), the circuit allows permitted-number-of-calls-in-half-open-state (3) test calls through. HALF_OPEN -> CLOSED: test calls succeed - circuit resets to normal operation. HALF_OPEN -> OPEN: test calls fail - back to OPEN for another wait period. Always pair circuit breakers with a meaningful fallback that returns a safe default or a cached response - never return null from a fallback method.
|
|