In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Testing ADK Agents with Feature Flags and Progressive Rollout

Testing ADK Agents with Feature Flags and Progressive Rollout

Author: Venkata Sudhakar

Feature flag testing for ADK agents verifies that agents activate new behavior only when the corresponding flag is enabled, and fall back to the previous behavior when it is disabled. ShopMax India uses feature flags to progressively roll out updated recommendation logic to 10% of Mumbai customers first, then expand to Delhi and Bangalore after validating quality metrics - agents must be tested for both flag states before any rollout begins.

A FeatureFlagService resolves flag state (enabled/disabled) for a given user or session. The agent tool checks the flag and branches accordingly. Tests cover four scenarios: flag enabled returns new behavior, flag disabled returns old behavior, flag rollout percentage routes correctly (10% enabled, 90% disabled), and flag override for specific users works as expected. Using a fake flag service in tests avoids any dependency on a real flag platform like LaunchDarkly or GrowthBook.

The example below defines a FakeFeatureFlagService, tests the agent tool under enabled and disabled flag states, and verifies that a 10% rollout routes approximately the right fraction of users to the new behavior.

import pytest
from dataclasses import dataclass, field
from typing import Dict

@dataclass
class FakeFeatureFlagService:
    flags: Dict[str, bool] = field(default_factory=dict)
    rollout_pct: Dict[str, float] = field(default_factory=dict)

def is_enabled(self, flag: str, user_id: str = "") -> bool:
        if flag in self.flags:
            return self.flags[flag]
        pct = self.rollout_pct.get(flag, 0.0)
        return (hash(user_id) % 100) < (pct * 100)

def get_recommendations(user_id: str, flags: FakeFeatureFlagService) -> dict:
    if flags.is_enabled("new_recommendation_algo", user_id):
        return {"algo": "v2", "items": ["Samsung QLED", "LG OLED"], "city": "Mumbai"}
    return {"algo": "v1", "items": ["Samsung 4K TV"], "city": "Mumbai"}

def test_flag_enabled_uses_new_algo():
    flags = FakeFeatureFlagService(flags={"new_recommendation_algo": True})
    result = get_recommendations("user_001", flags)
    assert result["algo"] == "v2"
    assert len(result["items"]) == 2
    print(f"Flag ON: algo={result['algo']}, items={result['items']}")

def test_flag_disabled_uses_old_algo():
    flags = FakeFeatureFlagService(flags={"new_recommendation_algo": False})
    result = get_recommendations("user_001", flags)
    assert result["algo"] == "v1"
    print(f"Flag OFF: algo={result['algo']}")

def test_10pct_rollout_routes_correctly():
    flags = FakeFeatureFlagService(rollout_pct={"new_recommendation_algo": 0.10})
    users = [f"user_{i:04d}" for i in range(200)]
    enabled_count = sum(
        1 for u in users if flags.is_enabled("new_recommendation_algo", u)
    )
    enabled_pct = enabled_count / len(users)
    print(f"Rollout: {enabled_count}/200 users enabled ({enabled_pct:.1%})")
    assert 0.05 <= enabled_pct <= 0.20, f"Rollout pct {enabled_pct:.1%} outside expected 5-20% band"

It gives the following output,

Flag ON: algo=v2, items=['Samsung QLED', 'LG OLED']
Flag OFF: algo=v1
Rollout: 22/200 users enabled (11.0%)
3 passed in 0.05s

Always test both the enabled and disabled states in the same test run to prevent flag state from leaking between tests. Use deterministic hash-based rollout (as shown above) rather than random sampling so that the same user always gets the same experience and tests are reproducible. Add a flag cleanup fixture in conftest.py that resets all flags to disabled after each test module so that flag state does not bleed across test files in the CI run.

Send your comments, suggestions or queries regarding this site to [email protected].