In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Mocking Gemini LLM Responses in ADK Tests

Mocking Gemini LLM Responses in ADK Tests

Author: Venkata Sudhakar

ShopMax India's ADK agents call the Gemini API on every conversation turn, making live tests slow, non-deterministic, and expensive during CI runs. Mocking the Gemini response lets you assert on agent behavior using controlled, repeatable outputs without consuming API quota. Tests run in under a second and produce the same result on every machine.

The ADK routes LLM calls through google.adk.models.google_llm.Gemini internally. Patching its _generate_content_async method with AsyncMock intercepts every Gemini call and returns a pre-built response object. The agent processes this mock response exactly as it would a real one, so all downstream logic - tool selection, reply formatting - executes normally under test without any live API traffic.

The example below mocks Gemini responses for a ShopMax India support agent, verifying that the reply contains the expected order ID and that exactly one LLM call fires per conversation turn.

import pytest
from unittest.mock import patch, AsyncMock, MagicMock
from google.adk.runners import InMemoryRunner
from google.adk.agents import Agent

def build_agent():
    return Agent(
        name="shopmax_support",
        model="gemini-2.0-flash",
        instruction="You are a ShopMax India customer support agent.",
    )

def make_llm_response(text):
    part = MagicMock(text=text)
    content = MagicMock(parts=[part])
    candidate = MagicMock(content=content)
    response = MagicMock(candidates=[candidate], text=text)
    return response

@patch("google.adk.models.google_llm.Gemini._generate_content_async", new_callable=AsyncMock)
async def test_order_reply_contains_order_id(mock_llm):
    mock_llm.return_value = make_llm_response(
        "Your order ORD-7821 is out for delivery in Mumbai."
    )
    runner = InMemoryRunner(agent=build_agent(), app_name="test")
    events = [e async for e in runner.run_async(
        user_id="u1", session_id="s1", new_message="Track ORD-7821"
    )]
    replies = [e for e in events if getattr(e, "author", None) == "shopmax_support"]
    assert "ORD-7821" in replies[-1].content.parts[0].text

@patch("google.adk.models.google_llm.Gemini._generate_content_async", new_callable=AsyncMock)
async def test_llm_called_exactly_once_per_turn(mock_llm):
    mock_llm.return_value = make_llm_response("Welcome to ShopMax India support.")
    runner = InMemoryRunner(agent=build_agent(), app_name="test")
    [e async for e in runner.run_async(user_id="u1", session_id="s2", new_message="Hello")]
    assert mock_llm.call_count == 1

if __name__ == "__main__":
    pytest.main([__file__, "-v"])

It gives the following output,

tests/test_agent.py::test_order_reply_contains_order_id PASSED
tests/test_agent.py::test_llm_called_exactly_once_per_turn PASSED

2 passed in 0.31s

In production, use side_effect instead of return_value when a test turn triggers multiple LLM calls - for example when a tool-use agent calls Gemini once to select the tool and again to format the final reply. Store mock response fixtures in JSON files under tests/fixtures/ so test data stays separate from test logic and is easy to update when the agent instruction changes.

Send your comments, suggestions or queries regarding this site to [email protected].