In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Prompt Engineering > Contrastive Chain-of-Thought - Using Good and Bad Examples

Contrastive Chain-of-Thought - Using Good and Bad Examples

Author: Venkata Sudhakar

Contrastive chain-of-thought prompting provides both a correct worked example and an incorrect worked example, asking the LLM to reason like the correct one and avoid the mistakes in the incorrect one. At ShopMax India, when training an LLM to assess warranty claim validity, showing both a good assessment (claim approved with clear reasoning) and a bad assessment (claim denied without checking policy) dramatically sharpens the quality of subsequent judgments.

The contrastive approach works because LLMs learn from contrast: seeing what NOT to do is often as instructive as seeing what to do. The bad example highlights failure patterns - circular reasoning, missed facts, policy violations - that the model might otherwise reproduce. The good example shows the target reasoning style. Together they bracket the expected behavior more precisely than a single positive example alone.

The example below shows ShopMax India using contrastive CoT for warranty claim assessment. One good and one bad example are embedded in the prompt, followed by three new claims for the model to assess.

import anthropic

client = anthropic.Anthropic()

SYSTEM = """
You are a ShopMax India warranty assessor. Assess each claim as APPROVED or DENIED with a brief reason.

GOOD EXAMPLE (follow this reasoning style):
Claim: Samsung TV, purchased 8 months ago, screen flickering. Warranty: 12 months.
Reasoning: Product is within the 12-month warranty period. Screen flickering is a
covered defect under display faults. No physical damage mentioned.
Decision: APPROVED - covered display defect within warranty period.

BAD EXAMPLE (avoid these mistakes):
Claim: LG AC, purchased 14 months ago, not cooling. Warranty: 12 months.
Reasoning: Customer is unhappy and we should help them.
Decision: APPROVED - customer deserves service.
[ERROR: This ignores the expired warranty. Approval must be based on policy, not sympathy.]

Always check: (1) Is product within warranty period? (2) Is the defect type covered? (3) Is there evidence of misuse?
"""

claims = [
    "Whirlpool washing machine, purchased 5 months ago, drum not spinning. Warranty: 24 months.",
    "Sony TV, purchased 18 months ago, remote control broken. Warranty: 12 months.",
    "Daikin AC, purchased 3 months ago, compressor noise. Warranty: 60 months. Customer dropped it while installing."
]

for claim in claims:
    r = client.messages.create(
        model="claude-haiku-4-5", max_tokens=128,
        system=SYSTEM,
        messages=[{"role": "user", "content": "Assess: " + claim}]
    )
    print("Claim:", claim[:60])
    print("Assessment:", r.content[0].text.strip())
    print()

It gives the following output,

Claim: Whirlpool washing machine, purchased 5 months ago, drum n
Assessment: Reasoning: 5 months is within the 24-month warranty. Drum failure
is a covered mechanical defect. No misuse mentioned.
Decision: APPROVED - mechanical defect within warranty period.

Claim: Sony TV, purchased 18 months ago, remote control broken.
Assessment: Reasoning: 18 months exceeds the 12-month warranty period.
Decision: DENIED - warranty expired 6 months ago. Offer out-of-warranty repair.

Claim: Daikin AC, purchased 3 months ago, compressor noise. War
Assessment: Reasoning: Within 60-month warranty. However, customer dropped unit
during installation - this constitutes physical damage from misuse, not a defect.
Decision: DENIED - physical damage from customer mishandling voids warranty.

The contrastive examples produce precise, policy-grounded decisions that mirror real assessor reasoning. At ShopMax India, curate a contrastive example library from historical warranty decisions - include real approved and denied cases with anonymized data. Update the bad examples when new failure modes appear in assessor outputs. For high-stakes decisions like large-value warranty claims, chain contrastive CoT with a second review call that checks the first decision against the same examples to catch inconsistencies.

Send your comments, suggestions or queries regarding this site to [email protected].