In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > AI Observability > Human Feedback Collection and Preference Tracking for LLM Apps

Human Feedback Collection and Preference Tracking for LLM Apps

Author: Venkata Sudhakar

ShopMax India's AI shopping assistant gives product recommendations, but without feedback signals it is impossible to know which responses customers found helpful versus frustrating. Collecting structured human feedback - thumbs up, thumbs down, or explicit ratings - creates a data flywheel that reveals which prompt versions work best, which topics need improvement, and which agent behaviours to reinforce. This tutorial shows how to build a lightweight feedback API that captures preference signals and stores them for analysis.

The feedback system exposes two endpoints: one to log a response along with its context (question, answer, model, timestamp), and one to record a user preference (liked or disliked). Each feedback event is stored in PostgreSQL with a session_id so you can join responses with preferences. A summary endpoint aggregates like rates by topic, giving the product team a clear signal of where to focus prompt engineering effort next.

The example below builds the feedback API using FastAPI and SQLAlchemy for ShopMax India. Customers submit thumbs signals after each AI response, and the system tracks like rates per topic category.

from fastapi import FastAPI
from pydantic import BaseModel
from sqlalchemy import create_engine, Column, Integer, String, Boolean, DateTime
from sqlalchemy.orm import declarative_base, Session
from datetime import datetime
import os

DATABASE_URL = os.getenv("DATABASE_URL", "postgresql://postgres:password@localhost/shopmax")
engine = create_engine(DATABASE_URL)
Base = declarative_base()

class ResponseLog(Base):
    __tablename__ = "response_logs"
    id = Column(Integer, primary_key=True)
    session_id = Column(String)
    question = Column(String)
    response = Column(String)
    topic = Column(String)
    model = Column(String)
    created_at = Column(DateTime, default=datetime.utcnow)

class FeedbackLog(Base):
    __tablename__ = "feedback_logs"
    id = Column(Integer, primary_key=True)
    session_id = Column(String)
    liked = Column(Boolean)
    created_at = Column(DateTime, default=datetime.utcnow)

Base.metadata.create_all(engine)
app = FastAPI()

class LogRequest(BaseModel):
    session_id: str
    question: str
    response: str
    topic: str
    model: str

class FeedbackRequest(BaseModel):
    session_id: str
    liked: bool

@app.post("/log")
def log_response(req: LogRequest):
    with Session(engine) as db:
        db.add(ResponseLog(**req.dict()))
        db.commit()
    return {"status": "logged"}

@app.post("/feedback")
def record_feedback(req: FeedbackRequest):
    with Session(engine) as db:
        db.add(FeedbackLog(session_id=req.session_id, liked=req.liked))
        db.commit()
    return {"status": "recorded"}

@app.get("/summary")
def get_summary():
    with Session(engine) as db:
        rows = db.query(ResponseLog.topic, FeedbackLog.liked).join(
            FeedbackLog, ResponseLog.session_id == FeedbackLog.session_id
        ).all()
    summary = {}
    for topic, liked in rows:
        if topic not in summary:
            summary[topic] = {"likes": 0, "total": 0}
        summary[topic]["total"] += 1
        if liked:
            summary[topic]["likes"] += 1
    return {t: {"like_rate": f"{v["likes"]/v["total"]*100:.1f}%"} for t, v in summary.items()}

It gives the following output,

POST /log    -> {"status": "logged"}
POST /feedback -> {"status": "recorded"}
GET /summary ->
{
  "returns":       {"like_rate": "78.3%"},
  "order_tracking": {"like_rate": "65.1%"},
  "product_reco":  {"like_rate": "82.7%"}
}

Add a prompt_version column to ResponseLog to track which prompt generated each response, so A/B test results map cleanly to feedback scores. Use the like_rate per topic as an automatic trigger: if returns drops below 60%, send a Slack alert to the prompt engineering team. For ShopMax India, weight feedback from high-value customers (order value above Rs 10,000) more heavily in your dashboard - their experience has greater business impact than one-off buyers.

Send your comments, suggestions or queries regarding this site to [email protected].