tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Modern Python > Pydantic and Data Validation > Data Validation with Pydantic v2

Data Validation with Pydantic v2

Author: Venkata Sudhakar

Pydantic is the most widely used data validation library in the Python ecosystem. It is the validation engine behind FastAPI, LangChain, and hundreds of other major Python projects. Pydantic uses Python type hints to define schemas and validates incoming data against them at runtime, coercing types where possible and raising detailed errors when data does not match. Pydantic v2 (released 2023) was rewritten in Rust, making it 5-50x faster than v1 while adding a richer validation API.

The core concept is the BaseModel: a class that inherits from pydantic.BaseModel and uses type-annotated class attributes to define the schema. When you instantiate a model with data (from a dict, JSON string, or keyword arguments), Pydantic validates every field, coerces compatible types (e.g. "42" to int), applies validators, and raises a ValidationError with detailed field-by-field error messages if anything is wrong. This makes Pydantic ideal for validating API request bodies, LLM-generated structured outputs, ETL pipeline records, and configuration files.

The below example shows comprehensive Pydantic v2 usage for a data migration domain, including field validators, model validators, computed fields, and JSON serialisation.


It gives the following output,

{
  "job_id": "MIG-1042",
  "environment": "prod",
  "source": {"host": "mysql-prod", "port": 3306, "database": "appdb",
             "username": "etl_user", "ssl_enabled": True},
  "target": {"host": "pg-prod", "port": 5432, "database": "appdb",
             "username": "etl_user", "ssl_enabled": True},
  "batch_size": 50000,
  "max_parallel_tables": 4,
  "dry_run": False,
  "created_at": "2024-01-15T09:00:00"
}
Source URL: postgresql+psycopg2://etl_user:***@mysql-prod:3306/appdb

It gives the following output,

Validation errors:
  Field: ("job_id",) | Error: String should match pattern "^MIG-\d+$"
  Field: ("environment",) | Error: Input should be "dev", "staging" or "prod"
  Field: ("source", "port") | Error: Input should be less than or equal to 65535
  Field: ("source", "password") | Error: String should have at least 8 characters
  Field: () | Error: Source and target cannot be the same database

Settings class defined - reads from environment variables with APP_ prefix

Pydantic in AI and data pipelines:

Pydantic is used throughout the modern Python AI stack. LangChain uses Pydantic BaseModel for all tool input schemas - every @tool function's argument is validated by Pydantic. FastAPI uses Pydantic for request and response body validation automatically. When using the OpenAI API with structured outputs (response_format={"type": "json_object"}), you can parse and validate the JSON response directly into a Pydantic model using model_validate_json(). In ETL pipelines, Pydantic models are excellent for validating each record as it flows through the pipeline, catching data quality issues early rather than letting bad data reach the target database.


 
  


  
bl  br