|
|
Async Python with asyncio and aiohttp
Author: Venkata Sudhakar
Asynchronous programming in Python allows a single thread to handle many I/O-bound tasks concurrently without blocking. When your code waits for a network call, database query, or file read in a synchronous program, the thread is idle - it does nothing while waiting. Asyncio uses an event loop that switches between tasks whenever one is waiting for I/O, keeping the CPU busy and dramatically improving throughput for I/O-bound workloads. This is critical for AI applications (calling multiple LLM APIs in parallel), data migration pipelines (fetching from multiple sources concurrently), and web services. The async/await syntax makes asynchronous code look and read almost like synchronous code. An async def function is a coroutine - calling it returns a coroutine object rather than executing immediately. The await keyword suspends the current coroutine until the awaited operation completes, giving the event loop a chance to run other coroutines. asyncio.gather() runs multiple coroutines concurrently and waits for all of them, making it the primary tool for parallel I/O. aiohttp is the standard library for async HTTP requests, replacing the synchronous requests library in async contexts. The below example shows the core async patterns: basic coroutines, concurrent execution with gather(), and making parallel HTTP requests with aiohttp.
It gives the following output,
--- Sequential execution ---
Fetching customers count...
Done: customers = 125,000 rows
Fetching orders count...
Done: orders = 892,000 rows
Fetching products count...
Done: products = 12,000 rows
Sequential total: 2.7s
--- Concurrent execution ---
Fetching customers count...
Fetching orders count...
Fetching products count...
Done: products = 12,000 rows
Done: customers = 125,000 rows
Done: orders = 892,000 rows
Concurrent total: 1.2s | Total rows: 1,029,000
It gives the following output (all 4 API calls complete in ~1s instead of ~4s sequentially),
Q: What is CDC in one sentence?...
A: CDC (Change Data Capture) is a technique that tracks and captures changes
made to a database so downstream systems can react to them in real time.
Q: What is ETL in one sentence?...
A: ETL (Extract, Transform, Load) is a data integration process that pulls
data from sources, transforms it, and loads it into a target system.
Q: What is Kafka in one sentence?...
A: Apache Kafka is a distributed event streaming platform for high-throughput,
fault-tolerant, real-time data pipelines.
Q: What is Debezium in one sentence?...
A: Debezium is an open-source CDC tool that reads database transaction logs
and publishes change events to Apache Kafka topics.
It gives the following output,
Processing batch 1...
Processing batch 2...
Processing batch 3...
Processing batch 4...
Processing batch 5...
Processing batch 6...
... (5 batches at a time)
Processing batch 20...
All 20 batches complete.
asyncio best practices: Never use time.sleep() in async code - it blocks the entire event loop. Always use await asyncio.sleep(). Use asyncio.gather() for parallel I/O - it is the most efficient way to run multiple coroutines concurrently. Use Semaphore to limit concurrency - when calling external APIs or databases, always cap concurrency with a Semaphore to avoid rate limiting or overloading the target. Async context managers - use async with for resources that need cleanup (aiohttp sessions, async database connections). asyncio.create_task() - use this when you want to start a coroutine running in the background without awaiting it immediately.
|
|