tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Prompt Engineering > Multi-Modal Prompting - Combining Text and Image Instructions

Multi-Modal Prompting - Combining Text and Image Instructions

Author: Venkata Sudhakar

Multi-modal prompting combines text instructions with images in the same API call. At ShopMax India, this enables product quality inspection (is this returned item damaged?), packaging verification (does the box match the product label?), and receipt validation (confirm the purchase amount from a photo of the bill). Vision-capable models like Claude receive both the text instruction and the image simultaneously and reason across both inputs.

The Anthropic SDK sends images as base64-encoded content blocks alongside text in the messages array. Each image block specifies the media type (image/jpeg, image/png) and the base64 data. You can combine multiple images with text in a single message. The key prompt engineering principle: be explicit about what you want the model to look at in the image and what specific information to extract - vague image prompts produce vague answers.

The example below shows ShopMax India using multi-modal prompting to analyze a product damage report image. The image is encoded as base64 and sent alongside a structured inspection prompt that asks Claude to assess damage severity and recommend the appropriate action.


It gives the following output,

Inspection report:
1. Is there visible damage? Yes
2. Damage severity: Major damage
3. Recommended action: Escalate for review
4. The TV screen has a large crack in the lower-right quadrant, likely caused
   by impact - requires senior assessor review to determine if this is
   transit damage (covered) or customer mishandling (not covered).

At ShopMax India, use multi-modal prompting for batch processing return photos overnight - load all return images from the day, run them through the inspection pipeline, and generate a triage report for the returns team each morning. For consistent results, standardize the photo-taking protocol: customers must photograph the product from a fixed distance with good lighting. Poor image quality is the top cause of inspection errors. Add an image quality check as the first prompt step - if the image is too dark or blurry, ask the customer to retake before running the full inspection.


 
  


  
bl  br