In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Hugging Face > Image Classification with Hugging Face Vision Transformers

Image Classification with Hugging Face Vision Transformers

Author: Venkata Sudhakar

Image classification assigns a label to an image from a fixed set of categories. ShopMax India uses image classification to automatically tag uploaded product photos by category - TV, laptop, headphone, refrigerator - so sellers can list products faster and customers can search by visual similarity without manual labelling by the operations team.

Hugging Face supports Vision Transformer (ViT) models through the image-classification pipeline. ViT treats an image as a sequence of fixed-size patches and applies transformer attention across them, achieving accuracy comparable to CNNs on ImageNet. The pipeline accepts local file paths, URLs, or PIL Image objects and returns top-k predicted labels with confidence scores.

The example below loads a ViT model and classifies product images for ShopMax India's electronics catalogue, returning the top 3 predicted categories for each image.

It gives the following output,

SKU: SKU-TV-001
  lemon: 21.3%
  orange: 18.7%
  banana: 9.2%

SKU: SKU-LAPTOP-042
  ant: 98.6%
  black widow: 0.4%
  bee: 0.2%

For ShopMax India's electronics catalogue, fine-tune the ViT model on your own labelled product images using Hugging Face Trainer API to get categories like TV, Laptop, Headphone, and Refrigerator. Use AutoFeatureExtractor for preprocessing consistency during training and inference. Cache the model locally after first download to avoid repeated network calls in production.

Send your comments, suggestions or queries regarding this site to [email protected].