← back to writing

Visual QA at 1000 images/minute

Generic vision APIs are slow and expensive. We built a custom CV pipeline that processes 1M images/day on a single GPU instance.

Architecture breakdown

The key insight: most visual QA checks don’t need a heavy model. Rule-based pre-filtering catches 60% of violations. The remaining 40% goes through a fine-tuned model that’s 8x smaller than GPT-4V.

Cost comparison

At 50k images/day, generic API costs run $1,500-3,000/month. Our pipeline: $200/month on a single T4 instance.