GradientLens

Architecture Diagram
Finding an object
48GB GPU utilization
vLLM with Qwen3-VL-8B-Instruct
Kokoro-82M for TTS

Inspiration

For individuals with low vision, navigating complex visual environments, from reading medication labels to identifying groceries, presents daily challenges. I was inspired to build an assistive tool that goes beyond a standard chatbot; I wanted a "proactive" AI that can see what you see, remember your goals, and speak to you in real-time.

To make this a reality, I needed an infrastructure that could handle high-throughput video streams, perform complex multimodal inference with ultra-low latency, and guarantee high availability. I turned to the DigitalOcean Cloud because it offered the perfect blend of raw compute power, serverless resilience, and developer-friendly managed services required to bring this ambitious vision to life.

What it does

GradientLens is a real-time, multimodal assistive web application designed to bridge the gap between complex visual environments and users with low vision.

Live Camera Scene Understanding: It continuously analyzes the user's environment in real-time, offering specialized modes for reading documents, identifying groceries, and exploring surroundings.
Medication Safety: It utilizes high-precision vision models to read medication labels, extracting critical information like dosage and name to ensure safety.
Proactive Assist: It doesn't just wait for commands. By matching detected objects with the user's stated goals, it automatically provides relevant suggestions and safety warnings.
Conversational Voice Interface: Users interact through a full-duplex voice session with ultra-fast, natural-sounding Text-to-Speech (TTS), making the experience feel like a conversation with a human assistant.
Persistent Context: It remembers past interactions and user goals across sessions, ensuring continuity and context-aware assistance.

How I built it

I built GradientLens using a Next.js full-stack framework and engineered a highly scalable, hybrid AI architecture that leverages the full spectrum of the DigitalOcean ecosystem:

Frontend & API (DO App Platform): The Next.js client and API routes are deployed globally with zero-config on the DigitalOcean App Platform.
Primary Custom Inference (DO GPU Droplets): To achieve bleeding-edge performance, I provisioned raw DO GPU Droplets (H100, L40S, etc.). I wrote custom automation scripts (gpu-setup.sh, tts-setup.sh) to instantly transform these droplets into high-performance inference servers hosting Qwen3-VL-8B-Instruct (via vLLM) for ultra-fast visual scene understanding, and Kokoro TTS (via FastAPI) for real-time speech synthesis.
Bulletproof Resilience (DO Gradient AI): Accessibility tools cannot afford downtime. I implemented code-level routing to use DigitalOcean Gradient AI Serverless Inference (Llama 3.3 and GPT-4o-mini) as our fallback layer. If our custom GPU Droplets ever experience latency spikes, the app seamlessly and instantly falls back to Gradient AI, ensuring users never lose access to critical features.
Sub-millisecond State (DO Managed Databases): I utilize a highly available DigitalOcean Managed Redis (Valkey) cluster to provide lightning-fast persistent session memory and user context.

Challenges I ran into

The Latency vs. Accuracy Trade-off: Processing live camera feeds with large multimodal models (like Qwen3-VL) while maintaining conversational response times was incredibly challenging. Optimizing vLLM on the GPU Droplets was critical to solving this.
Architectural Orchestration: Designing a seamless, imperceptible fallback mechanism that instantly switches between our dedicated GPU Droplets and the serverless DO Gradient AI endpoint required careful error handling and timeout management.
Real-world Visuals: Accurately extracting critical, high-stakes information, such as medication dosages, from a live, potentially blurry camera feed required fine-tuning our prompting and sampling strategies.

Accomplishments that I'm proud of

Seamless DO Integration: I successfully unified four distinct DigitalOcean products (App Platform, GPU Droplets, Gradient AI, and Managed Databases) into a single, cohesive architecture.
Custom Infrastructure Automation: I'm proud of our one-click bash scripts that completely automate the provisioning of complex AI models on bare DO GPU Droplets.
100% Uptime Architecture: Building the "Primary + Fallback" routing ensures our app remains highly available, which is a critical requirement for an accessibility tool.
Proactive AI: Moving beyond standard prompt-and-response interactions to create an agent that actively grounds the visual scene against user intent to provide unsolicited, helpful cues.

What I learned

I gained a deep appreciation for the developer experience and flexibility of the DigitalOcean ecosystem. The ability to mix managed, serverless AI (Gradient) with raw, unmanaged compute (GPU Droplets) gave us the exact control I needed.
I learned advanced techniques for deploying and optimizing large open-source models (vLLM, FastAPI) for real-time production use cases.
I discovered the importance of context and memory in making AI feel truly assistive and natural, highlighting the value of our fast Managed Redis implementation.

What's next for GradientLens

Expanded Proactive Capabilities: I plan to enhance the environmental understanding to recognize complex navigation hazards (e.g., street crossing, obstacle detection).
Multilingual Support: Utilizing Kokoro TTS to support a wider array of languages and regional accents to make the tool globally accessible.
Wearable Integration: Exploring ways to integrate the web app with smart glasses hardware for a truly hands-free experience.
Finetuned Vision Models: Training custom, smaller vision models specifically on visually impaired dataset perspectives to improve accuracy while further driving down inference costs on our DO Droplets.

Built With

digitalocean
digitalocean-ai
digitalocean-app-platform
digitalocean-gpu-droplet
digitalocean-gradient
digitalocean-gradient-ai
next
next.js
nextjs
redis
valkey

Updates

Lasse Stilvang started this project — Mar 18, 2026 04:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.