Nova — AI Live Tutor

Nova architecture

Inspiration

Every student has been stuck on homework at 11pm with no one to help. Google gives answers. ChatGPT gives answers. But real learning happens when someone guides you to the answer yourself. We wanted to build a tutor that acts like a brilliant friend — one who sees your homework, hears your struggle, and never just hands you the answer.

What it does

Nova is a real-time AI tutor that sees and hears you through your browser. Show her your homework through your camera, ask her a question, and she guides you Socratically — asking the right questions until you figure it out yourself. She adapts to your level automatically, whether you're in high school or university.

How we built it

Frontend: Vanilla HTML/JS with Web Audio API for real-time PCM audio playback and WebSocket client for streaming
Backend: Python FastAPI with asyncio for concurrent audio/video/response streams
AI: Google Gemini Live API (gemini-2.5-flash-native-audio-preview-12-2025) for real-time multimodal understanding
Deployment: Docker container on Railway, with Google Cloud Run deployment script included
Architecture: Browser captures mic (PCM 16kHz) and camera (JPEG 1fps) → WebSocket → FastAPI → Gemini Live API → audio response (PCM 24kHz) → browser speaker

Challenges we ran into

WebSocket + HTTPS requires WSS — browser blocks insecure WebSocket connections from HTTPS pages
Gemini Live API requires continuous audio input to trigger proactive vision responses
PCM audio scheduling — multiple chunks arriving rapidly must be chained seamlessly or they overlap
AudioWorklet vs ScriptProcessorNode — moved to AudioWorklet for dedicated audio thread to avoid main thread blocking

Accomplishments that we're proud of

Built a fully working real-time multimodal tutor in under 24 hours
Nova genuinely teaches — she never gives direct answers, always guides with questions
Clean architecture with auto-reconnect — session survives Gemini API drops transparently
Single HTML file frontend with no framework, no build step

What we learned

Gemini Live API is remarkably capable at real-time multimodal understanding
Real-time audio streaming in browsers requires careful PCM scheduling
The hardest part of building an AI tutor isn't the AI — it's the persona and prompt engineering
asyncio queues are a clean pattern for bridging WebSocket streams with external AI APIs

What's next for Nova

Google Cloud Run deployment when regional billing supports it
Session memory — Nova remembers what was covered and gives end-of-session summaries
Multiple subject modes with specialized tutoring approaches
Mobile app for easier homework camera setup
Parent dashboard showing learning progress

Built With

api
asyncio
audio
cloud
docker
fastapi
gemini
google
html5
javascript
live
python
railway
web
websocket

Updates

Bora Neak started this project — Mar 16, 2026 07:42 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.