Inspiration

1 in 36 children has autism, and professional speech therapy costs $100–200 per session — over $10,000 a year. Most families simply can't afford consistent access. We built CosmoCompanion so every child has a patient, always-available AI companion that turns therapy into play.

What it does

CosmoCompanion is a multimodal AI companion for autistic children. It uses the Gemini Live API for real-time two-way voice and vision sessions — Cosmo sees what the child shows on camera, hears what they say, and responds naturally with audio. The AI Storybook feature generates personalized illustrated stories in one interleaved flow: Gemini writes the story text, generates a matching illustration, and narrates it aloud — all woven together as the child co-authors the story by answering questions. Other features include Show & Tell (object identification via camera), Emotion Mirror (facial expression practice), Speech Practice with adaptive difficulty, Social Story Generator, Feelings Match game, AAC communication board, Role-Play scenarios, and a Parent Dashboard with progress tracking. The app is fully bilingual in English and Arabic.

How we built it

Frontend: React 19 + TypeScript + Tailwind CSS, deployed on Firebase Hosting. Backend: Node.js/Express on Google Cloud Run, handling Gemini API calls for story generation, image generation, and TTS narration. Real-time sessions use the Gemini Live API directly from the browser via WebSocket with 16kHz PCM audio streaming and JPEG video frames. We used Google GenAI SDK throughout — gemini-2.5-flash-native-audio-preview for live sessions, gemini-2.5-flash for story text, gemini-2.5-flash-image for illustrations, and gemini-2.5-flash-preview-tts for narration. Firebase handles authentication (Google + email) and Firestore stores user profiles and activity logs. Deployment is automated via cloudbuild.yaml (Cloud Build).

Challenges we ran into

Synchronizing interleaved multimodal output — generating story text, triggering image generation, and queuing audio narration in a way that feels seamless rather than sequential. Handling Gemini Live API interruptions gracefully so children can speak at any time without breaking the audio pipeline. Making the experience genuinely calming and accessible for children with sensory sensitivities — getting the pacing, voice tone, and visual design right took significant iteration.

Accomplishments that we're proud of

The AI Storybook genuinely feels like magic — a child types one sentence and Cosmo writes the next page, paints an illustration, and reads it aloud, all in one flow. The Live session's overstimulation detection (switching to calm-down mode when a child shows distress) is something we're especially proud of. Full Arabic + English bilingual support with RTL layout.

What we learned

Multimodal AI UX is fundamentally different from chat UX — latency, interruption handling, and sensory design matter enormously. Gemini's interleaved output capabilities are genuinely powerful for creative applications. Building for neurodivergent children forces you to think deeply about accessibility in ways that improve the experience for everyone.

What's next for CosmoCompanion

Deploy the Cloud Run backend at scale, add Gemini ADK for more structured agent workflows, integrate with speech-language pathologist review tools so therapists can monitor AI sessions, expand to more languages, and build an offline mode for families with limited connectivity.

Built With

  • audio
  • cloud-firestore
  • express.js
  • firebase-authentication
  • firebase-hosting
  • gemini-2.5-flash
  • gemini-image-generation
  • gemini-live-api
  • gemini-tts
  • google-cloud-build
  • google-cloud-run
  • google-genai-sdk
  • node.js
  • react
  • tailwind-css
  • typescript
  • vite
  • web
  • webrtc
Share this project:

Updates