TaleSpark

AI agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream.

Gemini Live Agent Challenge - Creative Storyteller

Focus: Multimodal Storytelling with Interleaved Output

Build an agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream. Leverage Gemini's native interleaved output to generate rich, mixed-media responses that combine narration with visuals, explanations with generated imagery, or storyboards with voiceover, all in one cohesive flow. Examples include Interactive storybooks (text + generated illustrations inline), marketing asset generator (copy + visuals + video in one go), educational explainers (narration woven with diagrams), and social content creator (caption + image + hashtags together).

Mandatory Tech: Must use Gemini's interleaved/mixed output capabilities. The agents are hosted on Google Cloud.

Project Structure

TaleSpark/ ├── app.py # FastAPI backend ├── requirements.txt # Python dependencies ├── README.md # This file ├── frontend/ # Vue 3 + TypeScript + Vite │ ├── package.json │ ├── tsconfig.json │ ├── vite.config.ts │ ├── index.html │ ├── public/ │ │ └── favicon.svg │ └── src/ │ ├── main.ts # Entry point │ ├── App.vue # Root component │ ├── types.ts # TypeScript definitions │ ├── styles/ │ │ └── main.css # Global styles + CSS variables │ ├── composables/ │ │ ├── useAppState.ts # Global state management │ │ ├── useSSE.ts # Server-Sent Events streaming │ │ └── useThreeScene.ts # Three.js particle system │ └── components/ │ ├── WelcomeScreen.vue # Hero with animated logo + particles │ ├── StorySetup.vue # Genre selection + prompt input │ ├── StoryViewer.vue # Streaming story display │ ├── StoryComplete.vue # Celebration + stats │ ├── LoadingScreen.vue # Animated quill writing │ ├── GenreCard.vue # 3D tilt genre cards │ ├── SceneCard.vue # Image + text + typewriter │ └── AudioPlayer.vue # Custom audio player ├── dist/ # Built frontend (production) ├── static/ # Generated images/audio └── plans/ └── frontend-architecture.md

Structure

flowchart LR %% Styles classDef frontend fill:#d4edda,stroke:#28a745,stroke-width:2px; classDef backend fill:#cce5ff,stroke:#007bff,stroke-width:2px; classDef ai fill:#f8d7da,stroke:#dc3545,stroke-width:2px; classDef cloud fill:#fff3cd,stroke:#ffc107,stroke-width:2px; %% Nodes subgraph Frontend UI[Web Browser]:::frontend end subgraph Backend API[FastAPI Endpoint]:::backend EQ[(Event Queue)]:::backend TQ[(TTS Text Queue)]:::backend LLM_W[Task 1: LLM Producer]:::backend TTS_W[Task 2: TTS Worker]:::backend FS[(Local Static Files)]:::backend end subgraph The_Brain LLM[Gemini 2.5 Pro]:::ai IMG[Imagen 3]:::ai TTS[GCP TTS API]:::cloud end %% Flow 1: Initialization UI -->|1. POST Prompt| API API -->|Starts| LLM_W API -->|Starts| TTS_W %% Flow 2: Task 1 (Text & Image Interleaved) LLM_W -->|2. Stream Chat| LLM LLM -.->|Text Chunks| LLM_W LLM_W -->|3. Tool Pause| IMG IMG -.->|Image Data| LLM_W %% Flow 3: Queues Routing LLM_W -->|Push Text/Img Event| EQ LLM_W -->|Push Sentences| TQ %% Flow 4: Task 2 (Parallel Audio) TQ -->|Pop Sentences| TTS_W TTS_W -->|4. Synthesize| TTS TTS -.->|MP3 Data| TTS_W TTS_W -->|Save File| FS TTS_W -->|Push Audio Event| EQ %% Flow 5: Output to Frontend EQ -->|5. SSE Stream| UI UI -.->|6. Fetch MP3/JPG| FS

Quick Start

Prerequisites

Python 3.10+
Node.js 18+
Google Cloud project with Gemini API enabled

Installation

# 1. Install Python dependencies pip install -r requirements.txt # 2. Install frontend dependencies Invoke-WebRequest https://get.pnpm.io/install.ps1 -UseBasicParsing | Invoke-Expression cd frontend pnpm install # 3. Configure Google Cloud (set PROJECT_ID in app.py) # Required: Google Cloud project with Vertex AI enabled

Development

# Terminal 1: Start the FastAPI backend python app.py # Backend runs at http://localhost:8000 # Terminal 2: Start Vue dev server (hot reload) cd frontend pnpm run dev # Frontend runs at http://localhost:5173

The frontend proxies API requests to the backend:

/api/* → http://localhost:8000/api/*
/static/* → http://localhost:8000/static/*

Production Build

# Build frontend cd frontend pnpm run build # This creates the dist/ folder with static files # Run production server python app.py # Serves the built frontend from dist/

Features

Frontend (Awwwards-quality interactive storybook)

Three.js Particle Background — Ambient golden particles floating upward, react to mouse movement, change color per genre
Genre Theming — 5 distinct themes (Fantasy, Sci-Fi, Mystery, Fairy Tale, Adventure) via CSS custom properties
GSAP Animations — Smooth page transitions, logo entrance, button glows, card 3D tilts
Real-time Streaming — Server-Sent Events deliver story content as it's generated
Typewriter Effect — Text streams in character-by-character with cursor
Custom Audio Player — Styled player with progress bar and auto-play
Responsive Design — Works on desktop, tablet, and mobile

Backend (Gemini AI Integration)

Gemini 2.5 Pro — Generates story text with interleaved tool calls
Imagen 3.0 — Generates scene images
google text-to-speech API — Converts text to speech narration
Server-Sent Events — Streams content in real-time

Configuration

Google Cloud Setup

Create a Google Cloud project
Enable Vertex AI API
Set PROJECT_ID in app.py:

PROJECT_ID = "your-project-id"

Environment Variables (Optional)

For production, you might want to use environment variables:

export PROJECT_ID="your-project-id" export LOCATION="us-central1"

Tech Stack

Layer	Technology
Frontend Framework	Vue 3 + TypeScript
Build Tool	Vite
Animations	GSAP
3D Effects	Three.js
Styling	CSS Custom Properties
Backend	FastAPI (Python)
AI	Google Gemini + Imagen
Audio	google text to speech

API Endpoints

Method	Endpoint	Description
GET	`/`	Serve frontend
POST	`/api/generate`	Generate story (SSE stream)

Generate Story

Request:

{ "prompt": "A young dragon discovers it can speak human languages..." }

Response: Server-Sent Events stream

{"type": "image", "src": "/static/img_abc123.jpg"} {"type": "text", "chunk": "Once upon a "} {"type": "text", "chunk": "time, in a land..."} {"type": "audio", "src": "/static/aud_def456.mp3"}

License

MIT

Credits

Built for the Gemini Live Agent Challenge.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
demo.html		demo.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TaleSpark

Gemini Live Agent Challenge - Creative Storyteller

Project Structure

Structure

Quick Start

Prerequisites

Installation

Development

Production Build

Features

Frontend (Awwwards-quality interactive storybook)

Backend (Gemini AI Integration)

Configuration

Google Cloud Setup

Environment Variables (Optional)

Tech Stack

API Endpoints

Generate Story

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TaleSpark

Gemini Live Agent Challenge - Creative Storyteller

Project Structure

Structure

Quick Start

Prerequisites

Installation

Development

Production Build

Features

Frontend (Awwwards-quality interactive storybook)

Backend (Gemini AI Integration)

Configuration

Google Cloud Setup

Environment Variables (Optional)

Tech Stack

API Endpoints

Generate Story

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages