Skip to content

astrowq/TaleSpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TaleSpark

AI agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream.

Gemini Live Agent Challenge - Creative Storyteller

Focus: Multimodal Storytelling with Interleaved Output

Build an agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream. Leverage Gemini's native interleaved output to generate rich, mixed-media responses that combine narration with visuals, explanations with generated imagery, or storyboards with voiceover, all in one cohesive flow. Examples include Interactive storybooks (text + generated illustrations inline), marketing asset generator (copy + visuals + video in one go), educational explainers (narration woven with diagrams), and social content creator (caption + image + hashtags together).

Mandatory Tech: Must use Gemini's interleaved/mixed output capabilities. The agents are hosted on Google Cloud.

Project Structure

TaleSpark/ ├── app.py # FastAPI backend ├── requirements.txt # Python dependencies ├── README.md # This file ├── frontend/ # Vue 3 + TypeScript + Vite │ ├── package.json │ ├── tsconfig.json │ ├── vite.config.ts │ ├── index.html │ ├── public/ │ │ └── favicon.svg │ └── src/ │ ├── main.ts # Entry point │ ├── App.vue # Root component │ ├── types.ts # TypeScript definitions │ ├── styles/ │ │ └── main.css # Global styles + CSS variables │ ├── composables/ │ │ ├── useAppState.ts # Global state management │ │ ├── useSSE.ts # Server-Sent Events streaming │ │ └── useThreeScene.ts # Three.js particle system │ └── components/ │ ├── WelcomeScreen.vue # Hero with animated logo + particles │ ├── StorySetup.vue # Genre selection + prompt input │ ├── StoryViewer.vue # Streaming story display │ ├── StoryComplete.vue # Celebration + stats │ ├── LoadingScreen.vue # Animated quill writing │ ├── GenreCard.vue # 3D tilt genre cards │ ├── SceneCard.vue # Image + text + typewriter │ └── AudioPlayer.vue # Custom audio player ├── dist/ # Built frontend (production) ├── static/ # Generated images/audio └── plans/ └── frontend-architecture.md 

Structure

flowchart LR %% Styles classDef frontend fill:#d4edda,stroke:#28a745,stroke-width:2px; classDef backend fill:#cce5ff,stroke:#007bff,stroke-width:2px; classDef ai fill:#f8d7da,stroke:#dc3545,stroke-width:2px; classDef cloud fill:#fff3cd,stroke:#ffc107,stroke-width:2px; %% Nodes subgraph Frontend UI[Web Browser]:::frontend end subgraph Backend API[FastAPI Endpoint]:::backend EQ[(Event Queue)]:::backend TQ[(TTS Text Queue)]:::backend LLM_W[Task 1: LLM Producer]:::backend TTS_W[Task 2: TTS Worker]:::backend FS[(Local Static Files)]:::backend end subgraph The_Brain LLM[Gemini 2.5 Pro]:::ai IMG[Imagen 3]:::ai TTS[GCP TTS API]:::cloud end %% Flow 1: Initialization UI -->|1. POST Prompt| API API -->|Starts| LLM_W API -->|Starts| TTS_W %% Flow 2: Task 1 (Text & Image Interleaved) LLM_W -->|2. Stream Chat| LLM LLM -.->|Text Chunks| LLM_W LLM_W -->|3. Tool Pause| IMG IMG -.->|Image Data| LLM_W %% Flow 3: Queues Routing LLM_W -->|Push Text/Img Event| EQ LLM_W -->|Push Sentences| TQ %% Flow 4: Task 2 (Parallel Audio) TQ -->|Pop Sentences| TTS_W TTS_W -->|4. Synthesize| TTS TTS -.->|MP3 Data| TTS_W TTS_W -->|Save File| FS TTS_W -->|Push Audio Event| EQ %% Flow 5: Output to Frontend EQ -->|5. SSE Stream| UI UI -.->|6. Fetch MP3/JPG| FS 
Loading

Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • Google Cloud project with Gemini API enabled

Installation

# 1. Install Python dependencies pip install -r requirements.txt # 2. Install frontend dependencies Invoke-WebRequest https://get.pnpm.io/install.ps1 -UseBasicParsing | Invoke-Expression cd frontend pnpm install # 3. Configure Google Cloud (set PROJECT_ID in app.py) # Required: Google Cloud project with Vertex AI enabled

Development

# Terminal 1: Start the FastAPI backend python app.py # Backend runs at http://localhost:8000 # Terminal 2: Start Vue dev server (hot reload) cd frontend pnpm run dev # Frontend runs at http://localhost:5173

The frontend proxies API requests to the backend:

  • /api/*http://localhost:8000/api/*
  • /static/*http://localhost:8000/static/*

Production Build

# Build frontend cd frontend pnpm run build # This creates the dist/ folder with static files # Run production server python app.py # Serves the built frontend from dist/

Features

Frontend (Awwwards-quality interactive storybook)

  • Three.js Particle Background — Ambient golden particles floating upward, react to mouse movement, change color per genre
  • Genre Theming — 5 distinct themes (Fantasy, Sci-Fi, Mystery, Fairy Tale, Adventure) via CSS custom properties
  • GSAP Animations — Smooth page transitions, logo entrance, button glows, card 3D tilts
  • Real-time Streaming — Server-Sent Events deliver story content as it's generated
  • Typewriter Effect — Text streams in character-by-character with cursor
  • Custom Audio Player — Styled player with progress bar and auto-play
  • Responsive Design — Works on desktop, tablet, and mobile

Backend (Gemini AI Integration)

  • Gemini 2.5 Pro — Generates story text with interleaved tool calls
  • Imagen 3.0 — Generates scene images
  • google text-to-speech API — Converts text to speech narration
  • Server-Sent Events — Streams content in real-time

Configuration

Google Cloud Setup

  1. Create a Google Cloud project
  2. Enable Vertex AI API
  3. Set PROJECT_ID in app.py:
PROJECT_ID = "your-project-id"

Environment Variables (Optional)

For production, you might want to use environment variables:

export PROJECT_ID="your-project-id" export LOCATION="us-central1"

Tech Stack

Layer Technology
Frontend Framework Vue 3 + TypeScript
Build Tool Vite
Animations GSAP
3D Effects Three.js
Styling CSS Custom Properties
Backend FastAPI (Python)
AI Google Gemini + Imagen
Audio google text to speech

API Endpoints

Method Endpoint Description
GET / Serve frontend
POST /api/generate Generate story (SSE stream)

Generate Story

Request:

{ "prompt": "A young dragon discovers it can speak human languages..." }

Response: Server-Sent Events stream

{"type": "image", "src": "/static/img_abc123.jpg"} {"type": "text", "chunk": "Once upon a "} {"type": "text", "chunk": "time, in a land..."} {"type": "audio", "src": "/static/aud_def456.mp3"}

License

MIT


Credits

Built for the Gemini Live Agent Challenge.

About

AI agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors