Add OpenAI-Compatible Vision Model OCR Service with Streaming Support by TioSisai · Pull Request #1146 · pot-app/pot-desktop

TioSisai · 2025-07-14T14:12:02Z

Add OpenAI-Compatible Vision Model OCR Service with Streaming Support

Summary

This PR introduces a new OCR service that supports any OpenAI-compatible vision model API, along with comprehensive streaming capabilities and internationalization support.

Features Added

🔍 OpenAI-Compatible OCR Service

Universal API Support: Works with OpenAI GPT-4V, local deployments, and third-party compatible APIs
Flexible Configuration: Customizable base URL, API key, and model selection
Dual Response Modes: Support for both streaming and non-streaming responses
Advanced OCR Prompts: Comprehensive system prompt with LaTeX formula support and layout preservation

⚡ Streaming OCR Functionality

Real-time Text Display: Live OCR results with progressive text updates
Visual Feedback: Cursor indicator (_) shows active streaming state
Enhanced UX: Immediate feedback during OCR processing
Backward Compatibility: Seamless integration with existing non-streaming services

🌍 Comprehensive Internationalization

Multi-language Support: Service names and UI elements translated across all supported languages
Global Coverage: Includes Chinese (Simplified/Traditional), English, Japanese, Korean, French, Spanish, Russian, German, Arabic, Hindi, Portuguese, Italian, and Dutch
Consistent Naming: Unified service identification across all locales

🎨 UI/UX Improvements

Modern Switch Component: Replaced dropdown with intuitive toggle for streaming control
Boolean Configuration: Improved type safety with proper boolean handling
Responsive Design: Clean and modern configuration interface

Technical Implementation

Core Components

src/services/recognize/openai_compatible/index.jsx: Main OCR service implementation
src/services/recognize/openai_compatible/Config.jsx: Configuration UI with Switch component
src/services/recognize/openai_compatible/info.ts: Service metadata and language definitions
src/window/Recognize/TextArea/index.jsx: Enhanced with streaming support

Key Features

SSE Streaming: Proper Server-Sent Events handling for real-time responses
Error Handling: Robust error management for network and parsing issues
Function Compatibility: Fixed parameter detection for service registration
Type Safety: Improved configuration with proper boolean types

Testing

✅ Streaming OCR functionality verified
✅ Non-streaming mode compatibility confirmed
✅ Multi-language UI tested
✅ Configuration persistence validated
✅ Error handling scenarios covered

Breaking Changes

None. This is a purely additive feature that maintains full backward compatibility.

Dependencies

No new dependencies added. Uses existing Tauri HTTP client and React ecosystem.

Documentation

Service automatically appears in OCR service selection with proper localized names and configuration options.

This enhancement significantly expands POT's OCR capabilities by enabling integration with the growing ecosystem of OpenAI-compatible vision models, while providing a modern streaming experience for users.

- Implement real-time text streaming display in OCR recognition window - Add streaming state management with cursor indicator (_) - Support progressive text updates during OCR processing - Enhance user experience with live OCR results feedback - Maintain backward compatibility with non-streaming OCR services This enables users to see OCR results as they are being processed, providing immediate feedback and improved responsiveness.

- Implement new OCR service supporting any OpenAI-compatible vision API - Add streaming and non-streaming response modes - Support custom base URL, API key, and model configuration - Include comprehensive system prompt for precise OCR with LaTeX support - Add Switch component for modern streaming toggle UI - Fix function parameter detection for proper service compatibility This allows integration with various vision models (OpenAI GPT-4V, local deployments, third-party compatible APIs) for OCR functionality.

- Add service name translations across all supported languages - Include localized strings for Chinese (Simplified/Traditional) - Add translations for English, Japanese, Korean, French, Spanish, Russian - Support German, Arabic, Hindi, Portuguese, Italian, and Dutch languages - Ensure consistent service naming across all locale files This provides native language support for OpenAI-compatible OCR service in all application-supported languages, improving accessibility for international users.

TioSisai · 2025-07-14T20:13:40Z

resolves #977

TioSisai added 5 commits July 14, 2025 17:09

fix(config): modifiy the mismatched ocr pack name

ef7e0b0

refactor(TextArea): Optimize the process of the stream flow

5a3fe82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OpenAI-Compatible Vision Model OCR Service with Streaming Support#1146

Add OpenAI-Compatible Vision Model OCR Service with Streaming Support#1146
TioSisai wants to merge 5 commits intopot-app:masterfrom
TioSisai:master

TioSisai commented Jul 14, 2025

TioSisai commented Jul 14, 2025

Labels

1 participant

Uh oh!

Conversation

TioSisai commented Jul 14, 2025

Add OpenAI-Compatible Vision Model OCR Service with Streaming Support

Summary

Features Added

🔍 OpenAI-Compatible OCR Service

⚡ Streaming OCR Functionality

🌍 Comprehensive Internationalization

🎨 UI/UX Improvements

Technical Implementation

Core Components

Key Features

Testing

Breaking Changes

Dependencies

Documentation

TioSisai commented Jul 14, 2025

Labels

1 participant