Add OpenAI-Compatible Vision Model OCR Service with Streaming Support#1146
Open
TioSisai wants to merge 5 commits intopot-app:masterfrom
Open
Add OpenAI-Compatible Vision Model OCR Service with Streaming Support#1146TioSisai wants to merge 5 commits intopot-app:masterfrom
TioSisai wants to merge 5 commits intopot-app:masterfrom
Conversation
- Implement real-time text streaming display in OCR recognition window - Add streaming state management with cursor indicator (_) - Support progressive text updates during OCR processing - Enhance user experience with live OCR results feedback - Maintain backward compatibility with non-streaming OCR services This enables users to see OCR results as they are being processed, providing immediate feedback and improved responsiveness.
- Implement new OCR service supporting any OpenAI-compatible vision API - Add streaming and non-streaming response modes - Support custom base URL, API key, and model configuration - Include comprehensive system prompt for precise OCR with LaTeX support - Add Switch component for modern streaming toggle UI - Fix function parameter detection for proper service compatibility This allows integration with various vision models (OpenAI GPT-4V, local deployments, third-party compatible APIs) for OCR functionality.
- Add service name translations across all supported languages - Include localized strings for Chinese (Simplified/Traditional) - Add translations for English, Japanese, Korean, French, Spanish, Russian - Support German, Arabic, Hindi, Portuguese, Italian, and Dutch languages - Ensure consistent service naming across all locale files This provides native language support for OpenAI-compatible OCR service in all application-supported languages, improving accessibility for international users.
Author
| resolves #977 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add OpenAI-Compatible Vision Model OCR Service with Streaming Support
Summary
This PR introduces a new OCR service that supports any OpenAI-compatible vision model API, along with comprehensive streaming capabilities and internationalization support.
Features Added
🔍 OpenAI-Compatible OCR Service
⚡ Streaming OCR Functionality
🌍 Comprehensive Internationalization
🎨 UI/UX Improvements
Technical Implementation
Core Components
src/services/recognize/openai_compatible/index.jsx: Main OCR service implementationsrc/services/recognize/openai_compatible/Config.jsx: Configuration UI with Switch componentsrc/services/recognize/openai_compatible/info.ts: Service metadata and language definitionssrc/window/Recognize/TextArea/index.jsx: Enhanced with streaming supportKey Features
Testing
Breaking Changes
None. This is a purely additive feature that maintains full backward compatibility.
Dependencies
No new dependencies added. Uses existing Tauri HTTP client and React ecosystem.
Documentation
Service automatically appears in OCR service selection with proper localized names and configuration options.
This enhancement significantly expands POT's OCR capabilities by enabling integration with the growing ecosystem of OpenAI-compatible vision models, while providing a modern streaming experience for users.