Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
- Updated
Mar 19, 2026 - Python
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
A Unified Toolkit for Deep Learning Based Document Image Analysis
YomiTokuはAIを活用した日本語文書解析エンジンを提供するPythonパッケージです。 Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.
OCR engine for all the languages
A toolbox of ocr models and algorithms based on MindSpore
Analysis of Chinese and English layouts 中英文版面分析
📝 针对文档类图像做内容提取,将文档类图像一比一输出到Word或者Txt中,便于进一步使用或处理。后续计划支持输入PDF/图像,输出对应json格式、Txt格式、Word格式和Markdown格式。
YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis
An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"
A Unified Toolkit for Deep Learning-Based Table Extraction
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil
pdfDet aims to simplify PDF layout detect tasks for users.
A more complete example of programming with PDFMiner, which continues where the default documentation stops
Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.
A powerful CLI tool for visualization and encoding of PAGE-XML files
Add a description, image, and links to the layout-analysis topic page so that developers can more easily learn about it.
To associate your repository with the layout-analysis topic, visit your repo's landing page and select "manage topics."