PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
- Updated
Mar 25, 2026 - Java
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 88+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
Free open-source web software for signing PDF (alone or with others) and also organize pages, edit medata and compress pdf
Use TradeRepublic in terminal and mass download all documents
JavaScript bindings for MuPDF
Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient processing for low-resource systems.
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
Java PDF table extraction & OCR library. Extract structured tables from text-based and scanned PDFs using stream, lattice (OpenCV-style grid detection), and hybrid parsing.
Translate many large PDF Reports for free using Python.
A professinal CLI workflow for PhD students to extract, analyze, and visualize academic papers into structured Markdown and Obsidian Canvas.
Web content extraction engine backed by Qt WebEngine.
Powerful PDF data extraction library powered by AI vision models. Transform PDFs into structured, validated data using TypeScript, Zod, and AI providers like Scaleway and Ollama.
🚀 Simplify your research workflow with Claude Scholar, the complete configuration for Claude Code in data science, AI, and academic writing.
Extract presentation slides from videos with accurate timestamps
This sample project provides a preview of the PDF Extract API. Using the sample project and this documentation, you will easily be able to integrate the PDF Extract API in your own server-side code.
AI research assistant that extracts structured patterns from papers using RAG, LangGraph, and Claude. Query across your research library with natural language.
Open WebUI tool for extracting text from PDFs and images using Tesseract OCR. Supports text-based and scanned PDFs, multi-language OCR (English + Swedish), fully offline.
Automated document extraction pipeline using AI vision models for invoice and form data capture
Turn a scientific paper PDF into a presentation slide deck. An Antigravity / Claude Code agent skill.
Cross-platform desktop app for extracting text, tables, and structured data from PDFs using IBM Docling AI. Export to JSON, Markdown, CSV, Excel, HTML.
Add a description, image, and links to the pdf-extraction topic page so that developers can more easily learn about it.
To associate your repository with the pdf-extraction topic, visit your repo's landing page and select "manage topics."