Command line tool for layout analysis of historical prints using Kraken.
The current version is available as a prebuild image for CUDA 12.3 and CuDNN 9.
docker pull ghcr.io/jahtz/octopy:latestdocker run --rm -it --gpus all -v $(pwd):/data ghcr.io/jahtz/octopy:latest [OPTIONS] COMMAND [ARGS]...-
Clone repository
git clone https://github.com/jahtz/octopy && cd octopy
-
Build image
docker build -t octopy --build-arg CUDA_TAG=12.2.2-cudnn8-devel-ubuntu22.04 . -
Run
docker run --rm -it --gpus all -v $(pwd):/data octopy [OPTIONS] COMMAND [ARGS]...
Note
Python: >=3.11
CUDA: 12.x
Tip
Use a virtual enviroment, e.g. with pyenv.
-
Clone and install custom Kraken fork:
git clone --single-branch --branch octopy https://github.com/jahtz/kraken
pip install kraken/.
-
Clone and install octopy
git clone https://github.com/jahtz/octopy
pip install octopy/.
-
Set LD_LIBRARY_PATH (for GPU acceleration, optional)
export LD_LIBRARY_PATH="/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH"
$ octopy --help Usage: octopy [OPTIONS] COMMAND [ARGS]... Command line tool layout analysis and OCR of historical prints using Kraken. Options: --help Show this message and exit. --version Show the version and exit. Commands: segment Segment images using Kraken. segtrain Train a custom segmentation model using Kraken.$ octopy segtrain --help Usage: octopy segtrain [OPTIONS] Train a custom segmentation model using Kraken. Options: -g, --gt FILE One or more ground truth PageXML files. Use quotes to enclose glob patterns (e.g., "*.xml"). [required] -e, --eval FILE One or more optional evaluation PageXML files. Use quotes to enclose glob patterns (e.g., "*.xml"). -p, --partition FLOAT Split ground truth files into training and evaluation sets if no evaluation files are provided. Default partition is 90% training, 10% evaluation. [default: 0.9] -i, --imagesuffix TEXT Full suffix of the image files to be used. If not set, the suffix is derived from the XML files. -o, --output DIRECTORY Output directory for saving the model and checkpoints. [required] -m, --model FILE Path to a pre-trained model to fine-tune. If not set, training starts from scratch. -n, --name TEXT Name of the output model. Used for saving results and checkpoints. -d, --device TEXT Specify the device for processing (e.g. cpu, cuda:0, ...). Refer to PyTorch documentation for supported devices. [default: cpu] -w, --workers INTEGER RANGE Number of worker processes for CPU-based training. [default: 1; x>=1] -t, --threads INTEGER RANGE Number of threads for CPU-based training. [default: 1; x>=1] -r, --resize [union|new|fail] Controls how the model's output layer is resized if the training data contains different classes. `union` adds new classes (former `add`), `new` resizes to match the training data (former `both`), and `fail` aborts training if there is a mismatch. [default: new] --line-width INTEGER Height of baselines in the target image after scaling. [default: 8] --padding <INTEGER INTEGER>... Padding (left/right, top/bottom) around the page image. [default: 0, 0] --freq FLOAT Model saving and report generation frequency in epochs during training. If frequency is >1 it must be an integer, i.e. running validation every n-th epoch. [default: 1.0] --quit [early|fixed] Stop condition for training. Choose `early` for early stopping or `fixed` for a fixed number of epochs. [default: fixed] --epochs INTEGER Number of epochs to train for when using fixed stopping. [default: 50] --min-epochs INTEGER Minimum number of epochs to train for before early stopping is allowed. [default: 0] --lag INTEGER RANGE Early stopping patience (number of validation steps without improvement). Measured by val_mean_iu. [default: 10; x>=1] --optimizer [Adam|SGD|RMSprop|Lamb] Optimizer to use during training. [default: Adam] --lrate FLOAT Learning rate for the optimizer. [default: 0.0002] --momentum FLOAT Momentum parameter for applicable optimizers. [default: 0.9] --weight-decay FLOAT Weight decay parameter for the optimizer. [default: 1e-05] --schedule [constant|1cycle|exponential|cosine|step|reduceonplateau] Set learning rate scheduler. For 1cycle, cycle length is determined by the `--step- size` option. [default: constant] --completed-epochs INTEGER Number of epochs already completed. Used for resuming training. [default: 0] --augment Use data augmentation during training. --step-size INTEGER Step size for learning rate scheduler. [default: 10] --gamma FLOAT Gamma for learning rate scheduler. [default: 0.1] --rop-factor FLOAT Factor for reducing learning rate on plateau. [default: 0.1] --rop-patience INTEGER Patience for reducing learning rate on plateau. [default: 5] --cos-t-max INTEGER Maximum number of epochs for cosine annealing. [default: 50] --cos-min-lr FLOAT Minimum learning rate for cosine annealing. [default: 2e-05] --warmup INTEGER Number of warmup epochs for cosine annealing. [default: 0] --precision [64|32|bf16|16] Numerical precision to use for training. Default is 32-bit single-point precision. [default: 32] --suppress-regions Disable region segmentation training. --suppress-baselines Disable baseline segmentation training. -vr, --valid-regions TEXT Comma-separated list of valid regions to include in the training. Applied before region merging. -vb, --valid-baselines TEXT Comma-separated list of valid baselines to include in the training. Applied before baseline merging. -mr, --merge-regions TEXT... Region merge mapping. One or more mappings of the form '-mr SOURCE TARGET', where 'SOURCE' is merged into 'TARGET'. -mb, --merge-baselines TEXT... Baseline merge mapping. One or more mappings of the form '-mb SOURCE TARGET', where 'SOURCE' is merged into 'TARGET'. --logging [ERROR|WARNING|INFO|DEBUG] Set logging level. [default: ERROR]$ octopy segment --help Usage: octopy segment [OPTIONS] IMAGES... Segment images using Kraken. IMAGES: Specify one or more image files to segment. Supports multiple file paths, wildcards, or directories (with the -g option). Options: -m, --model FILE Path to custom segmentation model(s). If not provided, the default Kraken model is used. -o, --output DIRECTORY Output directory for processed files. Defaults to the parent directory of each input file. -s, --suffix TEXT Suffix for output PageXML files. Should end with '.xml'. [default: .xml] -d, --device TEXT Specify the processing device (e.g. 'cpu', 'cuda:0',...). See PyTorch documentation. [default: cpu] --creator TEXT Metadata: Creator of the PageXML files. [default: octopy] --direction [hlr|hrl|vlr|vrl] Text direction of input images. [default: hlr] --suppress-lines Suppress lines in the output PageXML. --suppress-regions Suppress regions in the output PageXML. Creates a single dummy region for the whole image. --fallback INTEGER Use a default bounding box when the polygonizer fails to create a polygon around a baseline (in pixels). --heatmap TEXT Generate a heatmap image alongside the PageXML output. Specify the file extension for the heatmap (e.g., `.hm.png`). --logging [ERROR|WARNING|INFO|DEBUG] Set logging level. [default: ERROR]Developed at Centre for Philology and Digitality (ZPD), University of Würzburg.