Skip to content

royopa/python-cobol

Β 
Β 

Repository files navigation

Python COBOL Copybook Parser

PyPI version Python 3.8+ License: GPL-3.0 Code style: black Imports: isort

A modern, well-structured Python library for parsing and processing COBOL Copybook files. This library provides comprehensive support for COBOL data structures including REDEFINES, INDEXED BY, and OCCURS clauses, with robust error handling and extensive test coverage.

✨ Features

  • Complete COBOL Support: Parse REDEFINES, INDEXED BY, and OCCURS statements
  • Modern Python: Type hints, dataclasses, and modern Python patterns
  • Comprehensive Testing: Extensive test suite with high coverage
  • CLI Interface: Command-line tool for processing COBOL files
  • Library API: Easy-to-use Python API for integration
  • Database Ready: Generate database-safe field names
  • Logging: Built-in logging for debugging and monitoring
  • Error Handling: Robust error handling with informative messages

πŸš€ Quick Start

Installation

# Install from PyPI pip install python-cobol # Install for development git clone https://github.com/rodrigo/python-cobol.git cd python-cobol pip install -e ".[dev]"

Basic Usage

Command Line Interface

# Process a COBOL file with all features enabled python-cobol example.cbl # Skip denormalization python-cobol example.cbl --skip-denormalize # Enable verbose logging python-cobol example.cbl --verbose # See all options python-cobol --help

Python API

from python_cobol import process_cobol # Read and process a COBOL file with open("example.cbl", "r") as f: fields = process_cobol(f.readlines()) # Access field information for field in fields: print(f"Field: {field['name']}, Level: {field['level']}") if field['pic']: print(f" PIC: {field['pic']}") print(f" Type: {field['pic_info']['type']}") print(f" Length: {field['pic_info']['length']}")

πŸ“– Documentation

Supported COBOL Features

PIC Clauses

  • Character fields: PIC X(10)
  • Numeric fields: PIC 9(5)
  • Signed fields: PIC S9(5)
  • Decimal fields: PIC 9(5)V99
  • Signed decimal: PIC S9(5)V99

OCCURS Clauses

05 FIELD-1 OCCURS 3 TIMES PIC X(10). 05 GROUP-1 OCCURS 2 TIMES. 10 SUB-FIELD-1 PIC X(5). 10 SUB-FIELD-2 PIC 9(3).

REDEFINES Clauses

05 FIELD-1 PIC X(10). 05 FIELD-2 REDEFINES FIELD-1 PIC 9(10).

INDEXED BY Clauses

05 FIELD-1 OCCURS 3 TIMES INDEXED BY IDX-1 PIC X(10).

API Reference

Core Functions

process_cobol(lines: List[str]) -> List[Dict]

Complete processing pipeline that:

  • Cleans COBOL lines
  • Parses field definitions
  • Handles REDEFINES
  • Denormalizes OCCURS
  • Cleans field names
  • Makes names database-safe
parse_pic_string(pic_str: str) -> PicInfo

Parse a PIC clause and return structured information:

pic_info = parse_pic_string('S9(5)V99') # Returns: PicInfo(type='Signed Float', length=7, precision=2)
clean_cobol(lines: List[str]) -> List[str]

Convert multi-line COBOL statements to single lines.

parse_cobol(lines: List[str]) -> List[Dict]

Parse COBOL lines into structured dictionaries.

denormalize_cobol(lines: List[Dict]) -> List[Dict]

Expand OCCURS clauses into individual fields.

clean_names(lines: List[Dict], **options) -> List[Dict]

Clean field names with options:

  • ensure_unique_names: Add suffixes for uniqueness
  • strip_prefix: Remove prefixes before first dash
  • make_database_safe: Replace dashes with underscores

Data Models

PicInfo

@dataclass class PicInfo: type: str # 'Char', 'Integer', 'Float', 'Signed Integer', etc. length: int # Total field length precision: int # Decimal places (for numeric fields)

CobolField

@dataclass class CobolField: level: int name: str pic: Optional[str] = None pic_info: Optional[PicInfo] = None occurs: Optional[int] = None indexed_by: Optional[str] = None redefines: Optional[str] = None

πŸ”§ Development

Setup Development Environment

# Clone the repository git clone https://github.com/rodrigo/python-cobol.git cd python-cobol # Install in development mode pip install -e ".[dev]" # Install pre-commit hooks pre-commit install

Running Tests

# Run all tests make test # Run tests with coverage make test-cov # Run specific test file python -m pytest tests/test_core.py -v

Code Quality

# Format code make format # Run linting make lint # Run all checks (format, lint, test) make check

Project Structure

python-cobol/ β”œβ”€β”€ python_cobol/ # Main package β”‚ β”œβ”€β”€ __init__.py # Package initialization β”‚ β”œβ”€β”€ core.py # Core parsing functions β”‚ β”œβ”€β”€ models.py # Data models β”‚ β”œβ”€β”€ patterns.py # Regular expression patterns β”‚ └── cli.py # Command-line interface β”œβ”€β”€ tests/ # Test suite β”‚ β”œβ”€β”€ test_core.py # Core functionality tests β”‚ β”œβ”€β”€ test_example.py # Integration tests β”‚ └── example.cbl # Test COBOL file β”œβ”€β”€ pyproject.toml # Project configuration β”œβ”€β”€ requirements.txt # Runtime dependencies β”œβ”€β”€ requirements-dev.txt # Development dependencies β”œβ”€β”€ Makefile # Development tasks β”œβ”€β”€ .pre-commit-config.yaml # Code quality hooks └── README.md # This file 

πŸ“‹ Examples

Example 1: Simple Field Processing

Input COBOL:

01 CUSTOMER-RECORD. 05 CUSTOMER-ID PIC 9(10). 05 CUSTOMER-NAME PIC X(50). 05 CUSTOMER-BALANCE PIC S9(10)V99.

Python Code:

from python_cobol import process_cobol cobol_lines = [ "01 CUSTOMER-RECORD.", " 05 CUSTOMER-ID PIC 9(10).", " 05 CUSTOMER-NAME PIC X(50).", " 05 CUSTOMER-BALANCE PIC S9(10)V99." ] fields = process_cobol(cobol_lines) for field in fields: print(f"{field['name']}: {field['pic_info']['type']}")

Output:

CUSTOMER_RECORD: Group CUSTOMER_ID: Integer CUSTOMER_NAME: Char CUSTOMER_BALANCE: Signed Float 

Example 2: OCCURS Processing

Input COBOL:

01 ORDER-RECORD. 05 ORDER-ITEMS OCCURS 5 TIMES. 10 ITEM-CODE PIC X(10). 10 ITEM-QUANTITY PIC 9(3). 10 ITEM-PRICE PIC S9(7)V99.

Python Code:

from python_cobol import process_cobol cobol_lines = [ "01 ORDER-RECORD.", " 05 ORDER-ITEMS OCCURS 5 TIMES.", " 10 ITEM-CODE PIC X(10).", " 10 ITEM-QUANTITY PIC 9(3).", " 10 ITEM-PRICE PIC S9(7)V99." ] fields = process_cobol(cobol_lines) # Print all denormalized fields for field in fields: print(f"{field['name']}: {field['pic']}")

Output:

ORDER_RECORD: None ITEM_CODE_1: X(10) ITEM_QUANTITY_1: 9(3) ITEM_PRICE_1: S9(7)V99 ITEM_CODE_2: X(10) ITEM_QUANTITY_2: 9(3) ITEM_PRICE_2: S9(7)V99 ... 

Example 3: REDEFINES Processing

Input COBOL:

01 DATA-RECORD. 05 TEXT-FIELD PIC X(20). 05 NUMERIC-FIELD REDEFINES TEXT-FIELD PIC 9(20).

Python Code:

from python_cobol import process_cobol cobol_lines = [ "01 DATA-RECORD.", " 05 TEXT-FIELD PIC X(20).", " 05 NUMERIC-FIELD REDEFINES TEXT-FIELD PIC 9(20)." ] fields = process_cobol(cobol_lines) # Only NUMERIC-FIELD remains after REDEFINES processing for field in fields: print(f"{field['name']}: {field['pic']}")

Output:

DATA_RECORD: None NUMERIC_FIELD: 9(20) 

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Run tests: make test
  5. Run linting: make lint
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

Code Style

This project uses:

  • Black for code formatting
  • isort for import sorting
  • flake8 for linting
  • mypy for type checking

All code should pass these tools before submission.

πŸ“„ License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Original code by Paulus Schoutsen
  • PIC parsing logic inspired by pyCOBOL
  • Community contributors and maintainers

πŸ“ž Support

πŸ”„ Changelog

Version 1.0.0

  • Complete refactoring with modern Python practices
  • Added type hints throughout
  • Improved error handling and logging
  • Enhanced CLI interface
  • Comprehensive test suite
  • Modern project structure with pyproject.toml
  • Pre-commit hooks for code quality
  • Detailed documentation and examples

Version 0.1.4 (Original)

  • Basic COBOL parsing functionality
  • Support for REDEFINES, OCCURS, INDEXED BY
  • Simple CLI interface

About

Python code to parse and denormalize COBOL Copybooks.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 93.2%
  • COBOL 3.8%
  • Makefile 3.0%