A modern, well-structured Python library for parsing and processing COBOL Copybook files. This library provides comprehensive support for COBOL data structures including REDEFINES, INDEXED BY, and OCCURS clauses, with robust error handling and extensive test coverage.
- Complete COBOL Support: Parse REDEFINES, INDEXED BY, and OCCURS statements
- Modern Python: Type hints, dataclasses, and modern Python patterns
- Comprehensive Testing: Extensive test suite with high coverage
- CLI Interface: Command-line tool for processing COBOL files
- Library API: Easy-to-use Python API for integration
- Database Ready: Generate database-safe field names
- Logging: Built-in logging for debugging and monitoring
- Error Handling: Robust error handling with informative messages
# Install from PyPI pip install python-cobol # Install for development git clone https://github.com/rodrigo/python-cobol.git cd python-cobol pip install -e ".[dev]"# Process a COBOL file with all features enabled python-cobol example.cbl # Skip denormalization python-cobol example.cbl --skip-denormalize # Enable verbose logging python-cobol example.cbl --verbose # See all options python-cobol --helpfrom python_cobol import process_cobol # Read and process a COBOL file with open("example.cbl", "r") as f: fields = process_cobol(f.readlines()) # Access field information for field in fields: print(f"Field: {field['name']}, Level: {field['level']}") if field['pic']: print(f" PIC: {field['pic']}") print(f" Type: {field['pic_info']['type']}") print(f" Length: {field['pic_info']['length']}")- Character fields:
PIC X(10) - Numeric fields:
PIC 9(5) - Signed fields:
PIC S9(5) - Decimal fields:
PIC 9(5)V99 - Signed decimal:
PIC S9(5)V99
05 FIELD-1 OCCURS 3 TIMES PIC X(10). 05 GROUP-1 OCCURS 2 TIMES. 10 SUB-FIELD-1 PIC X(5). 10 SUB-FIELD-2 PIC 9(3).05 FIELD-1 PIC X(10). 05 FIELD-2 REDEFINES FIELD-1 PIC 9(10).05 FIELD-1 OCCURS 3 TIMES INDEXED BY IDX-1 PIC X(10).Complete processing pipeline that:
- Cleans COBOL lines
- Parses field definitions
- Handles REDEFINES
- Denormalizes OCCURS
- Cleans field names
- Makes names database-safe
Parse a PIC clause and return structured information:
pic_info = parse_pic_string('S9(5)V99') # Returns: PicInfo(type='Signed Float', length=7, precision=2)Convert multi-line COBOL statements to single lines.
Parse COBOL lines into structured dictionaries.
Expand OCCURS clauses into individual fields.
Clean field names with options:
ensure_unique_names: Add suffixes for uniquenessstrip_prefix: Remove prefixes before first dashmake_database_safe: Replace dashes with underscores
@dataclass class PicInfo: type: str # 'Char', 'Integer', 'Float', 'Signed Integer', etc. length: int # Total field length precision: int # Decimal places (for numeric fields)@dataclass class CobolField: level: int name: str pic: Optional[str] = None pic_info: Optional[PicInfo] = None occurs: Optional[int] = None indexed_by: Optional[str] = None redefines: Optional[str] = None# Clone the repository git clone https://github.com/rodrigo/python-cobol.git cd python-cobol # Install in development mode pip install -e ".[dev]" # Install pre-commit hooks pre-commit install# Run all tests make test # Run tests with coverage make test-cov # Run specific test file python -m pytest tests/test_core.py -v# Format code make format # Run linting make lint # Run all checks (format, lint, test) make checkpython-cobol/ βββ python_cobol/ # Main package β βββ __init__.py # Package initialization β βββ core.py # Core parsing functions β βββ models.py # Data models β βββ patterns.py # Regular expression patterns β βββ cli.py # Command-line interface βββ tests/ # Test suite β βββ test_core.py # Core functionality tests β βββ test_example.py # Integration tests β βββ example.cbl # Test COBOL file βββ pyproject.toml # Project configuration βββ requirements.txt # Runtime dependencies βββ requirements-dev.txt # Development dependencies βββ Makefile # Development tasks βββ .pre-commit-config.yaml # Code quality hooks βββ README.md # This file Input COBOL:
01 CUSTOMER-RECORD. 05 CUSTOMER-ID PIC 9(10). 05 CUSTOMER-NAME PIC X(50). 05 CUSTOMER-BALANCE PIC S9(10)V99.Python Code:
from python_cobol import process_cobol cobol_lines = [ "01 CUSTOMER-RECORD.", " 05 CUSTOMER-ID PIC 9(10).", " 05 CUSTOMER-NAME PIC X(50).", " 05 CUSTOMER-BALANCE PIC S9(10)V99." ] fields = process_cobol(cobol_lines) for field in fields: print(f"{field['name']}: {field['pic_info']['type']}")Output:
CUSTOMER_RECORD: Group CUSTOMER_ID: Integer CUSTOMER_NAME: Char CUSTOMER_BALANCE: Signed Float Input COBOL:
01 ORDER-RECORD. 05 ORDER-ITEMS OCCURS 5 TIMES. 10 ITEM-CODE PIC X(10). 10 ITEM-QUANTITY PIC 9(3). 10 ITEM-PRICE PIC S9(7)V99.Python Code:
from python_cobol import process_cobol cobol_lines = [ "01 ORDER-RECORD.", " 05 ORDER-ITEMS OCCURS 5 TIMES.", " 10 ITEM-CODE PIC X(10).", " 10 ITEM-QUANTITY PIC 9(3).", " 10 ITEM-PRICE PIC S9(7)V99." ] fields = process_cobol(cobol_lines) # Print all denormalized fields for field in fields: print(f"{field['name']}: {field['pic']}")Output:
ORDER_RECORD: None ITEM_CODE_1: X(10) ITEM_QUANTITY_1: 9(3) ITEM_PRICE_1: S9(7)V99 ITEM_CODE_2: X(10) ITEM_QUANTITY_2: 9(3) ITEM_PRICE_2: S9(7)V99 ... Input COBOL:
01 DATA-RECORD. 05 TEXT-FIELD PIC X(20). 05 NUMERIC-FIELD REDEFINES TEXT-FIELD PIC 9(20).Python Code:
from python_cobol import process_cobol cobol_lines = [ "01 DATA-RECORD.", " 05 TEXT-FIELD PIC X(20).", " 05 NUMERIC-FIELD REDEFINES TEXT-FIELD PIC 9(20)." ] fields = process_cobol(cobol_lines) # Only NUMERIC-FIELD remains after REDEFINES processing for field in fields: print(f"{field['name']}: {field['pic']}")Output:
DATA_RECORD: None NUMERIC_FIELD: 9(20) We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Run tests:
make test - Run linting:
make lint - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
This project uses:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
All code should pass these tools before submission.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
- Original code by Paulus Schoutsen
- PIC parsing logic inspired by pyCOBOL
- Community contributors and maintainers
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: GitHub Wiki
- Complete refactoring with modern Python practices
- Added type hints throughout
- Improved error handling and logging
- Enhanced CLI interface
- Comprehensive test suite
- Modern project structure with pyproject.toml
- Pre-commit hooks for code quality
- Detailed documentation and examples
- Basic COBOL parsing functionality
- Support for REDEFINES, OCCURS, INDEXED BY
- Simple CLI interface