Skip to content

aaronstevenwhite/glazing

Repository files navigation

Glazing

PyPI version Python versions CI Documentation License DOI

Unified data models and interfaces for syntactic and semantic frame ontologies.

Features

  • πŸš€ One-command setup: glazing init downloads and prepares all datasets
  • πŸ“¦ Type-safe models: Pydantic v2 validation for all data structures
  • πŸ” Unified search: Query across all datasets with consistent API
  • πŸ”— Cross-references: Automatic mapping between resources with confidence scores
  • 🎯 Fuzzy search: Find data with typos, spelling variants, and inconsistencies
  • 🐳 Docker support: Use via Docker without local installation
  • πŸ’Ύ Efficient storage: JSON Lines format with streaming support
  • 🐍 Modern Python: Full type hints, Python 3.13+ support

Installation

Via pip

pip install glazing

Via Docker

Build and run Glazing in a containerized environment:

# Build the image git clone https://github.com/aaronstevenwhite/glazing.git cd glazing docker build -t glazing:latest . # Initialize datasets (persisted in volume) docker run --rm -v glazing-data:/data glazing:latest init # Use the CLI docker run --rm -v glazing-data:/data glazing:latest search query "give" docker run --rm -v glazing-data:/data glazing:latest search query "transfer" --fuzzy # Interactive Python session docker run --rm -it -v glazing-data:/data --entrypoint python glazing:latest

See the installation docs for more Docker usage examples.

Quick Start

Initialize all datasets (one-time setup, ~54MB download):

glazing init

Then start using the data:

from glazing.search import UnifiedSearch # Automatically uses default data directory after 'glazing init' search = UnifiedSearch() results = search.search("give") for result in results[:5]: print(f"{result.dataset}: {result.name} - {result.description}")

CLI Usage

Search across datasets:

# Search all datasets glazing search query "abandon" # Search specific dataset glazing search query "run" --dataset verbnet # Find data with typos or spelling variants glazing search query "realize" --fuzzy glazing search query "organize" --fuzzy --threshold 0.8

Resolve cross-references:

# Extract cross-reference index (one-time setup) glazing xref extract # Find cross-references glazing xref resolve "give.01" --source propbank glazing xref resolve "give-13.1" --source verbnet # Find data with variations or inconsistencies glazing xref resolve "realize.01" --source propbank --fuzzy

Python API

Load and work with individual datasets:

from glazing.framenet.loader import FrameNetLoader from glazing.verbnet.loader import VerbNetLoader # Loaders automatically use default paths and load data after 'glazing init' fn_loader = FrameNetLoader() # Data is already loaded frames = fn_loader.frames vn_loader = VerbNetLoader() # Data is already loaded verb_classes = list(vn_loader.classes.values())

Cross-reference resolution:

from glazing.references.index import CrossReferenceIndex # Automatic extraction on first use (cached for future runs) xref = CrossReferenceIndex() # Resolve references for a PropBank roleset refs = xref.resolve("give.01", source="propbank") print(f"VerbNet classes: {refs['verbnet_classes']}") print(f"Confidence scores: {refs['confidence_scores']}") # Find data with variations or inconsistencies refs = xref.resolve("realize.01", source="propbank", fuzzy=True) print(f"Found match with fuzzy search: {refs['verbnet_classes']}")

Fuzzy search in Python:

from glazing.search import UnifiedSearch # Find data with typos or spelling variants search = UnifiedSearch() results = search.search_with_fuzzy("organize", fuzzy_threshold=0.8) for result in results[:5]: print(f"{result.dataset}: {result.name} (score: {result.score:.2f})")

Supported Datasets

  • FrameNet 1.7: Semantic frames and frame elements
  • PropBank 3.4: Predicate-argument structures
  • VerbNet 3.4: Verb classes with thematic roles
  • WordNet 3.1: Synsets and lexical relations

Documentation

Full documentation available at https://glazing.readthedocs.io.

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Development setup git clone https://github.com/aaronstevenwhite/glazing cd glazing pip install -e ".[dev]"

Citation

If you use Glazing in your research, please cite:

@software{glazing2025, author = {White, Aaron Steven}, title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies}, year = {2025}, url = {https://github.com/aaronstevenwhite/glazing}, doi = {10.5281/zenodo.17467082} }

License

This package is licensed under an MIT License. See LICENSE file for details.

Links

Acknowledgments

This project was funded by a National Science Foundation (BCS-2040831) and builds upon the foundational work of the FrameNet, PropBank, VerbNet, and WordNet teams.