A comprehensive collection of standardized enumerations and value sets for data science, bioinformatics, materials science, and beyond.
Data standardization is hard. Every project reinvents the wheel with custom enums, inconsistent naming, and no semantic meaning.
Common Value Sets solves this by providing:
- π Rich, standardized enumerations β Pre-defined value sets across multiple domains
- 𧬠Semantic meaning β Every value is linked to ontology terms (when possible)
- π Python-first convenience β Work with simple enums, get semantics for free
- π Multi-language support β Generate JSON Schema, TypeScript, and more
- π Interoperability β Built on LinkML standards for maximum compatibility
Different datasets often represent the same concept in incompatible ways:
M/Fmale/female1/2
They all mean the same thing, but they donβt interoperate.
With Common Value Sets, you can instead use a shared enum:
from valuesets.enums.core import SexEnum s = SexEnum.MALE print(s.value) # "MALE" print(s.get_meaning()) # "NCIT:C20197" print(s.get_description())# "Male sex"from valuesets.enums.bio.structural_biology import StructuralBiologyTechnique from valuesets.enums.spatial.spatial_qualifiers import AnatomicalSide # Rich enums with metadata and ontology mappings technique = StructuralBiologyTechnique.CRYO_EM print(technique.value) # "CRYO_EM" print(technique.get_description()) # "Cryo-electron microscopy" print(technique.get_meaning()) # "CHMO:0002413" (Chemical Methods Ontology) print(technique.get_annotations()) # {'resolution_range': '2-30 Γ
typical', ...} # Spatial relationships with BSPO mappings side = AnatomicalSide.LEFT print(side.get_meaning()) # "BSPO:0000000" (Biological Spatial Ontology) # Look up enums by their ontology terms found = AnatomicalSide.from_meaning("BSPO:0000000") # Returns LEFTfrom valuesets.enums.statistics import StatisticalTest, PValueThreshold from valuesets.enums.data_science import DatasetSplitType, ModelType # Standardized statistical tests with STATO ontology mappings test = StatisticalTest.STUDENTS_T_TEST print(test.get_meaning()) # "STATO:0000176" print(test.get_description()) # "Student's t-test for comparing means" # ML pipeline with standard splits split = DatasetSplitType.TRAIN model = ModelType.RANDOM_FOREST # P-value thresholds with clear semantics threshold = PValueThreshold.SIGNIFICANT print(threshold.get_annotations()) # {'value': 0.05, 'symbol': '*'}from valuesets.enums.bio.taxonomy import CommonOrganismTaxaEnum, BiologicalKingdom from valuesets.enums.bio.cell_biology import CellCyclePhase, CellType # Model organisms with NCBI Taxonomy IDs human = CommonOrganismTaxaEnum.HUMAN print(human.get_meaning()) # "NCBITaxon:9606" print(human.get_description()) # "Homo sapiens (human)" # Cell biology with CL and GO mappings phase = CellCyclePhase.S_PHASE print(phase.get_meaning()) # "GO:0000084" neuron = CellType.NEURON print(neuron.get_meaning()) # "CL:0000540" # Get all organisms at a specific taxonomic level mammals = [org for org in CommonOrganismTaxaEnum if 'MAMMALIA' in str(org)]- 𧬠Biology:
- Structural Biology: Cryo-EM techniques, crystallization methods, detectors
- Cell Biology: Cell types, cell cycle phases, organelles
- Taxonomy: Model organisms (all with NCBI Taxonomy IDs)
- π Spatial: Anatomical directions, planes, relationships (BSPO mapped)
- π Statistics: Statistical tests (STATO mapped), p-value thresholds
- π§ͺ Data Science: ML model types, dataset splits, metrics
- βοΈ Materials Science: Crystal structures, characterization methods
- π₯ Clinical/Medical: Blood types (SNOMED), vital status
- π Environmental: Exposure routes, pollutants
- β‘ Energy: Sources, storage methods, efficiency ratings
- π§ Geography: Country codes (ISO), time zones, coordinate systems
- β° Time: Temporal relationships, periods, frequencies
- πΌ Academic: Publication types, research roles, funding sources
- π Industrial: Manufacturing processes, quality standards
Use the raw LinkML schemas for data modeling, validation, and documentation:
# Direct schema usage Person: attributes: vital_status: range: VitalStatusEnum # ALIVE, DECEASED, UNKNOWNGet Python enums with full IDE support, type checking, and semantic metadata:
# Type-safe enums with ontology mappings status = VitalStatusEnum.ALIVE print(status.meaning) # "NCIT:C37987"Write simple code, get semantic meaning automatically:
# Example: Different systems use different names for the same concept from valuesets.enums.medical import BloodTypeEnum from external_system import PatientBloodType # Third-party enum # Even though the enum values might be named differently: # BloodTypeEnum.A_POSITIVE vs PatientBloodType.A_POS # They map to the same SNOMED code: SNOMED:278149003 if blood_type.get_meaning() == patient_blood.get_meaning(): # Semantic interoperability - works across different naming conventions process_compatible_blood_type() # Or use the utility function if same_meaning_as(blood_type, patient_blood): process_compatible_blood_type()Generate schemas and types for any language:
# Generate JSON Schema for web apps gen-jsonschema schema.yaml # Generate TypeScript definitions gen-typescript schema.yaml -t typescript # Generate JSON-LD gen-jsonld schema.yaml- Excel/Google Sheets: Generate dropdown validation lists
- Web forms: Auto-generate select options with descriptions
- APIs: Standardized response codes and classifications
- Databases: Consistent foreign key constraints
# Some enums support hierarchical is_a relationships from valuesets.enums import ViralGenomeTypeEnum # Baltimore classification with hierarchy positive_rna = ViralGenomeTypeEnum.SSRNA_POSITIVE # Group IV # inherits from SSRNA (single-stranded RNA)from valuesets.enums.bio.structural_biology import CryoEMGridType grid = CryoEMGridType.QUANTIFOIL metadata = grid.get_metadata() print(metadata) # { # 'name': 'QUANTIFOIL', # 'value': 'QUANTIFOIL', # 'description': 'Quantifoil holey carbon grid', # 'annotations': { # 'hole_sizes': '1.2/1.3, 2/1, 2/2 ΞΌm common', # 'manufacturer': 'Quantifoil' # } # } # Get all grid types with their descriptions at once all_grids = CryoEMGridType.get_all_descriptions() # {'C_FLAT': 'C-flat holey carbon grid', 'QUANTIFOIL': ...}from valuesets.enums.spatial import AnatomicalPlane # Get all ontology mappings for an enum mappings = AnatomicalPlane.get_all_meanings() print(mappings) # {'SAGITTAL': 'BSPO:0000417', 'CORONAL': 'BSPO:0000019', ...} # List all metadata for every value in an enum all_metadata = AnatomicalPlane.list_metadata() for name, meta in all_metadata.items(): print(f"{name}: {meta.get('description', 'No description')}") # Find enum by ontology term (useful for data integration) plane = AnatomicalPlane.from_meaning("BSPO:0000417") # Returns SAGITTALSome enums in this collection are dynamic enums that can be expanded at runtime by querying ontologies. This uses LinkML's Dynamic Enum feature.
# Example: A dynamic enum that pulls values from an ontology CellTypeEnum: # Dynamic expansion from Cell Ontology reachable_from: source_ontology: obo:cl source_nodes: - CL:0000540 # neuron include_self: false relationship_types: - rdfs:subClassOfNote: Runtime expansion support is coming soon! Currently, dynamic enums provide:
- β Static values with ontology mappings
- β Metadata and descriptions
- π§ Runtime expansion from ontologies (coming in next release)
When runtime expansion is available, you'll be able to:
# Future: Dynamically expand enum with all neuron subtypes cell_types = CellTypeEnum.expand_from_ontology() # Would add: MOTOR_NEURON, SENSORY_NEURON, INTERNEURON, etc.Full Documentation Website β
The value sets are also available as an OWL ontology for semantic web applications and ontology browsers:
- Direct Download: https://w3id.org/valuesets/valuesets.owl.ttl
- BioPortal: Available at BioPortal
- Ontology Lookup Service (OLS): Submission planned for OLS
The OWL representation allows you to:
- Browse value sets in ontology browsers
- Perform SPARQL queries
- Integrate with semantic web applications
- Link to other biomedical ontologies
We plan to add maturity level metadata to each enum to help users understand their readiness:
- π’ Stable: Production-ready, well-tested, unlikely to change
- π‘ Beta: Usable but may have minor changes
- π΄ Draft: Under development, expect changes
# Future: Check maturity before use if enum_def.maturity_level == MaturityLevel.STABLE: use_in_production()Split the package into domain-specific modules for lighter installs:
# Future: Install only what you need pip install valuesets-core # Core functionality pip install valuesets-bio # Biological domains pip install valuesets-materials # Materials science pip install valuesets-clinical # Clinical/medical- Domain Packages: Community-maintained domain-specific value sets
- Organization Standards: Company/institution-specific enums that extend base sets
- Mapping Tables: Cross-ontology and cross-standard mappings
- π€ AI/LLM Integration: Semantic annotations optimized for language models
- π Usage Analytics: Track which enums are most used, identify gaps
- π Version Management: Handle enum evolution with deprecation warnings
- π Multi-ontology Support: Map single values to multiple ontologies
- π Fuzzy Matching: Find enums by approximate string matching
git clone https://github.com/linkml/valuesets cd valuesets uv installjust --list # Show all available commands just test # Run tests just doctest # Run doctests just lint # Run linting just site # Build documentation siteWe welcome contributions! Whether you're adding new domains, improving existing enums, or fixing bugs:
- Domain Experts: Contribute standardized value sets for your field
- Developers: Add utility functions, improve tooling, fix issues
- Users: Report missing enums, suggest improvements, share use cases
βββ src/valuesets/ β βββ schema/ # π LinkML YAML schemas (source of truth) β β βββ bio/ # Biological domains β β β βββ cell_biology.yaml β β β βββ structural_biology.yaml β β β βββ taxonomy.yaml β β βββ spatial/ # Spatial and anatomical β β β βββ spatial_qualifiers.yaml β β βββ statistics.yaml β β βββ core.yaml β βββ enums/ # π Generated Python enums β β βββ <auto-generated from schemas> β βββ generators/ # π§ Rich enum generator β β βββ rich_enum.py β βββ validators/ # β Ontology validation β βββ enum_evaluator.py βββ docs/ # π Documentation βββ tests/ # π§ͺ Test cases βββ test_rich_enums.py # Rich enum functionality βββ validators/ # Ontology validation tests Built with LinkML and the linkml-project-copier template.
Making data standardization simple, semantic, and scalable π