Configuration & Validation¶
Pairing OpenMedConfig with the validation helpers lets you reproduce experiments, keep cache paths predictable, and guard APIs against malformed inputs.
OpenMedConfig sources¶
OpenMedConfig reads values in the following order:
- Explicit keyword arguments when you instantiate it.
- Environment variables prefixed with
OPENMED_. - YAML file passed via
OPENMED_CONFIG_FILE(oropenmed_config=argument). - Sensible defaults (CPU device,
~/.cache/openmed, unauthenticated Hugging Face access).
from pathlib import Path
from openmed.core import ModelLoader, OpenMedConfig
config = OpenMedConfig.from_file(Path.home() / ".config/openmed/config.yaml")
loader = ModelLoader(config=config)
ner = loader.create_pipeline("disease_detection_superclinical", aggregation_strategy="simple")
entities = ner("Dapagliflozin added for HFpEF symptom relief.")
Minimal YAML file¶
~/.config/openmed/config.yaml
default_org: OpenMed
device: cuda
cache_dir: ~/.cache/openmed
hf_token: ${HF_TOKEN} # optional
pipeline:
aggregation_strategy: simple
return_all_scores: false
Environment variables override YAML values, making it easy to swap devices or cache directories in CI/CD:
Validation helpers¶
from openmed.utils.validation import (
validate_input,
validate_model_name,
)
text = validate_input(
user_supplied_text,
max_length=2000,
allow_empty=False,
strip=True,
)
model_id = validate_model_name("disease_detection_superclinical")
validate_inputtrims whitespace, enforces max lengths, and raises informative errors for API clients.validate_model_namenormalizes registry aliases and protects service endpoints from arbitrary HF IDs.
Logging and tracing¶
from openmed.utils import setup_logging
from openmed.core import ModelLoader
setup_logging(level="INFO", json=True)
loader = ModelLoader()
- Use JSON output with your log shipper or disable it during notebooks.
- Combine with
OPENMED_DISABLE_WARNINGS=1when you want the quietest possible inference loop.
Cache & device tips¶
- CPU-only teams: keep
device="cpu"and rely on HF caching. PyTorch installs stay optional unless you add theglinerextra. - GPU nodes: set
device="cuda"and optionallytorch_dtype=float16insideOpenMedConfig.pipeline. - Shared runners: point
cache_dirat an ephemeral volume per job to avoid artifacts leaking between builds.