ModelLoader & Pipelines¶
openmed.core.models.ModelLoader is the backbone for all runtime integration. It centralizes Hugging Face discovery, credential management, caching, and pipeline instantiation so you can move between quick experiments and production runners without rewriting glue code.
When to use it¶
- You want to reuse a single tokenizer/pipeline across many documents.
- You need to load multiple models (e.g., disease + pharma) side-by-side.
- You are deploying a service/CLI and prefer to hydrate everything at startup.
- You require maximum control over device placement, dtype, batch size, or tokenizer configuration.
Essentials¶
from openmed.core import ModelLoader, OpenMedConfig
config = OpenMedConfig(
device="cuda",
cache_dir="~/.cache/openmed",
hf_token="hf_api_token_if_needed",
)
loader = ModelLoader(config=config)
pipeline = loader.create_pipeline(
"disease_detection_superclinical",
task="token-classification",
aggregation_strategy="simple",
use_fast_tokenizer=True,
)
raw = pipeline("Administered paclitaxel alongside trastuzumab.")
create_pipelineaccepts any kwargs supported bytransformers.pipeline.- Tokens are cached per model/config combination; repeated calls reuse the same HF objects.
ModelLoader.get_max_sequence_length(model_name)infers tokenizer limits when you need manual truncation logic.
Discovery helpers¶
from openmed.core import ModelLoader
loader = ModelLoader()
print(loader.list_available_models(include_registry=True, include_remote=False))
print(loader.list_available_models(include_registry=True, include_remote=True)[:5])
These functions power openmed.list_models() and the CLI equivalents. Use them to present dropdowns in UIs or to pre-flight deployments before running inference.
Device & caching strategy¶
- CPU-only deployments: leave
device="cpu"and skip installing GPU runtimes. Transformers will usetorchCPU wheels. - GPU deployments: set
device="cuda"orcuda:1, and optionally configuretorch_dtype="auto"viaOpenMedConfig.pipeline. - Air-gapped or repeated builds: point
cache_dirat a persistent volume; ModelLoader instructs HF to reuse it. - Provide
hf_tokenwhen you consume gated models, or rely onHfFoldercredentials.
Token helpers¶
If you need raw token alignment, use the tokenization utilities that ship alongside the loader:
from openmed.processing import TokenizationHelper
model_data = loader.load_model("anatomy_detection_electramed")
token_helper = TokenizationHelper(model_data["tokenizer"])
encoding = token_helper.tokenize_with_alignment("BP 120/80. Start metformin 500mg bid.")
print(encoding["tokens"][:10])
load_model lets you access the underlying HF AutoModel + AutoTokenizer for workflows that outgrow pipelines.
Sentence detection reuse¶
ModelLoader does not run pySBD itself, but it exposes hooks so analyze_text can pass in the tokenizer/pipeline it creates. If you batch many texts with custom segmentation, build the segmenter once and reuse it:
from openmed.processing import sentences
segmenter = sentences.create_segmenter(language="en", clean=True)
for doc in docs:
segments = sentences.segment_text(doc, segmenter=segmenter)
Troubleshooting checklist¶
- “Tokenizer length mismatch”: call
loader.get_max_sequence_length(model_name)and setmax_lengthexplicitly. - “Model not found”: confirm it exists in
openmed.core.model_registry.OPENMED_MODELSor pass a full HF path and setinclude_remote=Trueif you rely on discovery. - Slow cold-starts: prefetch pipelines at startup and mount the cache dir on SSD/NVMe storage.