Clinical AI that never leaves the device.
1,000+ state-of-the-art healthcare NLP models, 30+ new Portuguese PII models, and 25+ curated datasets — one local-first runtime story across Python MLX on Apple Silicon and Swift-native OpenMedKit, Apache-2.0 end to end.
One MLX story across Python and Swift.
Prototype with Python MLX on Apple Silicon, then ship the same local-first PII and clinical NLP experience inside native Swift apps with OpenMedKit — same artifacts, same behavior.
Accelerated local workflows on Apple Silicon
Install openmed[mlx] to run local inference, PII extraction, and benchmark workflows with Apple-native MLX acceleration on Mac.
Read the MLX backend guide →Swift-native PII and clinical NLP for Apple apps
Bring detection, smart entity merging, and local model execution into macOS, iOS, and iPadOS apps without sending PHI off device.
Explore OpenMedKit docs →Move from notebooks to native apps with less friction
The same MLX artifacts power the Python and Swift paths — prototype in a notebook, then ship the identical model into a native app without a second packaging track.
See the shared artifact story →Clinical text de-identification, built for HIPAA & GDPR.
Language-aware PII models across nine languages. Process data locally — your PHI never leaves your environment.
Context-aware PHI detection
Presidio-inspired scoring boosts confidence when keywords like SSN:, DOB:, or NPI: appear near detected entities.
Checksum & format validation
Built-in validators reduce false positives: French NIR, German Steuer-ID, Italian Codice Fiscale, Spanish DNI/NIE, Dutch BSN, Portuguese CPF/CNPJ, Luhn.
Smart entity merging
Subword tokenizers split 123-45-6789 into fragments — semantic patterns reassemble complete SSNs, phones, and multi-word entities.
Zero data movement
Process PHI entirely on your infrastructure. No API calls to external services. Your clinical data never leaves your secure environment.
Flexible redaction methods
Mask with entity-type labels [NAME], redact completely, or replace with synthetic data. Configurable thresholds for precision control.
HIPAA Safe Harbor ready
Detects all 18 HIPAA Safe Harbor identifiers — part of a wider 55+ PII entity catalog — across English, Spanish, French, German, Italian, Dutch, Hindi, Telugu, and Portuguese clinical text.
State-of-the-art, by entity type.
Production-ready LLMs for healthcare and clinical AI across 13 biomedical domains — chemicals, diseases, genes, proteins, species, anatomy, oncology.
Four lines to production.
Composable Python APIs for notebooks and services. Same call shape across local MLX, CPU, and cloud.
-
analyze_text(...)One-call inference with structured outputs -
extract_pii / deidentifyLanguage-aware across 9 languages -
BatchProcessor(...)Progress callbacks, per-item results -
OpenMedConfig.from_profile(...)dev / prod / test profiles
Started as a SOTA NER paper. Grew into the catalog.
The original work — domain-adaptive pre-training with parameter-efficient LoRA on 350k biomedical passages — set state-of-the-art on 10 of 12 NER benchmarks. Since then, the catalog has expanded into multilingual PII, Apple-native MLX variants, and curated datasets across the broader OpenMed collection.
When the cloud is the fastest path to production.
Marketplace-managed model packages with HIPAA-compliant infrastructure, sub-100ms endpoints, and end-to-end encryption.
OpenMed NER Genome Detection
Deploy secure inference endpoints in under five minutes.
OpenMed NER Species Detection
Accelerate biosurveillance and antimicrobial stewardship pipelines.
SageMaker Starter Notebooks
Batch transform, real-time endpoints, and cost governance examples.
Questions from the clinical, ML, and compliance teams.
If we haven't answered yours, reach out — we reply from the same people who train the models.
An open-source medical NLP toolkit. Specialized transformer models fine-tuned for biomedical named-entity recognition — diseases, drugs, genes, anatomy, chemicals, oncology — plus PII extraction and de-identification across nine languages. Ships as a Python package, a Dockerized FastAPI service, and a Swift package (OpenMedKit) for macOS and iOS apps. Apache-2.0, no vendor lock-in, runs on your infrastructure.
Encoder transformers (BERT, ELECTRA, DeBERTa families), not generative chat models. They do extraction and classification — pulling structured entities out of unstructured clinical text — and stay small enough to run on a laptop or a phone. The paper (arXiv:2508.01630) reports new state-of-the-art on 10 of 12 biomedical NER benchmarks. Think of them as complementary to the larger generative "medical LLM" category, not a replacement.
Nowhere you don't send it. You download the models once (Hugging Face or a private mirror) and inference runs wherever you run the Python process, the Docker container, or the Swift app — your laptop, your VPC, an on-prem server, or air-gapped hardware. No telemetry, no license check-in, no outbound calls at runtime. PHI stays on your side of the fence by default.
The PII catalog covers all 18 HIPAA Safe Harbor identifiers across English, Spanish, French, German, Italian, Dutch, Hindi, Telugu, and Portuguese — with locale-aware validators for SSN, NIR, Steuer-ID, Codice Fiscale, DNI, BSN, CPF/CNPJ, and Luhn checks. Models are trained on de-identified, ethically sourced corpora. OpenMed provides the technical controls (on-device processing, configurable thresholds, multiple redaction methods); the legal compliance boundary lives in your deployment.
Yes. Models are published on Hugging Face under permissive licensing with full training recipes. The reference approach combines lightweight domain-adaptive pre-training on a 350k-passage biomedical corpus with parameter-efficient LoRA fine-tuning — updating less than 1.5% of model parameters and completing in under 12 hours on a single GPU (<1.2 kg CO₂e). Tokenizer assets and starter notebooks are in the openmed-starter repo.
No. The Python toolkit, OpenMedKit (Swift), and MLX backend are fully self-contained Apache-2.0 — you can build and ship without ever touching the Agent. OpenMed Agent is a separate medical agent currently in preview — a terminal-native runner for clinical workflows on top of the same stack.