OpenMed 1.1.0 · Portuguese PII · Python MLX · OpenMedKit

Clinical AI that never leaves the device.

1,000+ state-of-the-art healthcare NLP models, 30+ new Portuguese PII models, and 25+ curated datasets — one local-first runtime story across Python MLX on Apple Silicon and Swift-native OpenMedKit, Apache-2.0 end to end.

Apple MLX · 1,000+ models · On-device · Apache-2.0
pii.detect() · on-device
0/5 entities

                    
lang="en" · HIPAA Safe Harbor 0 bytes left device
Live PHI detection · 9 languages ▶ watch
1,000+
HF Models
25+
Curated Datasets
55+
PII Entities
9
Languages
<1.2kg
CO₂e Training
The runtime story

One MLX story across Python and Swift.

Prototype with Python MLX on Apple Silicon, then ship the same local-first PII and clinical NLP experience inside native Swift apps with OpenMedKit — same artifacts, same behavior.

Python MLX

Accelerated local workflows on Apple Silicon

Install openmed[mlx] to run local inference, PII extraction, and benchmark workflows with Apple-native MLX acceleration on Mac.

Read the MLX backend guide →
OpenMedKit

Swift-native PII and clinical NLP for Apple apps

Bring detection, smart entity merging, and local model execution into macOS, iOS, and iPadOS apps without sending PHI off device.

Explore OpenMedKit docs →
Shared MLX artifacts

Move from notebooks to native apps with less friction

The same MLX artifacts power the Python and Swift paths — prototype in a notebook, then ship the identical model into a native app without a second packaging track.

See the shared artifact story →
Healthcare data privacy

Clinical text de-identification, built for HIPAA & GDPR.

Language-aware PII models across nine languages. Process data locally — your PHI never leaves your environment.

Context-aware PHI detection

Presidio-inspired scoring boosts confidence when keywords like SSN:, DOB:, or NPI: appear near detected entities.

Keyword boosting 100-char window

Checksum & format validation

Built-in validators reduce false positives: French NIR, German Steuer-ID, Italian Codice Fiscale, Spanish DNI/NIE, Dutch BSN, Portuguese CPF/CNPJ, Luhn.

NIR Steuer-ID Codice Fiscale CPF / CNPJ

Smart entity merging

Subword tokenizers split 123-45-6789 into fragments — semantic patterns reassemble complete SSNs, phones, and multi-word entities.

BIO tags Regex patterns

Zero data movement

Process PHI entirely on your infrastructure. No API calls to external services. Your clinical data never leaves your secure environment.

Air-gapped On-premises

Flexible redaction methods

Mask with entity-type labels [NAME], redact completely, or replace with synthetic data. Configurable thresholds for precision control.

Mask Redact Replace

HIPAA Safe Harbor ready

Detects all 18 HIPAA Safe Harbor identifiers — part of a wider 55+ PII entity catalog — across English, Spanish, French, German, Italian, Dutch, Hindi, Telugu, and Portuguese clinical text.

18 Safe Harbor 55+ entity types 9 languages
Quickstart

Four lines to production.

Composable Python APIs for notebooks and services. Same call shape across local MLX, CPU, and cloud.

  • analyze_text(...)
    One-call inference with structured outputs
  • extract_pii / deidentify
    Language-aware across 9 languages
  • BatchProcessor(...)
    Progress callbacks, per-item results
  • OpenMedConfig.from_profile(...)
    dev / prod / test profiles
Read full API guide
openmed 1.1.0 · quickstart

                
Paper · arXiv:2508.01630 · since 2025

Started as a SOTA NER paper. Grew into the catalog.

The original work — domain-adaptive pre-training with parameter-efficient LoRA on 350k biomedical passages — set state-of-the-art on 10 of 12 NER benchmarks. Since then, the catalog has expanded into multilingual PII, Apple-native MLX variants, and curated datasets across the broader OpenMed collection.

25+
Curated Datasets
MIMIC-III, PubMed, BC5CDR, BC2GM, JNLPBA…
210
PII Models
English, Spanish, French, German, Italian, Dutch, Hindi, Telugu, Portuguese
55+
PII Entity Types
Locale-aware: SSN, NIR, Steuer-ID, CF, DNI, BSN, CPF/CNPJ…
200+
MLX Variants
Apple Silicon ready, BERT-family supported
Healthcare AI FAQ

Questions from the clinical, ML, and compliance teams.

If we haven't answered yours, reach out — we reply from the same people who train the models.

What is OpenMed, exactly?

An open-source medical NLP toolkit. Specialized transformer models fine-tuned for biomedical named-entity recognition — diseases, drugs, genes, anatomy, chemicals, oncology — plus PII extraction and de-identification across nine languages. Ships as a Python package, a Dockerized FastAPI service, and a Swift package (OpenMedKit) for macOS and iOS apps. Apache-2.0, no vendor lock-in, runs on your infrastructure.

Are these generative LLMs or something else?

Encoder transformers (BERT, ELECTRA, DeBERTa families), not generative chat models. They do extraction and classification — pulling structured entities out of unstructured clinical text — and stay small enough to run on a laptop or a phone. The paper (arXiv:2508.01630) reports new state-of-the-art on 10 of 12 biomedical NER benchmarks. Think of them as complementary to the larger generative "medical LLM" category, not a replacement.

Where does my data go when I use OpenMed?

Nowhere you don't send it. You download the models once (Hugging Face or a private mirror) and inference runs wherever you run the Python process, the Docker container, or the Swift app — your laptop, your VPC, an on-prem server, or air-gapped hardware. No telemetry, no license check-in, no outbound calls at runtime. PHI stays on your side of the fence by default.

Does OpenMed support HIPAA-aligned workflows?

The PII catalog covers all 18 HIPAA Safe Harbor identifiers across English, Spanish, French, German, Italian, Dutch, Hindi, Telugu, and Portuguese — with locale-aware validators for SSN, NIR, Steuer-ID, Codice Fiscale, DNI, BSN, CPF/CNPJ, and Luhn checks. Models are trained on de-identified, ethically sourced corpora. OpenMed provides the technical controls (on-device processing, configurable thresholds, multiple redaction methods); the legal compliance boundary lives in your deployment.

Can I fine-tune OpenMed models for my own vocabulary?

Yes. Models are published on Hugging Face under permissive licensing with full training recipes. The reference approach combines lightweight domain-adaptive pre-training on a 350k-passage biomedical corpus with parameter-efficient LoRA fine-tuning — updating less than 1.5% of model parameters and completing in under 12 hours on a single GPU (<1.2 kg CO₂e). Tokenizer assets and starter notebooks are in the openmed-starter repo.

Do I need OpenMed Agent to use OpenMed?

No. The Python toolkit, OpenMedKit (Swift), and MLX backend are fully self-contained Apache-2.0 — you can build and ship without ever touching the Agent. OpenMed Agent is a separate medical agent currently in preview — a terminal-native runner for clinical workflows on top of the same stack.