Skip to content

OpenMed v1.7.0

OpenMed 1.7.0 is the multimodal de-identification and evaluation-depth release.

This release turns the v1.6 privacy assurance foundation into a broader local-first clinical data platform: multimodal document intake, OCR adapters, structured clinical extraction, FHIR/HL7/CDA/CSV de-identification, streaming and batch service controls, richer policy profiles, and typed clients for Python, Swift, and TypeScript.

It also deepens the evidence layer around releases: benchmark scorecards, threshold sweeps, leakage heatmaps, membership-inference probes, k-anonymity metrics, utility-loss reports, release-gate previews, evidence bundles, SBOMs, reproducible locks, and stronger supply-chain checks.

Release date: 2026-07-01.

Highlights

  • Added multimodal document primitives, source spans, OCR adapters, image/PDF redaction, Markdown/AsciiDoc offset extraction, metadata scrubbing, JSONL chat-log de-identification, and language-aware OCR configuration.
  • Added de-identification adapters for CDA/C-CDA XML, HL7 v2, CSV/TSV, FHIR $de-identify, FHIR Bulk NDJSON, deterministic FHIR bundles, OperationOutcome, Provenance, AuditEvent, and CodeableConcept validation/provenance helpers.
  • Added clinical extraction and normalization for lab values, vital signs, medication sigs, problem lists, summary cards, microbiology concepts, dermatology and ophthalmology domains, clinical concept labels, assertion context, status vocabulary, and clinical-term protection.
  • Expanded language and locale coverage with Indonesian, Thai, Hebrew RTL, Polish PESEL, Korean RRN, Unicode script segmentation, locale national-ID providers, and deterministic locale PHI generation.
  • Added runtime de-identification features including typed AnalyzeResult, DeidentificationResult.to_dataframe(), redaction preview diffs, cross-document surrogate vaults, patient-keyed date shifting, format-preserving redaction, minimum-necessary strength selection, custom recognizers, streaming incremental de-identification, explain traces, section stamping, and per-document risk budgets.
  • Added new CLI and SDK surfaces: openmed deid, openmed fhir bundle, openmed models recommend, openmed models diff, openmed policy diff, openmed doctor, openmed gates preview, openmed gates bundle, openmed audit, openmed risk, model-card previews, a typed Python REST client, a TypeScript REST client, and Swift policy-profile support.
  • Added REST service operations for model warm pools, dynamic batching, request coalescing, rate and concurrency limits, readiness/liveness split, graceful shutdown, trusted hosts, CORS configuration, Prometheus-style metrics, Docker Compose, and Kafka streaming de-identification.
  • Added evaluation and release evidence tooling for DrugProt, public biomedical NER, i2b2, clinical PHI manifests, dataset cards, fixture coverage, per-section recall, result caching, fairness, robustness, error analysis, leakage heatmaps, membership inference, linkage attacks, k-anonymity/l-diversity/t-closeness, audit diffs, model scorecards, threshold sweeps, flaky-run detection, paired significance, calibration reliability, utility loss, policy compliance, benchmark history diffs, nano-tier certification, and risk dashboards.
  • Added model and backend work for Laneformer MLX-LM, MLX INT4 certification, Core ML INT8 export, AWQ/GPTQ recipes, bitsandbytes 4-bit loading, FlashAttention/SDPA/eager selection, PyTorch MPS tuning, ONNX/WebGPU and Transformers.js exports, tokenizer caching, Mode-A distillation, DAPT corpus assembly, DirectID, hard-negative sampling, and span-relation graph decoding.
  • Added security and release engineering coverage: root responsible disclosure policy, breach runbook and report template, CycloneDX SBOM generation, reproducible-lock workflow, lockfile drift gate, GitHub Actions ref validation, manifest schema validation, dependency updates, doctest-backed examples, and PHI-safe diagnostics.

The Big Picture

OpenMed v1.7.0 is about broadening what can be safely de-identified and proving more of the surrounding workflow.

The v1.6 release established policy-aware spans, audit reports, risk scoring, and release gates. v1.7 applies that foundation to more input types and operational paths: scanned documents, PDFs, Markdown and AsciiDoc, structured tables, HL7 v2, CDA/C-CDA XML, FHIR operations, bulk NDJSON, chat logs, Kafka streams, and REST clients.

The evaluation story also moves from single aggregate scores toward release-grade evidence: per-language leakage heatmaps, per-section recall, dataset cards, fixture coverage, model scorecards, threshold sweeps, fairness and robustness reports, utility-loss metrics, and evidence bundles that make gate decisions easier to inspect.

Multimodal And Document Redaction

The new multimodal layer gives OpenMed a shared contract for document ingestion and source-preserving redaction.

This release adds:

  • openmed.multimodal primitives for document content, source spans, handlers, and lazy registration
  • image redaction and OCR result contracts
  • Tesseract, PaddleOCR, EasyOCR, and docTR engine adapters
  • OCR language configuration for non-English scans
  • PDF coordinate projection for detected spans
  • Markdown and AsciiDoc text extraction with character-offset maps
  • metadata scrubbing and verification for images, PDFs, and DOCX files
  • JSONL chat-log de-identification with role and turn structure preservation

The result is a cleaner path from scanned or mixed-format clinical material to audited text, coordinates, redactions, and PHI-safe summaries.

Interoperability And Structured Health Data

v1.7.0 adds a large set of healthcare interchange helpers.

New or expanded interop paths include:

  • CDA/C-CDA XML de-identification
  • HL7 v2 message parsing and field-level redaction
  • CSV/TSV PHI column classification and redaction manifests
  • FHIR $de-identify operation wrappers
  • FHIR Bulk NDJSON de-identification
  • deterministic FHIR Bundle assembly and stable urn:uuid references
  • FHIR OperationOutcome generation
  • FHIR Provenance and AuditEvent emission from signed audit reports
  • CodeableConcept builders, text fallback checks, and code-system version pinning
  • PHILTER, pyDeid, GLiNER-BioMed, LangChain, and spaCy adapter surfaces

These additions make OpenMed more useful in pipelines that already speak health-data formats instead of plain text alone.

Clinical Extraction And Context

Clinical extraction is now broader and more context-aware.

This release adds or expands:

  • dermatology and ophthalmology zero-shot domains
  • microbiology labels and routing metadata
  • lab reference range parsing and flag handling
  • vital signs extraction
  • medication sig normalization
  • problem-list deduplication and status reconciliation
  • clinical summary cards without PHI
  • clinical concept canonical labels
  • clinical assertion context records
  • negation, section-aware ConText assertions, and sentence-bounded cue scope
  • clinical term protection to reduce false PII hits on protected clinical vocabulary
  • status vocabulary normalization for substance, employment, and living-status language

Together these changes reduce over-redaction risk and improve downstream structured clinical outputs.

Language, Locale, And Identifier Coverage

v1.7.0 expands multilingual PII coverage and locale correctness.

New coverage includes:

  • Indonesian PII with NIK validation
  • Thai PII patterns and fixtures
  • Hebrew RTL PII handling
  • Polish PESEL validation
  • South Korean RRN validation
  • Unicode script detection and mixed-script segmentation
  • locale national-ID provider registry
  • per-language surrogate coherence regression coverage
  • locale-aware date and number normalization
  • deterministic locale PHI generation for training and evaluation

Runtime, Service, And Clients

The runtime and service layers are more production-oriented.

Notable additions:

  • model warm pools and resident model limits
  • dynamic REST batching
  • request coalescing for identical in-flight work
  • rate-limit and concurrency-limit middleware
  • split /livez and /readyz probes with graceful shutdown behavior
  • trusted-host and CORS configuration
  • optional Prometheus-style metrics without PHI-derived labels
  • Docker Compose with HF cache volume
  • Kafka streaming de-identification connector
  • typed Python REST client
  • TypeScript REST client
  • Swift OpenMedKit de-identification JSON export and bundled policy profiles

Evaluation, Risk, And Release Evidence

The evaluation system now produces more granular evidence for privacy and model quality.

New capabilities include:

  • DrugProt public relation-extraction evaluation
  • biomedical NER benchmark suite
  • i2b2 eval-only loader
  • clinical PHI dataset manifest
  • dataset cards without row text
  • fixture coverage reports
  • per-section recall reports
  • eval result caching
  • model fleet freshness metrics
  • fairness, robustness, and error-analysis reports
  • leakage heatmaps by label and language
  • membership-inference and linkage attack modes
  • k-anonymity, l-diversity, and t-closeness reports
  • audit-report diffs
  • model scorecards
  • threshold sweeps and paired significance testing
  • flaky-run variance detection
  • calibration reliability data
  • over-redaction and utility-loss reports
  • policy-profile compliance evaluation
  • cross-release benchmark history diffs
  • risk dashboard rendering
  • release-gate dry-run previews and evidence bundles

Models, Backends, And Training

v1.7.0 adds model export and training infrastructure across Apple, browser, PyTorch, and quantized paths.

New work includes:

  • Laneformer MLX language model runtime
  • MLX INT4 recall certification
  • Core ML INT8 palettized export
  • AWQ and GPTQ 4-bit recipes
  • bitsandbytes 4-bit load-time quantization
  • FlashAttention, SDPA, and eager attention backend selection
  • PyTorch Metal/MPS device selection and tuning
  • ONNX/WebGPU export artifacts
  • Transformers.js browser export target
  • tokenizer caching
  • Mode-A knowledge distillation
  • DAPT corpus assembler
  • DirectID tiny-head contract
  • hard-negative training sampler
  • span-relation graph decoder
  • ONNX and quantized artifact publishing metadata

Security, Supply Chain, And Docs

The release also hardens project operations.

Additions include:

  • root SECURITY.md and private vulnerability reporting guidance
  • contributor onboarding and community health files
  • breach notification runbook and breach report template
  • CycloneDX SBOM generation and release attachment
  • reproducible-lock workflow
  • lockfile drift gate
  • GitHub Actions ref validation
  • manifest schema validation command and CI gate
  • release docs for trust status, clinical context, SBOM, reproducible dependencies, OpenAPI, REST clients, quantization, and policy workflows
  • doctest-backed examples for public APIs
  • explicit UTF-8 file/subprocess handling, subprocess timeouts, and lazy logging formatting fixes

Fixes

  • Audit span records now sort deterministically across runs.
  • Date shifting avoids zero-day offsets, preserves shifted intervals by default, and handles slash/dash two-digit-year formats more consistently.
  • Unicode offset redaction in the pipeline was fixed.
  • Duplicate FHIR resource IDs are rejected.
  • JSON loading paths in core, eval, NER, and risk modules raise clearer errors or fail closed on corrupt JSON.
  • Optional-extra failures now produce clearer skipped-capability metadata.
  • Safety sweeps reduce over-redaction from bare numeric identifier patterns.
  • GitHub Actions ref validation and setup-uv pinning were tightened.
  • Manifest validation now runs through uv.
  • The CLI fallback message for the basic TUI entry point was corrected.

Migration Notes

  • analyze_text(..., output_format="dict") now returns a frozen AnalyzeResult; to_dict() and mapping access preserve the legacy payload shape.
  • Clinical term protection is enabled by default in PII extraction and the staged pipeline, which can suppress ambiguous PERSON, LOCATION, or ORG matches that exactly match protected clinical vocabulary.
  • FHIR OperationOutcome output emits R4 issue.expression; legacy issue.location is accepted on input but is not emitted.
  • ServiceRuntime.get_loader() now returns the warm-pool proxy. Use get_model_loader() when raw loader access is required.
  • Custom OCR engines should accept the keyword-only languages parameter.
  • OCR auto-selection can choose optional EasyOCR or docTR engines when they are installed.
  • REST deployments using custom Host headers must configure OPENMED_SERVICE_TRUSTED_HOSTS; wildcard CORS and trusted-host settings are rejected.
  • Unsupported Core ML architectures now fail before model loading/tracing, and --quantized-output requires --quantize int8.
  • The canonical label set expanded with clinical concept labels.
  • format_preserve expands the de-identification action enum and schema surface.
  • The legacy shift_dates boolean remains accepted, but new code should prefer method="shift_dates" with explicit date-shift options.

PR Review Summary

Reviewed range: v1.6.0..origin/release/openmed-170

  • Last release: v1.6.0 at 1863350b
  • Reviewed head: origin/release/openmed-170 at 36ae8666
  • First-parent commits reviewed: 189
  • Merged PR commits reviewed: 148
  • Additional squash/direct PRs covered by these notes: 36
  • Total PRs covered by these notes: 184
  • Direct maintenance commits without a new PR number reviewed: e7f080b4 (revert grouped Actions update), 38e86172 (pin setup-uv action), cd3f2b9e (run manifest validation through uv), d1061d7b (README update), and 79d56791 (avoid MkDocs autorefs in doctest outputs)
  • Aggregate diff: 483 files changed, 87,085 insertions, 1,222 deletions

Major PR groups:

  • Multimodal, OCR, and document redaction: #555, #567, #717, #749, #558, #726, #745, #755, #758
  • Interop and structured health data: #566, #642, #631, #629, #626, #625, #553, #557, #737, #777, #784, #689, #690, #704, #372
  • Clinical extraction and context: #552, #410, #560, #568, #565, #718, #683, #773, #684, #691, #738, #739, #782, #785, #774, #698
  • Language, locale, and identifiers: #562, #747, #746, #748, #709, #609, #610, #614, #766
  • Evaluation, risk, and release gates: #617, #615, #743, #701, #688, #703, #702, #725, #680, #724, #723, #740, #735, #681, #682, #753, #754, #762, #765, #734, #764, #744, #786, #337, #351, #633, #634, #635, #636, #637, #638, #732, #379
  • Service, CLI, and SDKs: #632, #630, #750, #742, #722, #788, #789, #756, #741, #721, #780, #771, #772, #730, #775, #787, #696, #728, #776, #564
  • Models, backends, and training: #644, #620, #619, #627, #759, #760, #761, #719, #736, #790, #383, #751, #622, #612, #349, #603, #1020
  • Security, docs, CI, and release engineering: #648, #647, #716, #720, #1083, #693, #700, #607, #710, #711, #712, #713, #714, #715, #1021, #409, #697, #763, #185, #640, #646, #604, #605, #407

Open Items Before Tagging

  • Run the full project lint, test, build, and release-gate suite if you want coverage beyond the focused release-prep gates listed below.
  • Confirm the v1.7.0 release-gate candidate bundle has required metadata, calibration evidence, span fixtures, quantization evidence, SBOM, and reproducible-lock evidence.
  • If the GitHub release body is generated automatically, confirm it is based on v1.6.0...v1.7.0 and keep the Fixes #399 issue reference out of the PR count.
  • Decide whether CHANGELOG.md should be expanded to include the 36 squash/direct PRs as explicit PR-number references, or whether the detailed release-note inventory is the source of truth for that level of traceability.

Verification For This Draft

  • The v1.6.0 GitHub release body and local RELEASE_NOTES_v1.6.0.md were used as the style reference.
  • The reviewed range is v1.6.0..origin/release/openmed-170; current reviewed head is 36ae8666.
  • The first-parent range contains 189 commits: 148 merge commits plus 41 direct first-parent commits.
  • GitHub generated release notes for v1.7.0 resolve to 184 PR links. The extra #399 in a commit title is an issue reference, not a PR, and is not counted as a PR.
  • Remaining 1.6 search hits after the version bump are historical changelog entries, compare ranges, CycloneDX/spec/style/dependency values, or changelog-generator examples.
  • python3 scripts/release/check_release_version.py --version 1.7.0 passed.
  • python3 scripts/release/check_repo_policy.py passed.
  • Focused tests passed: 70 passed, 2 warnings for release changelog, FHIR provenance, audit/risk CLI, OpenAPI spec, and service API tests.
  • make docs-build passed under MkDocs strict mode.
  • git diff --check passed.

What's Changed

New Contributors

Full Changelog: https://github.com/maziyarpanahi/openmed/compare/v1.6.0...v1.7.0