Skip to content

Async De-identification Jobs

POST /jobs accepts a multi-document de-identification batch and returns a job id immediately. The service processes jobs on a bounded local worker pool and persists only job metadata: status, progress, document hashes, label counts, offsets, span hashes, and webhook delivery status.

Raw submitted text is not written to the local job store. It remains in memory only until a worker finishes the job.

Submit a Job

curl -s http://127.0.0.1:8080/jobs \
  -H 'Content-Type: application/json' \
  -d '{
    "documents": [
      {"id": "note-1", "text": "Patient Maria Garcia arrived at 09:00."},
      {"id": "note-2", "text": "Call Maria Garcia at 555-1212."}
    ],
    "method": "mask",
    "webhook": {
      "url": "https://pipeline.example.com/openmed/jobs",
      "secret": "replace-with-shared-secret",
      "max_attempts": 3,
      "backoff_seconds": 0.5
    }
  }'

The response uses HTTP 202 and includes id, status, progress_percent, document_count, and status_url.

Poll Status

curl -s http://127.0.0.1:8080/jobs/<job-id>

Status moves from queued to running and then to done or failed. Progress is updated after each document. Terminal records include:

  • processed_count and failed_count
  • label_histogram
  • spans with document_id, offsets, labels, confidence, and text hashes
  • documents with ids, lengths, and document hashes
  • webhook_delivery after callback attempts complete

The job API does not return raw redacted text. Use synchronous POST /pii/deidentify when a caller needs the redacted payload in the response.

Webhook Payloads

Terminal jobs trigger a signed webhook when webhook is provided. The payload contains no raw PHI:

{
  "event": "job.done",
  "job_id": "<job-id>",
  "status": "done",
  "progress_percent": 100.0,
  "document_count": 2,
  "processed_count": 2,
  "failed_count": 0,
  "label_histogram": {"NAME": 2, "PHONE": 1},
  "spans": [
    {
      "document_id": "note-1",
      "start": 8,
      "end": 20,
      "label": "NAME",
      "text_hash": "sha256:...",
      "confidence": 0.99
    }
  ],
  "documents": [
    {"id": "note-1", "length": 42, "text_hash": "sha256:..."}
  ],
  "error": null,
  "completed_at": "2026-07-01T10:00:00Z"
}

Webhook requests include:

  • X-OpenMed-Event: job.done or job.failed
  • X-OpenMed-Timestamp: Unix timestamp
  • X-OpenMed-Signature: sha256=<hmac>

The signature is computed over <timestamp>.<canonical-json-body> using the configured shared secret and HMAC-SHA256. Non-2xx responses and transport errors are retried with exponential backoff.

Local Store Configuration

The default metadata store is ~/.cache/openmed/service-jobs.json. Override it for containers or tests:

OPENMED_SERVICE_JOBS_STORE_PATH=/var/lib/openmed/jobs.json \
OPENMED_SERVICE_JOBS_TTL_SECONDS=86400 \
OPENMED_SERVICE_JOBS_WORKERS=2 \
uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080

OPENMED_SERVICE_JOBS_TTL_SECONDS controls terminal metadata cleanup. OPENMED_SERVICE_JOBS_WORKERS bounds concurrent job execution. The local backend is intended for one service process and does not provide multi-node distribution.