Skip to content

Device Tiers and SLOs

The device tier is the contract a checkpoint promises a device. Fitting the declared tier is a hard gate for release checks, while model quality gates are measured separately.

Param ranges map existing OpenMed size labels and are illustrative pending manifest generation. The gate reads the manifest's real param_count, not the tier word or the exact M-counts below.

Tier Param range (illustrative) Existing sizes Reference device Target RAM (resident) Target latency (p50 / p95, ~1 page) Default format
Tiny (default mobile; incl. distilled Nano floor) ~10–135M (Nano 10–30M distilled; Small-44M, LiteClinical-66M) Small/Lite Phone / tablet (Apple Silicon), embedded ≤ 350 MB (Nano ≤ 150 MB) ≤ 60 ms / ≤ 150 ms (Nano ≤ 25 / ≤ 60 ms) INT8 (MLX-8bit / CoreML)
Base (laptop default) ~135–280M (Base-125/135/184M, TinyMed-135M) Base Laptop CPU, modest GPU ≤ 900 MB ≤ 150 ms / ≤ 400 ms INT8 (FP fallback)
Large (workstation) ~430–570M (SuperClinical-434M, ~568M) Large Workstation GPU ≤ 4 GB ≤ 250 ms / ≤ 800 ms FP16 (INT8 if recall holds)
Accurate / XLarge (server) ~600M+ / MoE (incl. ~1.4B privacy-filter) XLarge / MoE Server GPU ≤ 8 GB (MoE) ≤ 400 ms / ≤ 1200 ms FP16 (INT8 if recall holds)

Machine-Readable Budgets

openmed.eval.tiers.TIERS is the machine-readable SLO source for gate harnesses. It exposes four named rows: Tiny, Base, Large, and Accurate-XLarge. Each row carries:

Field Meaning
ram_mb_max Maximum resident RAM budget for the tier.
p50_ms_max Maximum p50 latency budget for approximately one page.
p95_ms_max Maximum p95 latency budget for approximately one page.
default_format Default release artifact format for the tier.

The Large and Accurate-XLarge RAM values are represented as 4096 MB and 8192 MB in code.

Notes

  • Tier ≠ tier word across families ("Large" = 434M for token-class NER but ~459–568M elsewhere; "Base" = 184M or 220M depending on family). The manifest carries the real param_count; tier words on cards are descriptive only.
  • Nano is a distillation target folded into the Tiny tier, not a separate scheme.
  • Default selection off the manifest's tier: Tiny on mobile, Base on laptop, Large/Accurate only when the caller opts in or recall demands it.
  • The latency/RAM budgets above are the single per-tier SLO source; the §3 engine and §7 gates reference this table, not separate numbers.