Device Tiers and SLOs¶
The device tier is the contract a checkpoint promises a device. Fitting the declared tier is a hard gate for release checks, while model quality gates are measured separately.
Param ranges map existing OpenMed size labels and are illustrative pending manifest generation. The gate reads the manifest's real param_count, not the tier word or the exact M-counts below.
| Tier | Param range (illustrative) | Existing sizes | Reference device | Target RAM (resident) | Target latency (p50 / p95, ~1 page) | Default format |
|---|---|---|---|---|---|---|
| Tiny (default mobile; incl. distilled Nano floor) | ~10–135M (Nano 10–30M distilled; Small-44M, LiteClinical-66M) | Small/Lite | Phone / tablet (Apple Silicon), embedded | ≤ 350 MB (Nano ≤ 150 MB) | ≤ 60 ms / ≤ 150 ms (Nano ≤ 25 / ≤ 60 ms) | INT8 (MLX-8bit / CoreML) |
| Base (laptop default) | ~135–280M (Base-125/135/184M, TinyMed-135M) | Base | Laptop CPU, modest GPU | ≤ 900 MB | ≤ 150 ms / ≤ 400 ms | INT8 (FP fallback) |
| Large (workstation) | ~430–570M (SuperClinical-434M, ~568M) | Large | Workstation GPU | ≤ 4 GB | ≤ 250 ms / ≤ 800 ms | FP16 (INT8 if recall holds) |
| Accurate / XLarge (server) | ~600M+ / MoE (incl. ~1.4B privacy-filter) | XLarge / MoE | Server GPU | ≤ 8 GB (MoE) | ≤ 400 ms / ≤ 1200 ms | FP16 (INT8 if recall holds) |
Machine-Readable Budgets¶
openmed.eval.tiers.TIERS is the machine-readable SLO source for gate harnesses. It exposes four named rows: Tiny, Base, Large, and Accurate-XLarge. Each row carries:
| Field | Meaning |
|---|---|
ram_mb_max | Maximum resident RAM budget for the tier. |
p50_ms_max | Maximum p50 latency budget for approximately one page. |
p95_ms_max | Maximum p95 latency budget for approximately one page. |
default_format | Default release artifact format for the tier. |
The Large and Accurate-XLarge RAM values are represented as 4096 MB and 8192 MB in code.
Notes¶
- Tier ≠ tier word across families ("Large" = 434M for token-class NER but ~459–568M elsewhere; "Base" = 184M or 220M depending on family). The manifest carries the real
param_count; tier words on cards are descriptive only. - Nano is a distillation target folded into the Tiny tier, not a separate scheme.
- Default selection off the manifest's
tier: Tiny on mobile, Base on laptop, Large/Accurate only when the caller opts in or recall demands it. - The latency/RAM budgets above are the single per-tier SLO source; the §3 engine and §7 gates reference this table, not separate numbers.