Helm Deployment¶
The deploy/helm/openmed-service chart deploys the OpenMed de-identification REST service on Kubernetes with separate liveness and readiness probes, a persistent model cache volume, and configurable service runtime settings.
Install¶
Build and publish an OpenMed service image to a registry your cluster can pull, then install the chart with the matching image repository and tag:
helm upgrade --install openmed-service deploy/helm/openmed-service \
--namespace openmed \
--create-namespace \
--set image.repository=ghcr.io/maziyarpanahi/openmed \
--set image.tag=1.7.0
The default chart creates:
- a
Deploymentrunning the REST service on port8080 - a
Serviceexposing the HTTP port inside the cluster - a
ConfigMapfor non-secret OpenMed service settings - a
PersistentVolumeClaimmounted at/root/.cache/huggingface
The service stores OpenMed model cache files under /root/.cache/huggingface/openmed, so downloads survive pod restarts when the PVC is retained.
Upgrade¶
Upgrade by changing values and running the same release name:
helm upgrade openmed-service deploy/helm/openmed-service \
--namespace openmed \
--set image.repository=ghcr.io/maziyarpanahi/openmed \
--set image.tag=1.7.1
The chart does not create an Ingress or autoscaling object. Add those in environment-specific overlays so cluster ingress classes, certificates, and HPA policy stay outside the reusable chart.
Probes¶
The chart wires Kubernetes probes to the service endpoints added for orchestrated deployments:
- liveness probe:
GET /livez - readiness probe:
GET /readyz
/livez reports that the process is running. /readyz returns success only after configured model preload completes and flips back to not ready during graceful shutdown. Probe requests set Host: localhost so the service trusted host middleware accepts kubelet checks without weakening the application host allowlist.
Secrets¶
Do not put tokens or other secrets in values.yaml. Reference an existing Kubernetes Secret through extraEnv:
Values Reference¶
| Value | Default | Description |
|---|---|---|
replicaCount | 1 | Number of service pods. |
image.repository | openmed | Container image repository. |
image.tag | 1.7.0 | Container image tag. Empty uses Chart.appVersion. |
image.pullPolicy | IfNotPresent | Kubernetes image pull policy. |
imagePullSecrets | [] | Pull secrets for private image registries. |
nameOverride | "" | Short name override. |
fullnameOverride | "" | Full resource name override. |
podLabels | {} | Additional pod labels. |
podAnnotations | {} | Additional pod annotations. |
podSecurityContext | {} | Pod security context. |
securityContext | {} | Container security context. |
terminationGracePeriodSeconds | 45 | Pod shutdown grace period. |
config.profile | prod | OPENMED_PROFILE. |
config.cacheDir | /root/.cache/huggingface/openmed | OPENMED_CACHE_DIR. |
config.preloadModels | [] | Models joined into OPENMED_SERVICE_PRELOAD_MODELS. |
config.maxResidentModels | "" | OPENMED_SERVICE_MAX_RESIDENT_MODELS; empty means unbounded. |
config.keepAlive | 10m | OPENMED_SERVICE_KEEP_ALIVE. |
config.maxTextLength | 1000000 | OPENMED_SERVICE_MAX_TEXT_LENGTH. |
config.corsOrigins | [] | Exact origins joined into OPENMED_SERVICE_CORS_ORIGINS. |
config.trustedHosts | [] | Extra hosts appended to generated loopback and service DNS names. |
config.batching.enabled | false | OPENMED_SERVICE_BATCHING_ENABLED. |
config.batching.maxSize | 8 | OPENMED_SERVICE_BATCH_MAX_SIZE. |
config.batching.maxWaitMs | 5 | OPENMED_SERVICE_BATCH_MAX_WAIT_MS. |
config.coalescing.enabled | false | OPENMED_SERVICE_COALESCING_ENABLED. |
config.shutdownDrainSeconds | 30 | OPENMED_SERVICE_SHUTDOWN_DRAIN_SECONDS. |
config.metrics.enabled | false | OPENMED_SERVICE_METRICS_ENABLED. |
config.throttle.rateLimitRps | 0 | OPENMED_SERVICE_RATE_LIMIT_RPS; 0 disables rate limiting. |
config.throttle.rateLimitBurst | 0 | OPENMED_SERVICE_RATE_LIMIT_BURST. |
config.throttle.maxConcurrency | 0 | OPENMED_SERVICE_RATE_LIMIT_MAX_CONCURRENCY; 0 disables concurrency limiting. |
config.throttle.concurrencyWaitSeconds | 0.05 | OPENMED_SERVICE_CONCURRENCY_WAIT_SECONDS. |
config.throttle.keyBy | global | OPENMED_SERVICE_THROTTLE_KEY. |
extraEnv | [] | Extra container env entries, typically secret references. |
service.type | ClusterIP | Kubernetes Service type. |
service.port | 8080 | Service port. |
service.targetPort | 8080 | Container port. |
service.annotations | {} | Service annotations. |
probes.liveness.path | /livez | Liveness probe path. |
probes.liveness.initialDelaySeconds | 10 | Liveness initial delay. |
probes.liveness.periodSeconds | 10 | Liveness period. |
probes.liveness.timeoutSeconds | 3 | Liveness timeout. |
probes.liveness.failureThreshold | 3 | Liveness failure threshold. |
probes.readiness.path | /readyz | Readiness probe path. |
probes.readiness.initialDelaySeconds | 5 | Readiness initial delay. |
probes.readiness.periodSeconds | 5 | Readiness period. |
probes.readiness.timeoutSeconds | 3 | Readiness timeout. |
probes.readiness.failureThreshold | 3 | Readiness failure threshold. |
resources.requests.cpu | 500m | Requested CPU. |
resources.requests.memory | 1Gi | Requested memory. |
resources.limits.cpu | 2 | CPU limit. |
resources.limits.memory | 4Gi | Memory limit. |
persistence.enabled | true | Mount a model-cache PVC. |
persistence.existingClaim | "" | Existing PVC to mount instead of creating one. |
persistence.storageClassName | "" | StorageClass for generated PVC. Empty uses the cluster default. |
persistence.accessModes | [ReadWriteOnce] | PVC access modes. |
persistence.size | 20Gi | PVC size. |
persistence.mountPath | /root/.cache/huggingface | Model cache mount path. |
persistence.subPath | "" | Optional PVC subPath. |
persistence.annotations | {} | PVC annotations. |
nodeSelector | {} | Node selector. |
tolerations | [] | Pod tolerations. |
affinity | {} | Pod affinity. |
extraVolumes | [] | Extra pod volumes. |
extraVolumeMounts | [] | Extra container volume mounts. |