Kubernetes
For production at scale. The pattern: one Deployment per Titan app, one Statefulset for the daemon (with its persistent state), managed Postgres + Redis as separate clusters.
Topology
In Kubernetes you have two valid patterns:
| Pattern | When |
|---|---|
| Daemon-per-pod (Omnitron supervises the apps in the pod) | Small clusters; one or two services per pod |
| K8s-supervises-apps (Omnitron is the control plane only) | Large clusters; one Deployment per Titan app; Omnitron daemon as a separate StatefulSet for control/inspection |
This page covers pattern 2 — the production default.
Per-app Deployment
# k8s/api/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: platform
spec:
replicas: 3
selector:
matchLabels: { app: api }
template:
metadata:
labels: { app: api }
spec:
containers:
- name: api
image: registry.example.com/platform-api:v1.4.2
ports:
- { name: http, containerPort: 3001 }
env:
- { name: NODE_ENV, value: production }
- name: DATABASE_URL
valueFrom: { secretKeyRef: { name: api-secrets, key: database-url } }
- name: REDIS_URL
valueFrom: { secretKeyRef: { name: api-secrets, key: redis-url } }
- name: JWT_SECRET
valueFrom: { secretKeyRef: { name: api-secrets, key: jwt-secret } }
resources:
requests: { cpu: '500m', memory: '512Mi' }
limits: { cpu: '2000m', memory: '2Gi' }
livenessProbe:
httpGet: { path: /healthz, port: http }
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet: { path: /readyz, port: http }
initialDelaySeconds: 5
periodSeconds: 3
timeoutSeconds: 3
failureThreshold: 2
startupProbe:
httpGet: { path: /healthz, port: http }
initialDelaySeconds: 0
periodSeconds: 2
timeoutSeconds: 2
failureThreshold: 60 # 2 min grace for cold start
lifecycle:
preStop:
exec:
command: ['sh', '-c', 'sleep 10'] # let LB drain
terminationGracePeriodSeconds: 45
---
apiVersion: v1
kind: Service
metadata:
name: api
namespace: platform
spec:
selector: { app: api }
ports:
- { name: http, port: 80, targetPort: http }
Key points:
- Three probes: liveness, readiness, startup.
- Liveness →
/healthzfromtitan-health. Kill if it fails persistently. - Readiness →
/readyz. Stop sending traffic when failing; resume when passing. - Startup →
/healthzwith longerfailureThreshold— masks liveness during cold-start (especially when eager-loading heavy services).
- Liveness →
preStopsleep 10 s — lets the load balancer's endpoint list update before SIGTERM. Avoids 502s during pod rotation.terminationGracePeriodSeconds: 45— > app'sshutdown.timeout. Otherwise k8s SIGKILLs mid-drain.
Omnitron daemon as StatefulSet
The daemon needs persistent state (~/.omnitron/) and a stable
identity:
# k8s/daemon/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: omnitron
namespace: platform
spec:
serviceName: omnitron
replicas: 1
selector:
matchLabels: { app: omnitron }
template:
metadata:
labels: { app: omnitron }
spec:
containers:
- name: omnitron
image: registry.example.com/platform-omnitron:v1.4.2
ports:
- { name: tcp, containerPort: 9700 }
- { name: http, containerPort: 9800 }
env:
- { name: OMNITRON_HOME, value: /var/lib/omnitron }
- { name: NODE_ENV, value: production }
volumeMounts:
- { name: state, mountPath: /var/lib/omnitron }
resources:
requests: { cpu: '200m', memory: '256Mi' }
limits: { cpu: '1000m', memory: '1Gi' }
volumeClaimTemplates:
- metadata: { name: state }
spec:
accessModes: ['ReadWriteOnce']
resources: { requests: { storage: 5Gi } }
storageClassName: 'fast-ssd'
---
apiVersion: v1
kind: Service
metadata:
name: omnitron
namespace: platform
spec:
selector: { app: omnitron }
ports:
- { name: tcp, port: 9700, targetPort: tcp }
- { name: http, port: 9800, targetPort: http }
Operators can kubectl port-forward svc/omnitron 9800 and open
the webapp on their machine.
Ingress
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: platform
namespace: platform
annotations:
cert-manager.io/cluster-issuer: letsencrypt
nginx.ingress.kubernetes.io/proxy-body-size: '50m'
spec:
ingressClassName: nginx
tls:
- hosts: [api.example.com, app.example.com]
secretName: platform-tls
rules:
- host: api.example.com
http:
paths:
- { path: /, pathType: Prefix, backend: { service: { name: api, port: { number: 80 } } } }
- host: app.example.com
http:
paths:
- { path: /, pathType: Prefix, backend: { service: { name: omnitron, port: { number: 9800 } } } }
Two hosts — one for the API (Titan apps), one for the webapp.
Secrets
# k8s/secrets.yaml — do NOT commit. Use sealed-secrets or external-secrets in real life.
apiVersion: v1
kind: Secret
metadata:
name: api-secrets
namespace: platform
type: Opaque
stringData:
database-url: postgres://platform@platform-pg.svc.cluster.local:5432/platform
redis-url: redis://platform-redis.svc.cluster.local:6379
jwt-secret: '<set via secret manager>'
Production options:
- Sealed Secrets (Bitnami): commit encrypted secrets to git.
- External Secrets Operator: sync from AWS Secrets Manager / GCP Secret Manager / Vault.
- Pod-level secret manager mount (e.g., AWS Secrets Store CSI Driver).
HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api
namespace: platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }
- type: Resource
resource: { name: memory, target: { type: Utilization, averageUtilization: 80 } }
titan-pm's own autoscaler operates at the worker-pool level
within a process; HPA operates at the pod level. Use both —
HPA for cross-pod scale, titan-pm for cross-worker.
PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api
namespace: platform
spec:
minAvailable: 2
selector:
matchLabels: { app: api }
Keeps at least 2 pods up during voluntary disruptions (node drains, cluster upgrades).
NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-ingress
namespace: platform
spec:
podSelector:
matchLabels: { app: api }
policyTypes: [Ingress, Egress]
ingress:
- from:
- namespaceSelector: { matchLabels: { name: ingress-nginx } }
ports:
- { protocol: TCP, port: 3001 }
egress:
- to:
- namespaceSelector: { matchLabels: { kubernetes.io/metadata.name: platform } }
- to:
- namespaceSelector: { matchLabels: { kubernetes.io/metadata.name: kube-system } }
ports:
- { protocol: UDP, port: 53 }
Allow ingress only from the nginx ingress controller; restrict egress to in-namespace + DNS.
Rolling updates
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
maxUnavailable: 0 keeps capacity at 100% throughout the
rollout. maxSurge: 1 adds one new pod, waits for it to become
ready, then terminates an old one.
For high-throughput apps, set maxSurge: 25% for faster
rollouts.
Database migrations
Run as an initContainer or pre-deploy Job:
# Pre-deploy Job, runs to completion before pods rotate
apiVersion: batch/v1
kind: Job
metadata:
name: migrate-{{ .Values.deployVersion }}
namespace: platform
spec:
backoffLimit: 1
template:
spec:
restartPolicy: Never
containers:
- name: migrate
image: registry.example.com/platform-api:v1.4.2
command: ['pnpm', 'omnitron', 'infra', 'migrate']
env:
- { name: DATABASE_URL, valueFrom: { secretKeyRef: { name: api-secrets, key: database-url } } }
Wire as a Helm pre-install / pre-upgrade hook.
Webapp deployment
Static assets behind an nginx container:
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
replicas: 2
template:
spec:
containers:
- name: webapp
image: registry.example.com/platform-webapp:v1.4.2
ports: [{ name: http, containerPort: 80 }]
# Built static; nginx serves it
Or — simpler — let the Omnitron daemon serve the webapp from its HTTP port (9800) and point ingress there.
Observability
- Logs → stdout → cluster log aggregator (Loki / Datadog / ELK).
- Metrics → scrape
/metricsfrom each pod via Prometheus annotations:annotations:prometheus.io/scrape: 'true'prometheus.io/port: '3001'prometheus.io/path: '/metrics' - Traces → OTel collector as DaemonSet; apps export to it.
- Alerts →
OmnitronAlerts+ PrometheusRules.
Disaster recovery
Backup priorities:
- Postgres — managed snapshots + WAL archiving.
- Omnitron daemon state (
omnitron-dataPVC) — snapshot regularly; survival of the daemon is important for the uptime-bar history. - Secrets — backed up via the secret manager itself.
The Postgres data is the only catastrophic loss. Everything else can be reconstructed.
Cost-savings tips
- Spot / preemptible for worker pools; on-demand for the daemon.
- Memory requests > limits, CPU limits > requests: gives headroom under burst while bin-packing efficiently.
- One daemon per cluster, not per app.
titan-cacheL1 only for read-heavy apps to avoid Redis hops.
See also
- Deployment overview
- PaaS — for smaller setups
- Cluster + Fleet — multi-cluster patterns
- Observability