Unkey Deploy is currently in private beta. To get access, reach out on
Discord or email
support@unkey.com.
SIGTERM) to give your app time to drain connections and finish in-flight requests before the instance stops.
How instances fit in
Instances sit at the bottom of the Unkey Deploy hierarchy. A deployment produces instances across your configured regions:Configure instance count
Set the maximum number of instances per region in Settings > Runtime settings > Instances. The default maximum is one. Unkey starts with one instance and scales up automatically based on CPU load, up to the maximum you configure. Running multiple instances in a region provides redundancy and distributes load. If one instance fails, the rest keep serving requests without interruption.During the beta, the maximum is 4 instances per region. Contact support@unkey.com if you need more.
Resource allocation
Each instance has configurable limits for CPU and memory. The two resources work differently for billing:- CPU is a maximum limit. Your instance can burst up to the configured amount, but Unkey only charges for the CPU time actually used.
- Memory is a dedicated allocation. The configured amount is reserved for your instance, and you are charged for the full allocation.
- CPU: 1/4 vCPU to 2 vCPU (default 1/4 vCPU, billed on usage)
- Memory: 256 MiB to 4 GiB (default 256 MiB, billed on allocation)
These limits apply during the beta. Larger instance sizes are planned once the beta ends. Contact support@unkey.com if you need more resources now.
Load balancing
The Sentinel gateway in each region distributes incoming requests uniformly at random across all running instances. Each instance has an equal probability of handling any given request. If no running instances exist in the local region, traffic reroutes to the nearest region that has healthy instances.Health checks
Unkey sends periodic HTTP requests to a health check endpoint in your app to verify each instance can receive traffic. Instances that fail consecutive checks are removed from the load-balancing pool until they recover or the deployment is replaced. Without a health check configured, Unkey considers an instance healthy as long as its container process is running. See App settings for the full list of configuration options.Design a health check endpoint
Your health check endpoint should confirm the instance can handle requests, not that every downstream dependency is reachable. A simple endpoint that returns200 OK is enough:
Autoscaling
Autoscaling is enabled by default. Unkey automatically adjusts the number of instances in each region between 1 and the maximum you configure in the dashboard, targeting 80% CPU utilization. When CPU usage exceeds 80%, Unkey adds instances. When it drops below that threshold, Unkey removes instances down to a minimum of one. Additional autoscaling configuration options (memory thresholds, RPS thresholds, custom minimums) are not yet available in the dashboard.Observability
Each instance reports metrics independently. In the dashboard, you can view per-instance data under the Network tab:- Requests per second (RPS) per instance
- CPU and memory utilization
- Instance status and region
Next steps
App settings
Configure CPU, memory, and other runtime settings
Regions
Choose where your instances run

