Skip to main content
Unkey Deploy is currently in private beta. To get access, reach out on Discord or email support@unkey.com.
An instance is a single running container of your app in a specific region. When Unkey deploys your app, it creates one or more instances per region based on your configuration. Each instance runs the same container image with the same variables and serves traffic independently. Instances are stateless and ephemeral: local disk does not persist across deployments or restarts. During deployments and scale-down, Unkey sends the configured shutdown signal (default SIGTERM) to give your app time to drain connections and finish in-flight requests before the instance stops.

How instances fit in

Instances sit at the bottom of the Unkey Deploy hierarchy. A deployment produces instances across your configured regions:
Workspace
  └── Project
        └── App
              └── Environment (production, preview)
                    └── Deployment (specific version)
                          └── Instances (1+ per region)
A deployment with two regions and two instances per region creates four instances total. Each instance receives traffic through the Sentinel gateway running in the same region.

Configure instance count

Set the maximum number of instances per region in Settings > Runtime settings > Instances. The default maximum is one. Unkey starts with one instance and scales up automatically based on CPU load, up to the maximum you configure. Running multiple instances in a region provides redundancy and distributes load. If one instance fails, the rest keep serving requests without interruption.
During the beta, the maximum is 4 instances per region. Contact support@unkey.com if you need more.

Resource allocation

Each instance has configurable limits for CPU and memory. The two resources work differently for billing:
  • CPU is a maximum limit. Your instance can burst up to the configured amount, but Unkey only charges for the CPU time actually used.
  • Memory is a dedicated allocation. The configured amount is reserved for your instance, and you are charged for the full allocation.
Configure limits in Settings > Runtime settings:
  • CPU: 1/4 vCPU to 2 vCPU (default 1/4 vCPU, billed on usage)
  • Memory: 256 MiB to 4 GiB (default 256 MiB, billed on allocation)
All instances in a deployment share the same resource configuration. Changing these values takes effect on the next deployment.
These limits apply during the beta. Larger instance sizes are planned once the beta ends. Contact support@unkey.com if you need more resources now.
See App settings for details on each option.

Load balancing

The Sentinel gateway in each region distributes incoming requests uniformly at random across all running instances. Each instance has an equal probability of handling any given request. If no running instances exist in the local region, traffic reroutes to the nearest region that has healthy instances.

Health checks

Unkey sends periodic HTTP requests to a health check endpoint in your app to verify each instance can receive traffic. Instances that fail consecutive checks are removed from the load-balancing pool until they recover or the deployment is replaced. Without a health check configured, Unkey considers an instance healthy as long as its container process is running. See App settings for the full list of configuration options.

Design a health check endpoint

Your health check endpoint should confirm the instance can handle requests, not that every downstream dependency is reachable. A simple endpoint that returns 200 OK is enough:
GET /health → 200 { "status": "ok" }
Avoid checking databases or third-party APIs in your health endpoint. A temporary outage in a dependency would mark all instances unhealthy, taking your entire app offline instead of returning errors for the affected queries.

Autoscaling

Autoscaling is enabled by default. Unkey automatically adjusts the number of instances in each region between 1 and the maximum you configure in the dashboard, targeting 80% CPU utilization. When CPU usage exceeds 80%, Unkey adds instances. When it drops below that threshold, Unkey removes instances down to a minimum of one. Additional autoscaling configuration options (memory thresholds, RPS thresholds, custom minimums) are not yet available in the dashboard.

Observability

Each instance reports metrics independently. In the dashboard, you can view per-instance data under the Network tab:
  • Requests per second (RPS) per instance
  • CPU and memory utilization
  • Instance status and region
This visibility helps you identify whether a performance issue affects a single instance or the entire deployment. See Observability for more details.

Next steps

App settings

Configure CPU, memory, and other runtime settings

Regions

Choose where your instances run
Last modified on April 2, 2026