Instances

Deploying applications on Unkey is in public beta. To try it, open the product switcher in the top-left of the dashboard and select Deploy. During beta, deployed resources are free. We’re eager for feedback, so let us know what you think on Discord, X, or email support@unkey.com.

An instance is a single running container of your app in a specific region. When Unkey deploys your app, it creates one or more instances per region based on your configuration. Each instance runs the same container image with the same variables and serves traffic independently. Instances are ephemeral, ephemeral storage does not persist across deployments or restarts. During deployments and scale-down, Unkey sends the configured shutdown signal (default SIGTERM) to give your app time to drain connections and finish in-flight requests before the instance stops.

How instances fit in

Instances sit at the bottom of the deployment hierarchy. A deployment produces instances across your configured regions:

Workspace
  └── Project
        └── App
              └── Environment (production, preview)
                    └── Deployment (specific version)
                          └── Instances (1+ per region)

A deployment with two regions and two instances per region creates four instances total. Each instance receives traffic through the Sentinel gateway running in the same region.

Configure instance count

Set the maximum number of instances per region in Settings > Runtime settings > Instances. The default maximum is one. Unkey starts with one instance and scales up automatically based on CPU load, up to the maximum you configure. Running multiple instances in a region provides redundancy and distributes load. If one instance fails, the rest keep serving requests without interruption.

During the beta, the maximum is 4 instances per region. Contact support@unkey.com if you need more.

Resource allocation

Each instance has configurable limits for CPU, memory, and storage. CPU and memory work differently for billing:

CPU is a maximum limit. Your instance can burst up to the configured amount, but Unkey only charges for the CPU time actually used.
Memory is a dedicated allocation. The configured amount is reserved for your instance, and you are charged for the full allocation.

Configure limits in Settings > Runtime settings:

CPU: 1/4 vCPU to 2 vCPU (default 1/4 vCPU, billed on usage)
Memory: 256 MiB to 4 GiB (default 256 MiB, billed on allocation)
Storage: None to 10 GiB ephemeral disk (default none, mounted at /data)

A /tmp directory backed by memory is always available for scratch files. For larger temporary storage needs, configure ephemeral storage to attach a dedicated disk volume per instance. All instances in a deployment share the same resource configuration. Changing these values takes effect on the next deployment.

These limits apply during the beta. Larger instance sizes are planned once the beta ends. Contact support@unkey.com if you need more resources now.

See App settings for details on each option.

Load balancing

The Sentinel gateway in each region distributes incoming requests uniformly at random across all running instances. Each instance has an equal probability of handling any given request. If no running instances exist in the local region, traffic reroutes to the nearest region that has healthy instances.

Health checks

Unkey sends periodic HTTP requests to a health check endpoint in your app to verify each instance can receive traffic. Instances that fail consecutive checks are removed from the load-balancing pool until they recover or the deployment is replaced. Without a health check configured, Unkey considers an instance healthy as long as its container process is running. See App settings for the full list of configuration options.

Design a health check endpoint

Your health check endpoint should confirm the instance can handle requests, not that every downstream dependency is reachable. A simple endpoint that returns 200 OK is enough:

GET /health → 200 { "status": "ok" }

Avoid checking databases or third-party APIs in your health endpoint. A temporary outage in a dependency would mark all instances unhealthy, taking your entire app offline instead of returning errors for the affected queries.

Autoscaling

Autoscaling is enabled by default. Unkey automatically adjusts the number of instances in each region between 1 and the maximum you configure in the dashboard, targeting 80% CPU utilization. When CPU usage exceeds 80%, Unkey adds instances. When it drops below that threshold, Unkey removes instances down to a minimum of one. Additional autoscaling configuration options (memory thresholds, RPS thresholds, custom minimums) are not yet available in the dashboard.

Observability

Each instance reports metrics independently. In the dashboard, you can view per-instance data under the Network tab:

Requests per second (RPS) per instance
CPU and memory utilization
Instance status and region

This visibility helps you identify whether a performance issue affects a single instance or the entire deployment. See Observability for more details.

How instances fit in

Configure instance count

Resource allocation

Load balancing

Health checks

Design a health check endpoint

Autoscaling

Observability

Next steps

App settings

Regions

​How instances fit in

​Configure instance count

​Resource allocation

​Load balancing

​Health checks

​Design a health check endpoint

​Autoscaling

​Observability

​Next steps

App settings

Regions

How instances fit in

Configure instance count

Resource allocation

Load balancing

Health checks

Design a health check endpoint

Autoscaling

Observability

Next steps