Scale behind a load balancer

Last updated: June 21, 2026

You followed Deploy a Next.js app with your AI assistant (or the Docker Compose recipe) and your app is live on one VM. Now it's getting more traffic than one box can handle, or you simply want redundancy so a single reboot doesn't take the site down.

The fix is to run two identical VMs behind a load balancer: a load balancer rule on your network's public IP spreads incoming requests across both backends, and either one can go down without the site going with it. This recipe hands that whole job to your AI assistant. With the American Cloud MCP server connected, it can inspect the VM you already have, clone it, replicate the app over SSH, wire up the load balancer, and verify both backends are healthy.

Why Claude Code for this

Any MCP client can create the infrastructure. But scaling out also means replicating your app setup on the new server over SSH. Claude Code combines the American Cloud tools with your terminal, so one session can provision the second VM and configure it without you switching tools. That's the setup this recipe assumes. Cursor and the other clients work too — you'll just run the SSH steps yourself when the assistant tells you to.

Everything here is a write operation. You'll need a read-write API key from console.americancloud.com/api-keys and the --allow-writes flag on the MCP server. See the overview for setup and safety details.

Before you start

  • One working VM running your app, created from the deploy recipe (you can SSH in, and the app serves on a local port behind nginx).
  • The SSH key from that first deploy — your assistant needs it to log into both the existing VM and the new one.
  • The MCP server connected to your assistant with a read-write key and --allow-writes.

The one prompt that does it

Open your project in Claude Code and paste this. Read the cost estimate it shows you before approving anything.

text
My app is live on one VM on American Cloud and I want to scale it out behind a
load balancer for capacity and redundancy.
 
Plan it out first, then walk through it step by step:
 
1. Find my existing VM, its region, package, size, image, and the network it's
   attached to. Identify the public IP on that network.
2. Show me a monthly cost estimate for a second identical VM before creating
   anything. Wait for me to confirm.
3. Create a second VM with the same package, size, and image, attached to the
   SAME network as the first, with my SSH key installed. Wait until it's
   fully running.
4. Over SSH, replicate the app setup from the first VM onto the second: same
   runtime, same build, same service. Confirm the app answers on its local
   port on the new VM.
5. List the port forwarding rules on the public IP. The first deploy created
   forwards for the app's public ports to VM 1 — remove those (and only
   those; leave the SSH forward alone), because a load balancer rule and a
   port forward can't share the same public port. Tell me before you remove
   anything.
6. Create a load balancer rule on the network's public IP that balances the
   app's public port across backends, then assign BOTH VMs to it. Do this
   immediately after step 5 so the port is only briefly unserved.
7. Verify both VMs are listed as backends on the rule, then confirm the site
   still serves correctly through the public IP.
 
My DNS already points at that public IP, so no DNS change should be needed —
tell me if that assumption is wrong.

What your assistant will do

Grounded in real MCP tools, here's the sequence:

  1. Inspect what you have. It calls list_vms to find your VM, then get_vm for the full detail — region, package, the size (vCPU and memory), the image label, and the network it's attached to. It calls list_public_ips_by_isolated_network (or list_public_ips) to find the public IP serving that network, the one your domain already resolves to.
  2. Price the second VM first. It calls get_cost_estimate_vm with the exact same region, package, size, and image as the original, and shows you the hourly and monthly numbers before creating anything. Two identical VMs cost roughly twice one — no surprises.
  3. Clone the box. On your go-ahead, create_vm provisions the second VM with the same vmPackage, vmSpecs, and image, and — critically — the same network UUID as the first, so both VMs share one private network and one public IP. It installs your SSH key via keypairs. It does not add inbound app-port rules here: the load balancer rule (next) is what exposes the app on the public IP. The VM provisions asynchronously, so the assistant polls get_vm until its status reaches STARTED.
  4. Replicate the app over SSH. Now in the terminal, it mirrors the first VM's setup onto the second — same runtime, same build, same systemd service listening on the same local port. If you wrote a deploy script in the first recipe, this is just running it against the new host.
  5. Clear the old path to VM 1. Your first deploy exposed the app with port forwarding: the public IP's app ports (80/443) currently forward straight to VM 1. A load balancer rule can't share a public port with a port forward, so the assistant calls list_port_forwarding_rules on the IP, shows you the forwards for the app ports, and — with your okay — removes them with delete_port_forwarding_rule. It leaves the SSH forward in place. (The firewall rules for 80/443 stay too — they're still what admits the traffic.)
  6. Create the load balancer rule — right away. It calls create_load_balancer_rule on the public IP (ipId) with a publicPort and privatePort matching your app (for example "443""443", or "80" → the local port nginx listens on), an algorithm, and protocol tcp. The algorithm choices are roundrobin (even distribution, the usual default), leastconn (send each request to the backend with the fewest active connections — good for long-lived connections), or source (pin each client IP to the same backend — a simple way to keep a user on one box). Steps 5 and 6 happen back-to-back, so the app port is unserved for seconds, not minutes.
  7. Assign both backends. It calls assign_vms_to_load_balancer with the rule ID and both VM UUIDs, so the rule fans traffic out to both.
  8. Verify. It calls list_load_balancer_instances to confirm both VMs are attached to the rule, and checks each is STARTED with get_vm. Then it confirms the site still serves through the public IP.

When it's done, requests hitting your public IP are spread across two VMs, and either one can be rebooted or fail without taking the site offline.

A public port is served by either a port-forwarding rule or a load balancer rule — never both. Your single-VM deploy used forwards for the app ports, which is why step 5 removes them before the load balancer rule takes over. The same applies later: never add a forward back on a load-balanced port.

Shared state: the part that breaks if you skip it

Two VMs behind a load balancer means a request can land on either box. Anything one VM remembers that the other doesn't will produce confusing, intermittent bugs — a file that uploaded fine but 404s on the next request, a user who's logged in on one page and logged out on the next. Before you send real traffic to two backends, move shared state off the individual boxes. Your assistant can do each of these in one prompt.

Uploads go to object storage

With one VM, user uploads could live on its local disk. With two, an upload that lands on VM A isn't on VM B — so half your requests can't find it. Move file uploads to object storage, which both VMs read and write over the network:

text
My app stores user uploads on the VM's local disk, but now there are two VMs
behind the load balancer. Set up an American Cloud object storage bucket for
uploads, give me the credentials, and update the app on BOTH VMs to read and
write uploads to the bucket instead of local disk. Migrate any existing files
on the first VM into the bucket.

The database moves to its own VM on the private network

A database that lives on one of the app VMs is only reachable by that VM, and it competes with the app for resources. Give it its own VM on the same private network, reachable by both app VMs over the internal network and exposed to nothing public:

text
Right now PostgreSQL runs on my first app VM. Create a separate small VM on the
SAME private network for the database, install PostgreSQL on it, and migrate my
data over. Configure it to listen only on the private network so both app VMs
can reach it, but it isn't exposed to the internet. Update both app VMs' config
to point at the database VM's private address, and restart the app on each.

This keeps the database private — it's reachable across the internal network by the app VMs, and there's no public rule pointing at it.

Sessions become stateless or shared

If your app keeps login sessions in the memory of a single process, a user bounced to the other VM appears logged out. Make sessions survive the hop:

text
My app stores login sessions in memory on a single VM. Behind the load balancer
that logs users out when they hit the other box. Switch sessions to either
signed cookies (stateless) or a shared session store on the database VM, so a
user stays logged in no matter which backend serves the request. Apply the
change to both VMs.

The source algorithm (sticky sessions by client IP) can paper over in-memory session state, but it isn't a substitute for the fixes above — a client's IP can change, and pinning traffic undermines even balancing. Use it as a convenience, not as your state strategy.

Rolling deploys with two backends

Two VMs also unlock rolling deploys: take one backend out of rotation, update it, put it back, then repeat on the other. The load balancer keeps serving from whichever VM is still in. (Requests already in flight to a VM at the moment it's removed can be cut — for typical short web requests that passes unnoticed, but it's worth knowing.)

text
Do a rolling deploy of the latest committed code:
 
1. Remove the first VM from the load balancer rule so it stops getting traffic.
   Confirm the second VM is still serving.
2. Deploy the latest code to the first VM and restart its service. Check it
   responds correctly on its local port.
3. Re-assign the first VM to the rule.
4. Repeat the same steps for the second VM.
 
At no point should both VMs be out of rotation at the same time.

Under the hood the assistant uses remove_vms_from_load_balancer to take a VM out of rotation, deploys to it, then assign_vms_to_load_balancer to return it — one VM at a time, so there's always a live backend.

Triage: is traffic balanced and are both backends healthy?

When something looks off, hand the assistant this:

text
Is traffic actually balanced and are both backends healthy? Check that:
- The load balancer rule on my public IP is ACTIVE.
- Both VMs are assigned to the rule and both are STARTED.
- Each VM's app service is up and answering on its local port over SSH.
- The rule's algorithm, public port, and private port match what the app
  listens on.
Tell me which backend, if any, is the problem.

It works through list_load_balancer_rules and list_load_balancer_instances for the rule and its backends, get_vm for each VM's power state, and an SSH check of each app service. A rule that won't go ACTIVE usually means no healthy VM is attached, or the ports don't match what the app is listening on.

Reaching individual VMs directly

The load-balanced port flows through the rule on the public IP, so you don't open it per-VM. But you'll still want to SSH into each box. The first VM already has port 22 reachable from the deploy recipe. For the second, your assistant can open SSH on it by adding a port-forwarding rule on the shared public IP with a distinct public port for that VM (for example, forward public port 2202 → private port 22 on the second VM), since both VMs sit behind one IP. Ask it to "give me SSH access to the second VM on the shared public IP using a separate public port, locked to my IP address." It uses create_port_forwarding_rule for the forward and a matching firewall rule, restricted to your source CIDR rather than open to the world.

When to step up to Kubernetes instead

Two VMs behind a load balancer is the right shape for a steady workload you scale by hand: predictable traffic, a deploy you trigger yourself, a handful of backends you can reason about individually. It's simple, and you own every box.

If your workload is spikier — bursty traffic you want to absorb automatically, many small services, rolling deploys and self-healing as a default rather than a script you run — that's the shape Kubernetes is built for. American Cloud runs managed Kubernetes, and your assistant can provision and operate a cluster with the same MCP tools. Move to it when you find yourself wanting the cluster to add and replace backends on its own.

Next steps