Skip to content
TopInsight .co
Three identical 1U rack servers stacked at three-quarter angle, each with a small green LED indicator, dim industrial backdrop.

Building a 3-node Proxmox HA cluster: the homelab high-availability path

A 3-node Proxmox cluster with HA is achievable for under $1,500 in 2025. Here is the working build — hardware picks, network setup, and the gotchas.

C Charles Lin ·

A 3-node Proxmox cluster with high availability — meaning a VM keeps running even if one node fails — used to require enterprise hardware and a serious budget. In 2025 it’s achievable for under $1,500 with mini PCs.

This is the working build from my own 3-node cluster running through 2024–2025, plus the gotchas that the docs don’t spell out.

Why bother with HA in a homelab

Real reasons:

  • Run services your family / household depends on (Home Assistant, Plex, etc.) — node failure shouldn’t kill the dashboard
  • Run a self-hosted alternative to a SaaS (Nextcloud, Vaultwarden) — uptime parity with the SaaS you’re replacing
  • Learn the patterns that matter in real production environments

Non-reasons:

  • “It’s cool” (this is fine but be honest)
  • Workloads that don’t need HA (Plex isn’t critical; let it have a 30-minute outage during a reboot)

If your honest assessment is that nothing you run needs HA, build a single Proxmox node and skip this guide.

The hardware

For a 3-node mini-PC cluster (the realistic 2025 homelab build):

  • 3× mini PCs — Beelink S12 Pro, Minisforum UM790, or similar. ~$400 each. CPU 4-6 cores, 32GB RAM, dual M.2 NVMe slots.
  • Network switch — managed gigabit at minimum (Mikrotik CRS305, Unifi USW-Lite-8). 2.5GbE preferred for cluster traffic.
  • Two physical networks recommended: one for client traffic, one for cluster heartbeat + replication. Can share one switch with VLANs.
  • Total: ~$1,200-1,500

Per-node storage layout:

  • 1× M.2 NVMe (256GB minimum) for Proxmox OS — install with ZFS root mirror if you have two slots
  • 1× M.2 NVMe (1TB+) for VM/LXC storage as part of a ZFS pool
  • Shared storage: either Ceph across the three nodes (HA-friendly) or a separate NAS exporting NFS

The cluster runs OK without shared storage but live migration / HA work better with it.

The cluster setup

The official Proxmox docs cover the mechanics. Here are the non-obvious bits:

Hostnames and DNS: every node must resolve every other node by hostname BEFORE you initialise the cluster. Add to /etc/hosts on all three:

192.168.10.1 pve1
192.168.10.2 pve2
192.168.10.3 pve3

Skipping this is the #1 source of “cluster won’t form” issues.

Cluster network: when you pvecm create on pve1, specify the dedicated cluster network if you have one. The cluster heartbeat (Corosync) is sensitive to latency, and putting it on the same network as your client traffic causes weird false-positives.

Two-node clusters are a trap: with only 2 nodes, when one goes down the surviving node loses quorum and refuses to do HA failover. Always have 3. If you can’t afford 3, use a QDevice on a Raspberry Pi for tiebreaker vote.

High availability config

Once the cluster is formed, HA is configured per VM:

  1. In the web UI: Datacenter → HA → Resources → Add
  2. Select the VM, set the group (defaults are fine), set restart count
  3. Make sure the VM’s storage is on a shared backend (Ceph or NFS) or replicated (ZFS replication every X minutes)

The HA manager will automatically restart the VM on another node if its current node fails.

The honest caveat: even with HA, failover takes 60-120 seconds typically — Corosync needs to detect the failure, fence the dead node, restart the VM elsewhere. For “no downtime ever” you need clustered services within the VM (HAProxy, Kubernetes, etc.). HA is “minutes of downtime” not “zero downtime.”

What can go wrong (and how to recover)

After a year, the failure modes I’ve actually hit:

Network partition (one node loses network briefly): Corosync may fence the node thinking it’s dead, even though it’s fine. Fix: longer Corosync token timeouts, dedicated cluster network.

Disk failure on a node: VMs on that node’s local storage are stuck. If you used replicated ZFS or shared storage, HA restarts them on another node. If you used local-only storage for a VM, that VM is down until you fix the disk.

Quorum loss after a node reboot: rare, but if Corosync state gets weird, pvecm expected 2 on the surviving nodes can temporarily restore quorum to recover.

Update timing: don’t update all three nodes at once. Update one, verify cluster is healthy, then the next.

The bigger pattern

A 3-node Proxmox cluster is the homelab equivalent of “small business private cloud.” Run 6-10 VMs / LXCs total, spread across the nodes for load distribution, configure HA on the 2-3 that matter, accept the rest as “lower priority.”

The cost — about $1,500 — is comparable to a single decent enterprise server but the operational properties are dramatically better. Live migration alone (moving a VM from one node to another with no downtime) is worth the setup work for the “update without losing services” workflow.

For the broader hypervisor context, see our Proxmox VE review. For the storage architecture, ZFS pool design guide. For the quiet build that pairs with this cluster, Building a quiet homelab server.

Sources

Every reference behind this piece. If we make a claim, it's because at least one of these said so — or we lived it ourselves.

  1. Firsthand Built and operated a 3-node Proxmox cluster for over a year
  2. Docs Proxmox VE cluster documentation — Proxmox
  3. Blog r/homelab — cluster setup discussions — r/homelab
  4. YouTube Lawrence Systems and Techno Tim Proxmox cluster tutorials — Lawrence Systems