PostgreSQL High Availability: Failover, Replication, and Backups Compared

Data-Specs

June 17, 2026

7 min read

PostgreSQL High Availability: Failover, Replication, and Backups Compared

A practical comparison of PostgreSQL high-availability building blocks, from streaming replicas and failover to PITR backups and logical replication.

ArmorDB EngineeringArmorDB engineering

PostgreSQLHigh AvailabilityReplication

High availability is often discussed as if it were one feature. In PostgreSQL, it is really a set of separate controls: replication to keep another server close to current, failover to promote that server, backups to recover from human error or storage loss, and restore testing to prove the plan works.

The practical question is not whether a database is highly available in the abstract. The question is which failure you are trying to survive, how much data you can afford to lose, how quickly the application must recover, and who is responsible for operating the moving parts at 03:00. A founder running a small SaaS does not need the same design as a payments platform, but both need to understand the difference between a replica and a backup.

Start with the failure mode

PostgreSQL gives you several reliable primitives, but each one protects against a different kind of incident. A hot standby can take read traffic and can be promoted after the primary fails. Continuous archiving and point-in-time recovery help restore the database to a chosen moment after corruption, a bad migration, or an accidental delete. Logical replication can copy selected tables or support low-downtime moves, but it is not a full physical copy of the cluster.

That distinction matters because the most expensive outages are often not simple server crashes. A replica faithfully follows many bad writes. If a migration drops the wrong table and the change is replayed, the standby now has the same problem. Backups and archived WAL are the safety net for those cases, while failover is the response to primary unavailability.

Comparison: which PostgreSQL HA tool solves which problem?

Building block	Best for	What it does not solve	Operational note
Streaming physical replica	Fast recovery from primary host failure and read scaling	Accidental writes, schema mistakes, or corruption already replicated	Monitor replication lag and promotion readiness, not just whether the process is running
Automated failover	Reducing manual recovery time when the primary disappears	Application retry logic, split-brain prevention by itself, or bad data recovery	Requires clear leadership, fencing, and connection endpoint changes
Continuous archiving plus PITR	Restoring to a clean point before a mistake or outage	Instant failover with no restore time	Archive every required WAL segment and regularly test restores
Logical replication	Selective table movement, online migrations, cross-version migration patterns	Full cluster HA, DDL replication completeness, sequence state by default	Track replication slots, apply lag, and unsupported object changes
Dumps and scheduled backups	Portable recovery, small databases, and human-readable exports	Tight RPO on busy systems or rapid failover	Good complement to PITR, but slow for large production databases

The table is intentionally not a ranking. A healthy production posture usually combines at least two rows. A managed PostgreSQL service may hide much of the machinery, but the application owner still needs to choose the right recovery objective and test the path back to service.

Replication improves availability, but it is not a backup

PostgreSQL streaming replication ships WAL records from a primary to one or more standby servers. With hot standby enabled, replicas can serve read-only queries while they continue recovery. This is the usual foundation for read replicas and failover targets because the standby is already a PostgreSQL cluster in recovery, not a dump waiting to be restored.

The tradeoff is that replication optimizes for keeping another copy current. That is exactly what you want when a VM, disk, or availability zone fails. It is not what you want when the primary accepts a damaging transaction. If the transaction is durable on the primary and replayed by the standby, failover gives you a highly available copy of the same bad state.

Replication lag is the number to watch. Low lag means a promoted standby should lose little or no acknowledged data in asynchronous setups. High lag means the standby may be minutes behind, which can turn a failover into a visible data-loss event. Synchronous replication can reduce that risk, but it makes commit latency and primary availability depend on the synchronous standby configuration. For many SaaS workloads, asynchronous replication plus honest RPO expectations is simpler and more resilient.

Failover is a process, not just a promote command

Promoting a PostgreSQL standby is technically straightforward. Operating a safe failover is harder. The platform must decide that the primary is truly unavailable, prevent two writable primaries from accepting traffic, promote the correct standby, update connection routing, and allow applications to reconnect cleanly.

This is where managed PostgreSQL can remove a large burden. The hard parts are less about SQL syntax and more about orchestration, monitoring, and tested runbooks. If you self-host, the failover design should name the decision maker, describe how the old primary is fenced, define what happens to replicas that followed the old timeline, and document how applications find the new writer.

Application behavior is part of the design. Short transactions, idempotent job processing, reasonable client timeouts, and retry logic make failover survivable. Long-running transactions, sticky connections, and pools that never refresh DNS can make a clean database promotion still look like an application outage. If you use PgBouncer, confirm how your app reconnects through the pooler during writer changes; ArmorDB documents its connection approach in the PgBouncer guide at /docs/pgbouncer.

PITR is the recovery path for mistakes

Point-in-time recovery combines a base backup with archived WAL so PostgreSQL can replay changes until a chosen recovery target. It is the feature you reach for after a destructive deployment, an accidental delete, or a data import that overwrote good rows. The goal is not merely having a backup file. The goal is being able to restore a consistent cluster to a specific time and then verify that the restored data is the one you want.

A useful backup policy has two numbers: RPO and RTO. RPO is how much data you can lose; RTO is how long recovery can take. Daily logical dumps may be enough for a prototype, but a production application with active writes usually needs continuous WAL archiving or a managed equivalent. The RTO is shaped by database size, restore bandwidth, replay volume, and the time needed to validate the restored system before redirecting traffic.

The most common backup failure is discovering during an incident that the restore was never tested. A backup dashboard is not proof. A periodic restore into a separate environment is proof. ArmorDB users should also review the backup behavior documented at /docs/backups and match the plan to the product risk, not only to database size.

Logical replication fits migrations more than classic HA

Logical replication publishes changes from selected tables and applies them to subscribers. It is extremely useful for staged migrations, selective data movement, some reporting patterns, and version transitions where physical replication is not the right fit. Because it operates at a logical level, it can be more flexible than byte-for-byte physical replication.

That flexibility has edges. Logical replication does not automatically make every database object behave like a full cluster copy. Schema changes, sequences, large objects, permissions, and extension behavior need deliberate handling. Replication slots can also retain WAL if subscribers fall behind, so monitoring apply lag and slot retention is part of the operating model.

For a SaaS migration, logical replication often works best as a bridge: take an initial copy, replicate ongoing changes, validate counts and critical queries, pause writes briefly if needed, and cut the application over. For HA, it is usually a complement rather than the primary answer.

A practical decision guide

For an early product, start with managed backups, a clear restore procedure, and connection behavior that can tolerate restarts. That gives you a real recovery path without prematurely operating a distributed database system. As the product becomes production-critical, add a standby or managed failover option so an infrastructure failure does not require restoring from scratch.

For read-heavy workloads, read replicas can reduce pressure on the primary, but they should not become a hidden consistency problem. Route only queries that can tolerate lag. User-facing flows that read immediately after writing usually belong on the primary unless the application explicitly handles stale reads.

For regulated or high-value data, make recovery drills part of normal operations. Record the time to restore, the exact recovery target used, and the validation queries that proved the restore was usable. A quarterly restore drill is more valuable than a long policy document that nobody has executed.

Takeaway

PostgreSQL high availability is strongest when each tool has a clear job. Replication and failover keep the service running through infrastructure failures. PITR and backups recover from bad data states. Logical replication helps with selective movement and low-downtime changes. The right design is the smallest combination that meets your RPO, RTO, operational skill, and budget.

If you are choosing between running this yourself and using a managed service, compare the full operating responsibility, not only the monthly database price. ArmorDB pricing is intentionally simple at /pricing, but the more important decision is whether your team wants to own backups, restore drills, pooler behavior, monitoring, and failover orchestration directly.

Sources and further reading

PostgreSQL documentation: High Availability, Load Balancing, and Replication, https://www.postgresql.org/docs/current/high-availability.html
PostgreSQL documentation: Log-Shipping Standby Servers, https://www.postgresql.org/docs/current/warm-standby.html
PostgreSQL documentation: Continuous Archiving and Point-in-Time Recovery, https://www.postgresql.org/docs/current/continuous-archiving.html
PostgreSQL documentation: Logical Replication, https://www.postgresql.org/docs/current/logical-replication.html

Topic

Data-Specs

Updated

Jun 17, 2026

Read time

7 min read

About the author

ArmorDB Engineering writes about PostgreSQL operations, security, and infrastructure decisions for teams building production apps on ArmorDB.

Scale your databaseTry ArmorDB Free

Compare managed plans