How to configure PostgreSQL HA with Patroni?

Nowadays PostgreSQL is widely being used in modern applications for storing application data in most efficient and secure way. Being an open source software PostgreSQL gives flexibility to modify and extend the feature-set unlike traditional and enterprise database systems like ORACLE/DB2.

Being open source distribution PostgreSQL needs extra libraries and extensions to setup production grade system for High Availability/Disaster Recovery/Backup & Restore. In this article we will see how we can setup PostgreSQL High Availability using open source third party library called Patroni.


What is High Availability and why it is required?
Let’s say you have deployed application with PostgreSQL and suddenly database server crashes or someone accidentally deletes database files, it will be complete downtime for your application till you recover database completely.

To avoid such scenarios in production we need a setup with two or more in sync copies of the database and a mechanism to automatically switch from failed database to running database. This setup is referred as HA setup. The ultimate aim is to avoid single point of failure (SPOF) for the whole system.

Why to use Patroni?
There are different solutions available for setting up PostgreSQL with HA. You can find full list of replication solutions.

  • PostgreSQL Automatic Failover(PAF)
  • Replication Manager (Repmgr)
  • Patroni

You can compare these solutions based on different factors like Failure detection, Failure recovery, Automatic Failover, Alert Management, Ease of Setup and Maintenance. Considering all these points Patroni is best candidate to choose for HA and DR setup. Also if you know little bit of Python you can easily read the code and change it according to your needs. Patroni also provides REST APIs to automate things on top of the existing functionalities.

Patroni Overview
Patroni is open source HA template for PostgreSQL written in Python which can be deployed easily on Kubernetes or VMs. It can be integrated with ETCD, Consul or Zookeeper as consensus store.

It is developed and maintained by Zalando, you can find source code on github.

Solution Architecture:
Below figure shows architecture of complete Solution with all components. You can tweak this according to your needs and limitations on hardware. Here we have used 8 VMs to avoid SPOF and achieve High Availability on Postgres.

High Availability Solution for PostgreSQL with Patroni
PostgreSQL HA with Patroni, ETCD and HAProxy

Components/Software’s used:
Patroni solution needs DCS system like ETCD/Consul/Zookeeper to store Configuration, current status of the node.

  • Distributed Consensus Store(DCS) like ETCD/Consul/Zookeeper
  • Load Balancer like HAProxy
  • PostgreSQL 9.5 and above

Hardware used in the Solution:

Component

Number of VMs

Configuration

Comment

ETCD Cluster(DCS)

3

2 Cores/4GB

For DCS you need to choose odd number of machines

HAProxy

2

2 Cores/8GB

Load Balancer

Postgres + Patroni

3

4 Cores/16GB

PG nodes

  • ETCD Cluster (DCS):
    ETCD is DCS and it needs odd number of servers to get consensus. We are using three nodes instead of one to avoid single point of failure.

    Nodes can be of minimum configuration as it will only be used for storing data for Patroni.
  • HAProxy (Load Balancer):
    You can start with minimum configuration similar to ETCD machines and as needed can be increased later on.
  • Postgres + Patroni:
    Patroni library is installed on machines running actual PostgreSQL database. Hardware configuration for these nodes is dependent on the database size. For setting up you can start with 4 cores and 16GB RAM.

VM Configuration above is given as a reference you can tweak this according to your needs.

Let's dive in by installing Production grade system. Below is the /etc/hosts file used in the setup.

#etcd hosts
192.168.56.201 etcd1.bootvar.com etcd1
192.168.56.202 etcd2.bootvar.com etcd2
192.168.56.203 etcd3.bootvar.com etcd3

#postgres hosts
192.168.56.204 pgdb1.bootvar.com pgdb1
192.168.56.205 pgdb2.bootvar.com pgdb2
192.168.56.206 pgdb3.bootvar.com pgdb3

#haproxy hosts
192.168.56.207 haproxy1.bootvar.com haproxy1
192.168.56.208 haproxy2.bootvar.com haproxy2
/etc/hosts

ETCD Installation and Configuration (etcd1, etcd2, etcd3)

wget https://github.com/etcd-io/etcd/releases/download/v3.5.0/etcd-v3.5.0-linux-amd64.tar.gz
  • Once downloaded unzip and copy binaries to your /usr/bin
gtar –xvf etcd-v3.5.0-linux-amd64.tar.gz
cd etcd-v3.5.0-linux-amd64/
cp etcd etcdctl etcdutl /usr/bin
  • Make sure /usr/bin is added in PATH variable. You can run below commands to check if etcd is installed correctly.
[root@etcd1 ~]# etcd --version
etcd Version: 3.5.0
Git SHA: 946a5a6f2
Go Version: go1.16.3
Go OS/Arch: linux/amd64
  • Repeat same steps on all three ETCD hosts

Configure and run 3 node ETCD Cluster:

  • Create etcd user and group for etcd binaries to run
groupadd --system etcd
useradd -s /bin/bash --system -g etcd etcd
  • Create two directories(data and configuration)
sudo mkdir -p /var/lib/etcd/
sudo mkdir /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd/ /etc/etcd
  • Login using etcd user and create .bash_profile file with below content
export ETCD_NAME=`hostname -s`
export ETCD_HOST_IP=`hostname -i`
  • Create Service etcd in /etc/systemd/system/etcd.service, replace IP addresses with your corresponding machine IPs
[Unit]
Description=etcd
Documentation=https://github.com/etcd-io/etcd

[Service]
Type=notify
User=etcd
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
ExecStart=/usr/bin/etcd \\
  --name ${ETCD_NAME} \\
  --data-dir=/var/lib/etcd \\
  --initial-advertise-peer-urls http://${ETCD_HOST_IP}:2380 \\
  --listen-peer-urls http://${ETCD_HOST_IP}:2380 \\
  --listen-client-urls http://${ETCD_HOST_IP}:2379,http://127.0.0.1:2379 \\
  --advertise-client-urls http://${ETCD_HOST_IP}:2379 \\
  --initial-cluster-token etcd-cluster-0 \\
  --initial-cluster etcd1=http://192.168.56.201:2380,etcd2=http://192.168.56.202:2380,etcd3=http://192.168.56.203:2380 \\
  --initial-cluster-state new \

[Install]
WantedBy=multi-user.target
/etc/systemd/system/etcd.service
  • Once Service created enable the service and start it on all three servers
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd
  • You can check cluster working by issuing following commands:
[root@etcd1 ~]# etcdctl member list --write-out=table
+------------------+---------+-------+----------------------------+----------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS         |        CLIENT ADDRS        | IS LEARNER |
+------------------+---------+-------+----------------------------+----------------------------+------------+
| a79bc423e9248ab8 | started | etcd2 | http://192.168.56.202:2380 | http://192.168.56.202:2379 |      false |
| b6820e5fa6807659 | started | etcd3 | http://192.168.56.203:2380 | http://192.168.56.203:2379 |      false |
| b82998c4f9249433 | started | etcd1 | http://192.168.56.201:2380 | http://192.168.56.201:2379 |      false |
+------------------+---------+-------+----------------------------+----------------------------+------------+
  • To check leader you can check endpoint status:
etcdctl endpoint status --write-out=table --endpoints=etcd1:2379,etcd2:2379,etcd3:2379
Note: By default etcd does not support v2 API, in case patroni fails to start with the api error, add --enable-v2 flag in etcd service

Patroni and Postgres Installation(pgdb1, pgdb2, pgdb3):

  • Configure postgres repo (now it includes patroni as well)
sudo yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
  • Install Postgres using below commands:
dnf -y install postgresql12 postgresql12-server
  • Install Patroni service
dnf install patroni
  • You need to install extra package required for connecting to etcd
pip3 install python-etcd
  • Enable Patroni service
systemctl enable patroni
  • Create configuration file and required directories for patroni:
mkdir -p /etc/patroni/logs/ #directory to store logs
chmod 777 /etc/patroni/logs 
touch /etc/patroni/patroni.yml
  • Create config file for patroni as below (/etc/patroni/patroni.yml)
scope: bootvar_cluster
name: pgdb1

log:
  traceback_level: INFO
  level: INFO
  dir: /etc/patroni/logs/
  file_num: 5

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.56.201:8008

etcd:
  protocol: http
  hosts: 192.168.56.201:2379,192.168.56.203:2379,192.168.56.203:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout : 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_keep_segments: 100
        #add other postgres DB parameters to start with

  initdb:
  - encoding: UTF8
  - data-checksums

  pg_hba:
  - host replication replicator 0.0.0.0/0 md5
  - host all all 0.0.0.0/0 md5

postgresql:
  listen: 192.168.56.204:5432
  connect_address: 192.168.56.204:5432
  data_dir: /var/lib/pgsql/bootvar/pgdb1/data
  bin_dir: /usr/pgsql-12/bin
  authentication:
    replication:
      username: replicator
      password: replicator
    superuser:
      username: postgres
      password: postgres
/etc/patroni/patroni.yml
  • Start Patroni
service patroni start
  • Repeat same procedure on all three nodes, for any issues you can set log.level and log.traceback_level to DEBUG
  • Once all nodes are up and running you can check status of patroni cluster using patronictl utility.
patronictl -c /etc/patroni/patroni.yml list
+ Cluster: bootvar_cluster (6974438395379920074) --+-----------+
| Member | Host           | Role    | State   | TL | Lag in MB |
+--------+----------------+---------+---------+----+-----------+
| pgdb1  | 192.168.56.204 | Leader  | running | 10 |           |
| pgdb2  | 192.168.56.205 | Replica | running | 10 |         0 |
| pgdb3  | 192.168.56.206 | Replica | running | 10 |         0 |
+--------+----------------+---------+---------+----+-----------+

If you are not familiar with patronictl, check our guide on patroni commands

Now patroni cluster is ready to use, you can start playing around and do some replication and failover tests.

After this we need to setup load balancer to point it to active (Leader) Postgres database. For this you need two HAProxy servers or if you are setting this on cloud you can use load balancers provided by cloud provider.

Install load balancer (HAProxy - haproxy1, haproxy2):

  • Install haproxy on both servers:
dnf install haproxy
  • Configure haproxy.cfg file to redirect all traffic to active postgres leader.
global
    maxconn 100

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen boorvar_cluster
    bind *:5432
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pgdb1_5432 192.168.56.204:5432 maxconn 100 check port 8008
    server pgdb2_5432 192.168.56.205:5432 maxconn 100 check port 8008
    server pgdb3_5432 192.168.56.206:5432 maxconn 100 check port 8008
/etc/haproxy/haproxy.cfg

Note: Haproxy will check 8008 port of pgdb servers and if it returns 200 status then it will redirect all traffic to the leader. This 8008 port is configured in Patroni.

  • Start haproxy on both nodes
service haproxy start
  • Once haproxy is started you can check status by hitting url http://haproxy1:7000

You can see all connections on haproxy:5432 will be redirected to pgdb1:5432, you can check if pgdb1 is the leader or not.

Patroni Leader replica Status
Patroni Leader Status

Now try connecting to the cluster using haproxy host, it should get redirected to leader.

Patroni - HAproxy redirection
Patroni - HAproxy redirection

Now you can run over some failover tests and handover it to application team.

Application side Configuration:

As we have two HAProxy servers application should be configured in such a way that it should point to both servers, submit the request to available server and if application does not support such case then you need to set up virtual IP which will point to available HAProxy server.

You can handover below details to application team:

Host: haproxy1 (haproxy1.bootvar.com), haproxy2 (haproxy2.bootvar.com)
Port: 5432
Credentials: App user credentials
Connection details for Application

Postgres read-only applications such as reporting, analysis, dashboards can use standby postgres nodes. To configure such type of application you need to create HAProxy listener on different port.

Things to consider when using Patroni:

  • You need to administrate PostgreSQL through Patroni like changing DB parameters, Starting or Stopping of the database.
  • Sometimes you need to write your own automation for having out of the box functionalities.
  • Patroni is open source library and does not come with enterprise support, you need to depend on open source community for any unforeseen issues or bugs. Although there are managed services available for Patroni.
Suhas Adhav
In love with web3, DApps and Blockchain Technology | DevOps Expert | Kubernetes | Docker | Jenkins | Cloud | Hadoop
The Internet