postgresql_patroni

Overview
Setup ETCD
Setup Patroni
Test failover
Management
Appendix
Backup

Overview

Patroni and ETCD provide a great combination for a cluster solution for a PostgreSQL. In a nutshell: Patroni managed the cluster while the ETCD allows configuration data to be stored. The diagram of how they interact with PostgreSQL you can see below:

So let's get started with the configuration of ETCD first. To configure ETCD, we need to:

Install the package
Configure the cluster

Both tasks are fairly simple:

Setup ETCD

Let's assume we have three nodes (as the picture above)

node00 (10.100.10.200)
node01 (10.100.10.201)
node02 (10.100.10.202)

I like starting from “0” I know To install the package, simply use the package provider: yum, apt-get, etc:

Install RPM

Install ETCD package

node02:$ yum -y install etcd
node02: --> Running transaction check
node02: ---> Package etcd.x86_64 0:3.3.11-2.el7.centos will be installed
node02: --> Finished Dependency Resolution
node02:
node02: Dependencies Resolved
node02:
node02: ================================================================================
node02:  Package      Arch           Version                       Repository      Size
node02: ================================================================================
node02: Installing:
node02:  etcd         x86_64         3.3.11-2.el7.centos           extras          10 M
node02:
node02: Transaction Summary
node02: ================================================================================
node02: Install  1 Package
node02:
node02: Total download size: 10 M
node02: Installed size: 45 M
node02: Downloading packages:
node02: Running transaction check
node02: Running transaction test
node02: Transaction test succeeded
node02: Running transaction
node02:   Installing : etcd-3.3.11-2.el7.centos.x86_64                              1/1
node02:   Verifying  : etcd-3.3.11-2.el7.centos.x86_64                              1/1
node02:
node02: Installed:
node02:   etcd.x86_64 0:3.3.11-2.el7.centos
node02:
node02: Complete!

You need to do that all nodes

Configure the cluster

To configure the cluster, use the following script:

Configure ETCD

mkdir -p /db/etcd
host=`hostname`
cat > /etc/etcd/etcd.conf <<EOF
name: ${host}
data-dir: /db/etcd 
initial-advertise-peer-urls: http://${host}:2380 
listen-peer-urls: http://0.0.0.0:2380 
listen-client-urls: http://0.0.0.0:2379 
advertise-client-urls: http://${host}:2379 
initial-cluster: "node00=http://node00:2380,node01=http://node01:2380,node02=http://node02:2380"
EOF

mkdir -p /var/log/postgresql

Again, you need to run that on all 3 nodes, after that we can start the ETCD

Start the ETCD

Start ETCD

[root@node00 ~]# etcd --config-file /etc/etcd/etcd.conf &>/var/log/postgresql/etcd.log &
[1] 5536
[root@node00 ~]#

Again, on all nodes. We can after that check the status of the cluster:

Check the status

Check status

[root@node00 postgresql]# etcdctl cluster-health
member 4fd4358d80b0e45e is healthy: got healthy result from http://node00:2379
member 825e916a3f48ca3f is healthy: got healthy result from http://node01:2379
member ce1dafd3ad7d9971 is healthy: got healthy result from http://node02:2379
cluster is healthy
[root@node00 postgresql]#

If you uses API v3, you have to use the new command:

Check status

[root@lpara etcd]# etcdctl --cluster=true endpoint health
http://lpara:2379 is healthy: successfully committed proposal: took = 5.287583ms
http://lparb:2379 is healthy: successfully committed proposal: took = 6.793576ms
[root@lpara etcd]#

You can also “TEST” the server as publishing messages (key-values)

Test ETCD

#On 2nd node:
[root@node01 ~]# etcdctl put ha-cookbook-2 "Hello World"
OK
[root@node01 ~]#

#On 1st node:
[root@node00 etcd]# etcdctl get ha-cookbook-2
ha-cookbook-2
Hello World
[root@node00 etcd]#

So we have our key-value propagated among the cluster, nice….

We can also check the health of the cluster endpoints as follow:

Check cluster endpoints

[root@etcd02 ~]# etcdctl endpoint status --write-out=table --endpoints=http://etcd01:2379,http://etcd00:2379,http://etcd02:2379
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|      ENDPOINT      |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://etcd01:2379 | 32552f25bf46ac89 |  3.5.21 |   20 kB |     false |      false |         2 |         12 |                 12 |        |
| http://etcd00:2379 | 5dd4414f709cd700 |  3.5.21 |   20 kB |      true |      false |         2 |         12 |                 12 |        |
| http://etcd02:2379 | e1e9e5d83c377f01 |  3.5.21 |   20 kB |     false |      false |         2 |         12 |                 12 |        |
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd02 ~]#

Setup Patroni

The Patroni is really the brain behind the cluster management and the coordination with the distributed key store (in our case ETCD, although other solutions are possible as well: Consul, K8S, etc) Patroni is a fork of Governor and its job is to manage the cluster. With patroni you can also reinitialize a node, as we will see below. To sum it up, the configuration of Patroni, you can use the following script:

Setup Patroni

yum -y install centos-release-scl-rh
yum -y install llvm-toolset-7-clang gcc python3-devel postgresql15-devel.x86_64
pip3 install patroni
pip3 install python-etcd
mkdir /etc/patroni
mkdir -p /db/pgdata 
mkdir -p /var/log/postgresql/


chown -R postgres:postgres /etc/patroni
chown postgres:postgres /db/pgdata
chown postgres:postgres /var/log/postgresql/
chown postgres:postgres /db

host=`hostname`
binexec=`find / -name "pg_config"`
bindir=`${binexec} --bindir`
ipaddr=`cat /etc/hosts | grep "${host}" | cut -d' ' -f1`
subnet=`echo ${ipaddr} | cut -d'.' -f1-3`

cat > /etc/patroni/stampede.yml <<EOF
scope: stampede 
name: ${host} 
 
restapi: 
  listen: ${host}:8008 
  connect_address: ${host}:8008 
 
etcd: 
  host: ${host}:2379 
 
bootstrap: 
  dcs: 
    ttl: 30 
    loop_wait: 10 
    retry_timeout: 10 
    maximum_lag_on_failover: 1048576 
    postgresql: 
      use_pg_rewind: true 
      use_slots: true 
      parameters: 
        wal_level: logical 
        wal_log_hints: "on" 
 
  initdb: 
  - encoding: UTF8 
  - data-checksums 
 
  pg_hba: 
  - host replication rep_user ${ipaddr}/24 md5 
  - host all all ${subnet}.0/24 md5 
 
  users: 
    admin: 
      password: adminpass 
      options: 
        - createrole 
        - createdb 
 
postgresql: 
  listen: ${host}:5432 
  connect_address: ${host}:5432 
  data_dir: /db/pgdata 
  bin_dir: ${bindir}
  pgpass: /tmp/pgpass0 
  authentication: 
    replication: 
      username: rep_user 
      password: newpass 
    superuser: 
      username: postgres 
      password: newpass 
  parameters: 
    unix_socket_directories: '/var/run/postgresql' 
    external_pid_file: '/var/run/postgresql/15-main.pid' 
    logging_collector: "on" 
    log_directory: "/var/log/postgresql" 
    log_filename: "postgresql-15-main.log" 
    archive_command: 'test ! -f /db/wal_archive/%f && cp %p /db/wal_archive/%f'
    archive_mode: 'on'
EOF

chown -R postgres:postgres /etc/patroni
chown postgres:postgres /db/pgdata
chown -R postgres:postgres /var/log/postgresql/
chown postgres:postgres /db
chmod 600 /etc/patroni/stampede.yml
chmod 750 /db/pgdata
chown postgres:postgres /var/log/postgresql/patroni.log

patroni /etc/patroni/stampede.yml &> /var/log/postgresql/patroni.log &

Execute that command on all servers in the cluster. You don't have to substitute anything. Everything is self generated (I like twerking with bash). On the first server, you just need to create the replication user:

Create replication User

postgres=# create role rep_user with replication login password 'newpass';
CREATE ROLE
postgres=#

After that, you should have all 3 nodes working together:

Check up

-bash-4.2$ patronictl -c /etc/patroni/stampede.yml list stampede
+ Cluster: stampede --------+---------+----+-----------+
| Member | Host   | Role    | State   | TL | Lag in MB |
+--------+--------+---------+---------+----+-----------+
| node00 | node00 | Leader  | running |  2 |           |
| node01 | node01 | Replica | running |  2 |         0 |
| node02 | node02 | Replica | running |  2 |         0 |
+--------+--------+---------+---------+----+-----------+
-bash-4.2$

Now, we can see that the node: “node00” is the leader and the rest are Replicas.

Test failover

Let's test a failover. We will halt the first node:

Halt a node

[root@node00 wal]# halt

Don't forget, that Patroni is checking if there is a Leader node every 5 seconds or so. So it should see that we are Leaderless and promote a replica:

Auto-Promoting Replica

-bash-4.2$ patronictl -c /etc/patroni/stampede.yml list stampede
+ Cluster: stampede --------+---------+----+-----------+
| Member | Host   | Role    | State   | TL | Lag in MB |
+--------+--------+---------+---------+----+-----------+
| node00 | node00 | Leader  | stopped |    |           |
| node01 | node01 | Replica | running |  2 |         0 |
| node02 | node02 | Replica | running |  2 |         0 |
+--------+--------+---------+---------+----+-----------+

We see that we have stopped server, so let's see again:

Auto-Promoting Replica P2

-bash-4.2$ patronictl -c /etc/patroni/stampede.yml list stampede
+ Cluster: stampede --------+---------+----+-----------+
| Member | Host   | Role    | State   | TL | Lag in MB |
+--------+--------+---------+---------+----+-----------+
| node00 | node00 | Replica | stopped |    |   unknown |
| node01 | node01 | Replica | running |  2 |         0 |
| node02 | node02 | Leader  | running |  2 |           |
+--------+--------+---------+---------+----+-----------+

Yeyyy, after some time, we have a new Leader. Off we go.

Management

Of course, with every software, we have to manage it. So let's what issues I had. Well, I had many issues, but one of the most important ones was that Node02 was ahead of the Node00 for some reason, so I had to reinitialize Node02:

IT IS IMPORTANT TO NOTE THAT, IF YOU UPDATE THE PATRONI YML FILE, YOU NEED TO RELOAD THE CLUSTER AND RESTART IT!!!

Reinitialize a node

Re-initializing is quite simple. It will pretty much do a pg_basebackup and restore it on the node:

Reinitialize a node

-bash-4.2$ patronictl -c /etc/patroni/stampede.yml reinit stampede node02
+ Cluster: stampede --------+---------+----+-----------+
| Member | Host   | Role    | State   | TL | Lag in MB |
+--------+--------+---------+---------+----+-----------+
| node00 | node00 | Leader  | running |  2 |           |
| node01 | node01 | Replica | running |  2 |         0 |
| node02 | node02 | Replica | running |  2 |         0 |
+--------+--------+---------+---------+----+-----------+
Are you sure you want to reinitialize members node02? [y/N]: y
Success: reinitialize for member node02
-bash-4.2$

Restart Cluster

Restart the cluster

-bash-4.2$ patronictl -c /etc/patroni/stampede.yml restart stampede
+ Cluster: stampede (7337687292879936167) ---+-----------+
| Member | Host   | Role    | State     | TL | Lag in MB |
+--------+--------+---------+-----------+----+-----------+
| node00 | node00 | Leader  | running   |  1 |           |
| node01 | node01 | Replica | streaming |  1 |         0 |
| node02 | node02 | Replica | streaming |  1 |         0 |
+--------+--------+---------+-----------+----+-----------+
When should the restart take place (e.g. 2024-02-20T20:19)  [now]:
Are you sure you want to restart members node01, node02, node00? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)  []:
Success: restart on member node01
Success: restart on member node02
Success: restart on member node00
-bash-4.2$ patronictl -c /etc/patroni/stampede.yml list stampede
+ Cluster: stampede (7337687292879936167) ---+-----------+
| Member | Host   | Role    | State     | TL | Lag in MB |
+--------+--------+---------+-----------+----+-----------+
| node00 | node00 | Leader  | running   |  1 |           |
| node01 | node01 | Replica | streaming |  1 |         0 |
| node02 | node02 | Replica | streaming |  1 |         0 |
+--------+--------+---------+-----------+----+-----------+

Reload cluster

The reload is needed if you have changed anything in the configuration file of the resources, EXCEPT the DCS

Reload cluster

-bash-4.2$ patronictl -c /etc/patroni/stampede.yml reload stampede
+ Cluster: stampede (7337687292879936167) ---+-----------+
| Member | Host   | Role    | State     | TL | Lag in MB |
+--------+--------+---------+-----------+----+-----------+
| node00 | node00 | Leader  | running   |  1 |           |
| node01 | node01 | Replica | streaming |  1 |         0 |
| node02 | node02 | Replica | streaming |  1 |         0 |
+--------+--------+---------+-----------+----+-----------+
Are you sure you want to reload members node01, node02, node00? [y/N]: y
Reload request received for member node01 and will be processed within 10 seconds
Reload request received for member node02 and will be processed within 10 seconds
Reload request received for member node00 and will be processed within 10 seconds
-bash-4.2$

Update DCS

If you want to edit the DCS, you have to EITHER re-initialize the server OR use the rest API:

Update DCS part

$ curl -s -XPATCH -d \
        '{"postgresql":{"parameters":{"archive_command":"test ! -f /db/wal_archive/%f && cp %p /db/wal_archive/%f"}}}' \
        http://localhost:8008/config

Appendix

The whole code to run you that cluster using vagrant, you can see below.

$allow_shell = <<SCRIPT
sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config    
systemctl restart sshd.service
SCRIPT

$edit_hosts_file = <<SCRIPT
cat > /etc/hosts <<EOF
127.0.0.1 localhost

10.100.10.200 node00
10.100.10.201 node01 
10.100.10.202 node02 
EOF
SCRIPT


$install_etcd = <<SCRIPT
yum -y update
yum -y install python3-psycopg2 python3-pip PyYAML
yum -y install etcd
mkdir -p /db/etcd

# Install PostgreSQL:
sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo yum install -y postgresql15-server
chown postgres:postgres /db/etcd

#Modify etcd
host=`hostname`
cat > /etc/etcd/etcd.conf <<EOF
name: ${host}
data-dir: /db/etcd 
initial-advertise-peer-urls: http://${host}:2380 
listen-peer-urls: http://0.0.0.0:2380 
listen-client-urls: http://0.0.0.0:2379 
advertise-client-urls: http://${host}:2379 
initial-cluster: "node00=http://node00:2380,node01=http://node01:2380,node02=http://node02:2380"
EOF

mkdir -p /var/log/postgresql
etcd --config-file /etc/etcd/etcd.conf &>/var/log/postgresql/etcd.log &
SCRIPT


$install_patroni = <<SCRIPT
yum -y install centos-release-scl-rh
yum -y install llvm-toolset-7-clang gcc python3-devel postgresql15-devel.x86_64
pip3 install patroni
pip3 install python-etcd
mkdir /etc/patroni
mkdir -p /db/pgdata 
mkdir -p /var/log/postgresql/


chown -R postgres:postgres /etc/patroni
chown postgres:postgres /db/pgdata
chmod 600 /etc/patroni/stampede.yml
chown postgres:postgres /var/log/postgresql/
chown postgres:postgres /db

host=`hostname`
binexec=`find / -name "pg_config"`
bindir=`${binexec} --bindir`
ipaddr=`cat /etc/hosts | grep "${host}" | cut -d' ' -f1`
subnet=`echo ${ipaddr} | cut -d'.' -f1-3`

cat > /etc/patroni/stampede.yml <<EOF
scope: stampede 
name: ${host} 
 
restapi: 
  listen: ${host}:8008 
  connect_address: ${host}:8008 
 
etcd: 
  host: ${host}:2379 
 
bootstrap: 
  dcs: 
    ttl: 30 
    loop_wait: 10 
    retry_timeout: 10 
    maximum_lag_on_failover: 1048576 
    postgresql: 
      use_pg_rewind: true 
      use_slots: true 
      parameters: 
        wal_level: logical 
        wal_log_hints: "on" 
 
  initdb: 
  - encoding: UTF8 
  - data-checksums 
 
  pg_hba: 
  - host replication rep_user ${ipaddr}/24 md5 
  - host all all ${subnet}.0/24 md5 
 
  users: 
    admin: 
      password: adminpass 
      options: 
        - createrole 
        - createdb 
 
postgresql: 
  listen: ${host}:5432 
  connect_address: ${host}:5432 
  data_dir: /db/pgdata 
  bin_dir: ${bindir}
  pgpass: /tmp/pgpass0 
  authentication: 
    replication: 
      username: rep_user 
      password: newpass 
    superuser: 
      username: postgres 
      password: newpass 
  parameters: 
    unix_socket_directories: '/var/run/postgresql' 
    external_pid_file: '/var/run/postgresql/15-main.pid' 
    logging_collector: "on" 
    log_directory: "/var/log/postgresql" 
    log_filename: "postgresql-15-main.log" 
EOF


chmod 750 /db/pgdata
chown postgres:postgres /var/log/postgresql/patroni.log
patroni /etc/patroni/stampede.yml &> /var/log/postgresql/patroni.log &
SCRIPT

$install_ansible = <<SCRIPT
sudo yum -y install epel-release
sudo yum -y install ansible

cat > /etc/ansible/hosts <<EOF
[patroni]
10.100.10.200
10.100.10.201
10.100.10.202
 
[patroni:vars]
ansible_user=root
ansible_password=vagrant
ansible_connection=ssh
EOF

sed -i 's/#host_key_checking = False/host_key_checking = False/g' /etc/ansible/ansible.cfg  
SCRIPT

$disable_selinux = <<SCRIPT
setenforce 0
SCRIPT


# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config|
	(0..2).each do |i|
    config.vm.define "node0#{i}" do |node|
      node.vm.box = "centos/7"
      node.vm.box_check_update = true
      node.vm.network :private_network, ip: "10.100.10.20#{i}"
      node.vm.hostname = "node0#{i}"
	  node.vm.provision "shell", inline: $allow_shell, privileged: true
	  node.vm.provision "shell", inline: $disable_selinux, privileged: true
	  node.vm.provision "shell", inline: $install_ansible, privileged: true
	  node.vm.provision "shell", inline: $edit_hosts_file, privileged: true
	  node.vm.provision "shell", inline: $install_etcd, privileged: true
    node.vm.provision "shell", inline: $install_patroni, privileged: true
      node.vm.provider "virtualbox" do |vb|
        vb.name = "node0#{i}"
        vb.memory = "1024"
      end
    end
  end
end

That code will create 3 node cluster with everything.

Backup

Below you can see a method to backup, encrypt and upload a backup to OCI. The backup uses the replica for a backup source:

Backup

#!/bin/bash

switch_logfiles(){
    v_leader=$1

    while :
    do
        echo "Switch logfile"
        psql -Ubarman -h ${v_leader} -c "select pg_switch_wal()" -d datamine
        sleep 10
    done
}

backup() {

environment=$1
echo "Backup start"

echo "Obtaining the first replica in the cluster"
REPLICA_HOST=postgres-${environment}

v_first_replica=$(curl -sk https://${REPLICA_HOST}:8008/cluster | jq -r '.members[] | select(.role == "replica") | .host' | head -n 1)
v_leader=$(curl -sk https://${REPLICA_HOST}:8008/cluster | jq -r '.members[] | select(.role == "leader") | .host')

echo "Replacing placeholder"
sed -i "s/REPLICA_IP_PLACEHOLDER/${v_first_replica}/g" /etc/barman.d/${environment}/streaming-server.conf

echo "Creating replication slot"
barman -c /etc/barman_${environment}.conf receive-wal --create-slot ${environment}

echo "Starting cron and receive-wal"
barman -c /etc/barman_${environment}.conf cron

echo "Backup process start"

switch_logfiles ${v_leader} &

barman -c /etc/barman_${environment}.conf backup ${environment} --wait

kill $!

echo "Return placeholder"
sed -i "s/${v_first_replica}/REPLICA_IP_PLACEHOLDER/g" /etc/barman.d/${environment}/streaming-server.conf
}

clean_old_backups(){
    find /var/log/mbackups -depth -mtime +7 -type f -print
}

encrypt_backup(){
        date
        environment=$1
        backup_dir=$(ls -rtd /data/backup/${environment}/base/* | tail -1)
        cd $(dirname ${backup_dir})
        backup_file=$(echo ${backup_dir} | cut -d '/' -f 6)
        enc_key=$(cat /var/lib/barman/enckey.enc | base64 -d)
        tar zcvf - ${backup_file}/ | openssl enc -e -pbkdf2 -aes-256-ctr -k ${enc_key} -out ${backup_file}.tar.gz.enc
        date
}

copy_files_to_os(){
        date
        environment=$1
        echo "Set BUCKET_NAME and OCI config file variable"
        case ${environment} in
            "dev")
                BUCKET_NAME=postgres-dev-backup
                OCI_CONFIG_FILE=/var/lib/barman/ocivault/dev.conf
            ;;
            "stg")
                BUCKET_NAME=postgres-stg-backup
                OCI_CONFIG_FILE=/var/lib/barman/ocivault/stg.conf
            ;;
            "prod")
                BUCKET_NAME=postgres-prod-backup
                OCI_CONFIG_FILE=/var/lib/barman/ocivault/prod.conf
            ;;
            *)
        esac

        export BUCKET_NAME
        export OCI_CONFIG_FILE

        encrypted_file=$(ls -rtd /data/backup/${environment}/base/* | grep "tar.gz.enc" | tail -1)
        oci os object put --bucket-name ${BUCKET_NAME} --file ${encrypted_file}
        rm -f ${encrypted_file}
        date
}

backup $1
encrypt_backup $1
copy_files_to_os $1

Table of Contents

Overview

Setup ETCD

Install RPM

Configure the cluster

Start the ETCD

Check the status

Setup Patroni

Test failover

Management

Reinitialize a node

Restart Cluster

Reload cluster

Update DCS

Appendix

Backup