postgresql_replication_ha [My wiki]

PGPool-II is open source package which provides a lot of features for PostgreSQL like:

Load balancing
Session management
Automatic failover
Others

In a nutshell, the PGPool is external program which connects to the master and the slave(s) and is able to detect when the master is down, which query needs only read and which query needs also write. Thus directing some sessions to the slaves and others to the master.

Below you can see basic architecture:

From the architecture it is visible that the PGPool-II assigns a Virtual IP (VIP) which the client (application) will user. In case the master server fails, that VIP will be migrated to the other server, thus “failing over” and “failover command” will be needed.

So, what are we waiting for. Let's use our configuration which we already have.

Master

postgresqlmaster - 192.168.0.178

Slaves

postgresqlslaveone - 192.168.0.199
postgresqlslavetwo - 192.168.0.200

This is the configuration which we've done with you in the previous sections. Now I will setup a new VM for the pgpool-II

pgPool

postgresqlpgpool VIP - 192.168.0.220

PGpool has a current stable version of 4.1. So let's use that one.

We have Linux 7 so we can download the rpm from here Once we download it, we have download one more package

Let's tart by downloading the packages on the VM (PostgresqlPGPool) We need 2 additional packages except the PGPool:

libevent
libmemcached

Download required Libraries

Using username "root".
Last login: Thu Jan 23 10:06:12 2020
[root@postgresqlslaveone ~]# wget http://www6.atomicorp.com/channels/atomic/centos/7/x86_64/RPMS/libmemcached-1.0.18-1.el7.art.x86_64.rpm
--2020-01-23 10:07:52--  http://www6.atomicorp.com/channels/atomic/centos/7/x86_64/RPMS/libmemcached-1.0.18-1.el7.art.x86_64.rpm
Resolving www6.atomicorp.com (www6.atomicorp.com)... 51.79.80.20
Connecting to www6.atomicorp.com (www6.atomicorp.com)|51.79.80.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 247408 (242K) [application/x-rpm]
Saving to: ‘libmemcached-1.0.18-1.el7.art.x86_64.rpm’

100%[=================================================================================================================================================================================>] 247,408      177KB/s   in 1.4s

2020-01-23 10:07:54 (177 KB/s) - ‘libmemcached-1.0.18-1.el7.art.x86_64.rpm’ saved [247408/247408]

[root@postgresqlslaveone ~]# wget https://www.pgpool.net/yum/rpms/4.1/redhat/rhel-7-x86_64/pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm
--2020-01-23 10:08:00--  https://www.pgpool.net/yum/rpms/4.1/redhat/rhel-7-x86_64/pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm
Resolving www.pgpool.net (www.pgpool.net)... 202.32.10.40
Connecting to www.pgpool.net (www.pgpool.net)|202.32.10.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1409060 (1.3M) [application/x-rpm]
Saving to: ‘pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm’

100%[=================================================================================================================================================================================>] 1,409,060    130KB/s   in 12s

2020-01-23 10:08:13 (113 KB/s) - ‘pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm’ saved [1409060/1409060]

[root@postgresqlslaveone ~]#

After that we can install them

Install

[root@postgresqlslaveone ~]# yum install libevent
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
 * base: ftp.cvut.cz
 * extras: ftp.cvut.cz
 * updates: ftp.cvut.cz
base                                                                                                                                                                                                | 3.6 kB  00:00:00
extras                                                                                                                                                                                              | 2.9 kB  00:00:00
pgdg10                                                                                                                                                                                              | 3.6 kB  00:00:00
pgdg11                                                                                                                                                                                              | 3.6 kB  00:00:00
pgdg94                                                                                                                                                                                              | 3.6 kB  00:00:00
pgdg95                                                                                                                                                                                              | 3.6 kB  00:00:00
pgdg96                                                                                                                                                                                              | 3.6 kB  00:00:00
updates                                                                                                                                                                                             | 2.9 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package libevent.x86_64 0:2.0.21-4.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===========================================================================================================================================================================================================================
 Package                                              Arch                                               Version                                                    Repository                                        Size
===========================================================================================================================================================================================================================
Installing:
 libevent                                             x86_64                                             2.0.21-4.el7                                               base                                             214 k

Transaction Summary
===========================================================================================================================================================================================================================
Install  1 Package

Total download size: 214 k
Installed size: 725 k
Is this ok [y/d/N]: y
Downloading packages:
libevent-2.0.21-4.el7.x86_64.rpm                                                                                                                                                                    | 214 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : libevent-2.0.21-4.el7.x86_64                                                                                                                                                                            1/1
  Verifying  : libevent-2.0.21-4.el7.x86_64                                                                                                                                                                            1/1

Installed:
  libevent.x86_64 0:2.0.21-4.el7

Complete!
[root@postgresqlslaveone ~]# rpm -Uvh libmemcached-1.0.18-1.el7.art.x86_64.rpm
warning: libmemcached-1.0.18-1.el7.art.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID 4520afa9: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:libmemcached-1.0.18-1.el7.art    ################################# [100%]
[root@postgresqlslaveone ~]# rpm -Uvh pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm
warning: pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm: Header V4 RSA/SHA1 Signature, key ID 60ae0e48: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:pgpool-II-pg10-4.1.0-2pgdg.rhel7 ################################# [100%]
postgres ALL=NOPASSWD: /sbin/ip
postgres ALL=NOPASSWD: /usr/sbin/arping
[root@postgresqlslaveone ~]#

After that we could finally install the PGPool as you can see above.

Now we start with the interesting part :)

Firstly,we have to do passwordless connection between all servers for root and postgresql users:

Allow passwordless connection for: root and postgres users

[all servers]# cd ~/.ssh
[all servers]# ssh-keygen -t rsa
[all servers]# ssh-copy-id -i id_rsa.pub root@postgresqlpgpool
[all servers]# ssh-copy-id -i id_rsa.pub root@postgresqlmaster
[all servers]# ssh-copy-id -i id_rsa.pub rootgres@postgresqlslaveone
[all servers]# ssh-copy-id -i id_rsa.pub rootgres@postgresqlslavetwo

[all servers]# su - postgres
[all servers]$ cd ~/.ssh
[all servers]$ ssh-keygen -t rsa
[all servers]$ ssh-copy-id -i id_rsa.pub postgres@postgresqlpgpool
[all servers]# ssh-copy-id -i id_rsa.pub postgres@postgresqlmaster
[all servers]$ ssh-copy-id -i id_rsa.pub postgres@postgresqlslaveone
[all servers]$ ssh-copy-id -i id_rsa.pub postgres@postgresqlslavetwo

Because of the security reasons, we create a user repl solely used for replication purpose, and a user pgpool for streaming replication delay check and health check of Pgpool-II.

Create user on the Master

 [postgresqlmaster]# psql -U postgres -p 5432
 postgres=# SET password_encryption = 'scram-sha-256';
 postgres=# CREATE ROLE pgpool WITH LOGIN;
 postgres=# \password pgpool
 postgres=# \password postgres

Then we have to configure the settings, which will be common for all servers:

Common settings in pgpool.conf

listen_addresses = '*'
sr_check_user = 'pgpool'
sr_check_password = ''
health_check_period = 5
health_check_timeout = 30
health_check_user = 'pgpool'
health_check_password = ''
health_check_max_retries = 3
backend_hostname0 = 'postgresqlmaster'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/10/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'postgresqlslaveone'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/10/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = 'postgresqlslavetwo'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/pgsql/10/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'postgresqlmaster'
backend_application_name1 = 'postgresqlslaveone'
backend_application_name2 = 'postgresqlslavetwo'
failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
follow_master_command = '/etc/pgpool-II/follow_master.sh %d %h %p %D %m %M %H %P %r %R'
recovery_user = 'postgres'
recovery_password = ''
recovery_1st_stage_command = 'recovery_1st_stage'
enable_pool_hba = on
use_watchdog = on
delegate_IP = '192.168.0.220'
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev enp0s3 label enp0s3:0'
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev enp0s3'
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I enp0s3'
if_cmd_path = '/sbin'
arping_path = '/usr/sbin'
log_destination = 'syslog'
# Where to log
# Valid values are combinations of stderr,
# and syslog. Default to stderr.
syslog_facility = 'LOCAL1'
# Syslog local facility. Default to LOCAL0

These values will be valid for all servers and now we have to specify the specific settings per host. These configurations include: Watch dog, healtcheck and others:

PostgreSQLMaster settings

wd_hostname = 'postgresqlmaster'
wd_port = 9000

# - Other pgpool Connection Settings -
other_pgpool_hostname0 = 'postgresqlslaveone'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'postgresqlslavetwo'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
heartbeat_destination0 = 'postgresqlslaveone'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
heartbeat_destination1 = 'postgresqlslavetwo'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''

PostgreSQLSlaveOne settings

wd_hostname = 'postgresqlslaveone'
wd_port = 9000
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'postgresqlmaster'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'postgresqlslavetwo'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
heartbeat_destination0 = 'postgresqlmaster'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
heartbeat_destination1 = 'postgresqlslavetwo'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''

PostgreSQLSLaveTwo settings

wd_hostname = 'postgresqlslavetwo'
wd_port = 9000
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'postgresqlmaster'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'postgresqlslaveone'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
heartbeat_destination0 = 'postgresqlmaster'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
heartbeat_destination1 = 'postgresqlslaveone'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''

All these settings are specified in the pgpool.conf file in /etc/pgpool-II After that is done, let's create the log destination:

Create log destination

[all servers]# mkdir /var/log/pgpool-II
[all servers]# touch /var/log/pgpool-II/pgpool.log
[all servers]# echo "LOCAL1.*                                                /var/log/pgpool-II/pgpool.log" >> /etc/rsyslog.conf
[all servers]# vi /etc/logrotate.d/syslog
...
/var/log/messages
/var/log/pgpool-II/pgpool.log
/var/log/secure
[all servers]# systemctl restart rsyslog

Since user authentication is required to use the PCP command, specify user name and md5 encrypted password in pcp.conf. Here we create the encrypted password for pgpool user, and add “username:encrypted password” in /etc/pgpool-II/pcp.conf.

PCP password

[all servers]# echo 'pgpool:'`pg_md5 PCP_passowrd` >> /etc/pgpool-II/pcp.conf

Since follow_master_command script has to execute PCP command without entering the password, we create .pcppass in the home directory of Pgpool-II startup user (root user).

[all servers]# echo 'localhost:9898:pgpool:pgpool' > ~/.pcppass
[all servers]# chmod 600 ~/.pcppass

In PGpool 4.0+ the only available password which can be used is either: AES encrypter or clear text. it is obvious to note that it isn't advised to use clear text password.

So we have to:

Modify: pg_hba.conf and pool_hba.conf
Encrypt our password
Store the decryption key in the process owner home on all servers

To modify the pg_hba.conf and pool_hba.conf, you can use any text editor you want, but you should follow the syntax:

Edit hba conf files

host    all         pgpool           0.0.0.0/0          scram-sha-256
host    all         postgres         0.0.0.0/0          scram-sha-256

Once this is done, we have to provide encryption and decryption key for the password. As you know, AES is symmetric type of encryption which means it utilise the same key for encryption and decryption.

Create encryption/decryption key

[all servers]# echo 'some secret string' > ~/.pgpoolkey 
[all servers]# chmod 600 ~/.pgpoolkey

Encrypt the password

[all servers]# pg_enc -m -k /root/.pgpoolkey -u pgpool -p
db password: [pgpool user's password]
[all servers]# pg_enc -m -k /root/.pgpoolkey -u postgres -p
db password: [postgres user's passowrd]

# cat /etc/pgpool-II/pool_passwd 
pgpool:AESheq2ZMZjynddMWk5sKP/Rw==
postgres:AESHs/pWL5rtXy2IwuzroHfqg==

Please be sure you have done that all servers and you have the: .pgpoolkey on all servers with PGpool on the home directory of the user, which runs the service. (In our case: postgres). Otherwise you will have the following error:

No .pgpoolkey available

SCRAM authentication failed
unable to decrypt password from pool_passwd
verify the valid pool_key exists

P.S. Be sure to restart the PGPool after :) or use pg_ctl to reload the hba files.

We installed PGPool on top of existing replication, but PGPool can be used to restore and recover and thus create replication configuration itself. Let's follow, the process. Firstly, please create both files in the appendix on the following locations on postgresqlmaster: /var/lib/pgsql/10/data/recovery_1st_stage /var/lib/pgsql/10/data/pgpool_remote_start

Start PGPool

[all servers]# service pgpool start
Redirecting to /bin/systemctl start pgpool.service
[all servers]# service pgpool status
Redirecting to /bin/systemctl status pgpool.service
● pgpool.service - Pgpool-II
   Loaded: loaded (/usr/lib/systemd/system/pgpool.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-01-27 12:45:33 EST; 3s ago
  Process: 6190 ExecStop=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $STOP_OPTS stop (code=exited, status=0/SUCCESS)
 Main PID: 6196 (pgpool)
   CGroup: /system.slice/pgpool.service
           ├─6196 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n
           ├─6198 pgpool: watchdog
           ├─6199 pgpool: lifecheck
           ├─6200 pgpool: wait for connection request
           ├─6201 pgpool: wait for connection request
           ├─6202 pgpool: wait for connection request
           ├─6203 pgpool: wait for connection request
           ├─6204 pgpool: wait for connection request
           ├─6205 pgpool: wait for connection request
           ├─6206 pgpool: wait for connection request
           ├─6207 pgpool: wait for connection request
           ├─6208 pgpool: wait for connection request
           ├─6209 pgpool: wait for connection request
           ├─6210 pgpool: wait for connection request
           ├─6211 pgpool: wait for connection request
           ├─6212 pgpool: wait for connection request
           ├─6213 pgpool: wait for connection request
           ├─6214 pgpool: wait for connection request
           ├─6215 pgpool: wait for connection request
           ├─6216 pgpool: wait for connection request
           ├─6217 pgpool: wait for connection request
           ├─6218 pgpool: wait for connection request
           ├─6219 pgpool: wait for connection request
           ├─6220 pgpool: wait for connection request
           ├─6221 pgpool: wait for connection request
           ├─6222 pgpool: wait for connection request
           ├─6223 pgpool: wait for connection request
           ├─6224 pgpool: wait for connection request
           ├─6225 pgpool: wait for connection request
           ├─6226 pgpool: wait for connection request
           ├─6227 pgpool: wait for connection request
           ├─6228 pgpool: wait for connection request
           ├─6229 pgpool: wait for connection request
           ├─6230 pgpool: wait for connection request
           ├─6231 pgpool: wait for connection request
           ├─6232 pgpool: PCP: wait for connection request
           ├─6233 pgpool: worker process
           ├─6234 pgpool: health check process(0)
           ├─6235 pgpool: health check process(1)
           ├─6236 pgpool: health check process(2)
           ├─6237 pgpool: heartbeat receiver
           ├─6238 pgpool: heartbeat sender
           ├─6239 pgpool: heartbeat receiver
           └─6240 pgpool: heartbeat sender

Jan 27 12:45:35 postgresqlslavetwo pgpool[6238]: [10-2] 2020-01-27 12:45:35: pid 6238: DETAIL:  set SO_REUSEPORT
Jan 27 12:45:35 postgresqlslavetwo pgpool[6237]: [9-1] 2020-01-27 12:45:35: pid 6237: LOG:  set SO_REUSEPORT option to the socket
Jan 27 12:45:35 postgresqlslavetwo pgpool[6237]: [10-1] 2020-01-27 12:45:35: pid 6237: LOG:  creating watchdog heartbeat receive socket.
Jan 27 12:45:35 postgresqlslavetwo pgpool[6237]: [10-2] 2020-01-27 12:45:35: pid 6237: DETAIL:  set SO_REUSEPORT
Jan 27 12:45:35 postgresqlslavetwo pgpool[6239]: [9-1] 2020-01-27 12:45:35: pid 6239: LOG:  set SO_REUSEPORT option to the socket
Jan 27 12:45:35 postgresqlslavetwo pgpool[6239]: [10-1] 2020-01-27 12:45:35: pid 6239: LOG:  creating watchdog heartbeat receive socket.
Jan 27 12:45:35 postgresqlslavetwo pgpool[6239]: [10-2] 2020-01-27 12:45:35: pid 6239: DETAIL:  set SO_REUSEPORT
Jan 27 12:45:35 postgresqlslavetwo pgpool[6240]: [9-1] 2020-01-27 12:45:35: pid 6240: LOG:  set SO_REUSEPORT option to the socket
Jan 27 12:45:35 postgresqlslavetwo pgpool[6240]: [10-1] 2020-01-27 12:45:35: pid 6240: LOG:  creating socket for sending heartbeat
Jan 27 12:45:35 postgresqlslavetwo pgpool[6240]: [10-2] 2020-01-27 12:45:35: pid 6240: DETAIL:  set SO_REUSEPORT
[root@postgresqlslavetwo pgpool-II]#

Restore the database using the VIP:

Start PGPool

# pcp_recovery_node -h 192.168.0.220 -p 9898 -U pgpool-n 1
Password: 
pcp_recovery_node -- Command Successful

# pcp_recovery_node -h 192.168.0.220 -p 9898 -U pgpool -n 2
Password: 
pcp_recovery_node -- Command Successful

I was going to kill myself couple times, seriously. I was head banging for at least couple weeks because I Couldn't make PostgreSQL failover. So let me tell you couple things which I found out THE VERY HARD and PAINFUL way.

With PgPool, you can either use postgresql replication to migrate you the data, OR Pgpool replication. By default, when you install pgpool, it will be the FIRST thing you install, HOWEVER in our case was the second, so there is a need of a little modification.

Please ENSURE the following parameters are set on the nodes:

master_slave_mode = on
master_slave_sub_mode = 'stream'

And the following ones, turned off:

-bash-4.2$ cat pgpool.conf | grep replication
replication_mode = off
                                   # Activate replication mode
                                   # when in replication mode
                                   # replication mode, specify table name to

As the two settings are mutually exclusive. Only one can be active at a time.

This indicates that you ALREADY have streaming replication and that you take care of it.

After that, let's see our current state of the cluster:

State

[root@postgresqlslaveone tmp]#  psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
Password for user postgres:
 node_id |      hostname      | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | postgresqlmaster   | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |                        | 2020-02-07 10:12:26
 1       | postgresqlslaveone | 5432 | up     | 0.500000  | standby | 0          | false             | 0                 |                   |                        | 2020-02-07 10:12:26
(2 rows)
 
[root@postgresqlslaveone tmp]

That clearly states that the postgresqlmaster is the master and postgresqlslaveone is the slave :) I know, stupid naming but bare with me :)

So what happens after I shutdown the first database:

Master Database

[root@postgresqlmaster pgpool-II]# su - postgres
Last login: Fri Feb  7 10:01:53 EST 2020 on pts/3
-bash-4.2$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ -l logfile stop
waiting for server to shut down..... done
server stopped
-bash-4.2$

Well…what can happen, the database will go down of course :)

Master Database

2020-02-07 10:14:36.882 EST [15508] LOG:  received fast shutdown request
2020-02-07 10:14:36.907 EST [15508] LOG:  aborting any active transactions
2020-02-07 10:14:36.909 EST [15508] LOG:  worker process: logical replication launcher (PID 15517) exited with exit code 1
2020-02-07 10:14:36.909 EST [15511] LOG:  shutting down
2020-02-07 10:14:38.176 EST [15508] LOG:  database system is shut down

And even more shokingly the PGPool will FINALLY recognize it:

Reaction on Master

10:15:09 postgresqlmaster pgpool[16840]: [10-1] 2020-02-07 10:15:09: pid 16840: LOG:  failed to connect to PostgreSQL server on "postgresqlmaster:5432", getsockopt() detected error "Connection refused"
Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [11-1] 2020-02-07 10:15:09: pid 16840: LOG:  received degenerate backend request for node_id: 0 from pid [16840]
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [26-1] 2020-02-07 10:15:09: pid 16802: LOG:  new IPC connection received
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [27-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog received the failover command from local pgpool-II on IPC interface
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [28-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-1] 2020-02-07 10:15:09: pid 16802: LOG:  we do not need quorum to hold to proceed with failover
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-2] 2020-02-07 10:15:09: pid 16802: DETAIL:  proceeding with the failover
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-3] 2020-02-07 10:15:09: pid 16802: HINT:  failover_when_quorum_exists is set to false
Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [12-1] 2020-02-07 10:15:09: pid 16840: FATAL:  failed to create a backend connection
Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [12-2] 2020-02-07 10:15:09: pid 16840: DETAIL:  executing failover on backend
Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [15-1] 2020-02-07 10:15:09: pid 16800: LOG:  Pgpool-II parent process has received failover request
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [30-1] 2020-02-07 10:15:09: pid 16802: LOG:  new IPC connection received
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [31-1] 2020-02-07 10:15:09: pid 16802: LOG:  received the failover indication from Pgpool-II on IPC interface
Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [32-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog is informed of failover start by the main process
Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [16-1] 2020-02-07 10:15:09: pid 16800: LOG:  starting degeneration. shutdown host postgresqlmaster(5432)
Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [17-1] 2020-02-07 10:15:09: pid 16800: LOG:  Restart all children
Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [18-1] 2020-02-07 10:15:09: pid 16800: LOG:  execute command: /etc/pgpool-II/failover.sh 0 postgresqlmaster 5432 /var/lib/pgsql/10/data 1 postgresqlslaveone 0 0 5432 /var/lib/pgsql/10/data postgresqlmaster 5432
Feb  7 10:15:11 postgresqlmaster postgres[17112]: follow_master.sh: start: Standby node 0
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_ID=0
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_HOST=postgresqlmaster
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_PORT=5432
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_PGDATA=/var/lib/pgsql/10/data
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_ID=1
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + OLD_MASTER_NODE_ID=0
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_HOST=postgresqlslaveone
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + OLD_PRIMARY_NODE_ID=0
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_PORT=5432
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_PGDATA=/var/lib/pgsql/10/data
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGHOME=/usr/pgsql-10
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ARCHIVEDIR=/walshared
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + REPLUSER=repl
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PCP_USER=postgres
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGPOOL_PATH=/usr/bin
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PCP_PORT=9898
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + logger -i -p local1.info follow_master.sh: start: Standby node 0
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@postgresqlslaveone -i /var/lib/pgsql/.ssh/id_rsa ls /tmp
Feb  7 10:15:11 postgresqlmaster postgres[17113]: Warning: Permanently added 'postgresqlslaveone,192.168.0.199' (ECDSA) to the list of known hosts.
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + '[' 0 -ne 0 ']'
Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ /usr/pgsql-10/bin/initdb -V
Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ awk '{print $3}'
Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ sed 's/\..*//'
Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGVERSION=10
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + '[' 10 -ge 12 ']'
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + RECOVERYCONF=/var/lib/pgsql/10/data/recovery.conf
Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@postgresqlmaster -i /var/lib/pgsql/.ssh/id_rsa /usr/pgsql-10/bin/pg_ctl -w -D /var/lib/pgsq
l/10/data status
Feb  7 10:15:11 postgresqlmaster postgres[17113]: Warning: Permanently added 'postgresqlmaster,192.168.0.178' (ECDSA) to the list of known hosts.

FINALLY, the failover script was executed after countless blood stains on the wall

You can see also that on the new master node's database:

Reaction on Slave(new master)

2020-01-28 14:40:19.701 EST [22926] LOG:  received promote request
2020-01-28 14:40:19.728 EST [22926] LOG:  redo done at 0/1C000028
cp: cannot stat ‘/walshared/00000001000000000000001C’: No such file or directory
cp: cannot stat ‘/walshared/00000002.history’: No such file or directory
2020-01-28 14:40:19.767 EST [22926] LOG:  selected new timeline ID: 2
2020-01-28 14:40:19.876 EST [22926] LOG:  archive recovery complete
cp: cannot stat ‘/walshared/00000001.history’: No such file or directory
2020-01-28 14:40:20.012 EST [22924] LOG:  database system is ready to accept connections

Now, on the slave (new master) you won't see anything on the Pgpool until you don't shutdown the pgpool too. Because usually when a master fails over, the entire server is dead :)

PGPool on Master(old)

[root@postgresqlmaster pgpool-II]# service pgpool stop
Redirecting to /bin/systemctl stop pgpool.service
[root@postgresqlmaster pgpool-II]#

Once you do, you will recieve the migrate of the VIP as well :)

Reaction on Slave

Jan 28 15:33:47 postgresqlslaveone pgpool[25956]: [50-1] 2020-01-28 15:33:47: pid 25956: LOG:  watchdog node state changed from [STANDBY] to [JOINING]
Jan 28 15:33:51 postgresqlslaveone pgpool[25956]: [51-1] 2020-01-28 15:33:51: pid 25956: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [52-1] 2020-01-28 15:33:52: pid 25956: LOG:  I am the only alive node in the watchdog cluster
Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [52-2] 2020-01-28 15:33:52: pid 25956: HINT:  skipping stand for coordinator state
Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [53-1] 2020-01-28 15:33:52: pid 25956: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]
Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [54-1] 2020-01-28 15:33:52: pid 25956: LOG:  I am announcing my self as master/coordinator watchdog node
Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [55-1] 2020-01-28 15:33:56: pid 25956: LOG:  I am the cluster leader node
Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [55-2] 2020-01-28 15:33:56: pid 25956: DETAIL:  our declare coordinator message is accepted by all nodes
Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [56-1] 2020-01-28 15:33:56: pid 25956: LOG:  setting the local node "postgresqlslaveone:9999 Linux postgresqlslaveone" as watchdog cluster master
Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [57-1] 2020-01-28 15:33:56: pid 25956: LOG:  I am the cluster leader node. Starting escalation process
Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [58-1] 2020-01-28 15:33:56: pid 25956: LOG:  escalation process started with PID:29053
Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [59-1] 2020-01-28 15:33:56: pid 25956: LOG:  new IPC connection received
Jan 28 15:33:56 postgresqlslaveone pgpool[29053]: [58-1] 2020-01-28 15:33:56: pid 29053: LOG:  watchdog: escalation started
Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [24-1] 2020-01-28 15:33:59: pid 26021: LOG:  forked new pcp worker, pid=29066 socket=8
Jan 28 15:33:59 postgresqlslaveone pgpool[25956]: [60-1] 2020-01-28 15:33:59: pid 25956: LOG:  new IPC connection received
Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [25-1] 2020-01-28 15:33:59: pid 26021: LOG:  PCP process with pid: 29066 exit with SUCCESS.
Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [26-1] 2020-01-28 15:33:59: pid 26021: LOG:  PCP process with pid: 29066 exits with status 0
Jan 28 15:34:00 postgresqlslaveone pgpool[29053]: [59-1] 2020-01-28 15:34:00: pid 29053: LOG:  successfully acquired the delegate IP:"192.168.0.220"
Jan 28 15:34:00 postgresqlslaveone pgpool[29053]: [59-2] 2020-01-28 15:34:00: pid 29053: DETAIL:  'if_up_cmd' returned with success
Jan 28 15:34:00 postgresqlslaveone pgpool[25956]: [61-1] 2020-01-28 15:34:00: pid 25956: LOG:  watchdog escalation process with pid: 29053 exit with SUCCESS.
Jan 28 15:34:16 postgresqlslaveone pgpool[26293]: [31-1] 2020-01-28 15:34:16: pid 26293: LOG:  pool_reuse_block: blockid: 0
Jan 28 15:34:16 postgresqlslaveone pgpool[26293]: [31-2] 2020-01-28 15:34:16: pid 26293: CONTEXT:  while searching system catalog, When relcache is missed
Jan 28 15:34:19 postgresqlslaveone pgpool[25985]: [9-1] 2020-01-28 15:34:19: pid 25985: LOG:  informing the node status change to watchdog
Jan 28 15:34:19 postgresqlslaveone pgpool[25985]: [9-2] 2020-01-28 15:34:19: pid 25985: DETAIL:  node id :1 status = "NODE DEAD" message:"No heartbeat signal from node"
Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [62-1] 2020-01-28 15:34:19: pid 25956: LOG:  new IPC connection received
Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [63-1] 2020-01-28 15:34:19: pid 25956: LOG:  received node status change ipc message
Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [63-2] 2020-01-28 15:34:19: pid 25956: DETAIL:  No heartbeat signal from node
Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [64-1] 2020-01-28 15:34:19: pid 25956: LOG:  remote node "postgresqlmaster:9999 Linux postgresqlmaster" is shutting down

After all this is done, we can check the new status of the cluster :)

Failover

[root@postgresqlslaveone tmp]#  pcp_watchdog_info -p 9898 -h 192.168.0.220 -U postgres
Password:
2 YES postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone
 
postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone 9999 9000 4 MASTER
postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster 9999 9000 10 SHUTDOWN
[root@postgresqlslaveone tmp]#  psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
Password for user postgres:
 node_id |      hostname      | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | postgresqlmaster   | 5432 | down   | 0.500000  | standby | 0          | false             | 0                 |                   |                        | 2020-01-28 14:40:20
 1       | postgresqlslaveone | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |                        | 2020-01-28 15:34:16
(2 rows)
 
[root@postgresqlslaveone tmp]#

After executing pcp_recovery_node command, vertify that postgresqlslaveone and postgresqlslavetwo are started as PostgreSQL standby server.

Verify PGPool

# psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
Password for user postgres
node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0       | postgresqlmaster  | 5432 | up     | 0.333333  | primary | 0          | false             | 0                 |                   |                        | 2020-01-28 05:18:09
1       | postgresqlslaveone  | 5432 | up     | 0.333333  | standby | 0          | true              | 0                 | streaming         | async                  | 2020-01-28 05:18:09
2       | postgresqlslavetwo  | 5432 | up     | 0.333333  | standby | 0          | false             | 0                 | streaming         | async                  | 2020-01-28 05:18:09
(3 rows)

You can also verify the Watchdog deamon as follows:

Verify WatchDog

[root@postgresqlmaster ~]# pcp_watchdog_info -p 9898 -h 192.168.0.220 -U postgres
Password:
3 YES postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster

postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster 9999 9000 4 MASTER
postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone 9999 9000 7 STANDBY
postgresqlslavetwo:9999 Linux postgresqlslavetwo postgresqlslavetwo 9999 9000 7 STANDBY
[root@postgresqlmaster ~]#

recovery_1st_stage

#!/bin/bash
# This script is executed by "recovery_1st_stage" to recovery a Standby node.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

PRIMARY_NODE_PGDATA="$1"
DEST_NODE_HOST="$2"
DEST_NODE_PGDATA="$3"
PRIMARY_NODE_PORT="$4"
DEST_NODE_ID="$5"
DEST_NODE_PORT="$6"

PRIMARY_NODE_HOST=$(hostname)
PGHOME=/usr/pgsql-10
ARCHIVEDIR=/var/lib/pgsql/archivedir
REPLUSER=repl

logger -i -p local1.info recovery_1st_stage: start: pg_basebackup for Standby node $DEST_NODE_ID

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info recovery_1st_stage: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`
if [ $PGVERSION -ge 12 ]; then
    RECOVERYCONF=${DEST_NODE_PGDATA}/myrecovery.conf
else
    RECOVERYCONF=${DEST_NODE_PGDATA}/recovery.conf
fi

## Create replication slot "${DEST_NODE_HOST}"
${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_create_physical_replication_slot('${DEST_NODE_HOST}');
EOQ

## Execute pg_basebackup to recovery Standby node
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "

    set -o errexit

    rm -rf $DEST_NODE_PGDATA
    rm -rf $ARCHIVEDIR/*

    ${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X stream

    if [ ${PGVERSION} -ge 12 ]; then
        sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
               -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.conf
    fi

    cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPLUSER} application_name=${DEST_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${DEST_NODE_HOST}'
EOT

    if [ ${PGVERSION} -ge 12 ]; then
            touch ${DEST_NODE_PGDATA}/standby.signal
    else
            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
    fi

    sed -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${DEST_NODE_PGDATA}/postgresql.conf
"

if [ $? -ne 0 ]; then

    ${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_drop_replication_slot('${DEST_NODE_HOST}');
EOQ

    logger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failed
    exit 1
fi

logger -i -p local1.info recovery_1st_stage: end: recovery_1st_stage complete
exit 0

pgpool_remote_start

#!/bin/bash
# This script is run after recovery_1st_stage to start Standby node.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

PGHOME=/usr/pgsql-10
DEST_NODE_HOST="$1"
DEST_NODE_PGDATA="$2"


logger -i -p local1.info pgpool_remote_start: start: remote start Standby node $DEST_NODE_HOST

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info pgpool_remote_start: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## Start Standby node
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "
    $PGHOME/bin/pg_ctl -l /dev/null -w -D $DEST_NODE_PGDATA start
"

if [ $? -ne 0 ]; then
    logger -i -p local1.error pgpool_remote_start: $DEST_NODE_HOST PostgreSQL start failed.
    exit 1
fi

logger -i -p local1.info pgpool_remote_start: end: $DEST_NODE_HOST PostgreSQL started successfully.
exit 0

/etc/pgpool-II/failover.sh

#!/bin/bash
# This script is run by failover_command.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

# Special values:
#   %d = failed node id
#   %h = failed node hostname
#   %p = failed node port number
#   %D = failed node database cluster path
#   %m = new master node id
#   %H = new master node hostname
#   %M = old master node id
#   %P = old primary node id
#   %r = new master port number
#   %R = new master database cluster path
#   %N = old primary node hostname
#   %S = old primary node port number
#   %% = '%' character

FAILED_NODE_ID="$1"
FAILED_NODE_HOST="$2"
FAILED_NODE_PORT="$3"
FAILED_NODE_PGDATA="$4"
NEW_MASTER_NODE_ID="$5"
NEW_MASTER_NODE_HOST="$6"
OLD_MASTER_NODE_ID="$7"
OLD_PRIMARY_NODE_ID="$8"
NEW_MASTER_NODE_PORT="$9"
NEW_MASTER_NODE_PGDATA="${10}"
OLD_PRIMARY_NODE_HOST="${11}"
OLD_PRIMARY_NODE_PORT="${12}"

PGHOME=/usr/pgsql-10


logger -i -p local1.info failover.sh: start: failed_node_id=$FAILED_NODE_ID old_primary_node_id=$OLD_PRIMARY_NODE_ID failed_host=$FAILED_NODE_HOST new_master_host=$NEW_MASTER_NODE_HOST

## If there's no master node anymore, skip failover.
if [ $NEW_MASTER_NODE_ID -lt 0 ]; then
    logger -i -p local1.info failover.sh: All nodes are down. Skipping failover.
	exit 0
fi

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info failover.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## If Standby node is down, skip failover.
if [ $FAILED_NODE_ID -ne $OLD_PRIMARY_NODE_ID ]; then
    logger -i -p local1.info failover.sh: Standby node is down. Skipping failover.

    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$OLD_PRIMARY_NODE_HOST -i ~/.ssh/id_rsa_pgpool "
        ${PGHOME}/bin/psql -p $OLD_PRIMARY_NODE_PORT -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
    "

    if [ $? -ne 0 ]; then
        logger -i -p local1.error failover.sh: drop replication slot "${FAILED_NODE_HOST}" failed
        exit 1
    fi

    exit 0
fi

## Promote Standby node.
logger -i -p local1.info failover.sh: Primary node is down, promote standby node ${NEW_MASTER_NODE_HOST}.

ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
    postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -D ${NEW_MASTER_NODE_PGDATA} -w promote

if [ $? -ne 0 ]; then
    logger -i -p local1.error failover.sh: new_master_host=$NEW_MASTER_NODE_HOST promote failed
    exit 1
fi

logger -i -p local1.info failover.sh: end: new_master_node_id=$NEW_MASTER_NODE_ID started as the primary node
exit 0

/etc/pgpool-II/follow_master.sh

#!/bin/bash
# This script is run after failover_command to synchronize the Standby with the new Primary.
# First try pg_rewind. If pg_rewind failed, use pg_basebackup.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

# Special values:
#   %d = failed node id
#   %h = failed node hostname
#   %p = failed node port number
#   %D = failed node database cluster path
#   %m = new master node id
#   %H = new master node hostname
#   %M = old master node id
#   %P = old primary node id
#   %r = new master port number
#   %R = new master database cluster path
#   %N = old primary node hostname
#   %S = old primary node port number
#   %% = '%' character

FAILED_NODE_ID="$1"
FAILED_NODE_HOST="$2"
FAILED_NODE_PORT="$3"
FAILED_NODE_PGDATA="$4"
NEW_MASTER_NODE_ID="$5"
OLD_MASTER_NODE_ID="$6"
NEW_MASTER_NODE_HOST="$7"
OLD_PRIMARY_NODE_ID="$8"
NEW_MASTER_NODE_PORT="$9"
NEW_MASTER_NODE_PGDATA="${10}"

PGHOME=/usr/pgsql-10
ARCHIVEDIR=/var/lib/pgsql/archivedir
REPLUSER=repl
PCP_USER=pgpool
PGPOOL_PATH=/usr/bin
PCP_PORT=9898

logger -i -p local1.info follow_master.sh: start: Standby node ${FAILED_NODE_ID}

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info follow_master.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`

if [ $PGVERSION -ge 12 ]; then
RECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.conf
else
RECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conf
fi

## Check the status of Standby
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -w -D ${FAILED_NODE_PGDATA} status


## If Standby is running, synchronize it with the new Primary.
if [ $? -eq 0 ]; then

    logger -i -p local1.info follow_master.sh: pg_rewind for $FAILED_NODE_ID

    # Create replication slot "${FAILED_NODE_HOST}"
    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
        ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${FAILED_NODE_HOST}');\"
    "

    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "

        set -o errexit

        ${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stop

        cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${FAILED_NODE_HOST}'
EOT

        if [ ${PGVERSION} -ge 12 ]; then
            touch ${FAILED_NODE_PGDATA}/standby.signal
        else
            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
        fi

        ${PGHOME}/bin/pg_rewind -D ${FAILED_NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\"

    "

    if [ $? -ne 0 ]; then
        logger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup.

        ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
             
            set -o errexit

            # Execute pg_basebackup
            rm -rf ${FAILED_NODE_PGDATA}
            rm -rf ${ARCHIVEDIR}/*
            ${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X stream

            if [ ${PGVERSION} -ge 12 ]; then
                sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
                       -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.conf
            fi
     
            cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${FAILED_NODE_HOST}'
EOT

            if [ ${PGVERSION} -ge 12 ]; then
                    touch ${FAILED_NODE_PGDATA}/standby.signal
            else
                    echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
            fi
        "

        if [ $? -ne 0 ]; then
            # drop replication slot
            ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
                ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
            "

            logger -i -p local1.error follow_master.sh: end: pg_basebackup failed
            exit 1
        fi
    fi

    # start Standby node on ${FAILED_NODE_HOST}
    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
            postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool $PGHOME/bin/pg_ctl -l /dev/null -w -D ${FAILED_NODE_PGDATA} start

    # If start Standby successfully, attach this node
    if [ $? -eq 0 ]; then

        # Run pcp_attact_node to attach Standby node to Pgpool-II.
        ${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID}

        if [ $? -ne 0 ]; then
                logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed
                exit 1
        fi

    # If start Standby failed, drop replication slot "${FAILED_NODE_HOST}"
    else

        ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \
        ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')"

        logger -i -p local1.error follow_master.sh: end: follow master command failed
        exit 1
    fi

else
    logger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master command
    exit 0
fi

logger -i -p local1.info follow_master.sh: end: follow master command complete
exit 0

To implement pgpool can be done either 1) Via variables 2) Configmaps

In this case, we will use a config map:

apiVersion: v1
kind: ConfigMap
metadata:
  name: pgpool-config
  namespace: db-test
  labels:
    app: pgpool-config
data:
  pgpool.conf: |-
    listen_addresses = '*'
    port = 9999
    socket_dir = '/var/run/postgresql'
    pcp_listen_addresses = '*'
    pcp_port = 9898
    pcp_socket_dir = '/var/run/postgresql'
    backend_hostname0 = experience-db-cluster-alinma-rw
    backend_port0 = 5432
    backend_weight0 = 1
    backend_flag0 = 'ALWAYS_PRIMARY|DISALLOW_TO_FAILOVER'
    backend_auth_method0 = 'scram-sha-256'
    backend_password0 = 'experience_db'
    backend_hostname1 = experience-db-cluster-alinma-ro
    backend_port1 = 5432
    backend_weight1 = 1
    backend_flag1 = 'DISALLOW_TO_FAILOVER'
    backend_password1 = 'experience_db'
    backend_auth_method1 = 'scram-sha-256'
    backend_hostname2 = experience-db-cluster-alinma-ro
    backend_port2 = 5432
    backend_weight2 = 2
    backend_flag2 = 'DISALLOW_TO_FAILOVER'
    backend_password2 = 'experience_db'
    backend_auth_method2 = 'scram-sha-256'
    sr_check_user = 'experience_db'
    sr_check_password = 'experience_db'
    sr_check_period = 10
    enable_pool_hba = on
    master_slave_mode = on
    num_init_children = 32
    max_pool = 4
    child_life_time = 300
    child_max_connections = 0
    connection_life_time = 0
    client_idle_limit = 0
    connection_cache = on
    load_balance_mode = on
    PGPOOL_PCP_USER = 'experience_db'
    PGPOOL_PCP_PASSWORD = 'experience_db'
  pcp.conf: |-
    experience_db:be22aea2ca31a561e65894d88a2bad32
  pool_passwd: |-
    experience_db:be22aea2ca31a561e65894d88a2bad32
  pool_hba.conf: |-
    local   all         all                               trust
    host    all         all         127.0.0.1/32          trust
    host    all         all         ::1/128               trust
    host    all         all         0.0.0.0/0             scram-sha-256

After we create that configmap with:

Create configmap

kk apply -f configmap.yaml

We can create the deployment and the service now:

Create Depoyment and Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgpool
spec:
  replicas: 3
  selector:
    matchLabels:
      app: pgpool
  template:
    metadata:
      labels:
        app: pgpool
    spec:
      containers:
      - name: pgpool
        image: pgpool/pgpool
        env:
        - name: POSTGRES_USERNAME
          value: "experience_db"
        - name: POSTGRES_PASSWORD
          value: "experience_db"
        - name: PGPOOL_PASSWORD_ENCRYPTION_METHOD
          value: "scram-sha-256"
        - name: PGPOOL_ENABLE_POOL_PASSWD
          value: "true"
        - name: PGPOOL_SKIP_PASSWORD_ENCRYPTION
          value: "false"
        # The following settings are not required when not using the Pgpool-II PCP command.
        # To enable the following settings, you must define a secret that stores the PCP user's
        # username and password.
        #- name: PGPOOL_PCP_USER
        #  valueFrom:
        #    secretKeyRef:
        #      name: pgpool-pcp-secret
        #      key: username
        #- name: PGPOOL_PCP_PASSWORD
        #  valueFrom:
        #    secretKeyRef:
        #      name: pgpool-pcp-secret
        #      key: password
        volumeMounts:
        - name: pgpool-config
          mountPath: /config
        #- name: pgpool-tls
        #  mountPath: /config/tls
      volumes:
      - name: pgpool-config
        configMap:
          name: pgpool-config
      # Configure your own TLS certificate.
      # If not set, Pgpool-II will automatically generate the TLS certificate if ssl = on.
      #- name: pgpool-tls
      #  secret:
      #    secretName: pgpool-tls
---
apiVersion: v1
kind: Service
metadata:
  name: pgpool
spec:
  selector:
    app: pgpool
  ports:
  - name: pgpool-port
    protocol: TCP
    port: 9999
    targetPort: 9999

Overview

Master

Slaves

pgPool

Configure

Download Packages

Install packages

Setup

Configure passwordless Connection

Configure Common settings

Configure Individual settings

PCP Config

Allow access

Setup Replication

Start the Pgpool service

Failover

Current State

Database Failover

PGPool Failover

After failover

Verify

Appendix

Implemention with Kubernetes

My wiki