postgresql_replication_ha

This is an old revision of the document!


PGPool-II is open source package which provides a lot of features for PostgreSQL like:

  • Load balancing
  • Session management
  • Automatic failover
  • Others

In a nutshell, the PGPool is external program which connects to the master and the slave(s) and is able to detect when the master is down, which query needs only read and which query needs also write. Thus directing some sessions to the slaves and others to the master.

Below you can see basic architecture:

From the architecture it is visible that the PGPool-II assigns a Virtual IP (VIP) which the client (application) will user. In case the master server fails, that VIP will be migrated to the other server, thus “failing over” and “failover command” will be needed.

So, what are we waiting for. Let's use our configuration which we already have.

Master
  • postgresqlmaster - 192.168.0.178
Slaves
  • postgresqlslaveone - 192.168.0.199
  • postgresqlslavetwo - 192.168.0.200

This is the configuration which we've done with you in the previous sections. Now I will setup a new VM for the pgpool-II

pgPool
  • postgresqlpgpool VIP - 192.168.0.220

PGpool has a current stable version of 4.1. So let's use that one.

We have Linux 7 so we can download the rpm from here Once we download it, we have download one more package

Let's tart by downloading the packages on the VM (PostgresqlPGPool) We need 2 additional packages except the PGPool:

  • libevent
  • libmemcached

Download required Libraries

Using username "root".
Last login: Thu Jan 23 10:06:12 2020
[root@postgresqlslaveone ~]# wget http://www6.atomicorp.com/channels/atomic/centos/7/x86_64/RPMS/libmemcached-1.0.18-1.el7.art.x86_64.rpm
--2020-01-23 10:07:52--  http://www6.atomicorp.com/channels/atomic/centos/7/x86_64/RPMS/libmemcached-1.0.18-1.el7.art.x86_64.rpm
Resolving www6.atomicorp.com (www6.atomicorp.com)... 51.79.80.20
Connecting to www6.atomicorp.com (www6.atomicorp.com)|51.79.80.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 247408 (242K) [application/x-rpm]
Saving to: ‘libmemcached-1.0.18-1.el7.art.x86_64.rpm’

100%[=================================================================================================================================================================================>] 247,408      177KB/s   in 1.4s

2020-01-23 10:07:54 (177 KB/s) - ‘libmemcached-1.0.18-1.el7.art.x86_64.rpm’ saved [247408/247408]

[root@postgresqlslaveone ~]# wget https://www.pgpool.net/yum/rpms/4.1/redhat/rhel-7-x86_64/pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm
--2020-01-23 10:08:00--  https://www.pgpool.net/yum/rpms/4.1/redhat/rhel-7-x86_64/pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm
Resolving www.pgpool.net (www.pgpool.net)... 202.32.10.40
Connecting to www.pgpool.net (www.pgpool.net)|202.32.10.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1409060 (1.3M) [application/x-rpm]
Saving to: ‘pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm’

100%[=================================================================================================================================================================================>] 1,409,060    130KB/s   in 12s

2020-01-23 10:08:13 (113 KB/s) - ‘pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm’ saved [1409060/1409060]

[root@postgresqlslaveone ~]#

After that we can install them

Install

[root@postgresqlslaveone ~]# yum install libevent
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
 * base: ftp.cvut.cz
 * extras: ftp.cvut.cz
 * updates: ftp.cvut.cz
base                                                                                                                                                                                                | 3.6 kB  00:00:00
extras                                                                                                                                                                                              | 2.9 kB  00:00:00
pgdg10                                                                                                                                                                                              | 3.6 kB  00:00:00
pgdg11                                                                                                                                                                                              | 3.6 kB  00:00:00
pgdg94                                                                                                                                                                                              | 3.6 kB  00:00:00
pgdg95                                                                                                                                                                                              | 3.6 kB  00:00:00
pgdg96                                                                                                                                                                                              | 3.6 kB  00:00:00
updates                                                                                                                                                                                             | 2.9 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package libevent.x86_64 0:2.0.21-4.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===========================================================================================================================================================================================================================
 Package                                              Arch                                               Version                                                    Repository                                        Size
===========================================================================================================================================================================================================================
Installing:
 libevent                                             x86_64                                             2.0.21-4.el7                                               base                                             214 k

Transaction Summary
===========================================================================================================================================================================================================================
Install  1 Package

Total download size: 214 k
Installed size: 725 k
Is this ok [y/d/N]: y
Downloading packages:
libevent-2.0.21-4.el7.x86_64.rpm                                                                                                                                                                    | 214 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : libevent-2.0.21-4.el7.x86_64                                                                                                                                                                            1/1
  Verifying  : libevent-2.0.21-4.el7.x86_64                                                                                                                                                                            1/1

Installed:
  libevent.x86_64 0:2.0.21-4.el7

Complete!
[root@postgresqlslaveone ~]# rpm -Uvh libmemcached-1.0.18-1.el7.art.x86_64.rpm
warning: libmemcached-1.0.18-1.el7.art.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID 4520afa9: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:libmemcached-1.0.18-1.el7.art    ################################# [100%]
[root@postgresqlslaveone ~]# rpm -Uvh pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm
warning: pgpool-II-pg10-4.1.0-2pgdg.rhel7.x86_64.rpm: Header V4 RSA/SHA1 Signature, key ID 60ae0e48: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:pgpool-II-pg10-4.1.0-2pgdg.rhel7 ################################# [100%]
postgres ALL=NOPASSWD: /sbin/ip
postgres ALL=NOPASSWD: /usr/sbin/arping
[root@postgresqlslaveone ~]#

After that we could finally install the PGPool as you can see above.

Now we start with the interesting part :)

Firstly,we have to do passwordless connection between all servers for root and postgresql users:

Allow passwordless connection for: root and postgres users

[all servers]# cd ~/.ssh
[all servers]# ssh-keygen -t rsa
[all servers]# ssh-copy-id -i id_rsa.pub root@postgresqlpgpool
[all servers]# ssh-copy-id -i id_rsa.pub root@postgresqlmaster
[all servers]# ssh-copy-id -i id_rsa.pub rootgres@postgresqlslaveone
[all servers]# ssh-copy-id -i id_rsa.pub rootgres@postgresqlslavetwo

[all servers]# su - postgres
[all servers]$ cd ~/.ssh
[all servers]$ ssh-keygen -t rsa
[all servers]$ ssh-copy-id -i id_rsa.pub postgres@postgresqlpgpool
[all servers]# ssh-copy-id -i id_rsa.pub postgres@postgresqlmaster
[all servers]$ ssh-copy-id -i id_rsa.pub postgres@postgresqlslaveone
[all servers]$ ssh-copy-id -i id_rsa.pub postgres@postgresqlslavetwo

Because of the security reasons, we create a user repl solely used for replication purpose, and a user pgpool for streaming replication delay check and health check of Pgpool-II.

Create user on the Master

 [postgresqlmaster]# psql -U postgres -p 5432
 postgres=# SET password_encryption = 'scram-sha-256';
 postgres=# CREATE ROLE pgpool WITH LOGIN;
 postgres=# \password pgpool
 postgres=# \password postgres

Then we have to configure the settings, which will be common for all servers:

Common settings in pgpool.conf

listen_addresses = '*'
sr_check_user = 'pgpool'
sr_check_password = ''
health_check_period = 5
health_check_timeout = 30
health_check_user = 'pgpool'
health_check_password = ''
health_check_max_retries = 3
backend_hostname0 = 'postgresqlmaster'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/10/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'postgresqlslaveone'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/10/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = 'postgresqlslavetwo'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/pgsql/10/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'postgresqlmaster'
backend_application_name1 = 'postgresqlslaveone'
backend_application_name2 = 'postgresqlslavetwo'
failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
follow_master_command = '/etc/pgpool-II/follow_master.sh %d %h %p %D %m %M %H %P %r %R'
recovery_user = 'postgres'
recovery_password = ''
recovery_1st_stage_command = 'recovery_1st_stage'
enable_pool_hba = on
use_watchdog = on
delegate_IP = '192.168.0.220'
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev enp0s3 label enp0s3:0'
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev enp0s3'
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I enp0s3'
if_cmd_path = '/sbin'
arping_path = '/usr/sbin'
log_destination = 'syslog'
# Where to log
# Valid values are combinations of stderr,
# and syslog. Default to stderr.
syslog_facility = 'LOCAL1'
# Syslog local facility. Default to LOCAL0

These values will be valid for all servers and now we have to specify the specific settings per host. These configurations include: Watch dog, healtcheck and others:

PostgreSQLMaster settings

wd_hostname = 'postgresqlmaster'
wd_port = 9000

# - Other pgpool Connection Settings -
other_pgpool_hostname0 = 'postgresqlslaveone'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'postgresqlslavetwo'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
heartbeat_destination0 = 'postgresqlslaveone'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
heartbeat_destination1 = 'postgresqlslavetwo'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''

PostgreSQLSlaveOne settings

wd_hostname = 'postgresqlslaveone'
wd_port = 9000
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'postgresqlmaster'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'postgresqlslavetwo'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
heartbeat_destination0 = 'postgresqlmaster'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
heartbeat_destination1 = 'postgresqlslavetwo'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''

PostgreSQLSLaveTwo settings

wd_hostname = 'postgresqlslavetwo'
wd_port = 9000
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'postgresqlmaster'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'postgresqlslaveone'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
heartbeat_destination0 = 'postgresqlmaster'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
heartbeat_destination1 = 'postgresqlslaveone'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''

All these settings are specified in the pgpool.conf file in /etc/pgpool-II After that is done, let's create the log destination:

Create log destination

[all servers]# mkdir /var/log/pgpool-II
[all servers]# touch /var/log/pgpool-II/pgpool.log
[all servers]# echo "LOCAL1.*                                                /var/log/pgpool-II/pgpool.log" >> /etc/rsyslog.conf
[all servers]# vi /etc/logrotate.d/syslog
...
/var/log/messages
/var/log/pgpool-II/pgpool.log
/var/log/secure
[all servers]# systemctl restart rsyslog

Since user authentication is required to use the PCP command, specify user name and md5 encrypted password in pcp.conf. Here we create the encrypted password for pgpool user, and add “username:encrypted password” in /etc/pgpool-II/pcp.conf.

PCP password

[all servers]# echo 'pgpool:'`pg_md5 PCP_passowrd` >> /etc/pgpool-II/pcp.conf

Since follow_master_command script has to execute PCP command without entering the password, we create .pcppass in the home directory of Pgpool-II startup user (root user).

[all servers]# echo 'localhost:9898:pgpool:pgpool' > ~/.pcppass
[all servers]# chmod 600 ~/.pcppass

In PGpool 4.0+ the only available password which can be used is either: AES encrypter or clear text. it is obvious to note that it isn't advised to use clear text password.

So we have to:

  1. Modify: pg_hba.conf and pool_hba.conf
  2. Encrypt our password
  3. Store the decryption key in the process owner home on all servers

To modify the pg_hba.conf and pool_hba.conf, you can use any text editor you want, but you should follow the syntax:

Edit hba conf files

host    all         pgpool           0.0.0.0/0          scram-sha-256
host    all         postgres         0.0.0.0/0          scram-sha-256

Once this is done, we have to provide encryption and decryption key for the password. As you know, AES is symmetric type of encryption which means it utilise the same key for encryption and decryption.

Create encryption/decryption key

[all servers]# echo 'some secret string' > ~/.pgpoolkey 
[all servers]# chmod 600 ~/.pgpoolkey

Encrypt the password

[all servers]# pg_enc -m -k /root/.pgpoolkey -u pgpool -p
db password: [pgpool user's password]
[all servers]# pg_enc -m -k /root/.pgpoolkey -u postgres -p
db password: [postgres user's passowrd]

# cat /etc/pgpool-II/pool_passwd 
pgpool:AESheq2ZMZjynddMWk5sKP/Rw==
postgres:AESHs/pWL5rtXy2IwuzroHfqg==

Please be sure you have done that all servers and you have the: .pgpoolkey on all servers with PGpool on the home directory of the user, which runs the service. (In our case: postgres). Otherwise you will have the following error:

No .pgpoolkey available

SCRAM authentication failed
unable to decrypt password from pool_passwd
verify the valid pool_key exists

P.S. Be sure to restart the PGPool after :) or use pg_ctl to reload the hba files.

We installed PGPool on top of existing replication, but PGPool can be used to restore and recover and thus create replication configuration itself. Let's follow, the process. Firstly, please create both files in the appendix on the following locations on postgresqlmaster: /var/lib/pgsql/10/data/recovery_1st_stage /var/lib/pgsql/10/data/pgpool_remote_start

Start PGPool

[all servers]# service pgpool start
Redirecting to /bin/systemctl start pgpool.service
[all servers]# service pgpool status
Redirecting to /bin/systemctl status pgpool.service
● pgpool.service - Pgpool-II
   Loaded: loaded (/usr/lib/systemd/system/pgpool.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-01-27 12:45:33 EST; 3s ago
  Process: 6190 ExecStop=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $STOP_OPTS stop (code=exited, status=0/SUCCESS)
 Main PID: 6196 (pgpool)
   CGroup: /system.slice/pgpool.service
           ├─6196 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n
           ├─6198 pgpool: watchdog
           ├─6199 pgpool: lifecheck
           ├─6200 pgpool: wait for connection request
           ├─6201 pgpool: wait for connection request
           ├─6202 pgpool: wait for connection request
           ├─6203 pgpool: wait for connection request
           ├─6204 pgpool: wait for connection request
           ├─6205 pgpool: wait for connection request
           ├─6206 pgpool: wait for connection request
           ├─6207 pgpool: wait for connection request
           ├─6208 pgpool: wait for connection request
           ├─6209 pgpool: wait for connection request
           ├─6210 pgpool: wait for connection request
           ├─6211 pgpool: wait for connection request
           ├─6212 pgpool: wait for connection request
           ├─6213 pgpool: wait for connection request
           ├─6214 pgpool: wait for connection request
           ├─6215 pgpool: wait for connection request
           ├─6216 pgpool: wait for connection request
           ├─6217 pgpool: wait for connection request
           ├─6218 pgpool: wait for connection request
           ├─6219 pgpool: wait for connection request
           ├─6220 pgpool: wait for connection request
           ├─6221 pgpool: wait for connection request
           ├─6222 pgpool: wait for connection request
           ├─6223 pgpool: wait for connection request
           ├─6224 pgpool: wait for connection request
           ├─6225 pgpool: wait for connection request
           ├─6226 pgpool: wait for connection request
           ├─6227 pgpool: wait for connection request
           ├─6228 pgpool: wait for connection request
           ├─6229 pgpool: wait for connection request
           ├─6230 pgpool: wait for connection request
           ├─6231 pgpool: wait for connection request
           ├─6232 pgpool: PCP: wait for connection request
           ├─6233 pgpool: worker process
           ├─6234 pgpool: health check process(0)
           ├─6235 pgpool: health check process(1)
           ├─6236 pgpool: health check process(2)
           ├─6237 pgpool: heartbeat receiver
           ├─6238 pgpool: heartbeat sender
           ├─6239 pgpool: heartbeat receiver
           └─6240 pgpool: heartbeat sender

Jan 27 12:45:35 postgresqlslavetwo pgpool[6238]: [10-2] 2020-01-27 12:45:35: pid 6238: DETAIL:  set SO_REUSEPORT
Jan 27 12:45:35 postgresqlslavetwo pgpool[6237]: [9-1] 2020-01-27 12:45:35: pid 6237: LOG:  set SO_REUSEPORT option to the socket
Jan 27 12:45:35 postgresqlslavetwo pgpool[6237]: [10-1] 2020-01-27 12:45:35: pid 6237: LOG:  creating watchdog heartbeat receive socket.
Jan 27 12:45:35 postgresqlslavetwo pgpool[6237]: [10-2] 2020-01-27 12:45:35: pid 6237: DETAIL:  set SO_REUSEPORT
Jan 27 12:45:35 postgresqlslavetwo pgpool[6239]: [9-1] 2020-01-27 12:45:35: pid 6239: LOG:  set SO_REUSEPORT option to the socket
Jan 27 12:45:35 postgresqlslavetwo pgpool[6239]: [10-1] 2020-01-27 12:45:35: pid 6239: LOG:  creating watchdog heartbeat receive socket.
Jan 27 12:45:35 postgresqlslavetwo pgpool[6239]: [10-2] 2020-01-27 12:45:35: pid 6239: DETAIL:  set SO_REUSEPORT
Jan 27 12:45:35 postgresqlslavetwo pgpool[6240]: [9-1] 2020-01-27 12:45:35: pid 6240: LOG:  set SO_REUSEPORT option to the socket
Jan 27 12:45:35 postgresqlslavetwo pgpool[6240]: [10-1] 2020-01-27 12:45:35: pid 6240: LOG:  creating socket for sending heartbeat
Jan 27 12:45:35 postgresqlslavetwo pgpool[6240]: [10-2] 2020-01-27 12:45:35: pid 6240: DETAIL:  set SO_REUSEPORT
[root@postgresqlslavetwo pgpool-II]#

Restore the database using the VIP:

Start PGPool

# pcp_recovery_node -h 192.168.0.220 -p 9898 -U pgpool-n 1
Password: 
pcp_recovery_node -- Command Successful

# pcp_recovery_node -h 192.168.0.220 -p 9898 -U pgpool -n 2
Password: 
pcp_recovery_node -- Command Successful

After executing pcp_recovery_node command, vertify that postgresqlslaveone and postgresqlslavetwo are started as PostgreSQL standby server.

Verify PGPool

# psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
Password for user postgres
node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0       | postgresqlmaster  | 5432 | up     | 0.333333  | primary | 0          | false             | 0                 |                   |                        | 2020-01-28 05:18:09
1       | postgresqlslaveone  | 5432 | up     | 0.333333  | standby | 0          | true              | 0                 | streaming         | async                  | 2020-01-28 05:18:09
2       | postgresqlslavetwo  | 5432 | up     | 0.333333  | standby | 0          | false             | 0                 | streaming         | async                  | 2020-01-28 05:18:09
(3 rows)

You can also verify the Watchdog deamon as follows:

Verify WatchDog

[root@postgresqlmaster ~]# pcp_watchdog_info -p 9898 -h 192.168.0.220 -U postgres
Password:
3 YES postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster

postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster 9999 9000 4 MASTER
postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone 9999 9000 7 STANDBY
postgresqlslavetwo:9999 Linux postgresqlslavetwo postgresqlslavetwo 9999 9000 7 STANDBY
[root@postgresqlmaster ~]#

recovery_1st_stage

#!/bin/bash
# This script is executed by "recovery_1st_stage" to recovery a Standby node.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

PRIMARY_NODE_PGDATA="$1"
DEST_NODE_HOST="$2"
DEST_NODE_PGDATA="$3"
PRIMARY_NODE_PORT="$4"
DEST_NODE_ID="$5"
DEST_NODE_PORT="$6"

PRIMARY_NODE_HOST=$(hostname)
PGHOME=/usr/pgsql-11
ARCHIVEDIR=/var/lib/pgsql/archivedir
REPLUSER=repl

logger -i -p local1.info recovery_1st_stage: start: pg_basebackup for Standby node $DEST_NODE_ID

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info recovery_1st_stage: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`
if [ $PGVERSION -ge 12 ]; then
    RECOVERYCONF=${DEST_NODE_PGDATA}/myrecovery.conf
else
    RECOVERYCONF=${DEST_NODE_PGDATA}/recovery.conf
fi

## Create replication slot "${DEST_NODE_HOST}"
${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_create_physical_replication_slot('${DEST_NODE_HOST}');
EOQ

## Execute pg_basebackup to recovery Standby node
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "

    set -o errexit

    rm -rf $DEST_NODE_PGDATA
    rm -rf $ARCHIVEDIR/*

    ${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X stream

    if [ ${PGVERSION} -ge 12 ]; then
        sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
               -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.conf
    fi

    cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPLUSER} application_name=${DEST_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${DEST_NODE_HOST}'
EOT

    if [ ${PGVERSION} -ge 12 ]; then
            touch ${DEST_NODE_PGDATA}/standby.signal
    else
            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
    fi

    sed -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${DEST_NODE_PGDATA}/postgresql.conf
"

if [ $? -ne 0 ]; then

    ${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_drop_replication_slot('${DEST_NODE_HOST}');
EOQ

    logger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failed
    exit 1
fi

logger -i -p local1.info recovery_1st_stage: end: recovery_1st_stage complete
exit 0

pgpool_remote_start

#!/bin/bash
# This script is run after recovery_1st_stage to start Standby node.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

PGHOME=/usr/pgsql-11
DEST_NODE_HOST="$1"
DEST_NODE_PGDATA="$2"


logger -i -p local1.info pgpool_remote_start: start: remote start Standby node $DEST_NODE_HOST

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info pgpool_remote_start: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## Start Standby node
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "
    $PGHOME/bin/pg_ctl -l /dev/null -w -D $DEST_NODE_PGDATA start
"

if [ $? -ne 0 ]; then
    logger -i -p local1.error pgpool_remote_start: $DEST_NODE_HOST PostgreSQL start failed.
    exit 1
fi

logger -i -p local1.info pgpool_remote_start: end: $DEST_NODE_HOST PostgreSQL started successfully.
exit 0

/etc/pgpool-II/failover.sh

#!/bin/bash
# This script is run by failover_command.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

# Special values:
#   %d = failed node id
#   %h = failed node hostname
#   %p = failed node port number
#   %D = failed node database cluster path
#   %m = new master node id
#   %H = new master node hostname
#   %M = old master node id
#   %P = old primary node id
#   %r = new master port number
#   %R = new master database cluster path
#   %N = old primary node hostname
#   %S = old primary node port number
#   %% = '%' character

FAILED_NODE_ID="$1"
FAILED_NODE_HOST="$2"
FAILED_NODE_PORT="$3"
FAILED_NODE_PGDATA="$4"
NEW_MASTER_NODE_ID="$5"
NEW_MASTER_NODE_HOST="$6"
OLD_MASTER_NODE_ID="$7"
OLD_PRIMARY_NODE_ID="$8"
NEW_MASTER_NODE_PORT="$9"
NEW_MASTER_NODE_PGDATA="${10}"
OLD_PRIMARY_NODE_HOST="${11}"
OLD_PRIMARY_NODE_PORT="${12}"

PGHOME=/usr/pgsql-11


logger -i -p local1.info failover.sh: start: failed_node_id=$FAILED_NODE_ID old_primary_node_id=$OLD_PRIMARY_NODE_ID failed_host=$FAILED_NODE_HOST new_master_host=$NEW_MASTER_NODE_HOST

## If there's no master node anymore, skip failover.
if [ $NEW_MASTER_NODE_ID -lt 0 ]; then
    logger -i -p local1.info failover.sh: All nodes are down. Skipping failover.
	exit 0
fi

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info failover.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## If Standby node is down, skip failover.
if [ $FAILED_NODE_ID -ne $OLD_PRIMARY_NODE_ID ]; then
    logger -i -p local1.info failover.sh: Standby node is down. Skipping failover.

    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$OLD_PRIMARY_NODE_HOST -i ~/.ssh/id_rsa_pgpool "
        ${PGHOME}/bin/psql -p $OLD_PRIMARY_NODE_PORT -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
    "

    if [ $? -ne 0 ]; then
        logger -i -p local1.error failover.sh: drop replication slot "${FAILED_NODE_HOST}" failed
        exit 1
    fi

    exit 0
fi

## Promote Standby node.
logger -i -p local1.info failover.sh: Primary node is down, promote standby node ${NEW_MASTER_NODE_HOST}.

ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
    postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -D ${NEW_MASTER_NODE_PGDATA} -w promote

if [ $? -ne 0 ]; then
    logger -i -p local1.error failover.sh: new_master_host=$NEW_MASTER_NODE_HOST promote failed
    exit 1
fi

logger -i -p local1.info failover.sh: end: new_master_node_id=$NEW_MASTER_NODE_ID started as the primary node
exit 0

/etc/pgpool-II/follow_master.sh

#!/bin/bash
# This script is run after failover_command to synchronize the Standby with the new Primary.
# First try pg_rewind. If pg_rewind failed, use pg_basebackup.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

# Special values:
#   %d = failed node id
#   %h = failed node hostname
#   %p = failed node port number
#   %D = failed node database cluster path
#   %m = new master node id
#   %H = new master node hostname
#   %M = old master node id
#   %P = old primary node id
#   %r = new master port number
#   %R = new master database cluster path
#   %N = old primary node hostname
#   %S = old primary node port number
#   %% = '%' character

FAILED_NODE_ID="$1"
FAILED_NODE_HOST="$2"
FAILED_NODE_PORT="$3"
FAILED_NODE_PGDATA="$4"
NEW_MASTER_NODE_ID="$5"
OLD_MASTER_NODE_ID="$6"
NEW_MASTER_NODE_HOST="$7"
OLD_PRIMARY_NODE_ID="$8"
NEW_MASTER_NODE_PORT="$9"
NEW_MASTER_NODE_PGDATA="${10}"

PGHOME=/usr/pgsql-11
ARCHIVEDIR=/var/lib/pgsql/archivedir
REPLUSER=repl
PCP_USER=pgpool
PGPOOL_PATH=/usr/bin
PCP_PORT=9898

logger -i -p local1.info follow_master.sh: start: Standby node ${FAILED_NODE_ID}

## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info follow_master.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
    exit 1
fi

## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`

if [ $PGVERSION -ge 12 ]; then
RECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.conf
else
RECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conf
fi

## Check the status of Standby
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -w -D ${FAILED_NODE_PGDATA} status


## If Standby is running, synchronize it with the new Primary.
if [ $? -eq 0 ]; then

    logger -i -p local1.info follow_master.sh: pg_rewind for $FAILED_NODE_ID

    # Create replication slot "${FAILED_NODE_HOST}"
    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
        ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${FAILED_NODE_HOST}');\"
    "

    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "

        set -o errexit

        ${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stop

        cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${FAILED_NODE_HOST}'
EOT

        if [ ${PGVERSION} -ge 12 ]; then
            touch ${FAILED_NODE_PGDATA}/standby.signal
        else
            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
        fi

        ${PGHOME}/bin/pg_rewind -D ${FAILED_NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\"

    "

    if [ $? -ne 0 ]; then
        logger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup.

        ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
             
            set -o errexit

            # Execute pg_basebackup
            rm -rf ${FAILED_NODE_PGDATA}
            rm -rf ${ARCHIVEDIR}/*
            ${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X stream

            if [ ${PGVERSION} -ge 12 ]; then
                sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
                       -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.conf
            fi
     
            cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${FAILED_NODE_HOST}'
EOT

            if [ ${PGVERSION} -ge 12 ]; then
                    touch ${FAILED_NODE_PGDATA}/standby.signal
            else
                    echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
            fi
        "

        if [ $? -ne 0 ]; then
            # drop replication slot
            ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
                ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
            "

            logger -i -p local1.error follow_master.sh: end: pg_basebackup failed
            exit 1
        fi
    fi

    # start Standby node on ${FAILED_NODE_HOST}
    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
            postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool $PGHOME/bin/pg_ctl -l /dev/null -w -D ${FAILED_NODE_PGDATA} start

    # If start Standby successfully, attach this node
    if [ $? -eq 0 ]; then

        # Run pcp_attact_node to attach Standby node to Pgpool-II.
        ${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID}

        if [ $? -ne 0 ]; then
                logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed
                exit 1
        fi

    # If start Standby failed, drop replication slot "${FAILED_NODE_HOST}"
    else

        ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \
        ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')"

        logger -i -p local1.error follow_master.sh: end: follow master command failed
        exit 1
    fi

else
    logger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master command
    exit 0
fi

logger -i -p local1.info follow_master.sh: end: follow master command complete
exit 0
  • postgresql_replication_ha.1580215048.txt.gz
  • Last modified: 2020/01/28 20:37
  • (external edit)