Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
postgresql_replication_ha [2020/01/28 12:24] – [Check the configuration] andonovjpostgresql_replication_ha [2024/11/09 19:13] (current) andonovj
Line 30: Line 30:
   * postgresqlpgpool VIP - 192.168.0.220   * postgresqlpgpool VIP - 192.168.0.220
  
- +=====Configure=====
-=====Setup=====+
 PGpool has a current stable version of 4.1. So let's use that one.  PGpool has a current stable version of 4.1. So let's use that one. 
  
Line 45: Line 44:
   * libmemcached   * libmemcached
  
-<sxh bash>+<Code:none| Download required Libraries>
 Using username "root". Using username "root".
 Last login: Thu Jan 23 10:06:12 2020 Last login: Thu Jan 23 10:06:12 2020
Line 73: Line 72:
  
 [root@postgresqlslaveone ~]# [root@postgresqlslaveone ~]#
-</sxh>+</Code>
  
 After that we can install them After that we can install them
  
 =====Install packages===== =====Install packages=====
-<sxh bash>+<Code:none | Install>
 [root@postgresqlslaveone ~]# yum install libevent [root@postgresqlslaveone ~]# yum install libevent
 Loaded plugins: fastestmirror, langpacks Loaded plugins: fastestmirror, langpacks
Line 139: Line 138:
 postgres ALL=NOPASSWD: /usr/sbin/arping postgres ALL=NOPASSWD: /usr/sbin/arping
 [root@postgresqlslaveone ~]# [root@postgresqlslaveone ~]#
-</sxh>+</Code>
  
 After that we could finally install the PGPool as you can see above. After that we could finally install the PGPool as you can see above.
  
  
-=====Configuration=====+=====Setup=====
 Now we start with the interesting part :) Now we start with the interesting part :)
  
Line 386: Line 385:
  
  
-=====Check the configuration=====+=====Setup Replication=====
 We installed PGPool on top of existing replication, but PGPool can be used to restore and recover and thus create replication configuration itself. We installed PGPool on top of existing replication, but PGPool can be used to restore and recover and thus create replication configuration itself.
 Let's follow, the process. Firstly, please create both files in the appendix on the following locations on postgresqlmaster: Let's follow, the process. Firstly, please create both files in the appendix on the following locations on postgresqlmaster:
Line 392: Line 391:
 /var/lib/pgsql/10/data/pgpool_remote_start /var/lib/pgsql/10/data/pgpool_remote_start
  
-Start the Pgpool service:+====Start the Pgpool service====
 <Code:none |Start PGPool> <Code:none |Start PGPool>
-[postgresqlmaster]# systemctl start pgpool.service 
-</Code> 
- 
-Restore the database using the VIP: 
-<Code:none |Start PGPool> 
-# pcp_recovery_node -h 192.168.0.220 -p 9898 -U pgpool-n 1 
-Password:  
-pcp_recovery_node -- Command Successful 
- 
-# pcp_recovery_node -h 192.168.0.220 -p 9898 -U pgpool -n 2 
-Password:  
-pcp_recovery_node -- Command Successful 
-    
-</Code> 
- 
- 
-====Startup==== 
-<Code:none | Start the PGPool> 
 [all servers]# service pgpool start [all servers]# service pgpool start
 Redirecting to /bin/systemctl start pgpool.service Redirecting to /bin/systemctl start pgpool.service
Line 480: Line 461:
 </Code> </Code>
  
 +Restore the database using the VIP:
 +<Code:none |Start PGPool>
 +# pcp_recovery_node -h 192.168.0.220 -p 9898 -U pgpool-n 1
 +Password: 
 +pcp_recovery_node -- Command Successful
  
 +# pcp_recovery_node -h 192.168.0.220 -p 9898 -U pgpool -n 2
 +Password: 
 +pcp_recovery_node -- Command Successful
 +</Code>
 +
 +
 +=====Failover=====
 +I was going to kill myself couple times, seriously. I was head banging for at least couple weeks because I Couldn't make PostgreSQL failover. So let me tell you couple things which I found out THE VERY HARD and PAINFUL way. 
 +
 +With PgPool, you can either use postgresql replication to migrate you the data, OR Pgpool replication. By default, when you install pgpool, it will be the FIRST thing you install, HOWEVER in our case was the second, so there is a need of a little modification.
 +
 +Please ENSURE the following parameters are set on the nodes:
 +
 +<code:none|Pgpool.conf>
 +master_slave_mode = on
 +master_slave_sub_mode = 'stream'
 +</code>
 +
 +And the following ones, turned off:
 +
 +<code:none|Pgpool.conf>
 +-bash-4.2$ cat pgpool.conf | grep replication
 +replication_mode = off
 +                                   # Activate replication mode
 +                                   # when in replication mode
 +                                   # replication mode, specify table name to
 +
 +</code>
 +As the two settings are mutually exclusive. Only one can be active at a time.
 +
 +This indicates that you ALREADY have streaming replication and that you take care of it.
 +
 +====Current State====
 +After that, let's see our current state of the cluster:
 +
 +<code:none|Current State>
 +[root@postgresqlslaveone tmp]#  psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
 +Password for user postgres:
 + node_id |      hostname      | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
 +---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 +       | postgresqlmaster   | 5432 | up     | 0.500000  | primary | 0          | true              | 0                                                          | 2020-02-07 10:12:26
 +       | postgresqlslaveone | 5432 | up     | 0.500000  | standby | 0          | false             | 0                                                          | 2020-02-07 10:12:26
 +(2 rows)
 +
 +[root@postgresqlslaveone tmp]
 +</code>
 +
 +That clearly states that the postgresqlmaster is the master and postgresqlslaveone is the slave :) I know, stupid naming but bare with me :)
 +
 +====Database Failover====
 +So what happens after I shutdown the first database:
 +
 +<code:none|Shutdown Master Database>
 +[root@postgresqlmaster pgpool-II]# su - postgres
 +Last login: Fri Feb  7 10:01:53 EST 2020 on pts/3
 +-bash-4.2$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ -l logfile stop
 +waiting for server to shut down..... done
 +server stopped
 +-bash-4.2$
 +</code>
 +
 +Well...what can happen, the database will go down of course :)
 +
 +<code:none|Shutdown Master Database>
 +2020-02-07 10:14:36.882 EST [15508] LOG:  received fast shutdown request
 +2020-02-07 10:14:36.907 EST [15508] LOG:  aborting any active transactions
 +2020-02-07 10:14:36.909 EST [15508] LOG:  worker process: logical replication launcher (PID 15517) exited with exit code 1
 +2020-02-07 10:14:36.909 EST [15511] LOG:  shutting down
 +2020-02-07 10:14:38.176 EST [15508] LOG:  database system is shut down
 +</code>
 +
 +
 +And even more shokingly the PGPool will FINALLY recognize it:
 +<code:none|PGPool's Reaction on Master>
 +10:15:09 postgresqlmaster pgpool[16840]: [10-1] 2020-02-07 10:15:09: pid 16840: LOG:  failed to connect to PostgreSQL server on "postgresqlmaster:5432", getsockopt() detected error "Connection refused"
 +Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [11-1] 2020-02-07 10:15:09: pid 16840: LOG:  received degenerate backend request for node_id: 0 from pid [16840]
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [26-1] 2020-02-07 10:15:09: pid 16802: LOG:  new IPC connection received
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [27-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog received the failover command from local pgpool-II on IPC interface
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [28-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-1] 2020-02-07 10:15:09: pid 16802: LOG:  we do not need quorum to hold to proceed with failover
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-2] 2020-02-07 10:15:09: pid 16802: DETAIL:  proceeding with the failover
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-3] 2020-02-07 10:15:09: pid 16802: HINT:  failover_when_quorum_exists is set to false
 +Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [12-1] 2020-02-07 10:15:09: pid 16840: FATAL:  failed to create a backend connection
 +Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [12-2] 2020-02-07 10:15:09: pid 16840: DETAIL:  executing failover on backend
 +Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [15-1] 2020-02-07 10:15:09: pid 16800: LOG:  Pgpool-II parent process has received failover request
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [30-1] 2020-02-07 10:15:09: pid 16802: LOG:  new IPC connection received
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [31-1] 2020-02-07 10:15:09: pid 16802: LOG:  received the failover indication from Pgpool-II on IPC interface
 +Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [32-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog is informed of failover start by the main process
 +Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [16-1] 2020-02-07 10:15:09: pid 16800: LOG:  starting degeneration. shutdown host postgresqlmaster(5432)
 +Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [17-1] 2020-02-07 10:15:09: pid 16800: LOG:  Restart all children
 +Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [18-1] 2020-02-07 10:15:09: pid 16800: LOG:  execute command: /etc/pgpool-II/failover.sh 0 postgresqlmaster 5432 /var/lib/pgsql/10/data 1 postgresqlslaveone 0 0 5432 /var/lib/pgsql/10/data postgresqlmaster 5432
 +Feb  7 10:15:11 postgresqlmaster postgres[17112]: follow_master.sh: start: Standby node 0
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_ID=0
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_HOST=postgresqlmaster
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_PORT=5432
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_PGDATA=/var/lib/pgsql/10/data
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_ID=1
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + OLD_MASTER_NODE_ID=0
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_HOST=postgresqlslaveone
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + OLD_PRIMARY_NODE_ID=0
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_PORT=5432
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_PGDATA=/var/lib/pgsql/10/data
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGHOME=/usr/pgsql-10
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ARCHIVEDIR=/walshared
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + REPLUSER=repl
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PCP_USER=postgres
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGPOOL_PATH=/usr/bin
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PCP_PORT=9898
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + logger -i -p local1.info follow_master.sh: start: Standby node 0
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@postgresqlslaveone -i /var/lib/pgsql/.ssh/id_rsa ls /tmp
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: Warning: Permanently added 'postgresqlslaveone,192.168.0.199' (ECDSA) to the list of known hosts.
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + '[' 0 -ne 0 ']'
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ /usr/pgsql-10/bin/initdb -V
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ awk '{print $3}'
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ sed 's/\..*//'
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGVERSION=10
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + '[' 10 -ge 12 ']'
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + RECOVERYCONF=/var/lib/pgsql/10/data/recovery.conf
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@postgresqlmaster -i /var/lib/pgsql/.ssh/id_rsa /usr/pgsql-10/bin/pg_ctl -w -D /var/lib/pgsq
 +l/10/data status
 +Feb  7 10:15:11 postgresqlmaster postgres[17113]: Warning: Permanently added 'postgresqlmaster,192.168.0.178' (ECDSA) to the list of known hosts.
 +</code>
 +
 +FINALLY, the failover script was executed after countless blood stains on the wall
 +
 +
 +You can see also that on the new master node's database:
 +
 +<code:none|Database Reaction on Slave(new master)>
 +2020-01-28 14:40:19.701 EST [22926] LOG:  received promote request
 +2020-01-28 14:40:19.728 EST [22926] LOG:  redo done at 0/1C000028
 +cp: cannot stat ‘/walshared/00000001000000000000001C’: No such file or directory
 +cp: cannot stat ‘/walshared/00000002.history’: No such file or directory
 +2020-01-28 14:40:19.767 EST [22926] LOG:  selected new timeline ID: 2
 +2020-01-28 14:40:19.876 EST [22926] LOG:  archive recovery complete
 +cp: cannot stat ‘/walshared/00000001.history’: No such file or directory
 +2020-01-28 14:40:20.012 EST [22924] LOG:  database system is ready to accept connections
 +</code>
 +
 +
 +
 +====PGPool Failover====
 +Now, on the slave (new master) you won't see anything on the Pgpool until you don't shutdown the pgpool too. Because usually when a master fails over, the entire server is dead :)
 +
 +<code:none|Shutdown PGPool on Master(old)>
 +[root@postgresqlmaster pgpool-II]# service pgpool stop
 +Redirecting to /bin/systemctl stop pgpool.service
 +[root@postgresqlmaster pgpool-II]#
 +</code>
 +
 +Once you do, you will recieve the migrate of the VIP as well :)
 +<code:none|PGPool's Reaction on Slave>
 +Jan 28 15:33:47 postgresqlslaveone pgpool[25956]: [50-1] 2020-01-28 15:33:47: pid 25956: LOG:  watchdog node state changed from [STANDBY] to [JOINING]
 +Jan 28 15:33:51 postgresqlslaveone pgpool[25956]: [51-1] 2020-01-28 15:33:51: pid 25956: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
 +Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [52-1] 2020-01-28 15:33:52: pid 25956: LOG:  I am the only alive node in the watchdog cluster
 +Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [52-2] 2020-01-28 15:33:52: pid 25956: HINT:  skipping stand for coordinator state
 +Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [53-1] 2020-01-28 15:33:52: pid 25956: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]
 +Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [54-1] 2020-01-28 15:33:52: pid 25956: LOG:  I am announcing my self as master/coordinator watchdog node
 +Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [55-1] 2020-01-28 15:33:56: pid 25956: LOG:  I am the cluster leader node
 +Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [55-2] 2020-01-28 15:33:56: pid 25956: DETAIL:  our declare coordinator message is accepted by all nodes
 +Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [56-1] 2020-01-28 15:33:56: pid 25956: LOG:  setting the local node "postgresqlslaveone:9999 Linux postgresqlslaveone" as watchdog cluster master
 +Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [57-1] 2020-01-28 15:33:56: pid 25956: LOG:  I am the cluster leader node. Starting escalation process
 +Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [58-1] 2020-01-28 15:33:56: pid 25956: LOG:  escalation process started with PID:29053
 +Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [59-1] 2020-01-28 15:33:56: pid 25956: LOG:  new IPC connection received
 +Jan 28 15:33:56 postgresqlslaveone pgpool[29053]: [58-1] 2020-01-28 15:33:56: pid 29053: LOG:  watchdog: escalation started
 +Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [24-1] 2020-01-28 15:33:59: pid 26021: LOG:  forked new pcp worker, pid=29066 socket=8
 +Jan 28 15:33:59 postgresqlslaveone pgpool[25956]: [60-1] 2020-01-28 15:33:59: pid 25956: LOG:  new IPC connection received
 +Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [25-1] 2020-01-28 15:33:59: pid 26021: LOG:  PCP process with pid: 29066 exit with SUCCESS.
 +Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [26-1] 2020-01-28 15:33:59: pid 26021: LOG:  PCP process with pid: 29066 exits with status 0
 +Jan 28 15:34:00 postgresqlslaveone pgpool[29053]: [59-1] 2020-01-28 15:34:00: pid 29053: LOG:  successfully acquired the delegate IP:"192.168.0.220"
 +Jan 28 15:34:00 postgresqlslaveone pgpool[29053]: [59-2] 2020-01-28 15:34:00: pid 29053: DETAIL:  'if_up_cmd' returned with success
 +Jan 28 15:34:00 postgresqlslaveone pgpool[25956]: [61-1] 2020-01-28 15:34:00: pid 25956: LOG:  watchdog escalation process with pid: 29053 exit with SUCCESS.
 +Jan 28 15:34:16 postgresqlslaveone pgpool[26293]: [31-1] 2020-01-28 15:34:16: pid 26293: LOG:  pool_reuse_block: blockid: 0
 +Jan 28 15:34:16 postgresqlslaveone pgpool[26293]: [31-2] 2020-01-28 15:34:16: pid 26293: CONTEXT:  while searching system catalog, When relcache is missed
 +Jan 28 15:34:19 postgresqlslaveone pgpool[25985]: [9-1] 2020-01-28 15:34:19: pid 25985: LOG:  informing the node status change to watchdog
 +Jan 28 15:34:19 postgresqlslaveone pgpool[25985]: [9-2] 2020-01-28 15:34:19: pid 25985: DETAIL:  node id :1 status = "NODE DEAD" message:"No heartbeat signal from node"
 +Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [62-1] 2020-01-28 15:34:19: pid 25956: LOG:  new IPC connection received
 +Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [63-1] 2020-01-28 15:34:19: pid 25956: LOG:  received node status change ipc message
 +Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [63-2] 2020-01-28 15:34:19: pid 25956: DETAIL:  No heartbeat signal from node
 +Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [64-1] 2020-01-28 15:34:19: pid 25956: LOG:  remote node "postgresqlmaster:9999 Linux postgresqlmaster" is shutting down
 +</code>
 +
 +====After failover====
 +After all this is done, we can check the new status of the cluster :)
 +
 +<code:none|After Failover>
 +[root@postgresqlslaveone tmp]#  pcp_watchdog_info -p 9898 -h 192.168.0.220 -U postgres
 +Password:
 +2 YES postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone
 +
 +postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone 9999 9000 4 MASTER
 +postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster 9999 9000 10 SHUTDOWN
 +[root@postgresqlslaveone tmp]#  psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
 +Password for user postgres:
 + node_id |      hostname      | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
 +---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 +       | postgresqlmaster   | 5432 | down   | 0.500000  | standby | 0          | false             | 0                                                          | 2020-01-28 14:40:20
 +       | postgresqlslaveone | 5432 | up     | 0.500000  | primary | 0          | true              | 0                                                          | 2020-01-28 15:34:16
 +(2 rows)
 +
 +[root@postgresqlslaveone tmp]#
 +</code>
 +
 +
 +====Verify====
 +After executing pcp_recovery_node command, vertify that postgresqlslaveone  and postgresqlslavetwo  are started as PostgreSQL standby server.
 +
 +<Code:none |Verify PGPool>
 +# psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
 +Password for user postgres
 +node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
 +---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 +0       | postgresqlmaster  | 5432 | up     | 0.333333  | primary | 0          | false             | 0                                                          | 2020-01-28 05:18:09
 +1       | postgresqlslaveone  | 5432 | up     | 0.333333  | standby | 0          | true              | 0                 | streaming         | async                  | 2020-01-28 05:18:09
 +2       | postgresqlslavetwo  | 5432 | up     | 0.333333  | standby | 0          | false             | 0                 | streaming         | async                  | 2020-01-28 05:18:09
 +(3 rows)
 +</Code>
 +
 +You can also verify the Watchdog deamon as follows:
 +
 +<Code:none |Verify WatchDog>
 +[root@postgresqlmaster ~]# pcp_watchdog_info -p 9898 -h 192.168.0.220 -U postgres
 +Password:
 +3 YES postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster
 +
 +postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster 9999 9000 4 MASTER
 +postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone 9999 9000 7 STANDBY
 +postgresqlslavetwo:9999 Linux postgresqlslavetwo postgresqlslavetwo 9999 9000 7 STANDBY
 +[root@postgresqlmaster ~]#
 +</Code>
 =====Appendix===== =====Appendix=====
 <Code:none |recovery_1st_stage> <Code:none |recovery_1st_stage>
Line 497: Line 714:
  
 PRIMARY_NODE_HOST=$(hostname) PRIMARY_NODE_HOST=$(hostname)
-PGHOME=/usr/pgsql-11+PGHOME=/usr/pgsql-10
 ARCHIVEDIR=/var/lib/pgsql/archivedir ARCHIVEDIR=/var/lib/pgsql/archivedir
 REPLUSER=repl REPLUSER=repl
Line 576: Line 793:
 exec > >(logger -i -p local1.info) 2>&1 exec > >(logger -i -p local1.info) 2>&1
  
-PGHOME=/usr/pgsql-11+PGHOME=/usr/pgsql-10
 DEST_NODE_HOST="$1" DEST_NODE_HOST="$1"
 DEST_NODE_PGDATA="$2" DEST_NODE_PGDATA="$2"
Line 603: Line 820:
 logger -i -p local1.info pgpool_remote_start: end: $DEST_NODE_HOST PostgreSQL started successfully. logger -i -p local1.info pgpool_remote_start: end: $DEST_NODE_HOST PostgreSQL started successfully.
 exit 0 exit 0
 +</Code>
 +
 +
 +<Code:none |/etc/pgpool-II/failover.sh>
 +#!/bin/bash
 +# This script is run by failover_command.
 +
 +set -o xtrace
 +exec > >(logger -i -p local1.info) 2>&1
 +
 +# Special values:
 +#   %d = failed node id
 +#   %h = failed node hostname
 +#   %p = failed node port number
 +#   %D = failed node database cluster path
 +#   %m = new master node id
 +#   %H = new master node hostname
 +#   %M = old master node id
 +#   %P = old primary node id
 +#   %r = new master port number
 +#   %R = new master database cluster path
 +#   %N = old primary node hostname
 +#   %S = old primary node port number
 +#   %% = '%' character
 +
 +FAILED_NODE_ID="$1"
 +FAILED_NODE_HOST="$2"
 +FAILED_NODE_PORT="$3"
 +FAILED_NODE_PGDATA="$4"
 +NEW_MASTER_NODE_ID="$5"
 +NEW_MASTER_NODE_HOST="$6"
 +OLD_MASTER_NODE_ID="$7"
 +OLD_PRIMARY_NODE_ID="$8"
 +NEW_MASTER_NODE_PORT="$9"
 +NEW_MASTER_NODE_PGDATA="${10}"
 +OLD_PRIMARY_NODE_HOST="${11}"
 +OLD_PRIMARY_NODE_PORT="${12}"
 +
 +PGHOME=/usr/pgsql-10
 +
 +
 +logger -i -p local1.info failover.sh: start: failed_node_id=$FAILED_NODE_ID old_primary_node_id=$OLD_PRIMARY_NODE_ID failed_host=$FAILED_NODE_HOST new_master_host=$NEW_MASTER_NODE_HOST
 +
 +## If there's no master node anymore, skip failover.
 +if [ $NEW_MASTER_NODE_ID -lt 0 ]; then
 +    logger -i -p local1.info failover.sh: All nodes are down. Skipping failover.
 + exit 0
 +fi
 +
 +## Test passwrodless SSH
 +ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null
 +
 +if [ $? -ne 0 ]; then
 +    logger -i -p local1.info failover.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
 +    exit 1
 +fi
 +
 +## If Standby node is down, skip failover.
 +if [ $FAILED_NODE_ID -ne $OLD_PRIMARY_NODE_ID ]; then
 +    logger -i -p local1.info failover.sh: Standby node is down. Skipping failover.
 +
 +    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$OLD_PRIMARY_NODE_HOST -i ~/.ssh/id_rsa_pgpool "
 +        ${PGHOME}/bin/psql -p $OLD_PRIMARY_NODE_PORT -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
 +    "
 +
 +    if [ $? -ne 0 ]; then
 +        logger -i -p local1.error failover.sh: drop replication slot "${FAILED_NODE_HOST}" failed
 +        exit 1
 +    fi
 +
 +    exit 0
 +fi
 +
 +## Promote Standby node.
 +logger -i -p local1.info failover.sh: Primary node is down, promote standby node ${NEW_MASTER_NODE_HOST}.
 +
 +ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
 +    postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -D ${NEW_MASTER_NODE_PGDATA} -w promote
 +
 +if [ $? -ne 0 ]; then
 +    logger -i -p local1.error failover.sh: new_master_host=$NEW_MASTER_NODE_HOST promote failed
 +    exit 1
 +fi
 +
 +logger -i -p local1.info failover.sh: end: new_master_node_id=$NEW_MASTER_NODE_ID started as the primary node
 +exit 0
 +</Code>
 +
 +<Code:none |/etc/pgpool-II/follow_master.sh>
 +#!/bin/bash
 +# This script is run after failover_command to synchronize the Standby with the new Primary.
 +# First try pg_rewind. If pg_rewind failed, use pg_basebackup.
 +
 +set -o xtrace
 +exec > >(logger -i -p local1.info) 2>&1
 +
 +# Special values:
 +#   %d = failed node id
 +#   %h = failed node hostname
 +#   %p = failed node port number
 +#   %D = failed node database cluster path
 +#   %m = new master node id
 +#   %H = new master node hostname
 +#   %M = old master node id
 +#   %P = old primary node id
 +#   %r = new master port number
 +#   %R = new master database cluster path
 +#   %N = old primary node hostname
 +#   %S = old primary node port number
 +#   %% = '%' character
 +
 +FAILED_NODE_ID="$1"
 +FAILED_NODE_HOST="$2"
 +FAILED_NODE_PORT="$3"
 +FAILED_NODE_PGDATA="$4"
 +NEW_MASTER_NODE_ID="$5"
 +OLD_MASTER_NODE_ID="$6"
 +NEW_MASTER_NODE_HOST="$7"
 +OLD_PRIMARY_NODE_ID="$8"
 +NEW_MASTER_NODE_PORT="$9"
 +NEW_MASTER_NODE_PGDATA="${10}"
 +
 +PGHOME=/usr/pgsql-10
 +ARCHIVEDIR=/var/lib/pgsql/archivedir
 +REPLUSER=repl
 +PCP_USER=pgpool
 +PGPOOL_PATH=/usr/bin
 +PCP_PORT=9898
 +
 +logger -i -p local1.info follow_master.sh: start: Standby node ${FAILED_NODE_ID}
 +
 +## Test passwrodless SSH
 +ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null
 +
 +if [ $? -ne 0 ]; then
 +    logger -i -p local1.info follow_master.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
 +    exit 1
 +fi
 +
 +## Get PostgreSQL major version
 +PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`
 +
 +if [ $PGVERSION -ge 12 ]; then
 +RECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.conf
 +else
 +RECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conf
 +fi
 +
 +## Check the status of Standby
 +ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
 +postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -w -D ${FAILED_NODE_PGDATA} status
 +
 +
 +## If Standby is running, synchronize it with the new Primary.
 +if [ $? -eq 0 ]; then
 +
 +    logger -i -p local1.info follow_master.sh: pg_rewind for $FAILED_NODE_ID
 +
 +    # Create replication slot "${FAILED_NODE_HOST}"
 +    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
 +        ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${FAILED_NODE_HOST}');\"
 +    "
 +
 +    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
 +
 +        set -o errexit
 +
 +        ${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stop
 +
 +        cat > ${RECOVERYCONF} << EOT
 +primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
 +recovery_target_timeline = 'latest'
 +restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
 +primary_slot_name = '${FAILED_NODE_HOST}'
 +EOT
 +
 +        if [ ${PGVERSION} -ge 12 ]; then
 +            touch ${FAILED_NODE_PGDATA}/standby.signal
 +        else
 +            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
 +        fi
 +
 +        ${PGHOME}/bin/pg_rewind -D ${FAILED_NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\"
 +
 +    "
 +
 +    if [ $? -ne 0 ]; then
 +        logger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup.
 +
 +        ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
 +             
 +            set -o errexit
 +
 +            # Execute pg_basebackup
 +            rm -rf ${FAILED_NODE_PGDATA}
 +            rm -rf ${ARCHIVEDIR}/*
 +            ${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X stream
 +
 +            if [ ${PGVERSION} -ge 12 ]; then
 +                sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
 +                       -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.conf
 +            fi
 +     
 +            cat > ${RECOVERYCONF} << EOT
 +primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
 +recovery_target_timeline = 'latest'
 +restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
 +primary_slot_name = '${FAILED_NODE_HOST}'
 +EOT
 +
 +            if [ ${PGVERSION} -ge 12 ]; then
 +                    touch ${FAILED_NODE_PGDATA}/standby.signal
 +            else
 +                    echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
 +            fi
 +        "
 +
 +        if [ $? -ne 0 ]; then
 +            # drop replication slot
 +            ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
 +                ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
 +            "
 +
 +            logger -i -p local1.error follow_master.sh: end: pg_basebackup failed
 +            exit 1
 +        fi
 +    fi
 +
 +    # start Standby node on ${FAILED_NODE_HOST}
 +    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
 +            postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool $PGHOME/bin/pg_ctl -l /dev/null -w -D ${FAILED_NODE_PGDATA} start
 +
 +    # If start Standby successfully, attach this node
 +    if [ $? -eq 0 ]; then
 +
 +        # Run pcp_attact_node to attach Standby node to Pgpool-II.
 +        ${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID}
 +
 +        if [ $? -ne 0 ]; then
 +                logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed
 +                exit 1
 +        fi
 +
 +    # If start Standby failed, drop replication slot "${FAILED_NODE_HOST}"
 +    else
 +
 +        ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \
 +        ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')"
 +
 +        logger -i -p local1.error follow_master.sh: end: follow master command failed
 +        exit 1
 +    fi
 +
 +else
 +    logger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master command
 +    exit 0
 +fi
 +
 +logger -i -p local1.info follow_master.sh: end: follow master command complete
 +exit 0
 +</Code>
 +
 +
 +=====Implemention with Kubernetes=====
 +To implement pgpool can be done either
 +1) Via variables
 +2) Configmaps
 +
 +In this case, we will use a config map:
 +
 +<Code:Configmap>
 +apiVersion: v1
 +kind: ConfigMap
 +metadata:
 +  name: pgpool-config
 +  namespace: db-test
 +  labels:
 +    app: pgpool-config
 +data:
 +  pgpool.conf: |-
 +    listen_addresses = '*'
 +    port = 9999
 +    socket_dir = '/var/run/postgresql'
 +    pcp_listen_addresses = '*'
 +    pcp_port = 9898
 +    pcp_socket_dir = '/var/run/postgresql'
 +    backend_hostname0 = experience-db-cluster-alinma-rw
 +    backend_port0 = 5432
 +    backend_weight0 = 1
 +    backend_flag0 = 'ALWAYS_PRIMARY|DISALLOW_TO_FAILOVER'
 +    backend_auth_method0 = 'scram-sha-256'
 +    backend_password0 = 'experience_db'
 +    backend_hostname1 = experience-db-cluster-alinma-ro
 +    backend_port1 = 5432
 +    backend_weight1 = 1
 +    backend_flag1 = 'DISALLOW_TO_FAILOVER'
 +    backend_password1 = 'experience_db'
 +    backend_auth_method1 = 'scram-sha-256'
 +    backend_hostname2 = experience-db-cluster-alinma-ro
 +    backend_port2 = 5432
 +    backend_weight2 = 2
 +    backend_flag2 = 'DISALLOW_TO_FAILOVER'
 +    backend_password2 = 'experience_db'
 +    backend_auth_method2 = 'scram-sha-256'
 +    sr_check_user = 'experience_db'
 +    sr_check_password = 'experience_db'
 +    sr_check_period = 10
 +    enable_pool_hba = on
 +    master_slave_mode = on
 +    num_init_children = 32
 +    max_pool = 4
 +    child_life_time = 300
 +    child_max_connections = 0
 +    connection_life_time = 0
 +    client_idle_limit = 0
 +    connection_cache = on
 +    load_balance_mode = on
 +    PGPOOL_PCP_USER = 'experience_db'
 +    PGPOOL_PCP_PASSWORD = 'experience_db'
 +  pcp.conf: |-
 +    experience_db:be22aea2ca31a561e65894d88a2bad32
 +  pool_passwd: |-
 +    experience_db:be22aea2ca31a561e65894d88a2bad32
 +  pool_hba.conf: |-
 +    local   all         all                               trust
 +    host    all         all         127.0.0.1/32          trust
 +    host    all         all         ::1/128               trust
 +    host    all         all         0.0.0.0/            scram-sha-256
 +</Code>
 +
 +
 +After we create that configmap with:
 +
 +<Code:bash|Create configmap>
 +kk apply -f configmap.yaml
 +</Code>
 +
 +We can create the deployment and the service now:
 +
 +<Code:bash|Create Depoyment and Service>
 +apiVersion: apps/v1
 +kind: Deployment
 +metadata:
 +  name: pgpool
 +spec:
 +  replicas: 3
 +  selector:
 +    matchLabels:
 +      app: pgpool
 +  template:
 +    metadata:
 +      labels:
 +        app: pgpool
 +    spec:
 +      containers:
 +      - name: pgpool
 +        image: pgpool/pgpool
 +        env:
 +        - name: POSTGRES_USERNAME
 +          value: "experience_db"
 +        - name: POSTGRES_PASSWORD
 +          value: "experience_db"
 +        - name: PGPOOL_PASSWORD_ENCRYPTION_METHOD
 +          value: "scram-sha-256"
 +        - name: PGPOOL_ENABLE_POOL_PASSWD
 +          value: "true"
 +        - name: PGPOOL_SKIP_PASSWORD_ENCRYPTION
 +          value: "false"
 +        # The following settings are not required when not using the Pgpool-II PCP command.
 +        # To enable the following settings, you must define a secret that stores the PCP user's
 +        # username and password.
 +        #- name: PGPOOL_PCP_USER
 +        #  valueFrom:
 +        #    secretKeyRef:
 +        #      name: pgpool-pcp-secret
 +        #      key: username
 +        #- name: PGPOOL_PCP_PASSWORD
 +        #  valueFrom:
 +        #    secretKeyRef:
 +        #      name: pgpool-pcp-secret
 +        #      key: password
 +        volumeMounts:
 +        - name: pgpool-config
 +          mountPath: /config
 +        #- name: pgpool-tls
 +        #  mountPath: /config/tls
 +      volumes:
 +      - name: pgpool-config
 +        configMap:
 +          name: pgpool-config
 +      # Configure your own TLS certificate.
 +      # If not set, Pgpool-II will automatically generate the TLS certificate if ssl = on.
 +      #- name: pgpool-tls
 +      #  secret:
 +      #    secretName: pgpool-tls
 +---
 +apiVersion: v1
 +kind: Service
 +metadata:
 +  name: pgpool
 +spec:
 +  selector:
 +    app: pgpool
 +  ports:
 +  - name: pgpool-port
 +    protocol: TCP
 +    port: 9999
 +    targetPort: 9999
 </Code> </Code>
  • postgresql_replication_ha.1580214288.txt.gz
  • Last modified: 2020/01/28 20:24
  • (external edit)