Differences

This shows you the differences between two versions of the page.

--- postgresql_replication_ha [2020/01/28 14:16] – andonovj
+++ postgresql_replication_ha [2024/11/09 19:13] (current) – andonovj
@@ Line 471: / Line 471: @@
 pcp_recovery_node -- Command Successful
 </Code>
+=====Failover=====
+I was going to kill myself couple times, seriously. I was head banging for at least couple weeks because I Couldn't make PostgreSQL failover. So let me tell you couple things which I found out THE VERY HARD and PAINFUL way.
+With PgPool, you can either use postgresql replication to migrate you the data, OR Pgpool replication. By default, when you install pgpool, it will be the FIRST thing you install, HOWEVER in our case was the second, so there is a need of a little modification.
+Please ENSURE the following parameters are set on the nodes:
+<code:none|Pgpool.conf>
+master_slave_mode = on
+master_slave_sub_mode = 'stream'
+</code>
+And the following ones, turned off:
+<code:none|Pgpool.conf>
+-bash-4.2$ cat pgpool.conf | grep replication
+replication_mode = off
+                                   # Activate replication mode
+                                   # when in replication mode
+                                   # replication mode, specify table name to
+</code>
+As the two settings are mutually exclusive. Only one can be active at a time.
+This indicates that you ALREADY have streaming replication and that you take care of it.
+====Current State====
+After that, let's see our current state of the cluster:
+<code:none|Current State>
+[root@postgresqlslaveone tmp]#  psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
+Password for user postgres:
+ node_id |      hostname      | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
+---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
+       | postgresqlmaster   | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |                        | 2020-02-07 10:12:26
+       | postgresqlslaveone | 5432 | up     | 0.500000  | standby | 0          | false             | 0                 |                   |                        | 2020-02-07 10:12:26
+(2 rows)
+[root@postgresqlslaveone tmp]
+</code>
+That clearly states that the postgresqlmaster is the master and postgresqlslaveone is the slave :) I know, stupid naming but bare with me :)
+====Database Failover====
+So what happens after I shutdown the first database:
+<code:none|Shutdown Master Database>
+[root@postgresqlmaster pgpool-II]# su - postgres
+Last login: Fri Feb  7 10:01:53 EST 2020 on pts/3
+-bash-4.2$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ -l logfile stop
+waiting for server to shut down..... done
+server stopped
+-bash-4.2$
+</code>
+Well...what can happen, the database will go down of course :)
+<code:none|Shutdown Master Database>
+-02-07 10:14:36.882 EST [15508] LOG:  received fast shutdown request
+-02-07 10:14:36.907 EST [15508] LOG:  aborting any active transactions
+-02-07 10:14:36.909 EST [15508] LOG:  worker process: logical replication launcher (PID 15517) exited with exit code 1
+-02-07 10:14:36.909 EST [15511] LOG:  shutting down
+-02-07 10:14:38.176 EST [15508] LOG:  database system is shut down
+</code>
+And even more shokingly the PGPool will FINALLY recognize it:
+<code:none|PGPool's Reaction on Master>
+:15:09 postgresqlmaster pgpool[16840]: [10-1] 2020-02-07 10:15:09: pid 16840: LOG:  failed to connect to PostgreSQL server on "postgresqlmaster:5432", getsockopt() detected error "Connection refused"
+Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [11-1] 2020-02-07 10:15:09: pid 16840: LOG:  received degenerate backend request for node_id: 0 from pid [16840]
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [26-1] 2020-02-07 10:15:09: pid 16802: LOG:  new IPC connection received
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [27-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog received the failover command from local pgpool-II on IPC interface
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [28-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-1] 2020-02-07 10:15:09: pid 16802: LOG:  we do not need quorum to hold to proceed with failover
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-2] 2020-02-07 10:15:09: pid 16802: DETAIL:  proceeding with the failover
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [29-3] 2020-02-07 10:15:09: pid 16802: HINT:  failover_when_quorum_exists is set to false
+Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [12-1] 2020-02-07 10:15:09: pid 16840: FATAL:  failed to create a backend connection
+Feb  7 10:15:09 postgresqlmaster pgpool[16840]: [12-2] 2020-02-07 10:15:09: pid 16840: DETAIL:  executing failover on backend
+Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [15-1] 2020-02-07 10:15:09: pid 16800: LOG:  Pgpool-II parent process has received failover request
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [30-1] 2020-02-07 10:15:09: pid 16802: LOG:  new IPC connection received
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [31-1] 2020-02-07 10:15:09: pid 16802: LOG:  received the failover indication from Pgpool-II on IPC interface
+Feb  7 10:15:09 postgresqlmaster pgpool[16802]: [32-1] 2020-02-07 10:15:09: pid 16802: LOG:  watchdog is informed of failover start by the main process
+Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [16-1] 2020-02-07 10:15:09: pid 16800: LOG:  starting degeneration. shutdown host postgresqlmaster(5432)
+Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [17-1] 2020-02-07 10:15:09: pid 16800: LOG:  Restart all children
+Feb  7 10:15:09 postgresqlmaster pgpool[16800]: [18-1] 2020-02-07 10:15:09: pid 16800: LOG:  execute command: /etc/pgpool-II/failover.sh 0 postgresqlmaster 5432 /var/lib/pgsql/10/data 1 postgresqlslaveone 0 0 5432 /var/lib/pgsql/10/data postgresqlmaster 5432
+Feb  7 10:15:11 postgresqlmaster postgres[17112]: follow_master.sh: start: Standby node 0
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_ID=0
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_HOST=postgresqlmaster
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_PORT=5432
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + FAILED_NODE_PGDATA=/var/lib/pgsql/10/data
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_ID=1
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + OLD_MASTER_NODE_ID=0
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_HOST=postgresqlslaveone
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + OLD_PRIMARY_NODE_ID=0
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_PORT=5432
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + NEW_MASTER_NODE_PGDATA=/var/lib/pgsql/10/data
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGHOME=/usr/pgsql-10
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ARCHIVEDIR=/walshared
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + REPLUSER=repl
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PCP_USER=postgres
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGPOOL_PATH=/usr/bin
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PCP_PORT=9898
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + logger -i -p local1.info follow_master.sh: start: Standby node 0
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@postgresqlslaveone -i /var/lib/pgsql/.ssh/id_rsa ls /tmp
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: Warning: Permanently added 'postgresqlslaveone,192.168.0.199' (ECDSA) to the list of known hosts.
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + '[' 0 -ne 0 ']'
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ /usr/pgsql-10/bin/initdb -V
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ awk '{print $3}'
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ sed 's/\..*//'
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: ++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + PGVERSION=10
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + '[' 10 -ge 12 ']'
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + RECOVERYCONF=/var/lib/pgsql/10/data/recovery.conf
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@postgresqlmaster -i /var/lib/pgsql/.ssh/id_rsa /usr/pgsql-10/bin/pg_ctl -w -D /var/lib/pgsq
+l/10/data status
+Feb  7 10:15:11 postgresqlmaster postgres[17113]: Warning: Permanently added 'postgresqlmaster,192.168.0.178' (ECDSA) to the list of known hosts.
+</code>
+FINALLY, the failover script was executed after countless blood stains on the wall
+You can see also that on the new master node's database:
+<code:none|Database Reaction on Slave(new master)>
+-01-28 14:40:19.701 EST [22926] LOG:  received promote request
+-01-28 14:40:19.728 EST [22926] LOG:  redo done at 0/1C000028
+cp: cannot stat ‘/walshared/00000001000000000000001C’: No such file or directory
+cp: cannot stat ‘/walshared/00000002.history’: No such file or directory
+-01-28 14:40:19.767 EST [22926] LOG:  selected new timeline ID: 2
+-01-28 14:40:19.876 EST [22926] LOG:  archive recovery complete
+cp: cannot stat ‘/walshared/00000001.history’: No such file or directory
+-01-28 14:40:20.012 EST [22924] LOG:  database system is ready to accept connections
+</code>
+====PGPool Failover====
+Now, on the slave (new master) you won't see anything on the Pgpool until you don't shutdown the pgpool too. Because usually when a master fails over, the entire server is dead :)
+<code:none|Shutdown PGPool on Master(old)>
+[root@postgresqlmaster pgpool-II]# service pgpool stop
+Redirecting to /bin/systemctl stop pgpool.service
+[root@postgresqlmaster pgpool-II]#
+</code>
+Once you do, you will recieve the migrate of the VIP as well :)
+<code:none|PGPool's Reaction on Slave>
+Jan 28 15:33:47 postgresqlslaveone pgpool[25956]: [50-1] 2020-01-28 15:33:47: pid 25956: LOG:  watchdog node state changed from [STANDBY] to [JOINING]
+Jan 28 15:33:51 postgresqlslaveone pgpool[25956]: [51-1] 2020-01-28 15:33:51: pid 25956: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
+Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [52-1] 2020-01-28 15:33:52: pid 25956: LOG:  I am the only alive node in the watchdog cluster
+Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [52-2] 2020-01-28 15:33:52: pid 25956: HINT:  skipping stand for coordinator state
+Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [53-1] 2020-01-28 15:33:52: pid 25956: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]
+Jan 28 15:33:52 postgresqlslaveone pgpool[25956]: [54-1] 2020-01-28 15:33:52: pid 25956: LOG:  I am announcing my self as master/coordinator watchdog node
+Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [55-1] 2020-01-28 15:33:56: pid 25956: LOG:  I am the cluster leader node
+Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [55-2] 2020-01-28 15:33:56: pid 25956: DETAIL:  our declare coordinator message is accepted by all nodes
+Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [56-1] 2020-01-28 15:33:56: pid 25956: LOG:  setting the local node "postgresqlslaveone:9999 Linux postgresqlslaveone" as watchdog cluster master
+Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [57-1] 2020-01-28 15:33:56: pid 25956: LOG:  I am the cluster leader node. Starting escalation process
+Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [58-1] 2020-01-28 15:33:56: pid 25956: LOG:  escalation process started with PID:29053
+Jan 28 15:33:56 postgresqlslaveone pgpool[25956]: [59-1] 2020-01-28 15:33:56: pid 25956: LOG:  new IPC connection received
+Jan 28 15:33:56 postgresqlslaveone pgpool[29053]: [58-1] 2020-01-28 15:33:56: pid 29053: LOG:  watchdog: escalation started
+Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [24-1] 2020-01-28 15:33:59: pid 26021: LOG:  forked new pcp worker, pid=29066 socket=8
+Jan 28 15:33:59 postgresqlslaveone pgpool[25956]: [60-1] 2020-01-28 15:33:59: pid 25956: LOG:  new IPC connection received
+Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [25-1] 2020-01-28 15:33:59: pid 26021: LOG:  PCP process with pid: 29066 exit with SUCCESS.
+Jan 28 15:33:59 postgresqlslaveone pgpool[26021]: [26-1] 2020-01-28 15:33:59: pid 26021: LOG:  PCP process with pid: 29066 exits with status 0
+Jan 28 15:34:00 postgresqlslaveone pgpool[29053]: [59-1] 2020-01-28 15:34:00: pid 29053: LOG:  successfully acquired the delegate IP:"192.168.0.220"
+Jan 28 15:34:00 postgresqlslaveone pgpool[29053]: [59-2] 2020-01-28 15:34:00: pid 29053: DETAIL:  'if_up_cmd' returned with success
+Jan 28 15:34:00 postgresqlslaveone pgpool[25956]: [61-1] 2020-01-28 15:34:00: pid 25956: LOG:  watchdog escalation process with pid: 29053 exit with SUCCESS.
+Jan 28 15:34:16 postgresqlslaveone pgpool[26293]: [31-1] 2020-01-28 15:34:16: pid 26293: LOG:  pool_reuse_block: blockid: 0
+Jan 28 15:34:16 postgresqlslaveone pgpool[26293]: [31-2] 2020-01-28 15:34:16: pid 26293: CONTEXT:  while searching system catalog, When relcache is missed
+Jan 28 15:34:19 postgresqlslaveone pgpool[25985]: [9-1] 2020-01-28 15:34:19: pid 25985: LOG:  informing the node status change to watchdog
+Jan 28 15:34:19 postgresqlslaveone pgpool[25985]: [9-2] 2020-01-28 15:34:19: pid 25985: DETAIL:  node id :1 status = "NODE DEAD" message:"No heartbeat signal from node"
+Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [62-1] 2020-01-28 15:34:19: pid 25956: LOG:  new IPC connection received
+Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [63-1] 2020-01-28 15:34:19: pid 25956: LOG:  received node status change ipc message
+Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [63-2] 2020-01-28 15:34:19: pid 25956: DETAIL:  No heartbeat signal from node
+Jan 28 15:34:19 postgresqlslaveone pgpool[25956]: [64-1] 2020-01-28 15:34:19: pid 25956: LOG:  remote node "postgresqlmaster:9999 Linux postgresqlmaster" is shutting down
+</code>
+====After failover====
+After all this is done, we can check the new status of the cluster :)
+<code:none|After Failover>
+[root@postgresqlslaveone tmp]#  pcp_watchdog_info -p 9898 -h 192.168.0.220 -U postgres
+Password:
+YES postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone
+postgresqlslaveone:9999 Linux postgresqlslaveone postgresqlslaveone 9999 9000 4 MASTER
+postgresqlmaster:9999 Linux postgresqlmaster postgresqlmaster 9999 9000 10 SHUTDOWN
+[root@postgresqlslaveone tmp]#  psql -h 192.168.0.220 -p 9999 -U postgres postgres -c "show pool_nodes"
+Password for user postgres:
+ node_id |      hostname      | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
+---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
+       | postgresqlmaster   | 5432 | down   | 0.500000  | standby | 0          | false             | 0                 |                   |                        | 2020-01-28 14:40:20
+       | postgresqlslaveone | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |                        | 2020-01-28 15:34:16
+(2 rows)
+[root@postgresqlslaveone tmp]#
+</code>
 ====Verify====
@@ Line 880: / Line 1080: @@
 logger -i -p local1.info follow_master.sh: end: follow master command complete
 exit 0
+</Code>
+=====Implemention with Kubernetes=====
+To implement pgpool can be done either
+) Via variables
+) Configmaps
+In this case, we will use a config map:
+<Code:Configmap>
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: pgpool-config
+  namespace: db-test
+  labels:
+    app: pgpool-config
+data:
+  pgpool.conf: |-
+    listen_addresses = '*'
+    port = 9999
+    socket_dir = '/var/run/postgresql'
+    pcp_listen_addresses = '*'
+    pcp_port = 9898
+    pcp_socket_dir = '/var/run/postgresql'
+    backend_hostname0 = experience-db-cluster-alinma-rw
+    backend_port0 = 5432
+    backend_weight0 = 1
+    backend_flag0 = 'ALWAYS_PRIMARY|DISALLOW_TO_FAILOVER'
+    backend_auth_method0 = 'scram-sha-256'
+    backend_password0 = 'experience_db'
+    backend_hostname1 = experience-db-cluster-alinma-ro
+    backend_port1 = 5432
+    backend_weight1 = 1
+    backend_flag1 = 'DISALLOW_TO_FAILOVER'
+    backend_password1 = 'experience_db'
+    backend_auth_method1 = 'scram-sha-256'
+    backend_hostname2 = experience-db-cluster-alinma-ro
+    backend_port2 = 5432
+    backend_weight2 = 2
+    backend_flag2 = 'DISALLOW_TO_FAILOVER'
+    backend_password2 = 'experience_db'
+    backend_auth_method2 = 'scram-sha-256'
+    sr_check_user = 'experience_db'
+    sr_check_password = 'experience_db'
+    sr_check_period = 10
+    enable_pool_hba = on
+    master_slave_mode = on
+    num_init_children = 32
+    max_pool = 4
+    child_life_time = 300
+    child_max_connections = 0
+    connection_life_time = 0
+    client_idle_limit = 0
+    connection_cache = on
+    load_balance_mode = on
+    PGPOOL_PCP_USER = 'experience_db'
+    PGPOOL_PCP_PASSWORD = 'experience_db'
+  pcp.conf: |-
+    experience_db:be22aea2ca31a561e65894d88a2bad32
+  pool_passwd: |-
+    experience_db:be22aea2ca31a561e65894d88a2bad32
+  pool_hba.conf: |-
+    local   all         all                               trust
+    host    all         all         127.0.0.1/32          trust
+    host    all         all         ::1/128               trust
+    host    all         all         0.0.0.0/0             scram-sha-256
+</Code>
+After we create that configmap with:
+<Code:bash|Create configmap>
+kk apply -f configmap.yaml
+</Code>
+We can create the deployment and the service now:
+<Code:bash|Create Depoyment and Service>
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: pgpool
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: pgpool
+  template:
+    metadata:
+      labels:
+        app: pgpool
+    spec:
+      containers:
+      - name: pgpool
+        image: pgpool/pgpool
+        env:
+        - name: POSTGRES_USERNAME
+          value: "experience_db"
+        - name: POSTGRES_PASSWORD
+          value: "experience_db"
+        - name: PGPOOL_PASSWORD_ENCRYPTION_METHOD
+          value: "scram-sha-256"
+        - name: PGPOOL_ENABLE_POOL_PASSWD
+          value: "true"
+        - name: PGPOOL_SKIP_PASSWORD_ENCRYPTION
+          value: "false"
+        # The following settings are not required when not using the Pgpool-II PCP command.
+        # To enable the following settings, you must define a secret that stores the PCP user's
+        # username and password.
+        #- name: PGPOOL_PCP_USER
+        #  valueFrom:
+        #    secretKeyRef:
+        #      name: pgpool-pcp-secret
+        #      key: username
+        #- name: PGPOOL_PCP_PASSWORD
+        #  valueFrom:
+        #    secretKeyRef:
+        #      name: pgpool-pcp-secret
+        #      key: password
+        volumeMounts:
+        - name: pgpool-config
+          mountPath: /config
+        #- name: pgpool-tls
+        #  mountPath: /config/tls
+      volumes:
+      - name: pgpool-config
+        configMap:
+          name: pgpool-config
+      # Configure your own TLS certificate.
+      # If not set, Pgpool-II will automatically generate the TLS certificate if ssl = on.
+      #- name: pgpool-tls
+      #  secret:
+      #    secretName: pgpool-tls
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: pgpool
+spec:
+  selector:
+    app: pgpool
+  ports:
+  - name: pgpool-port
+    protocol: TCP
+    port: 9999
+    targetPort: 9999
 </Code>