Differences

This shows you the differences between two versions of the page.

--- postgresql_repmgr_installation [2021/09/16 12:10] – andonovj
+++ postgresql_repmgr_installation [2021/09/28 11:30] (current) – andonovj
@@ Line 9: / Line 9: @@
   * 2nd Quandrant - For Rempgr
+{{:1_g-5e4zhde9lojvmbdsiptg.png?600|}}
 =====Virtual Environment Config=====
@@ Line 69: / Line 71: @@
-====Server Config====
+=====Server Config=====
 For the servers we have to:
@@ Line 80: / Line 82: @@
-===Modify /etc/hosts ===
+====Modify /etc/hosts ====
 That is pretty trivial, just add the IPs and the hostnames to /etc/hosts:
@@ Line 94: / Line 96: @@
-===Configure Repositories===
+====Configure Repositories====
 I decided to use 2nd quadrant packages, as they are the ones I use for work as well :)
 First thing first, we have to update our repos and pull the latest metadata from them.
@@ Line 133: / Line 135: @@
 </Code>
-===Install PostgreSQL===
+====Install PostgreSQL====
 Then we can proceed with installation of PostgreSQL
 <Code:bash|Install PostgreSQL>
@@ Line 177: / Line 179: @@
 Do this on both machines: server1 and server2
-===Install REPMGR Repos===
+====Install REPMGR Repos====
 Then we can proceed with adding the repo for REPMGR
@@ Line 210: / Line 212: @@
 </Code>
-===Configure SSH keys===
+====Configure SSH keys====
 To configure passwordless connection, we have to:
@@ Line 296: / Line 298: @@
 </Code>
-====Cluster Environment====
+=====Cluster Environment=====
 After we've installed and configured all the packages, we can proceed with cluster configuration.
 Firstly, we have to configure the database and the users:
-===Configure User===
+====Configure User====
 The user to be used for the replication can be configured as so:
@@ Line 316: / Line 318: @@
 After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf
-===Connection Configuration===
+====Connection Configuration====
 To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries.
@@ Line 342: / Line 344: @@
 repmgr=#
+</Code>
+Finally we have to set pre-loaded library for repmgr in the postgresql.conf
+<Code:bash|Set Pre-loaded Library>
+shared_preload_libraries = 'repmgr'
 </Code>
@@ Line 347: / Line 355: @@
 Now, finally we can configure the REPMGR
-====REPMGR====
+=====REPMGR=====
 The default configuration of Repmgr is located in:
@@ Line 355: / Line 363: @@
 We can do a basic configuration as follows:
+====Configuration====
 <Code:bash|Repmgr.conf>
 cluster=cluster
@@ Line 398: / Line 407: @@
 </Code>
+Furthermore, we have to make a symlink to the correct pg_ctl
+<Code:bash|Create symlink to pg_ctl>
+root@server2:~#  ln -s /usr/lib/postgresql/12/bin/pg_ctl /usr/bin/pg_ctl
+</Code>
+We have to do that on both servers.
+====Cluster Initialization====
+To initialize a cluster, we can do it as so:
+<Code:bash|Initialize a cluster>
+postgres@server1:~$ repmgr primary register
+WARNING: the following problems were found in the configuration file:
+  parameter "cluster" is deprecated and will be ignored
+  parameter "logfile" has been renamed to "log_file"
+  parameter "loglevel" has been renamed to "log_level"
+INFO: connecting to primary database...
+NOTICE: attempting to install extension "repmgr"
+NOTICE: "repmgr" extension successfully installed
+NOTICE: primary node record (ID: 1) registered
+postgres@server1:~$
+</Code>
+This will initialize the server and we can check that:
+<Code:bash|Check cluster status>
+postgres@server1:~$ repmgr cluster show
+WARNING: the following problems were found in the configuration file:
+  parameter "cluster" is deprecated and will be ignored
+  parameter "logfile" has been renamed to "log_file"
+  parameter "loglevel" has been renamed to "log_level"
+ ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
+----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
+  | server1 | primary | * running |          | default  | 100      | 1        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
+postgres@server1:~$
+</Code>
+After that is done, we can set-up the standby server (e.g. server2)
+====Setup Standby====
+To setup standby, we have to delete everything from the server and clone it:
+<Code:bash|Setup standby server>
+root@server2:~# rm -rf /var/lib/postgresql/12/main/
+root@server2:~# su - postgres
+postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
+WARNING: the following problems were found in the configuration file:
+  parameter "cluster" is deprecated and will be ignored
+  parameter "logfile" has been renamed to "log_file"
+  parameter "loglevel" has been renamed to "log_level"
+NOTICE: destination directory "/var/lib/postgresql/12/main" provided
+INFO: connecting to source node
+DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
+DETAIL: current installation size is 31 MB
+NOTICE: checking for available walsenders on the source node (2 required)
+NOTICE: checking replication connections can be made to the source server (2 required)
+INFO: creating directory "/var/lib/postgresql/12/main"...
+NOTICE: starting backup (using pg_basebackup)...
+HINT: this may take some time; consider using the -c/--fast-checkpoint option
+INFO: executing:
+  pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream
+NOTICE: standby clone (using pg_basebackup) complete
+NOTICE: you can now start your PostgreSQL server
+HINT: for example: pg_ctl -D /var/lib/postgresql/12/main start
+HINT: after starting the server, you need to register this standby with "repmgr standby register"
+postgres@server2:~$
+</Code>
+After that, we should be able to see the replication working:
+<Code:bash|Check status of REPMGR>
+postgres@server2:~$ repmgr cluster show
+WARNING: the following problems were found in the configuration file:
+  parameter "cluster" is deprecated and will be ignored
+  parameter "logfile" has been renamed to "log_file"
+  parameter "loglevel" has been renamed to "log_level"
+ ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
+----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
+  | server1 | primary | * running |          | default  | 100      | 1        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
+  | server2 | standby |   running | server1  | default  | 100      | 1        | host=server2 user=repmgr dbname=repmgr connect_timeout=2
+postgres@server2:~$
+</Code>
+We can test the connection as follows:
+====Test Configuration====
+Firstly, we will test the normal functionality and then we will test the failove.
+For the functionality, we can create a table on primary and see if it is present on the standby to test:
+==On Primary==
+<Code:bash|Test Functionality>
+repmgr=# create table test(id integer);
+CREATE TABLE
+repmgr=# insert into test values(1);
+INSERT 0 1
+repmgr=# commit;
+WARNING:  there is no transaction in progress
+COMMIT
+repmgr=#
+</Code>
+==Standby==
+Now, we can test it on the standby:
+<Code:bash|Test Functionality>
+postgres=# \c repmgr
+You are now connected to database "repmgr" as user "postgres".
+repmgr=# select * from test;
+ id
+----
+(1 row)
+repmgr=#
+</Code>
+Now, let's test the failover:
+====Automatic Failover====
+To test that, we can shutdown the primary and see if it is failover automatically (as per our settings)
+Now, I changed a little bit the config, so please check Appendix A.
+===Modify Postgres user===
+Firstly, we need to add the postgres user to the sudo group:
+<Code:bash|Add user to Sudoers group>
+### Server 1
+root@server2:/var/log/postgresql# usermod -aG sudo postgres
+root@server2:/var/log/postgresql#
+### Server 2
+root@server2:/var/log/postgresql# usermod -aG sudo postgres
+root@server2:/var/log/postgresql#
+</Code>
+===Test the failover===
+Now, we can finally test the failover.
+<Code:bash|Test Failover>
+postgres@server1:/var/log/postgresql$ repmgr cluster show
+ ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
+----+---------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------
+  | server1 | standby | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr
+  | server2 | primary |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
+</Code>
+We can simply shutdown the prod:
+<Code:bash|Shutdown production>
+root@server2:/var/log/postgresql# service postgresql restart
+</Code>
+Once, this is done, we can see the log file from server1:
+<Code:bash|Check the log on standby>
+[2021-09-28 10:46:45] [INFO] primary and this node have the same location ("default")
+[2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default
+[2021-09-28 10:46:45] [DEBUG] election result: WON
+[2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself
+[2021-09-28 10:46:45] [INFO] promote_command is:
+  "repmgr standby promote -f /etc/repmgr.conf --log-to-file"
+[2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgr.log"
+[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
+[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
+[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
+[2021-09-28 10:46:45] [NOTICE] promoting standby to primary
+[2021-09-28 10:46:45] [DETAIL] promoting server "server1" (ID: 1) using pg_promote()
+[2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
+[2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed
+[2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful
+[2021-09-28 10:46:46] [DETAIL] server "server1" (ID: 1) was successfully promoted to primary
+[2021-09-28 10:46:46] [INFO] 0 followers to notify
+[2021-09-28 10:46:46] [INFO] switching to primary monitoring mode
+[2021-09-28 10:46:46] [NOTICE] monitoring cluster primary "server1" (ID: 1)
+</Code>
+As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath.
+===Dealing with the aftermath===
+Now, we have one primary and 1 failed primary, we need to re-establish connection.
+<Code:bash|Re-establish the connection>
+postgres@server1:/var/log/postgresql$ repmgr cluster show
+DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
+DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
+DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
+ ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
+----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
+  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
+  | server2 | primary | - failed  |          | default  | 100      | ?        | host=server2 user=repmgr dbname=repmgr connect_timeout=2
+WARNING: following issues were detected
+  - unable to connect to node "server2" (ID: 2)
+HINT: execute with --verbose option to see connection error messages
+postgres@server1:
+</Code>
+We can re-establish the connection as follows:
+. Clear the old path and Resnap:
+<Code:bash|Resnap failed server's data>
+root@server2:/var/log/postgresql# rm -rf /var/lib/postgresql/12/main/
+root@server2:/var/log/postgresql# su - postgres
+postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
+WARNING: following problems with command line parameters detected:
+  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
+NOTICE: destination directory "/var/lib/postgresql/12/main" provided
+INFO: connecting to source node
+DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
+DETAIL: current installation size is 31 MB
+DEBUG: 2 node records returned by source node
+DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
+DEBUG: upstream_node_id determined as 1
+NOTICE: checking for available walsenders on the source node (2 required)
+NOTICE: checking replication connections can be made to the source server (2 required)
+INFO: creating directory "/var/lib/postgresql/12/main"...
+DEBUG: create_replication_slot(): creating slot "repmgr_slot_2" on upstream
+NOTICE: starting backup (using pg_basebackup)...
+HINT: this may take some time; consider using the -c/--fast-checkpoint option
+INFO: executing:
+  /usr/lib/postgresql/12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream -S repmgr_slot_2
+DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
+NOTICE: standby clone (using pg_basebackup) complete
+NOTICE: you can now start your PostgreSQL server
+HINT: for example: sudo systemctl start postgresql
+HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
+postgres@server2:~$
+</Code>
+. Startup the postgresql
+. Re-register
+<Code:bash|Startup PostgreSQL and Re-register>
+postgres@server2:~$ logout
+root@server2:/var/log/postgresql# service postgresql start
+root@server2:/var/log/postgresql# su - postgres
+postgres@server2:~$ repmgr standby register --force
+INFO: connecting to local node "server2" (ID: 2)
+DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
+INFO: connecting to primary database
+DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
+INFO: standby registration complete
+postgres@server2:~$
+</Code>
+Now we can check our status as follows:
+<Code:bash|Check the newly recovered cluster>
+postgres@server2:~$ repmgr cluster show
+ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
+----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
+  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr
+  | server2 | standby |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
+postgres@server2:~$
+</Code>
+For that configuration I have used the following appendix:
+===Appendix===
+<Code:bash|repmgrd.conf>
+node_id='2'                    # A unique integer greater than zero (1/2 depends on the node)
+node_name='server2'                       # An arbitrary (but unique) string; we recommend. (unique number for the cluster)
+conninfo='host=server1 user=repmgr dbname=repmgr connect_timeout=2'         # Database connection information as a conninfo string.
+data_directory='/var/lib/postgresql/12/main'             # The node's data directory. This is needed by repmgr
+config_directory='/etc/postgresql/12/main'               # If configuration files are located outside the data
+replication_user='repmgr'        # User to make replication connections with, if not set
+replication_type='physical'      # Must be one of "physical" or "bdr".
+location='default'               # An arbitrary string defining the location of the node; this
+use_replication_slots='yes'      # whether to use physical replication slots
+witness_sync_interval=15         # interval (in seconds) to synchronise node records
+log_level='DEBUG'                 # Log level: possible values are DEBUG, INFO, NOTICE,
+log_facility='STDERR'            # Logging facility: possible values are STDERR, or for
+log_file='/var/log/postgresql/repmgr.log'
+log_status_interval=300  # interval (in seconds) for repmgrd to log a status message
+pg_bindir='/usr/lib/postgresql/12/bin'                          # Path to PostgreSQL binary directory (location
+repmgr_bindir='/usr/lib/postgresql/12/bin'                      # Path to repmgr binary directory (location of the repmgr
+use_primary_conninfo_password=false     # explicitly set "password" in "primary_conninfo"
+passfile='/var/lib/postgresql/12/main/.pgpass'       # path to .pgpass file to include in "primary_conninfo"
+ssh_options='-o StrictHostKeyChecking=no'       # Options to append to "ssh"
+promote_check_timeout=60                # The length of time (in seconds) to wait
+promote_check_interval=1                # The interval (in seconds) to check whether
+primary_follow_timeout=60               # The max length of time (in seconds) to wait
+standby_follow_timeout=15               # The max length of time (in seconds) to wait
+shutdown_check_timeout=60               # The max length of time (in seconds) to wait for the demotion
+standby_reconnect_timeout=60            # The max length of time (in seconds) to wait
+node_rejoin_timeout=60          # The maximum length of time (in seconds) to wait for
+failover='automatic'                    # one of 'automatic', 'manual'.
+priority=100                            # indicates a preferred priority for promoting nodes;
+reconnect_attempts=6                    # Number of attempts which will be made to reconnect to an unreachable
+reconnect_interval=10                   # Interval between attempts to reconnect to an unreachable
+promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'  # command repmgrd executes when promoting a new primary; use something like:
+follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'  # command repmgrd executes when instructing a standby to follow a new primary;
+primary_notification_timeout=60                 # Interval (in seconds) which repmgrd on a standby
+repmgrd_standby_startup_timeout=60              # Interval (in seconds) which repmgrd on a standby will wait
+monitoring_history=no           # Whether to write monitoring data to the "montoring_history" table
+monitor_interval_secs=2                 # Interval (in seconds) at which to write monitoring data
+degraded_monitoring_timeout=-1          # Interval (in seconds) after which repmgrd will terminate if the
+async_query_timeout=60          # Interval (in seconds) which repmgrd will wait before
+standby_disconnect_on_failover=false    # If "true", in a failover situation wait for all standbys to
+primary_visibility_consensus=false      # If "true", only continue with failover if no standbys have seen
+service_start_command ='sudo systemctl start postgresql'
+service_stop_command ='sudo systemctl stop postgresql'
+service_restart_command ='sudo systemctl restart postgresql'
+service_reload_command ='sudo systemctl reload postgresql'
+archive_ready_warning=16                # repmgr node check --archive-ready
+archive_ready_critical=128              #
+replication_lag_warning=300             # repmgr node check --replication-lag
+replication_lag_critical=600            #
+bdr_local_monitoring_only=false         # Only monitor the local node; no checks will be
+bdr_recovery_timeout=30                 # If a BDR node was offline and has become available
+</Code>
+====Manual Failover====