Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
postgresql_repmgr_installation [2021/09/16 12:10] andonovjpostgresql_repmgr_installation [2021/09/28 11:30] (current) andonovj
Line 9: Line 9:
   * 2nd Quandrant - For Rempgr   * 2nd Quandrant - For Rempgr
  
 +
 +{{:1_g-5e4zhde9lojvmbdsiptg.png?600|}}
  
 =====Virtual Environment Config===== =====Virtual Environment Config=====
Line 69: Line 71:
  
  
-====Server Config====+=====Server Config=====
 For the servers we have to: For the servers we have to:
  
Line 80: Line 82:
  
  
-===Modify /etc/hosts ===+====Modify /etc/hosts ====
 That is pretty trivial, just add the IPs and the hostnames to /etc/hosts: That is pretty trivial, just add the IPs and the hostnames to /etc/hosts:
  
Line 94: Line 96:
  
  
-===Configure Repositories===+====Configure Repositories====
 I decided to use 2nd quadrant packages, as they are the ones I use for work as well :) I decided to use 2nd quadrant packages, as they are the ones I use for work as well :)
 First thing first, we have to update our repos and pull the latest metadata from them. First thing first, we have to update our repos and pull the latest metadata from them.
Line 133: Line 135:
 </Code> </Code>
  
-===Install PostgreSQL===+====Install PostgreSQL====
 Then we can proceed with installation of PostgreSQL Then we can proceed with installation of PostgreSQL
 <Code:bash|Install PostgreSQL> <Code:bash|Install PostgreSQL>
Line 177: Line 179:
 Do this on both machines: server1 and server2 Do this on both machines: server1 and server2
  
-===Install REPMGR Repos===+====Install REPMGR Repos====
 Then we can proceed with adding the repo for REPMGR Then we can proceed with adding the repo for REPMGR
  
Line 210: Line 212:
 </Code> </Code>
  
-===Configure SSH keys===+====Configure SSH keys====
 To configure passwordless connection, we have to: To configure passwordless connection, we have to:
  
Line 296: Line 298:
 </Code> </Code>
  
-====Cluster Environment====+=====Cluster Environment=====
 After we've installed and configured all the packages, we can proceed with cluster configuration.  After we've installed and configured all the packages, we can proceed with cluster configuration. 
 Firstly, we have to configure the database and the users: Firstly, we have to configure the database and the users:
  
-===Configure User===+====Configure User====
 The user to be used for the replication can be configured as so: The user to be used for the replication can be configured as so:
  
Line 316: Line 318:
 After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf
  
-===Connection Configuration===+====Connection Configuration====
 To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries. To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries.
  
Line 342: Line 344:
  
 repmgr=# repmgr=#
 +</Code>
 +
 +Finally we have to set pre-loaded library for repmgr in the postgresql.conf
 +
 +<Code:bash|Set Pre-loaded Library>
 +shared_preload_libraries = 'repmgr'
 </Code> </Code>
  
Line 347: Line 355:
 Now, finally we can configure the REPMGR Now, finally we can configure the REPMGR
  
-====REPMGR====+=====REPMGR=====
 The default configuration of Repmgr is located in: The default configuration of Repmgr is located in:
  
Line 355: Line 363:
 We can do a basic configuration as follows: We can do a basic configuration as follows:
  
 +====Configuration====
 <Code:bash|Repmgr.conf> <Code:bash|Repmgr.conf>
 cluster=cluster cluster=cluster
Line 398: Line 407:
 </Code> </Code>
  
 +Furthermore, we have to make a symlink to the correct pg_ctl
 +
 +
 +<Code:bash|Create symlink to pg_ctl>
 +root@server2:~#  ln -s /usr/lib/postgresql/12/bin/pg_ctl /usr/bin/pg_ctl
 +</Code>
 +
 +We have to do that on both servers.
 +====Cluster Initialization====
 +To initialize a cluster, we can do it as so:
 +
 +
 +<Code:bash|Initialize a cluster>
 +postgres@server1:~$ repmgr primary register
 +WARNING: the following problems were found in the configuration file:
 +  parameter "cluster" is deprecated and will be ignored
 +  parameter "logfile" has been renamed to "log_file"
 +  parameter "loglevel" has been renamed to "log_level"
 +INFO: connecting to primary database...
 +NOTICE: attempting to install extension "repmgr"
 +NOTICE: "repmgr" extension successfully installed
 +NOTICE: primary node record (ID: 1) registered
 +postgres@server1:~$
 +</Code>
 +
 +This will initialize the server and we can check that:
 +
 +<Code:bash|Check cluster status>
 +postgres@server1:~$ repmgr cluster show
 +WARNING: the following problems were found in the configuration file:
 +  parameter "cluster" is deprecated and will be ignored
 +  parameter "logfile" has been renamed to "log_file"
 +  parameter "loglevel" has been renamed to "log_level"
 + ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
 +----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 +  | server1 | primary | * running |          | default  | 100      | 1        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
 +postgres@server1:~$
 +</Code>
 +
 +After that is done, we can set-up the standby server (e.g. server2)
 +
 +
 +====Setup Standby====
 +To setup standby, we have to delete everything from the server and clone it:
 +
 +
 +<Code:bash|Setup standby server>
 +root@server2:~# rm -rf /var/lib/postgresql/12/main/
 +root@server2:~# su - postgres
 +postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
 +WARNING: the following problems were found in the configuration file:
 +  parameter "cluster" is deprecated and will be ignored
 +  parameter "logfile" has been renamed to "log_file"
 +  parameter "loglevel" has been renamed to "log_level"
 +NOTICE: destination directory "/var/lib/postgresql/12/main" provided
 +INFO: connecting to source node
 +DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
 +DETAIL: current installation size is 31 MB
 +NOTICE: checking for available walsenders on the source node (2 required)
 +NOTICE: checking replication connections can be made to the source server (2 required)
 +INFO: creating directory "/var/lib/postgresql/12/main"...
 +NOTICE: starting backup (using pg_basebackup)...
 +HINT: this may take some time; consider using the -c/--fast-checkpoint option
 +INFO: executing:
 +  pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream
 +NOTICE: standby clone (using pg_basebackup) complete
 +NOTICE: you can now start your PostgreSQL server
 +HINT: for example: pg_ctl -D /var/lib/postgresql/12/main start
 +HINT: after starting the server, you need to register this standby with "repmgr standby register"
 +postgres@server2:~$
 +</Code>
 +
 +After that, we should be able to see the replication working:
 +
 +<Code:bash|Check status of REPMGR>
 +postgres@server2:~$ repmgr cluster show
 +WARNING: the following problems were found in the configuration file:
 +  parameter "cluster" is deprecated and will be ignored
 +  parameter "logfile" has been renamed to "log_file"
 +  parameter "loglevel" has been renamed to "log_level"
 + ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
 +----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 +  | server1 | primary | * running |          | default  | 100      | 1        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
 +  | server2 | standby |   running | server1  | default  | 100      | 1        | host=server2 user=repmgr dbname=repmgr connect_timeout=2
 +postgres@server2:~$
 +</Code>
 +
 +
 +We can test the connection as follows:
 +
 +====Test Configuration====
 +Firstly, we will test the normal functionality and then we will test the failove.
 +
 +
 +For the functionality, we can create a table on primary and see if it is present on the standby to test:
 +
 +
 +==On Primary==
 +<Code:bash|Test Functionality>
 +repmgr=# create table test(id integer);
 +CREATE TABLE
 +repmgr=# insert into test values(1);
 +INSERT 0 1
 +repmgr=# commit;
 +WARNING:  there is no transaction in progress
 +COMMIT
 +repmgr=#
 +</Code>
 +
 +==Standby==
 +Now, we can test it on the standby:
 +
 +<Code:bash|Test Functionality>
 +postgres=# \c repmgr
 +You are now connected to database "repmgr" as user "postgres".
 +repmgr=# select * from test;
 + id
 +----
 +  1
 +(1 row)
 +
 +repmgr=#
 +</Code>
 +
 +Now, let's test the failover:
 +
 +====Automatic Failover====
 +To test that, we can shutdown the primary and see if it is failover automatically (as per our settings)
 +Now, I changed a little bit the config, so please check Appendix A.
 +
 +===Modify Postgres user===
 +Firstly, we need to add the postgres user to the sudo group:
 +
 +<Code:bash|Add user to Sudoers group>
 +### Server 1
 +root@server2:/var/log/postgresql# usermod -aG sudo postgres
 +root@server2:/var/log/postgresql#
 +
 +### Server 2
 +root@server2:/var/log/postgresql# usermod -aG sudo postgres
 +root@server2:/var/log/postgresql#
 +</Code>
 +
 +
 +===Test the failover===
 +Now, we can finally test the failover.
 +
 +<Code:bash|Test Failover>
 +postgres@server1:/var/log/postgresql$ repmgr cluster show
 + ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
 +----+---------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------
 +  | server1 | standby | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr 
 +  | server2 | primary |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
 +</Code>
 +
 +
 +We can simply shutdown the prod:
 +
 +<Code:bash|Shutdown production>
 +root@server2:/var/log/postgresql# service postgresql restart
 +</Code>
 +
 +Once, this is done, we can see the log file from server1:
 +
 +<Code:bash|Check the log on standby>
 +[2021-09-28 10:46:45] [INFO] primary and this node have the same location ("default")
 +[2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default
 +[2021-09-28 10:46:45] [DEBUG] election result: WON
 +[2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself
 +[2021-09-28 10:46:45] [INFO] promote_command is:
 +  "repmgr standby promote -f /etc/repmgr.conf --log-to-file"
 +[2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgr.log"
 +
 +[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
 +[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +[2021-09-28 10:46:45] [NOTICE] promoting standby to primary
 +[2021-09-28 10:46:45] [DETAIL] promoting server "server1" (ID: 1) using pg_promote()
 +[2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
 +[2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed
 +[2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful
 +[2021-09-28 10:46:46] [DETAIL] server "server1" (ID: 1) was successfully promoted to primary
 +[2021-09-28 10:46:46] [INFO] 0 followers to notify
 +[2021-09-28 10:46:46] [INFO] switching to primary monitoring mode
 +[2021-09-28 10:46:46] [NOTICE] monitoring cluster primary "server1" (ID: 1)
 +</Code>
 +
 +As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath.
 +
 +===Dealing with the aftermath===
 +Now, we have one primary and 1 failed primary, we need to re-establish connection.
 +
 +<Code:bash|Re-establish the connection>
 +postgres@server1:/var/log/postgresql$ repmgr cluster show
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
 + ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
 +----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 +  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
 +  | server2 | primary | - failed  |          | default  | 100      | ?        | host=server2 user=repmgr dbname=repmgr connect_timeout=2
 +
 +WARNING: following issues were detected
 +  - unable to connect to node "server2" (ID: 2)
 +HINT: execute with --verbose option to see connection error messages
 +postgres@server1:
 +</Code>
 +
 +We can re-establish the connection as follows:
 +
 +1. Clear the old path and Resnap:
 +
 +<Code:bash|Resnap failed server's data>
 +root@server2:/var/log/postgresql# rm -rf /var/lib/postgresql/12/main/
 +root@server2:/var/log/postgresql# su - postgres
 +postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
 +WARNING: following problems with command line parameters detected:
 +  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
 +NOTICE: destination directory "/var/lib/postgresql/12/main" provided
 +INFO: connecting to source node
 +DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
 +DETAIL: current installation size is 31 MB
 +DEBUG: 2 node records returned by source node
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +DEBUG: upstream_node_id determined as 1
 +NOTICE: checking for available walsenders on the source node (2 required)
 +NOTICE: checking replication connections can be made to the source server (2 required)
 +INFO: creating directory "/var/lib/postgresql/12/main"...
 +DEBUG: create_replication_slot(): creating slot "repmgr_slot_2" on upstream
 +NOTICE: starting backup (using pg_basebackup)...
 +HINT: this may take some time; consider using the -c/--fast-checkpoint option
 +INFO: executing:
 +  /usr/lib/postgresql/12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream -S repmgr_slot_2
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +NOTICE: standby clone (using pg_basebackup) complete
 +NOTICE: you can now start your PostgreSQL server
 +HINT: for example: sudo systemctl start postgresql
 +HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
 +postgres@server2:~$
 +</Code>
 +
 +2. Startup the postgresql
 +3. Re-register
 +
 +<Code:bash|Startup PostgreSQL and Re-register>
 +postgres@server2:~$ logout
 +root@server2:/var/log/postgresql# service postgresql start
 +root@server2:/var/log/postgresql# su - postgres
 +postgres@server2:~$ repmgr standby register --force
 +INFO: connecting to local node "server2" (ID: 2)
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
 +INFO: connecting to primary database
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +INFO: standby registration complete
 +postgres@server2:~$
 +</Code>
 +
 +
 +Now we can check our status as follows:
 +
 +
 +<Code:bash|Check the newly recovered cluster>
 +postgres@server2:~$ repmgr cluster show
 +ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
 +----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
 +  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr 
 +  | server2 | standby |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
 +postgres@server2:~$
 +</Code>
 +
 +
 +For that configuration I have used the following appendix:
 +===Appendix===
 +<Code:bash|repmgrd.conf>
 +node_id='2'                    # A unique integer greater than zero (1/2 depends on the node)
 +node_name='server2'                       # An arbitrary (but unique) string; we recommend. (unique number for the cluster)
 +conninfo='host=server1 user=repmgr dbname=repmgr connect_timeout=2'         # Database connection information as a conninfo string.
 +data_directory='/var/lib/postgresql/12/main'             # The node's data directory. This is needed by repmgr
 +config_directory='/etc/postgresql/12/main'               # If configuration files are located outside the data
 +replication_user='repmgr'        # User to make replication connections with, if not set
 +replication_type='physical'      # Must be one of "physical" or "bdr".
 +location='default'               # An arbitrary string defining the location of the node; this
 +use_replication_slots='yes'      # whether to use physical replication slots
 +witness_sync_interval=15         # interval (in seconds) to synchronise node records
 +log_level='DEBUG'                 # Log level: possible values are DEBUG, INFO, NOTICE,
 +log_facility='STDERR'            # Logging facility: possible values are STDERR, or for
 +log_file='/var/log/postgresql/repmgr.log'
 +log_status_interval=300  # interval (in seconds) for repmgrd to log a status message
 +pg_bindir='/usr/lib/postgresql/12/bin'                          # Path to PostgreSQL binary directory (location
 +repmgr_bindir='/usr/lib/postgresql/12/bin'                      # Path to repmgr binary directory (location of the repmgr
 +use_primary_conninfo_password=false     # explicitly set "password" in "primary_conninfo"
 +passfile='/var/lib/postgresql/12/main/.pgpass'       # path to .pgpass file to include in "primary_conninfo"
 +ssh_options='-o StrictHostKeyChecking=no'       # Options to append to "ssh"
 +promote_check_timeout=60                # The length of time (in seconds) to wait
 +promote_check_interval=1                # The interval (in seconds) to check whether
 +primary_follow_timeout=60               # The max length of time (in seconds) to wait
 +standby_follow_timeout=15               # The max length of time (in seconds) to wait
 +shutdown_check_timeout=60               # The max length of time (in seconds) to wait for the demotion
 +standby_reconnect_timeout=60            # The max length of time (in seconds) to wait
 +node_rejoin_timeout=60          # The maximum length of time (in seconds) to wait for
 +failover='automatic'                    # one of 'automatic', 'manual'.
 +priority=100                            # indicates a preferred priority for promoting nodes;
 +reconnect_attempts=6                    # Number of attempts which will be made to reconnect to an unreachable
 +reconnect_interval=10                   # Interval between attempts to reconnect to an unreachable
 +promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'  # command repmgrd executes when promoting a new primary; use something like:
 +follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'  # command repmgrd executes when instructing a standby to follow a new primary;
 +primary_notification_timeout=60                 # Interval (in seconds) which repmgrd on a standby
 +repmgrd_standby_startup_timeout=60              # Interval (in seconds) which repmgrd on a standby will wait
 +monitoring_history=no           # Whether to write monitoring data to the "montoring_history" table
 +monitor_interval_secs=2                 # Interval (in seconds) at which to write monitoring data
 +degraded_monitoring_timeout=-1          # Interval (in seconds) after which repmgrd will terminate if the
 +async_query_timeout=60          # Interval (in seconds) which repmgrd will wait before
 +standby_disconnect_on_failover=false    # If "true", in a failover situation wait for all standbys to
 +primary_visibility_consensus=false      # If "true", only continue with failover if no standbys have seen
 +service_start_command ='sudo systemctl start postgresql'
 +service_stop_command ='sudo systemctl stop postgresql'
 +service_restart_command ='sudo systemctl restart postgresql'
 +service_reload_command ='sudo systemctl reload postgresql'
 +archive_ready_warning=16                # repmgr node check --archive-ready
 +archive_ready_critical=128              #
 +replication_lag_warning=300             # repmgr node check --replication-lag
 +replication_lag_critical=600            #
 +bdr_local_monitoring_only=false         # Only monitor the local node; no checks will be
 +bdr_recovery_timeout=30                 # If a BDR node was offline and has become available
 +</Code>
 +====Manual Failover====
  
  • postgresql_repmgr_installation.1631794223.txt.gz
  • Last modified: 2021/09/16 12:10
  • by andonovj