Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
postgresql_repmgr_installation [2021/09/28 08:49] – [Overview] andonovjpostgresql_repmgr_installation [2021/09/28 11:30] (current) andonovj
Line 10: Line 10:
  
  
-{{:1_g-5e4zhde9lojvmbdsiptg.png?400|}}+{{:1_g-5e4zhde9lojvmbdsiptg.png?600|}} 
 =====Virtual Environment Config===== =====Virtual Environment Config=====
 To configure the Virtual Environment, we will use Vagrant and Ubuntu machines. To configure the Virtual Environment, we will use Vagrant and Ubuntu machines.
Line 534: Line 535:
 ====Automatic Failover==== ====Automatic Failover====
 To test that, we can shutdown the primary and see if it is failover automatically (as per our settings) To test that, we can shutdown the primary and see if it is failover automatically (as per our settings)
 +Now, I changed a little bit the config, so please check Appendix A.
  
 +===Modify Postgres user===
 +Firstly, we need to add the postgres user to the sudo group:
 +
 +<Code:bash|Add user to Sudoers group>
 +### Server 1
 +root@server2:/var/log/postgresql# usermod -aG sudo postgres
 +root@server2:/var/log/postgresql#
 +
 +### Server 2
 +root@server2:/var/log/postgresql# usermod -aG sudo postgres
 +root@server2:/var/log/postgresql#
 +</Code>
 +
 +
 +===Test the failover===
 +Now, we can finally test the failover.
  
 <Code:bash|Test Failover> <Code:bash|Test Failover>
 +postgres@server1:/var/log/postgresql$ repmgr cluster show
 + ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
 +----+---------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------
 +  | server1 | standby | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr 
 +  | server2 | primary |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
 +</Code>
  
  
 +We can simply shutdown the prod:
 +
 +<Code:bash|Shutdown production>
 +root@server2:/var/log/postgresql# service postgresql restart
 </Code> </Code>
  
 +Once, this is done, we can see the log file from server1:
  
 +<Code:bash|Check the log on standby>
 +[2021-09-28 10:46:45] [INFO] primary and this node have the same location ("default")
 +[2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default
 +[2021-09-28 10:46:45] [DEBUG] election result: WON
 +[2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself
 +[2021-09-28 10:46:45] [INFO] promote_command is:
 +  "repmgr standby promote -f /etc/repmgr.conf --log-to-file"
 +[2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgr.log"
 +
 +[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
 +[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +[2021-09-28 10:46:45] [NOTICE] promoting standby to primary
 +[2021-09-28 10:46:45] [DETAIL] promoting server "server1" (ID: 1) using pg_promote()
 +[2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
 +[2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed
 +[2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful
 +[2021-09-28 10:46:46] [DETAIL] server "server1" (ID: 1) was successfully promoted to primary
 +[2021-09-28 10:46:46] [INFO] 0 followers to notify
 +[2021-09-28 10:46:46] [INFO] switching to primary monitoring mode
 +[2021-09-28 10:46:46] [NOTICE] monitoring cluster primary "server1" (ID: 1)
 +</Code>
 +
 +As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath.
 +
 +===Dealing with the aftermath===
 +Now, we have one primary and 1 failed primary, we need to re-establish connection.
 +
 +<Code:bash|Re-establish the connection>
 +postgres@server1:/var/log/postgresql$ repmgr cluster show
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
 + ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
 +----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 +  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
 +  | server2 | primary | - failed  |          | default  | 100      | ?        | host=server2 user=repmgr dbname=repmgr connect_timeout=2
 +
 +WARNING: following issues were detected
 +  - unable to connect to node "server2" (ID: 2)
 +HINT: execute with --verbose option to see connection error messages
 +postgres@server1:
 +</Code>
 +
 +We can re-establish the connection as follows:
 +
 +1. Clear the old path and Resnap:
 +
 +<Code:bash|Resnap failed server's data>
 +root@server2:/var/log/postgresql# rm -rf /var/lib/postgresql/12/main/
 +root@server2:/var/log/postgresql# su - postgres
 +postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
 +WARNING: following problems with command line parameters detected:
 +  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
 +NOTICE: destination directory "/var/lib/postgresql/12/main" provided
 +INFO: connecting to source node
 +DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
 +DETAIL: current installation size is 31 MB
 +DEBUG: 2 node records returned by source node
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +DEBUG: upstream_node_id determined as 1
 +NOTICE: checking for available walsenders on the source node (2 required)
 +NOTICE: checking replication connections can be made to the source server (2 required)
 +INFO: creating directory "/var/lib/postgresql/12/main"...
 +DEBUG: create_replication_slot(): creating slot "repmgr_slot_2" on upstream
 +NOTICE: starting backup (using pg_basebackup)...
 +HINT: this may take some time; consider using the -c/--fast-checkpoint option
 +INFO: executing:
 +  /usr/lib/postgresql/12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream -S repmgr_slot_2
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +NOTICE: standby clone (using pg_basebackup) complete
 +NOTICE: you can now start your PostgreSQL server
 +HINT: for example: sudo systemctl start postgresql
 +HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
 +postgres@server2:~$
 +</Code>
 +
 +2. Startup the postgresql
 +3. Re-register
 +
 +<Code:bash|Startup PostgreSQL and Re-register>
 +postgres@server2:~$ logout
 +root@server2:/var/log/postgresql# service postgresql start
 +root@server2:/var/log/postgresql# su - postgres
 +postgres@server2:~$ repmgr standby register --force
 +INFO: connecting to local node "server2" (ID: 2)
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
 +INFO: connecting to primary database
 +DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
 +INFO: standby registration complete
 +postgres@server2:~$
 +</Code>
 +
 +
 +Now we can check our status as follows:
 +
 +
 +<Code:bash|Check the newly recovered cluster>
 +postgres@server2:~$ repmgr cluster show
 +ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
 +----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
 +  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr 
 +  | server2 | standby |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
 +postgres@server2:~$
 +</Code>
 +
 +
 +For that configuration I have used the following appendix:
 +===Appendix===
 +<Code:bash|repmgrd.conf>
 +node_id='2'                    # A unique integer greater than zero (1/2 depends on the node)
 +node_name='server2'                       # An arbitrary (but unique) string; we recommend. (unique number for the cluster)
 +conninfo='host=server1 user=repmgr dbname=repmgr connect_timeout=2'         # Database connection information as a conninfo string.
 +data_directory='/var/lib/postgresql/12/main'             # The node's data directory. This is needed by repmgr
 +config_directory='/etc/postgresql/12/main'               # If configuration files are located outside the data
 +replication_user='repmgr'        # User to make replication connections with, if not set
 +replication_type='physical'      # Must be one of "physical" or "bdr".
 +location='default'               # An arbitrary string defining the location of the node; this
 +use_replication_slots='yes'      # whether to use physical replication slots
 +witness_sync_interval=15         # interval (in seconds) to synchronise node records
 +log_level='DEBUG'                 # Log level: possible values are DEBUG, INFO, NOTICE,
 +log_facility='STDERR'            # Logging facility: possible values are STDERR, or for
 +log_file='/var/log/postgresql/repmgr.log'
 +log_status_interval=300  # interval (in seconds) for repmgrd to log a status message
 +pg_bindir='/usr/lib/postgresql/12/bin'                          # Path to PostgreSQL binary directory (location
 +repmgr_bindir='/usr/lib/postgresql/12/bin'                      # Path to repmgr binary directory (location of the repmgr
 +use_primary_conninfo_password=false     # explicitly set "password" in "primary_conninfo"
 +passfile='/var/lib/postgresql/12/main/.pgpass'       # path to .pgpass file to include in "primary_conninfo"
 +ssh_options='-o StrictHostKeyChecking=no'       # Options to append to "ssh"
 +promote_check_timeout=60                # The length of time (in seconds) to wait
 +promote_check_interval=1                # The interval (in seconds) to check whether
 +primary_follow_timeout=60               # The max length of time (in seconds) to wait
 +standby_follow_timeout=15               # The max length of time (in seconds) to wait
 +shutdown_check_timeout=60               # The max length of time (in seconds) to wait for the demotion
 +standby_reconnect_timeout=60            # The max length of time (in seconds) to wait
 +node_rejoin_timeout=60          # The maximum length of time (in seconds) to wait for
 +failover='automatic'                    # one of 'automatic', 'manual'.
 +priority=100                            # indicates a preferred priority for promoting nodes;
 +reconnect_attempts=6                    # Number of attempts which will be made to reconnect to an unreachable
 +reconnect_interval=10                   # Interval between attempts to reconnect to an unreachable
 +promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'  # command repmgrd executes when promoting a new primary; use something like:
 +follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'  # command repmgrd executes when instructing a standby to follow a new primary;
 +primary_notification_timeout=60                 # Interval (in seconds) which repmgrd on a standby
 +repmgrd_standby_startup_timeout=60              # Interval (in seconds) which repmgrd on a standby will wait
 +monitoring_history=no           # Whether to write monitoring data to the "montoring_history" table
 +monitor_interval_secs=2                 # Interval (in seconds) at which to write monitoring data
 +degraded_monitoring_timeout=-1          # Interval (in seconds) after which repmgrd will terminate if the
 +async_query_timeout=60          # Interval (in seconds) which repmgrd will wait before
 +standby_disconnect_on_failover=false    # If "true", in a failover situation wait for all standbys to
 +primary_visibility_consensus=false      # If "true", only continue with failover if no standbys have seen
 +service_start_command ='sudo systemctl start postgresql'
 +service_stop_command ='sudo systemctl stop postgresql'
 +service_restart_command ='sudo systemctl restart postgresql'
 +service_reload_command ='sudo systemctl reload postgresql'
 +archive_ready_warning=16                # repmgr node check --archive-ready
 +archive_ready_critical=128              #
 +replication_lag_warning=300             # repmgr node check --replication-lag
 +replication_lag_critical=600            #
 +bdr_local_monitoring_only=false         # Only monitor the local node; no checks will be
 +bdr_recovery_timeout=30                 # If a BDR node was offline and has become available
 +</Code>
 ====Manual Failover==== ====Manual Failover====
  
  • postgresql_repmgr_installation.1632818943.txt.gz
  • Last modified: 2021/09/28 08:49
  • by andonovj