Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
postgresql_repmgr_installation [2021/09/17 06:25] – andonovj | postgresql_repmgr_installation [2021/09/28 11:30] (current) – andonovj | ||
---|---|---|---|
Line 9: | Line 9: | ||
* 2nd Quandrant - For Rempgr | * 2nd Quandrant - For Rempgr | ||
+ | |||
+ | {{: | ||
=====Virtual Environment Config===== | =====Virtual Environment Config===== | ||
Line 533: | Line 535: | ||
====Automatic Failover==== | ====Automatic Failover==== | ||
To test that, we can shutdown the primary and see if it is failover automatically (as per our settings) | To test that, we can shutdown the primary and see if it is failover automatically (as per our settings) | ||
+ | Now, I changed a little bit the config, so please check Appendix A. | ||
+ | ===Modify Postgres user=== | ||
+ | Firstly, we need to add the postgres user to the sudo group: | ||
+ | |||
+ | < | ||
+ | ### Server 1 | ||
+ | root@server2:/ | ||
+ | root@server2:/ | ||
+ | |||
+ | ### Server 2 | ||
+ | root@server2:/ | ||
+ | root@server2:/ | ||
+ | </ | ||
+ | |||
+ | |||
+ | ===Test the failover=== | ||
+ | Now, we can finally test the failover. | ||
< | < | ||
+ | postgres@server1:/ | ||
+ | ID | Name | Role | Status | ||
+ | ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------- | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | We can simply shutdown the prod: | ||
+ | |||
+ | < | ||
+ | root@server2:/ | ||
</ | </ | ||
+ | Once, this is done, we can see the log file from server1: | ||
+ | < | ||
+ | [2021-09-28 10:46:45] [INFO] primary and this node have the same location (" | ||
+ | [2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default | ||
+ | [2021-09-28 10:46:45] [DEBUG] election result: WON | ||
+ | [2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself | ||
+ | [2021-09-28 10:46:45] [INFO] promote_command is: | ||
+ | " | ||
+ | [2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/ | ||
+ | |||
+ | [2021-09-28 10:46:45] [DEBUG] connecting to: " | ||
+ | [2021-09-28 10:46:45] [DEBUG] connecting to: " | ||
+ | [2021-09-28 10:46:45] [DEBUG] connecting to: " | ||
+ | [2021-09-28 10:46:45] [NOTICE] promoting standby to primary | ||
+ | [2021-09-28 10:46:45] [DETAIL] promoting server " | ||
+ | [2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter " | ||
+ | [2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed | ||
+ | [2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful | ||
+ | [2021-09-28 10:46:46] [DETAIL] server " | ||
+ | [2021-09-28 10:46:46] [INFO] 0 followers to notify | ||
+ | [2021-09-28 10:46:46] [INFO] switching to primary monitoring mode | ||
+ | [2021-09-28 10:46:46] [NOTICE] monitoring cluster primary " | ||
+ | </ | ||
+ | |||
+ | As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath. | ||
+ | |||
+ | ===Dealing with the aftermath=== | ||
+ | Now, we have one primary and 1 failed primary, we need to re-establish connection. | ||
+ | |||
+ | < | ||
+ | postgres@server1:/ | ||
+ | DEBUG: connecting to: " | ||
+ | DEBUG: connecting to: " | ||
+ | DEBUG: connecting to: " | ||
+ | ID | Name | Role | Status | ||
+ | ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- | ||
+ | | ||
+ | | ||
+ | |||
+ | WARNING: following issues were detected | ||
+ | - unable to connect to node " | ||
+ | HINT: execute with --verbose option to see connection error messages | ||
+ | postgres@server1: | ||
+ | </ | ||
+ | |||
+ | We can re-establish the connection as follows: | ||
+ | |||
+ | 1. Clear the old path and Resnap: | ||
+ | |||
+ | < | ||
+ | root@server2:/ | ||
+ | root@server2:/ | ||
+ | postgres@server2: | ||
+ | WARNING: following problems with command line parameters detected: | ||
+ | " | ||
+ | NOTICE: destination directory "/ | ||
+ | INFO: connecting to source node | ||
+ | DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr | ||
+ | DETAIL: current installation size is 31 MB | ||
+ | DEBUG: 2 node records returned by source node | ||
+ | DEBUG: connecting to: " | ||
+ | DEBUG: upstream_node_id determined as 1 | ||
+ | NOTICE: checking for available walsenders on the source node (2 required) | ||
+ | NOTICE: checking replication connections can be made to the source server (2 required) | ||
+ | INFO: creating directory "/ | ||
+ | DEBUG: create_replication_slot(): | ||
+ | NOTICE: starting backup (using pg_basebackup)... | ||
+ | HINT: this may take some time; consider using the -c/ | ||
+ | INFO: executing: | ||
+ | / | ||
+ | DEBUG: connecting to: " | ||
+ | NOTICE: standby clone (using pg_basebackup) complete | ||
+ | NOTICE: you can now start your PostgreSQL server | ||
+ | HINT: for example: sudo systemctl start postgresql | ||
+ | HINT: after starting the server, you need to re-register this standby with " | ||
+ | postgres@server2: | ||
+ | </ | ||
+ | |||
+ | 2. Startup the postgresql | ||
+ | 3. Re-register | ||
+ | |||
+ | < | ||
+ | postgres@server2: | ||
+ | root@server2:/ | ||
+ | root@server2:/ | ||
+ | postgres@server2: | ||
+ | INFO: connecting to local node " | ||
+ | DEBUG: connecting to: " | ||
+ | INFO: connecting to primary database | ||
+ | DEBUG: connecting to: " | ||
+ | INFO: standby registration complete | ||
+ | postgres@server2: | ||
+ | </ | ||
+ | |||
+ | |||
+ | Now we can check our status as follows: | ||
+ | |||
+ | |||
+ | < | ||
+ | postgres@server2: | ||
+ | ID | Name | Role | Status | ||
+ | ----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------ | ||
+ | | ||
+ | | ||
+ | postgres@server2: | ||
+ | </ | ||
+ | |||
+ | |||
+ | For that configuration I have used the following appendix: | ||
+ | ===Appendix=== | ||
+ | < | ||
+ | node_id=' | ||
+ | node_name=' | ||
+ | conninfo=' | ||
+ | data_directory='/ | ||
+ | config_directory='/ | ||
+ | replication_user=' | ||
+ | replication_type=' | ||
+ | location=' | ||
+ | use_replication_slots=' | ||
+ | witness_sync_interval=15 | ||
+ | log_level=' | ||
+ | log_facility=' | ||
+ | log_file='/ | ||
+ | log_status_interval=300 | ||
+ | pg_bindir='/ | ||
+ | repmgr_bindir='/ | ||
+ | use_primary_conninfo_password=false | ||
+ | passfile='/ | ||
+ | ssh_options=' | ||
+ | promote_check_timeout=60 | ||
+ | promote_check_interval=1 | ||
+ | primary_follow_timeout=60 | ||
+ | standby_follow_timeout=15 | ||
+ | shutdown_check_timeout=60 | ||
+ | standby_reconnect_timeout=60 | ||
+ | node_rejoin_timeout=60 | ||
+ | failover=' | ||
+ | priority=100 | ||
+ | reconnect_attempts=6 | ||
+ | reconnect_interval=10 | ||
+ | promote_command=' | ||
+ | follow_command=' | ||
+ | primary_notification_timeout=60 | ||
+ | repmgrd_standby_startup_timeout=60 | ||
+ | monitoring_history=no | ||
+ | monitor_interval_secs=2 | ||
+ | degraded_monitoring_timeout=-1 | ||
+ | async_query_timeout=60 | ||
+ | standby_disconnect_on_failover=false | ||
+ | primary_visibility_consensus=false | ||
+ | service_start_command ='sudo systemctl start postgresql' | ||
+ | service_stop_command ='sudo systemctl stop postgresql' | ||
+ | service_restart_command ='sudo systemctl restart postgresql' | ||
+ | service_reload_command ='sudo systemctl reload postgresql' | ||
+ | archive_ready_warning=16 | ||
+ | archive_ready_critical=128 | ||
+ | replication_lag_warning=300 | ||
+ | replication_lag_critical=600 | ||
+ | bdr_local_monitoring_only=false | ||
+ | bdr_recovery_timeout=30 | ||
+ | </ | ||
====Manual Failover==== | ====Manual Failover==== | ||