Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
postgresql_repmgr_installation [2021/09/16 09:39] – [Cluster Environment] andonovj | postgresql_repmgr_installation [2021/09/28 11:30] (current) – andonovj | ||
---|---|---|---|
Line 9: | Line 9: | ||
* 2nd Quandrant - For Rempgr | * 2nd Quandrant - For Rempgr | ||
+ | |||
+ | {{: | ||
=====Virtual Environment Config===== | =====Virtual Environment Config===== | ||
Line 69: | Line 71: | ||
- | ====Server Config==== | + | =====Server Config===== |
For the servers we have to: | For the servers we have to: | ||
Line 80: | Line 82: | ||
- | ===Modify /etc/hosts === | + | ====Modify / |
That is pretty trivial, just add the IPs and the hostnames to /etc/hosts: | That is pretty trivial, just add the IPs and the hostnames to /etc/hosts: | ||
Line 94: | Line 96: | ||
- | ===Configure Repositories=== | + | ====Configure Repositories==== |
I decided to use 2nd quadrant packages, as they are the ones I use for work as well :) | I decided to use 2nd quadrant packages, as they are the ones I use for work as well :) | ||
First thing first, we have to update our repos and pull the latest metadata from them. | First thing first, we have to update our repos and pull the latest metadata from them. | ||
Line 133: | Line 135: | ||
</ | </ | ||
- | ===Install PostgreSQL=== | + | ====Install PostgreSQL==== |
Then we can proceed with installation of PostgreSQL | Then we can proceed with installation of PostgreSQL | ||
< | < | ||
Line 177: | Line 179: | ||
Do this on both machines: server1 and server2 | Do this on both machines: server1 and server2 | ||
- | ===Install REPMGR Repos=== | + | ====Install REPMGR Repos==== |
Then we can proceed with adding the repo for REPMGR | Then we can proceed with adding the repo for REPMGR | ||
Line 210: | Line 212: | ||
</ | </ | ||
- | ===Configure SSH keys=== | + | ====Configure SSH keys==== |
To configure passwordless connection, we have to: | To configure passwordless connection, we have to: | ||
Line 296: | Line 298: | ||
</ | </ | ||
- | ====Cluster Environment==== | + | =====Cluster Environment===== |
After we've installed and configured all the packages, we can proceed with cluster configuration. | After we've installed and configured all the packages, we can proceed with cluster configuration. | ||
Firstly, we have to configure the database and the users: | Firstly, we have to configure the database and the users: | ||
- | ===Configure User=== | + | ====Configure User==== |
The user to be used for the replication can be configured as so: | The user to be used for the replication can be configured as so: | ||
Line 316: | Line 318: | ||
After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf | After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf | ||
- | ===Connection Configuration=== | + | ====Connection Configuration==== |
To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries. | To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries. | ||
Line 342: | Line 344: | ||
repmgr=# | repmgr=# | ||
+ | </ | ||
+ | |||
+ | Finally we have to set pre-loaded library for repmgr in the postgresql.conf | ||
+ | |||
+ | < | ||
+ | shared_preload_libraries = ' | ||
</ | </ | ||
Do that on both servers, to verify the configuration. | Do that on both servers, to verify the configuration. | ||
+ | Now, finally we can configure the REPMGR | ||
+ | |||
+ | =====REPMGR===== | ||
+ | The default configuration of Repmgr is located in: | ||
+ | |||
+ | * / | ||
+ | * / | ||
+ | |||
+ | We can do a basic configuration as follows: | ||
+ | |||
+ | ====Configuration==== | ||
+ | < | ||
+ | cluster=cluster | ||
+ | node_id=(1/ | ||
+ | node_name=server2 | ||
+ | conninfo=' | ||
+ | failover=automatic | ||
+ | promote_command=' | ||
+ | follow_command=' | ||
+ | logfile='/ | ||
+ | loglevel=NOTICE | ||
+ | reconnect_attempts=4 | ||
+ | reconnect_interval=5 | ||
+ | data_directory='/ | ||
+ | </ | ||
+ | |||
+ | |||
+ | This files, defines the structure of the replication. Including the commands for promotion and follow-up command. | ||
+ | The next file, will define either will be enabled or not, and which will file it will use. | ||
+ | |||
+ | < | ||
+ | # default settings for repmgrd. This file is source by /bin/sh from | ||
+ | # / | ||
+ | |||
+ | # disable repmgrd by default so it won't get started upon installation | ||
+ | # valid values: yes/no | ||
+ | REPMGRD_ENABLED=yes | ||
+ | |||
+ | # configuration file (required) | ||
+ | REPMGRD_CONF="/ | ||
+ | |||
+ | # additional options | ||
+ | # | ||
+ | |||
+ | # user to run repmgrd as | ||
+ | # | ||
+ | |||
+ | # repmgrd binary | ||
+ | # | ||
+ | |||
+ | # pid file | ||
+ | # | ||
+ | </ | ||
+ | |||
+ | Furthermore, | ||
+ | |||
+ | |||
+ | < | ||
+ | root@server2: | ||
+ | </ | ||
+ | |||
+ | We have to do that on both servers. | ||
+ | ====Cluster Initialization==== | ||
+ | To initialize a cluster, we can do it as so: | ||
+ | |||
+ | |||
+ | < | ||
+ | postgres@server1: | ||
+ | WARNING: the following problems were found in the configuration file: | ||
+ | parameter " | ||
+ | parameter " | ||
+ | parameter " | ||
+ | INFO: connecting to primary database... | ||
+ | NOTICE: attempting to install extension " | ||
+ | NOTICE: " | ||
+ | NOTICE: primary node record (ID: 1) registered | ||
+ | postgres@server1: | ||
+ | </ | ||
+ | |||
+ | This will initialize the server and we can check that: | ||
+ | |||
+ | < | ||
+ | postgres@server1: | ||
+ | WARNING: the following problems were found in the configuration file: | ||
+ | parameter " | ||
+ | parameter " | ||
+ | parameter " | ||
+ | ID | Name | Role | Status | ||
+ | ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- | ||
+ | | ||
+ | postgres@server1: | ||
+ | </ | ||
+ | |||
+ | After that is done, we can set-up the standby server (e.g. server2) | ||
+ | |||
+ | |||
+ | ====Setup Standby==== | ||
+ | To setup standby, we have to delete everything from the server and clone it: | ||
+ | |||
+ | |||
+ | < | ||
+ | root@server2: | ||
+ | root@server2: | ||
+ | postgres@server2: | ||
+ | WARNING: the following problems were found in the configuration file: | ||
+ | parameter " | ||
+ | parameter " | ||
+ | parameter " | ||
+ | NOTICE: destination directory "/ | ||
+ | INFO: connecting to source node | ||
+ | DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr | ||
+ | DETAIL: current installation size is 31 MB | ||
+ | NOTICE: checking for available walsenders on the source node (2 required) | ||
+ | NOTICE: checking replication connections can be made to the source server (2 required) | ||
+ | INFO: creating directory "/ | ||
+ | NOTICE: starting backup (using pg_basebackup)... | ||
+ | HINT: this may take some time; consider using the -c/ | ||
+ | INFO: executing: | ||
+ | pg_basebackup -l " | ||
+ | NOTICE: standby clone (using pg_basebackup) complete | ||
+ | NOTICE: you can now start your PostgreSQL server | ||
+ | HINT: for example: pg_ctl -D / | ||
+ | HINT: after starting the server, you need to register this standby with " | ||
+ | postgres@server2: | ||
+ | </ | ||
+ | |||
+ | After that, we should be able to see the replication working: | ||
+ | |||
+ | < | ||
+ | postgres@server2: | ||
+ | WARNING: the following problems were found in the configuration file: | ||
+ | parameter " | ||
+ | parameter " | ||
+ | parameter " | ||
+ | ID | Name | Role | Status | ||
+ | ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- | ||
+ | | ||
+ | | ||
+ | postgres@server2: | ||
+ | </ | ||
+ | |||
+ | |||
+ | We can test the connection as follows: | ||
+ | |||
+ | ====Test Configuration==== | ||
+ | Firstly, we will test the normal functionality and then we will test the failove. | ||
+ | |||
+ | |||
+ | For the functionality, | ||
+ | |||
+ | |||
+ | ==On Primary== | ||
+ | < | ||
+ | repmgr=# create table test(id integer); | ||
+ | CREATE TABLE | ||
+ | repmgr=# insert into test values(1); | ||
+ | INSERT 0 1 | ||
+ | repmgr=# commit; | ||
+ | WARNING: | ||
+ | COMMIT | ||
+ | repmgr=# | ||
+ | </ | ||
+ | |||
+ | ==Standby== | ||
+ | Now, we can test it on the standby: | ||
+ | |||
+ | < | ||
+ | postgres=# \c repmgr | ||
+ | You are now connected to database " | ||
+ | repmgr=# select * from test; | ||
+ | id | ||
+ | ---- | ||
+ | 1 | ||
+ | (1 row) | ||
+ | |||
+ | repmgr=# | ||
+ | </ | ||
+ | |||
+ | Now, let's test the failover: | ||
+ | |||
+ | ====Automatic Failover==== | ||
+ | To test that, we can shutdown the primary and see if it is failover automatically (as per our settings) | ||
+ | Now, I changed a little bit the config, so please check Appendix A. | ||
+ | |||
+ | ===Modify Postgres user=== | ||
+ | Firstly, we need to add the postgres user to the sudo group: | ||
+ | |||
+ | < | ||
+ | ### Server 1 | ||
+ | root@server2:/ | ||
+ | root@server2:/ | ||
+ | |||
+ | ### Server 2 | ||
+ | root@server2:/ | ||
+ | root@server2:/ | ||
+ | </ | ||
+ | |||
+ | |||
+ | ===Test the failover=== | ||
+ | Now, we can finally test the failover. | ||
+ | |||
+ | < | ||
+ | postgres@server1:/ | ||
+ | ID | Name | Role | Status | ||
+ | ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------- | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | |||
+ | We can simply shutdown the prod: | ||
+ | |||
+ | < | ||
+ | root@server2:/ | ||
+ | </ | ||
+ | |||
+ | Once, this is done, we can see the log file from server1: | ||
+ | |||
+ | < | ||
+ | [2021-09-28 10:46:45] [INFO] primary and this node have the same location (" | ||
+ | [2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default | ||
+ | [2021-09-28 10:46:45] [DEBUG] election result: WON | ||
+ | [2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself | ||
+ | [2021-09-28 10:46:45] [INFO] promote_command is: | ||
+ | " | ||
+ | [2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/ | ||
+ | |||
+ | [2021-09-28 10:46:45] [DEBUG] connecting to: " | ||
+ | [2021-09-28 10:46:45] [DEBUG] connecting to: " | ||
+ | [2021-09-28 10:46:45] [DEBUG] connecting to: " | ||
+ | [2021-09-28 10:46:45] [NOTICE] promoting standby to primary | ||
+ | [2021-09-28 10:46:45] [DETAIL] promoting server " | ||
+ | [2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter " | ||
+ | [2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed | ||
+ | [2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful | ||
+ | [2021-09-28 10:46:46] [DETAIL] server " | ||
+ | [2021-09-28 10:46:46] [INFO] 0 followers to notify | ||
+ | [2021-09-28 10:46:46] [INFO] switching to primary monitoring mode | ||
+ | [2021-09-28 10:46:46] [NOTICE] monitoring cluster primary " | ||
+ | </ | ||
+ | |||
+ | As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath. | ||
+ | |||
+ | ===Dealing with the aftermath=== | ||
+ | Now, we have one primary and 1 failed primary, we need to re-establish connection. | ||
+ | |||
+ | < | ||
+ | postgres@server1:/ | ||
+ | DEBUG: connecting to: " | ||
+ | DEBUG: connecting to: " | ||
+ | DEBUG: connecting to: " | ||
+ | ID | Name | Role | Status | ||
+ | ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- | ||
+ | | ||
+ | | ||
+ | |||
+ | WARNING: following issues were detected | ||
+ | - unable to connect to node " | ||
+ | HINT: execute with --verbose option to see connection error messages | ||
+ | postgres@server1: | ||
+ | </ | ||
+ | |||
+ | We can re-establish the connection as follows: | ||
+ | |||
+ | 1. Clear the old path and Resnap: | ||
+ | |||
+ | < | ||
+ | root@server2:/ | ||
+ | root@server2:/ | ||
+ | postgres@server2: | ||
+ | WARNING: following problems with command line parameters detected: | ||
+ | " | ||
+ | NOTICE: destination directory "/ | ||
+ | INFO: connecting to source node | ||
+ | DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr | ||
+ | DETAIL: current installation size is 31 MB | ||
+ | DEBUG: 2 node records returned by source node | ||
+ | DEBUG: connecting to: " | ||
+ | DEBUG: upstream_node_id determined as 1 | ||
+ | NOTICE: checking for available walsenders on the source node (2 required) | ||
+ | NOTICE: checking replication connections can be made to the source server (2 required) | ||
+ | INFO: creating directory "/ | ||
+ | DEBUG: create_replication_slot(): | ||
+ | NOTICE: starting backup (using pg_basebackup)... | ||
+ | HINT: this may take some time; consider using the -c/ | ||
+ | INFO: executing: | ||
+ | / | ||
+ | DEBUG: connecting to: " | ||
+ | NOTICE: standby clone (using pg_basebackup) complete | ||
+ | NOTICE: you can now start your PostgreSQL server | ||
+ | HINT: for example: sudo systemctl start postgresql | ||
+ | HINT: after starting the server, you need to re-register this standby with " | ||
+ | postgres@server2: | ||
+ | </ | ||
+ | |||
+ | 2. Startup the postgresql | ||
+ | 3. Re-register | ||
+ | |||
+ | < | ||
+ | postgres@server2: | ||
+ | root@server2:/ | ||
+ | root@server2:/ | ||
+ | postgres@server2: | ||
+ | INFO: connecting to local node " | ||
+ | DEBUG: connecting to: " | ||
+ | INFO: connecting to primary database | ||
+ | DEBUG: connecting to: " | ||
+ | INFO: standby registration complete | ||
+ | postgres@server2: | ||
+ | </ | ||
+ | |||
+ | |||
+ | Now we can check our status as follows: | ||
+ | |||
+ | |||
+ | < | ||
+ | postgres@server2: | ||
+ | ID | Name | Role | Status | ||
+ | ----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------ | ||
+ | | ||
+ | | ||
+ | postgres@server2: | ||
+ | </ | ||
+ | |||
+ | |||
+ | For that configuration I have used the following appendix: | ||
+ | ===Appendix=== | ||
+ | < | ||
+ | node_id=' | ||
+ | node_name=' | ||
+ | conninfo=' | ||
+ | data_directory='/ | ||
+ | config_directory='/ | ||
+ | replication_user=' | ||
+ | replication_type=' | ||
+ | location=' | ||
+ | use_replication_slots=' | ||
+ | witness_sync_interval=15 | ||
+ | log_level=' | ||
+ | log_facility=' | ||
+ | log_file='/ | ||
+ | log_status_interval=300 | ||
+ | pg_bindir='/ | ||
+ | repmgr_bindir='/ | ||
+ | use_primary_conninfo_password=false | ||
+ | passfile='/ | ||
+ | ssh_options=' | ||
+ | promote_check_timeout=60 | ||
+ | promote_check_interval=1 | ||
+ | primary_follow_timeout=60 | ||
+ | standby_follow_timeout=15 | ||
+ | shutdown_check_timeout=60 | ||
+ | standby_reconnect_timeout=60 | ||
+ | node_rejoin_timeout=60 | ||
+ | failover=' | ||
+ | priority=100 | ||
+ | reconnect_attempts=6 | ||
+ | reconnect_interval=10 | ||
+ | promote_command=' | ||
+ | follow_command=' | ||
+ | primary_notification_timeout=60 | ||
+ | repmgrd_standby_startup_timeout=60 | ||
+ | monitoring_history=no | ||
+ | monitor_interval_secs=2 | ||
+ | degraded_monitoring_timeout=-1 | ||
+ | async_query_timeout=60 | ||
+ | standby_disconnect_on_failover=false | ||
+ | primary_visibility_consensus=false | ||
+ | service_start_command ='sudo systemctl start postgresql' | ||
+ | service_stop_command ='sudo systemctl stop postgresql' | ||
+ | service_restart_command ='sudo systemctl restart postgresql' | ||
+ | service_reload_command ='sudo systemctl reload postgresql' | ||
+ | archive_ready_warning=16 | ||
+ | archive_ready_critical=128 | ||
+ | replication_lag_warning=300 | ||
+ | replication_lag_critical=600 | ||
+ | bdr_local_monitoring_only=false | ||
+ | bdr_recovery_timeout=30 | ||
+ | </ | ||
+ | ====Manual Failover==== | ||
+ |