In this section, we will configure the virtual machines and install, and configure the environment. In the other other sections we will play with the environment.
We will use:
To configure the Virtual Environment, we will use Vagrant and Ubuntu machines. You can configure the following vagrant file:
Vagrant Config
C:\Virtual Box\REPMGR>vagrant init A `Vagrantfile` has been placed in this directory. You are now ready to `vagrant up` your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on `vagrantup.com` for more information on using Vagrant. C:\Virtual Box\REPMGR>vagrant status Current machine states: server1 not created (virtualbox) server2 not created (virtualbox) This environment represents multiple VMs. The VMs are all listed above with their current state. For more information about a specific VM, run `vagrant status NAME`. C:\Virtual Box\REPMGR>vagrant up Bringing machine 'server1' up with 'virtualbox' provider... Bringing machine 'server2' up with 'virtualbox' provider... ==> server1: Box 'consumerlab/ubuntu-server-16-04-LTS' could not be found. Attempting to find and install... server1: Box Provider: virtualbox server1: Box Version: >= 0 ==> server1: Loading metadata for box 'consumerlab/ubuntu-server-16-04-LTS' server1: URL: https://vagrantcloud.com/consumerlab/ubuntu-server-16-04-LTS ==> server1: Adding box 'consumerlab/ubuntu-server-16-04-LTS' (v1.0.2) for provider: virtualbox server1: Downloading: https://vagrantcloud.com/consumerlab/boxes/ubuntu-server-16-04-LTS/versions/1.0.2/providers/virtualbox.box ***************************************
The vagrant file is the following:
Vagrant file
Vagrant.configure("2") do |config| (1..2).each do |n| config.vm.define "server#{n}" do |define| define.vm.box = "bento/ubuntu-20.04" define.ssh.insert_key = false define.vm.hostname = "server#{n}" define.vm.network :private_network, ip: "10.0.15.2#{n}" define.vm.provider "virtualbox" do |node| node.name = "server#{n}" end end end end
Once the machines are up, we can start with the installation. Bare in mind that this vagrant will automatically configure the IPs as well, but will not configure /etc/hosts. Something which we will do now.
For the servers we have to:
So, let's get going.
That is pretty trivial, just add the IPs and the hostnames to /etc/hosts:
Hosts
10.0.15.21 server1 10.0.15.22 server2
That will allow, the hosts to see and ping each other.
Then we can configure the repositories:
I decided to use 2nd quadrant packages, as they are the ones I use for work as well :) First thing first, we have to update our repos and pull the latest metadata from them.
Update repos
root@server1:~# apt-get update Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease Get:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB] Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] Get:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB] Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [1,213 kB] Get:6 http://security.ubuntu.com/ubuntu focal-security/main i386 Packages [282 kB] Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main i386 Packages [535 kB] Get:8 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [866 kB] Get:9 http://archive.ubuntu.com/ubuntu focal-updates/main Translation-en [259 kB] Get:10 http://archive.ubuntu.com/ubuntu focal-updates/restricted i386 Packages [19.3 kB] Get:11 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [443 kB] Get:12 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [167 kB] Get:13 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [406 kB] Get:14 http://archive.ubuntu.com/ubuntu focal-updates/restricted Translation-en [63.5 kB] Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe i386 Packages [633 kB] Get:16 http://security.ubuntu.com/ubuntu focal-security/restricted i386 Packages [18.0 kB] Get:17 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [58.3 kB] Get:18 http://security.ubuntu.com/ubuntu focal-security/universe i386 Packages [507 kB] Get:19 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [638 kB] Get:20 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [855 kB] Get:21 http://archive.ubuntu.com/ubuntu focal-updates/universe Translation-en [182 kB] Get:22 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [24.6 kB] Get:23 http://archive.ubuntu.com/ubuntu focal-updates/multiverse i386 Packages [8,232 B] Get:24 http://archive.ubuntu.com/ubuntu focal-updates/multiverse Translation-en [6,776 B] Get:25 http://archive.ubuntu.com/ubuntu focal-backports/universe i386 Packages [4,736 B] Get:26 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [5,800 B] Get:27 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [101 kB] Get:28 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [21.9 kB] Get:29 http://security.ubuntu.com/ubuntu focal-security/multiverse Translation-en [4,948 B] Fetched 7,652 kB in 7s (1,048 kB/s) Reading package lists... Done
Then we can proceed with installation of PostgreSQL
Install PostgreSQL
root@server1:~# apt-get install -y postgresql Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libllvm10 libpq5 libsensors-config libsensors5 postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat Suggested packages: lm-sensors postgresql-doc postgresql-doc-12 libjson-perl openssl-blacklist isag The following NEW packages will be installed: libllvm10 libpq5 libsensors-config libsensors5 postgresql postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat 0 upgraded, 11 newly installed, 0 to remove and 54 not upgraded. Need to get 15.1 MB/30.6 MB of archives. After this operation, 122 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpq5 amd64 12.8-0ubuntu0.20.04.1 [116 kB] Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 postgresql-client-12 amd64 12.8-0ubuntu0.20.04.1 [1,046 kB] Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 postgresql-12 amd64 12.8-0ubuntu0.20.04.1 [13.5 MB] Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 sysstat amd64 12.2.0-2ubuntu0.1 [448 kB] Fetched 15.1 MB in 15s (1,015 kB/s) Preconfiguring packages ... *************************************************** Success. You can now start the database server using: pg_ctlcluster 12 main start Ver Cluster Port Status Owner Data directory Log file 12 main 5432 down postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log update-alternatives: using /usr/share/postgresql/12/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode Setting up sysstat (12.2.0-2ubuntu0.1) ... Creating config file /etc/default/sysstat with new version update-alternatives: using /usr/bin/sar.sysstat to provide /usr/bin/sar (sar) in auto mode Created symlink /etc/systemd/system/multi-user.target.wants/sysstat.service → /lib/systemd/system/sysstat.service. Setting up postgresql (12+214ubuntu0.1) ... Processing triggers for systemd (245.4-4ubuntu3.11) ... Processing triggers for man-db (2.9.1-1) ... Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
Do this on both machines: server1 and server2
Then we can proceed with adding the repo for REPMGR
Add REPMGR Repo
root@server1:~# sudo add-apt-repository 'deb http://packages.2ndquadrant.com/repmgr3/apt/ xenial-2ndquadrant main' Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease Get:2 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease [5,044 B] Hit:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] Err:2 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6 Reading package lists... Done W: GPG error: http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6 E: The repository 'http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease' is not signed. N: Updating from such a repository can't be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details. root@server1:~# sudo apt update Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease Hit:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease Get:3 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease [5,044 B] Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] Err:3 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6 Reading package lists... Done W: GPG error: http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6 E: The repository 'http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease' is not signed. N: Updating from such a repository can't be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details. root@server1:~#
To configure passwordless connection, we have to:
So let's get going:
Generate Public/Private Key pair
postgres@server2:~$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa): Created directory '/var/lib/postgresql/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub The key fingerprint is: SHA256:KYxUhOKyll2anPYivR3ZrrRw/Kz7xDMVU0AtovXY4bA postgres@server2 The key's randomart image is: +---[RSA 3072]----+ | oo .oo. | | . .. + o.. | | . .. o Ooo | |. .. = E =o | | o+ * o S. | |.o B. +.. | |. o..* * | | . o=.B o | | ..o*== | +----[SHA256]-----+
This will generate us two files: Private and Public key. Private key, NEVER SHARE :)
Keys
postgres@server2:~/.ssh$ ls -alrt total 20 -rw-r--r-- 1 postgres postgres 570 Sep 16 08:58 id_rsa.pub -rw------- 1 postgres postgres 2602 Sep 16 08:58 id_rsa
The content of that server and put it in the authorized_keys file of server1 and vica versa:
Add public key to server1
postgres@server1:~/.ssh$ ls -alrt total 24 -rw-r--r-- 1 postgres postgres 570 Sep 16 08:57 id_rsa.pub -rw------- 1 postgres postgres 2602 Sep 16 08:57 id_rsa -rw-rw-r-- 1 postgres postgres 571 Sep 16 08:59 authorized_keys -rw-r--r-- 1 postgres postgres 444 Sep 16 08:59 known_hosts drwx------ 2 postgres postgres 4096 Sep 16 08:59 . drwxr-xr-x 5 postgres postgres 4096 Sep 16 09:12 .. postgres@server1:~/.ssh$ tail -100 authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDc7WdWJWmc6ArXUkFFhArVa/42//MPue0u1HmgBK389GkpLnKZ8dBfNRcFfrvocCsQyJZu5MsfK/sRR2/JycjifnL45bavw+qqejmyiuJnMmiNEJI2sbturdE/y3vABw1QjDaCG+LEDJzGT3DzY3HvxcU+33vBZYpNx7YKSI+LQnyeT6iGNNoy5hNzI+/O8jM3pg4ReN+XC6NJ77maBmQ+EaLhLsxk3vmcnP6C7TCDxez1iTZHZpV9tNGFx/OCABAOgV6ELpzRPsS/3MBSfif8eeJ6Sc8JPVQSruOzf3lYYAouYZTLLJYtDZOw3rc0+v7fn17jVk9w981Zn4zzkacI5FbVHmtM6oAnCIkSUUJwXlQzZprF3C+Z+Z10mmtBlS9/NFdZklpK3rHh1T1a3Fu1SXWlkHZxKdLxr5ytUL89ubGAPSa4UkBXRalBTUFjcYhvqk5hPqO6a+KQO1ehcNmn7/ffdEPrattoOxnGCfKaWLbXyxFmt3WFCq5Jy/wZoq8= postgres@server2 postgres@server1:~/.ssh$
After you do that on both servers, you can connect without password:
Test Passwordless Configuration
postgres@server1:~/.ssh$ ssh server2 Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-80-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage System information as of Thu 16 Sep 2021 09:30:52 AM UTC System load: 0.08 Processes: 128 Usage of /: 2.8% of 61.31GB Users logged in: 1 Memory usage: 25% IPv4 address for eth0: 10.0.2.15 Swap usage: 0% IPv4 address for eth1: 10.0.15.22 This system is built by the Bento project by Chef Software More information can be found at https://github.com/chef/bento Last login: Thu Sep 16 08:59:34 2021 from 10.0.15.21 postgres@server2:~$
After we've installed and configured all the packages, we can proceed with cluster configuration. Firstly, we have to configure the database and the users:
The user to be used for the replication can be configured as so:
Replication User Configuration
root@server1:~# su - postgres postgres@server1:~$ createuser --replication --createdb --createrole --superuser repmgr postgres@server1:~$ psql -c 'ALTER USER repmgr SET search_path TO repmgr_test, "$user", public;' ALTER ROLE postgres@server1:~$ createdb repmgr --owner=repmgr postgres@server1:~$
Do that on both servers.
After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf
To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries.
Configure pg_hba.conf
host repmgr repmgr 10.0.15.21/32 trust host replication repmgr 10.0.15.21/32 trust host repmgr repmgr 10.0.15.22/32 trust host replication repmgr 10.0.15.22/32 trust
We can test the settings as follows:
Test connections
postgres@server2:/etc/postgresql/12/main$ psql 'host=server2 dbname=repmgr user=repmgr' psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. repmgr=# \q postgres@server2:/etc/postgresql/12/main$ psql 'host=server1 dbname=repmgr user=repmgr' psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. repmgr=#
Finally we have to set pre-loaded library for repmgr in the postgresql.conf
Set Pre-loaded Library
shared_preload_libraries = 'repmgr'
Do that on both servers, to verify the configuration. Now, finally we can configure the REPMGR
The default configuration of Repmgr is located in:
We can do a basic configuration as follows:
Repmgr.conf
cluster=cluster node_id=(1/2 - 1 for server1 and 2 for server2) node_name=server2 conninfo='host=server(1/2, the same as before) user=repmgr dbname=repmgr connect_timeout=2' failover=automatic promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file' follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file' logfile='/var/log/postgresql/repmgr.log' loglevel=NOTICE reconnect_attempts=4 reconnect_interval=5 data_directory='/var/lib/postgresql/12/main'
This files, defines the structure of the replication. Including the commands for promotion and follow-up command. The next file, will define either will be enabled or not, and which will file it will use.
Repmgrd
# default settings for repmgrd. This file is source by /bin/sh from # /etc/init.d/repmgrd # disable repmgrd by default so it won't get started upon installation # valid values: yes/no REPMGRD_ENABLED=yes # configuration file (required) REPMGRD_CONF="/etc/repmgr.conf" # additional options #REPMGRD_OPTS="" # user to run repmgrd as #REPMGRD_USER=postgres # repmgrd binary #REPMGRD_BIN=/usr/bin/repmgrd # pid file #REPMGRD_PIDFILE=/var/run/repmgrd.pid
Furthermore, we have to make a symlink to the correct pg_ctl
Create symlink to pg_ctl
root@server2:~# ln -s /usr/lib/postgresql/12/bin/pg_ctl /usr/bin/pg_ctl
We have to do that on both servers.
To initialize a cluster, we can do it as so:
Initialize a cluster
postgres@server1:~$ repmgr primary register WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored parameter "logfile" has been renamed to "log_file" parameter "loglevel" has been renamed to "log_level" INFO: connecting to primary database... NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed NOTICE: primary node record (ID: 1) registered postgres@server1:~$
This will initialize the server and we can check that:
Check cluster status
postgres@server1:~$ repmgr cluster show WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored parameter "logfile" has been renamed to "log_file" parameter "loglevel" has been renamed to "log_level" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- 1 | server1 | primary | * running | | default | 100 | 1 | host=server1 user=repmgr dbname=repmgr connect_timeout=2 postgres@server1:~$
After that is done, we can set-up the standby server (e.g. server2)
To setup standby, we have to delete everything from the server and clone it:
Setup standby server
root@server2:~# rm -rf /var/lib/postgresql/12/main/ root@server2:~# su - postgres postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored parameter "logfile" has been renamed to "log_file" parameter "loglevel" has been renamed to "log_level" NOTICE: destination directory "/var/lib/postgresql/12/main" provided INFO: connecting to source node DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: creating directory "/var/lib/postgresql/12/main"... NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /var/lib/postgresql/12/main start HINT: after starting the server, you need to register this standby with "repmgr standby register" postgres@server2:~$
After that, we should be able to see the replication working:
Check status of REPMGR
postgres@server2:~$ repmgr cluster show WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored parameter "logfile" has been renamed to "log_file" parameter "loglevel" has been renamed to "log_level" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- 1 | server1 | primary | * running | | default | 100 | 1 | host=server1 user=repmgr dbname=repmgr connect_timeout=2 2 | server2 | standby | running | server1 | default | 100 | 1 | host=server2 user=repmgr dbname=repmgr connect_timeout=2 postgres@server2:~$
We can test the connection as follows:
Firstly, we will test the normal functionality and then we will test the failove.
For the functionality, we can create a table on primary and see if it is present on the standby to test:
Test Functionality
repmgr=# create table test(id integer); CREATE TABLE repmgr=# insert into test values(1); INSERT 0 1 repmgr=# commit; WARNING: there is no transaction in progress COMMIT repmgr=#
Now, we can test it on the standby:
Test Functionality
postgres=# \c repmgr You are now connected to database "repmgr" as user "postgres". repmgr=# select * from test; id ---- 1 (1 row) repmgr=#
Now, let's test the failover:
To test that, we can shutdown the primary and see if it is failover automatically (as per our settings) Now, I changed a little bit the config, so please check Appendix A.
Firstly, we need to add the postgres user to the sudo group:
Add user to Sudoers group
### Server 1 root@server2:/var/log/postgresql# usermod -aG sudo postgres root@server2:/var/log/postgresql# ### Server 2 root@server2:/var/log/postgresql# usermod -aG sudo postgres root@server2:/var/log/postgresql#
Now, we can finally test the failover.
Test Failover
postgres@server1:/var/log/postgresql$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------- 1 | server1 | standby | * running | | default | 100 | 3 | host=server1 user=repmgr dbname=repmgr 2 | server2 | primary | running | server1 | default | 100 | 3 | host=server2 user=repmgr dbname=repmgr
We can simply shutdown the prod:
Shutdown production
root@server2:/var/log/postgresql# service postgresql restart
Once, this is done, we can see the log file from server1:
Check the log on standby
[2021-09-28 10:46:45] [INFO] primary and this node have the same location ("default") [2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default [2021-09-28 10:46:45] [DEBUG] election result: WON [2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself [2021-09-28 10:46:45] [INFO] promote_command is: "repmgr standby promote -f /etc/repmgr.conf --log-to-file" [2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgr.log" [2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" [2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr" [2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" [2021-09-28 10:46:45] [NOTICE] promoting standby to primary [2021-09-28 10:46:45] [DETAIL] promoting server "server1" (ID: 1) using pg_promote() [2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete [2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed [2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful [2021-09-28 10:46:46] [DETAIL] server "server1" (ID: 1) was successfully promoted to primary [2021-09-28 10:46:46] [INFO] 0 followers to notify [2021-09-28 10:46:46] [INFO] switching to primary monitoring mode [2021-09-28 10:46:46] [NOTICE] monitoring cluster primary "server1" (ID: 1)
As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath.
Now, we have one primary and 1 failed primary, we need to re-establish connection.
Re-establish the connection
postgres@server1:/var/log/postgresql$ repmgr cluster show DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- 1 | server1 | primary | * running | | default | 100 | 3 | host=server1 user=repmgr dbname=repmgr connect_timeout=2 2 | server2 | primary | - failed | | default | 100 | ? | host=server2 user=repmgr dbname=repmgr connect_timeout=2 WARNING: following issues were detected - unable to connect to node "server2" (ID: 2) HINT: execute with --verbose option to see connection error messages postgres@server1:
We can re-establish the connection as follows:
1. Clear the old path and Resnap:
Resnap failed server's data
root@server2:/var/log/postgresql# rm -rf /var/lib/postgresql/12/main/ root@server2:/var/log/postgresql# su - postgres postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone WARNING: following problems with command line parameters detected: "config_directory" set in repmgr.conf, but --copy-external-config-files not provided NOTICE: destination directory "/var/lib/postgresql/12/main" provided INFO: connecting to source node DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB DEBUG: 2 node records returned by source node DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" DEBUG: upstream_node_id determined as 1 NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: creating directory "/var/lib/postgresql/12/main"... DEBUG: create_replication_slot(): creating slot "repmgr_slot_2" on upstream NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/lib/postgresql/12/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream -S repmgr_slot_2 DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: sudo systemctl start postgresql HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record postgres@server2:~$
2. Startup the postgresql 3. Re-register
Startup PostgreSQL and Re-register
postgres@server2:~$ logout root@server2:/var/log/postgresql# service postgresql start root@server2:/var/log/postgresql# su - postgres postgres@server2:~$ repmgr standby register --force INFO: connecting to local node "server2" (ID: 2) DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr" INFO: connecting to primary database DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" INFO: standby registration complete postgres@server2:~$
Now we can check our status as follows:
Check the newly recovered cluster
postgres@server2:~$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------ 1 | server1 | primary | * running | | default | 100 | 3 | host=server1 user=repmgr dbname=repmgr 2 | server2 | standby | running | server1 | default | 100 | 3 | host=server2 user=repmgr dbname=repmgr postgres@server2:~$
For that configuration I have used the following appendix:
repmgrd.conf
node_id='2' # A unique integer greater than zero (1/2 depends on the node) node_name='server2' # An arbitrary (but unique) string; we recommend. (unique number for the cluster) conninfo='host=server1 user=repmgr dbname=repmgr connect_timeout=2' # Database connection information as a conninfo string. data_directory='/var/lib/postgresql/12/main' # The node's data directory. This is needed by repmgr config_directory='/etc/postgresql/12/main' # If configuration files are located outside the data replication_user='repmgr' # User to make replication connections with, if not set replication_type='physical' # Must be one of "physical" or "bdr". location='default' # An arbitrary string defining the location of the node; this use_replication_slots='yes' # whether to use physical replication slots witness_sync_interval=15 # interval (in seconds) to synchronise node records log_level='DEBUG' # Log level: possible values are DEBUG, INFO, NOTICE, log_facility='STDERR' # Logging facility: possible values are STDERR, or for log_file='/var/log/postgresql/repmgr.log' log_status_interval=300 # interval (in seconds) for repmgrd to log a status message pg_bindir='/usr/lib/postgresql/12/bin' # Path to PostgreSQL binary directory (location repmgr_bindir='/usr/lib/postgresql/12/bin' # Path to repmgr binary directory (location of the repmgr use_primary_conninfo_password=false # explicitly set "password" in "primary_conninfo" passfile='/var/lib/postgresql/12/main/.pgpass' # path to .pgpass file to include in "primary_conninfo" ssh_options='-o StrictHostKeyChecking=no' # Options to append to "ssh" promote_check_timeout=60 # The length of time (in seconds) to wait promote_check_interval=1 # The interval (in seconds) to check whether primary_follow_timeout=60 # The max length of time (in seconds) to wait standby_follow_timeout=15 # The max length of time (in seconds) to wait shutdown_check_timeout=60 # The max length of time (in seconds) to wait for the demotion standby_reconnect_timeout=60 # The max length of time (in seconds) to wait node_rejoin_timeout=60 # The maximum length of time (in seconds) to wait for failover='automatic' # one of 'automatic', 'manual'. priority=100 # indicates a preferred priority for promoting nodes; reconnect_attempts=6 # Number of attempts which will be made to reconnect to an unreachable reconnect_interval=10 # Interval between attempts to reconnect to an unreachable promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file' # command repmgrd executes when promoting a new primary; use something like: follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n' # command repmgrd executes when instructing a standby to follow a new primary; primary_notification_timeout=60 # Interval (in seconds) which repmgrd on a standby repmgrd_standby_startup_timeout=60 # Interval (in seconds) which repmgrd on a standby will wait monitoring_history=no # Whether to write monitoring data to the "montoring_history" table monitor_interval_secs=2 # Interval (in seconds) at which to write monitoring data degraded_monitoring_timeout=-1 # Interval (in seconds) after which repmgrd will terminate if the async_query_timeout=60 # Interval (in seconds) which repmgrd will wait before standby_disconnect_on_failover=false # If "true", in a failover situation wait for all standbys to primary_visibility_consensus=false # If "true", only continue with failover if no standbys have seen service_start_command ='sudo systemctl start postgresql' service_stop_command ='sudo systemctl stop postgresql' service_restart_command ='sudo systemctl restart postgresql' service_reload_command ='sudo systemctl reload postgresql' archive_ready_warning=16 # repmgr node check --archive-ready archive_ready_critical=128 # replication_lag_warning=300 # repmgr node check --replication-lag replication_lag_critical=600 # bdr_local_monitoring_only=false # Only monitor the local node; no checks will be bdr_recovery_timeout=30 # If a BDR node was offline and has become available