postgresql_repmgr_installation

Overview
Virtual Environment Config
Server Config
Cluster Environment
- Configure User
- Connection Configuration
REPMGR

Overview

In this section, we will configure the virtual machines and install, and configure the environment. In the other other sections we will play with the environment.

We will use:

Vagrant - For Virtualization
Ubuntu - For Operation System
2nd Quandrant - For Rempgr

Virtual Environment Config

To configure the Virtual Environment, we will use Vagrant and Ubuntu machines. You can configure the following vagrant file:

Vagrant Config

C:\Virtual Box\REPMGR>vagrant init
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.

C:\Virtual Box\REPMGR>vagrant status
Current machine states:

server1                   not created (virtualbox)
server2                   not created (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

C:\Virtual Box\REPMGR>vagrant up
Bringing machine 'server1' up with 'virtualbox' provider...
Bringing machine 'server2' up with 'virtualbox' provider...
==> server1: Box 'consumerlab/ubuntu-server-16-04-LTS' could not be found. Attempting to find and install...
    server1: Box Provider: virtualbox
    server1: Box Version: >= 0
==> server1: Loading metadata for box 'consumerlab/ubuntu-server-16-04-LTS'
    server1: URL: https://vagrantcloud.com/consumerlab/ubuntu-server-16-04-LTS
==> server1: Adding box 'consumerlab/ubuntu-server-16-04-LTS' (v1.0.2) for provider: virtualbox
    server1: Downloading: https://vagrantcloud.com/consumerlab/boxes/ubuntu-server-16-04-LTS/versions/1.0.2/providers/virtualbox.box
***************************************

The vagrant file is the following:

Vagrant file

Vagrant.configure("2") do |config|
  (1..2).each do |n|
    config.vm.define "server#{n}" do |define|
      define.vm.box = "bento/ubuntu-20.04"
      define.ssh.insert_key = false
      define.vm.hostname = "server#{n}"
      define.vm.network :private_network, ip: "10.0.15.2#{n}"

      define.vm.provider "virtualbox" do |node|
        node.name = "server#{n}"
      end
    end
  end
end

Once the machines are up, we can start with the installation. Bare in mind that this vagrant will automatically configure the IPs as well, but will not configure /etc/hosts. Something which we will do now.

Server Config

For the servers we have to:

Configure /etc/hosts
Configure 2nd Quadrant Repository
Install 2nd Quadrant Packages

So, let's get going.

Modify /etc/hosts

That is pretty trivial, just add the IPs and the hostnames to /etc/hosts:

Hosts

10.0.15.21 server1
10.0.15.22 server2

That will allow, the hosts to see and ping each other.

Then we can configure the repositories:

Configure Repositories

I decided to use 2nd quadrant packages, as they are the ones I use for work as well :) First thing first, we have to update our repos and pull the latest metadata from them.

Update repos

root@server1:~# apt-get update
Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease
Get:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [1,213 kB]
Get:6 http://security.ubuntu.com/ubuntu focal-security/main i386 Packages [282 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main i386 Packages [535 kB]
Get:8 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [866 kB]
Get:9 http://archive.ubuntu.com/ubuntu focal-updates/main Translation-en [259 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/restricted i386 Packages [19.3 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [443 kB]
Get:12 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [167 kB]
Get:13 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [406 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-updates/restricted Translation-en [63.5 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe i386 Packages [633 kB]
Get:16 http://security.ubuntu.com/ubuntu focal-security/restricted i386 Packages [18.0 kB]
Get:17 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [58.3 kB]
Get:18 http://security.ubuntu.com/ubuntu focal-security/universe i386 Packages [507 kB]
Get:19 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [638 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [855 kB]
Get:21 http://archive.ubuntu.com/ubuntu focal-updates/universe Translation-en [182 kB]
Get:22 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [24.6 kB]
Get:23 http://archive.ubuntu.com/ubuntu focal-updates/multiverse i386 Packages [8,232 B]
Get:24 http://archive.ubuntu.com/ubuntu focal-updates/multiverse Translation-en [6,776 B]
Get:25 http://archive.ubuntu.com/ubuntu focal-backports/universe i386 Packages [4,736 B]
Get:26 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [5,800 B]
Get:27 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [101 kB]
Get:28 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [21.9 kB]
Get:29 http://security.ubuntu.com/ubuntu focal-security/multiverse Translation-en [4,948 B]
Fetched 7,652 kB in 7s (1,048 kB/s)
Reading package lists... Done

Install PostgreSQL

Then we can proceed with installation of PostgreSQL

Install PostgreSQL

root@server1:~# apt-get install -y postgresql
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libllvm10 libpq5 libsensors-config libsensors5 postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat
Suggested packages:
  lm-sensors postgresql-doc postgresql-doc-12 libjson-perl openssl-blacklist isag
The following NEW packages will be installed:
  libllvm10 libpq5 libsensors-config libsensors5 postgresql postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat
0 upgraded, 11 newly installed, 0 to remove and 54 not upgraded.
Need to get 15.1 MB/30.6 MB of archives.
After this operation, 122 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpq5 amd64 12.8-0ubuntu0.20.04.1 [116 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 postgresql-client-12 amd64 12.8-0ubuntu0.20.04.1 [1,046 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 postgresql-12 amd64 12.8-0ubuntu0.20.04.1 [13.5 MB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 sysstat amd64 12.2.0-2ubuntu0.1 [448 kB]
Fetched 15.1 MB in 15s (1,015 kB/s)
Preconfiguring packages ...
***************************************************

Success. You can now start the database server using:

    pg_ctlcluster 12 main start

Ver Cluster Port Status Owner    Data directory              Log file
12  main    5432 down   postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log
update-alternatives: using /usr/share/postgresql/12/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
Setting up sysstat (12.2.0-2ubuntu0.1) ...

Creating config file /etc/default/sysstat with new version
update-alternatives: using /usr/bin/sar.sysstat to provide /usr/bin/sar (sar) in auto mode
Created symlink /etc/systemd/system/multi-user.target.wants/sysstat.service → /lib/systemd/system/sysstat.service.
Setting up postgresql (12+214ubuntu0.1) ...
Processing triggers for systemd (245.4-4ubuntu3.11) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...

Do this on both machines: server1 and server2

Install REPMGR Repos

Then we can proceed with adding the repo for REPMGR

Add REPMGR Repo

root@server1:~# sudo add-apt-repository 'deb http://packages.2ndquadrant.com/repmgr3/apt/ xenial-2ndquadrant main'
Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease
Get:2 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease [5,044 B]
Hit:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Err:2 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6
Reading package lists... Done
W: GPG error: http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6
E: The repository 'http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
root@server1:~# sudo apt update
Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Get:3 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease [5,044 B]
Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Err:3 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6
Reading package lists... Done
W: GPG error: http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6
E: The repository 'http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
root@server1:~#

Configure SSH keys

To configure passwordless connection, we have to:

Generate public / private key pair
Put the public key in the authorized keys on the other host (e.g. the public key of server1, in the authorized_keys file on server2)

So let's get going:

Generate Public/Private Key pair

postgres@server2:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa):
Created directory '/var/lib/postgresql/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa
Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:KYxUhOKyll2anPYivR3ZrrRw/Kz7xDMVU0AtovXY4bA postgres@server2
The key's randomart image is:
+---[RSA 3072]----+
|     oo .oo.     |
|  . .. + o..     |
| . .. o Ooo      |
|. .. = E =o      |
| o+ * o S.       |
|.o B. +..        |
|. o..* *         |
| . o=.B o        |
|  ..o*==         |
+----[SHA256]-----+

This will generate us two files: Private and Public key. Private key, NEVER SHARE :)

Keys

postgres@server2:~/.ssh$ ls -alrt
total 20
-rw-r--r-- 1 postgres postgres  570 Sep 16 08:58 id_rsa.pub
-rw------- 1 postgres postgres 2602 Sep 16 08:58 id_rsa

The content of that server and put it in the authorized_keys file of server1 and vica versa:

Add public key to server1

postgres@server1:~/.ssh$ ls -alrt
total 24
-rw-r--r-- 1 postgres postgres  570 Sep 16 08:57 id_rsa.pub
-rw------- 1 postgres postgres 2602 Sep 16 08:57 id_rsa
-rw-rw-r-- 1 postgres postgres  571 Sep 16 08:59 authorized_keys
-rw-r--r-- 1 postgres postgres  444 Sep 16 08:59 known_hosts
drwx------ 2 postgres postgres 4096 Sep 16 08:59 .
drwxr-xr-x 5 postgres postgres 4096 Sep 16 09:12 ..
postgres@server1:~/.ssh$ tail -100 authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDc7WdWJWmc6ArXUkFFhArVa/42//MPue0u1HmgBK389GkpLnKZ8dBfNRcFfrvocCsQyJZu5MsfK/sRR2/JycjifnL45bavw+qqejmyiuJnMmiNEJI2sbturdE/y3vABw1QjDaCG+LEDJzGT3DzY3HvxcU+33vBZYpNx7YKSI+LQnyeT6iGNNoy5hNzI+/O8jM3pg4ReN+XC6NJ77maBmQ+EaLhLsxk3vmcnP6C7TCDxez1iTZHZpV9tNGFx/OCABAOgV6ELpzRPsS/3MBSfif8eeJ6Sc8JPVQSruOzf3lYYAouYZTLLJYtDZOw3rc0+v7fn17jVk9w981Zn4zzkacI5FbVHmtM6oAnCIkSUUJwXlQzZprF3C+Z+Z10mmtBlS9/NFdZklpK3rHh1T1a3Fu1SXWlkHZxKdLxr5ytUL89ubGAPSa4UkBXRalBTUFjcYhvqk5hPqO6a+KQO1ehcNmn7/ffdEPrattoOxnGCfKaWLbXyxFmt3WFCq5Jy/wZoq8= postgres@server2

postgres@server1:~/.ssh$

After you do that on both servers, you can connect without password:

Test Passwordless Configuration

postgres@server1:~/.ssh$ ssh server2
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-80-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Thu 16 Sep 2021 09:30:52 AM UTC

  System load:  0.08              Processes:             128
  Usage of /:   2.8% of 61.31GB   Users logged in:       1
  Memory usage: 25%               IPv4 address for eth0: 10.0.2.15
  Swap usage:   0%                IPv4 address for eth1: 10.0.15.22


This system is built by the Bento project by Chef Software
More information can be found at https://github.com/chef/bento
Last login: Thu Sep 16 08:59:34 2021 from 10.0.15.21
postgres@server2:~$

Cluster Environment

After we've installed and configured all the packages, we can proceed with cluster configuration. Firstly, we have to configure the database and the users:

Configure User

The user to be used for the replication can be configured as so:

Replication User Configuration

root@server1:~# su - postgres
postgres@server1:~$ createuser --replication --createdb --createrole --superuser repmgr
postgres@server1:~$ psql -c 'ALTER USER repmgr SET search_path TO repmgr_test, "$user", public;'
ALTER ROLE
postgres@server1:~$ createdb repmgr --owner=repmgr
postgres@server1:~$

Do that on both servers.

After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf

Connection Configuration

To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries.

Configure pg_hba.conf

host    repmgr repmgr 10.0.15.21/32 trust
host    replication repmgr 10.0.15.21/32 trust
host    repmgr repmgr 10.0.15.22/32 trust
host    replication repmgr 10.0.15.22/32 trust

We can test the settings as follows:

Test connections

postgres@server2:/etc/postgresql/12/main$ psql 'host=server2 dbname=repmgr user=repmgr'
psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

repmgr=# \q
postgres@server2:/etc/postgresql/12/main$ psql 'host=server1 dbname=repmgr user=repmgr'
psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

repmgr=#

Finally we have to set pre-loaded library for repmgr in the postgresql.conf

Set Pre-loaded Library

shared_preload_libraries = 'repmgr'

Do that on both servers, to verify the configuration. Now, finally we can configure the REPMGR

REPMGR

The default configuration of Repmgr is located in:

/etc/repmgr.conf
/etc/default/repmgrd

We can do a basic configuration as follows:

Configuration

Repmgr.conf

cluster=cluster
node_id=(1/2 - 1 for server1 and 2 for server2)
node_name=server2
conninfo='host=server(1/2, the same as before) user=repmgr dbname=repmgr connect_timeout=2'
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file'
logfile='/var/log/postgresql/repmgr.log'
loglevel=NOTICE
reconnect_attempts=4
reconnect_interval=5
data_directory='/var/lib/postgresql/12/main'

This files, defines the structure of the replication. Including the commands for promotion and follow-up command. The next file, will define either will be enabled or not, and which will file it will use.

Repmgrd

# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd

# disable repmgrd by default so it won't get started upon installation
# valid values: yes/no
REPMGRD_ENABLED=yes

# configuration file (required)
REPMGRD_CONF="/etc/repmgr.conf"

# additional options
#REPMGRD_OPTS=""

# user to run repmgrd as
#REPMGRD_USER=postgres

# repmgrd binary
#REPMGRD_BIN=/usr/bin/repmgrd

# pid file
#REPMGRD_PIDFILE=/var/run/repmgrd.pid

Furthermore, we have to make a symlink to the correct pg_ctl

Create symlink to pg_ctl

root@server2:~#  ln -s /usr/lib/postgresql/12/bin/pg_ctl /usr/bin/pg_ctl

We have to do that on both servers.

Cluster Initialization

To initialize a cluster, we can do it as so:

Initialize a cluster

postgres@server1:~$ repmgr primary register
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
  parameter "logfile" has been renamed to "log_file"
  parameter "loglevel" has been renamed to "log_level"
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
postgres@server1:~$

This will initialize the server and we can check that:

Check cluster status

postgres@server1:~$ repmgr cluster show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
  parameter "logfile" has been renamed to "log_file"
  parameter "loglevel" has been renamed to "log_level"
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 1  | server1 | primary | * running |          | default  | 100      | 1        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
postgres@server1:~$

After that is done, we can set-up the standby server (e.g. server2)

Setup Standby

To setup standby, we have to delete everything from the server and clone it:

Setup standby server

root@server2:~# rm -rf /var/lib/postgresql/12/main/
root@server2:~# su - postgres
postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
  parameter "logfile" has been renamed to "log_file"
  parameter "loglevel" has been renamed to "log_level"
NOTICE: destination directory "/var/lib/postgresql/12/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: creating directory "/var/lib/postgresql/12/main"...
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/postgresql/12/main start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
postgres@server2:~$

After that, we should be able to see the replication working:

Check status of REPMGR

postgres@server2:~$ repmgr cluster show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
  parameter "logfile" has been renamed to "log_file"
  parameter "loglevel" has been renamed to "log_level"
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 1  | server1 | primary | * running |          | default  | 100      | 1        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
 2  | server2 | standby |   running | server1  | default  | 100      | 1        | host=server2 user=repmgr dbname=repmgr connect_timeout=2
postgres@server2:~$

We can test the connection as follows:

Test Configuration

Firstly, we will test the normal functionality and then we will test the failove.

For the functionality, we can create a table on primary and see if it is present on the standby to test:

On Primary

Test Functionality

repmgr=# create table test(id integer);
CREATE TABLE
repmgr=# insert into test values(1);
INSERT 0 1
repmgr=# commit;
WARNING:  there is no transaction in progress
COMMIT
repmgr=#

Standby

Now, we can test it on the standby:

Test Functionality

postgres=# \c repmgr
You are now connected to database "repmgr" as user "postgres".
repmgr=# select * from test;
 id
----
  1
(1 row)

repmgr=#

Now, let's test the failover:

Automatic Failover

To test that, we can shutdown the primary and see if it is failover automatically (as per our settings) Now, I changed a little bit the config, so please check Appendix A.

Modify Postgres user

Firstly, we need to add the postgres user to the sudo group:

Add user to Sudoers group

### Server 1
root@server2:/var/log/postgresql# usermod -aG sudo postgres
root@server2:/var/log/postgresql#

### Server 2
root@server2:/var/log/postgresql# usermod -aG sudo postgres
root@server2:/var/log/postgresql#

Test the failover

Now, we can finally test the failover.

Test Failover

postgres@server1:/var/log/postgresql$ repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
----+---------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------
 1  | server1 | standby | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr 
 2  | server2 | primary |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr

We can simply shutdown the prod:

Shutdown production

root@server2:/var/log/postgresql# service postgresql restart

Once, this is done, we can see the log file from server1:

Check the log on standby

[2021-09-28 10:46:45] [INFO] primary and this node have the same location ("default")
[2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default
[2021-09-28 10:46:45] [DEBUG] election result: WON
[2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself
[2021-09-28 10:46:45] [INFO] promote_command is:
  "repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgr.log"

[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
[2021-09-28 10:46:45] [NOTICE] promoting standby to primary
[2021-09-28 10:46:45] [DETAIL] promoting server "server1" (ID: 1) using pg_promote()
[2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed
[2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful
[2021-09-28 10:46:46] [DETAIL] server "server1" (ID: 1) was successfully promoted to primary
[2021-09-28 10:46:46] [INFO] 0 followers to notify
[2021-09-28 10:46:46] [INFO] switching to primary monitoring mode
[2021-09-28 10:46:46] [NOTICE] monitoring cluster primary "server1" (ID: 1)

As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath.

Dealing with the aftermath

Now, we have one primary and 1 failed primary, we need to re-establish connection.

Re-establish the connection

postgres@server1:/var/log/postgresql$ repmgr cluster show
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 1  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
 2  | server2 | primary | - failed  |          | default  | 100      | ?        | host=server2 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "server2" (ID: 2)
HINT: execute with --verbose option to see connection error messages
postgres@server1:

We can re-establish the connection as follows:

1. Clear the old path and Resnap:

Resnap failed server's data

root@server2:/var/log/postgresql# rm -rf /var/lib/postgresql/12/main/
root@server2:/var/log/postgresql# su - postgres
postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
WARNING: following problems with command line parameters detected:
  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
NOTICE: destination directory "/var/lib/postgresql/12/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
DEBUG: 2 node records returned by source node
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
DEBUG: upstream_node_id determined as 1
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: creating directory "/var/lib/postgresql/12/main"...
DEBUG: create_replication_slot(): creating slot "repmgr_slot_2" on upstream
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/lib/postgresql/12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream -S repmgr_slot_2
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: sudo systemctl start postgresql
HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
postgres@server2:~$

2. Startup the postgresql 3. Re-register

Startup PostgreSQL and Re-register

postgres@server2:~$ logout
root@server2:/var/log/postgresql# service postgresql start
root@server2:/var/log/postgresql# su - postgres
postgres@server2:~$ repmgr standby register --force
INFO: connecting to local node "server2" (ID: 2)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
INFO: standby registration complete
postgres@server2:~$

Now we can check our status as follows:

Check the newly recovered cluster

postgres@server2:~$ repmgr cluster show
ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
 1  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr 
 2  | server2 | standby |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
postgres@server2:~$

For that configuration I have used the following appendix:

Appendix

repmgrd.conf

node_id='2'                    # A unique integer greater than zero (1/2 depends on the node)
node_name='server2'                       # An arbitrary (but unique) string; we recommend. (unique number for the cluster)
conninfo='host=server1 user=repmgr dbname=repmgr connect_timeout=2'         # Database connection information as a conninfo string.
data_directory='/var/lib/postgresql/12/main'             # The node's data directory. This is needed by repmgr
config_directory='/etc/postgresql/12/main'               # If configuration files are located outside the data
replication_user='repmgr'        # User to make replication connections with, if not set
replication_type='physical'      # Must be one of "physical" or "bdr".
location='default'               # An arbitrary string defining the location of the node; this
use_replication_slots='yes'      # whether to use physical replication slots
witness_sync_interval=15         # interval (in seconds) to synchronise node records
log_level='DEBUG'                 # Log level: possible values are DEBUG, INFO, NOTICE,
log_facility='STDERR'            # Logging facility: possible values are STDERR, or for
log_file='/var/log/postgresql/repmgr.log'
log_status_interval=300  # interval (in seconds) for repmgrd to log a status message
pg_bindir='/usr/lib/postgresql/12/bin'                          # Path to PostgreSQL binary directory (location
repmgr_bindir='/usr/lib/postgresql/12/bin'                      # Path to repmgr binary directory (location of the repmgr
use_primary_conninfo_password=false     # explicitly set "password" in "primary_conninfo"
passfile='/var/lib/postgresql/12/main/.pgpass'       # path to .pgpass file to include in "primary_conninfo"
ssh_options='-o StrictHostKeyChecking=no'       # Options to append to "ssh"
promote_check_timeout=60                # The length of time (in seconds) to wait
promote_check_interval=1                # The interval (in seconds) to check whether
primary_follow_timeout=60               # The max length of time (in seconds) to wait
standby_follow_timeout=15               # The max length of time (in seconds) to wait
shutdown_check_timeout=60               # The max length of time (in seconds) to wait for the demotion
standby_reconnect_timeout=60            # The max length of time (in seconds) to wait
node_rejoin_timeout=60          # The maximum length of time (in seconds) to wait for
failover='automatic'                    # one of 'automatic', 'manual'.
priority=100                            # indicates a preferred priority for promoting nodes;
reconnect_attempts=6                    # Number of attempts which will be made to reconnect to an unreachable
reconnect_interval=10                   # Interval between attempts to reconnect to an unreachable
promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'  # command repmgrd executes when promoting a new primary; use something like:
follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'  # command repmgrd executes when instructing a standby to follow a new primary;
primary_notification_timeout=60                 # Interval (in seconds) which repmgrd on a standby
repmgrd_standby_startup_timeout=60              # Interval (in seconds) which repmgrd on a standby will wait
monitoring_history=no           # Whether to write monitoring data to the "montoring_history" table
monitor_interval_secs=2                 # Interval (in seconds) at which to write monitoring data
degraded_monitoring_timeout=-1          # Interval (in seconds) after which repmgrd will terminate if the
async_query_timeout=60          # Interval (in seconds) which repmgrd will wait before
standby_disconnect_on_failover=false    # If "true", in a failover situation wait for all standbys to
primary_visibility_consensus=false      # If "true", only continue with failover if no standbys have seen
service_start_command ='sudo systemctl start postgresql'
service_stop_command ='sudo systemctl stop postgresql'
service_restart_command ='sudo systemctl restart postgresql'
service_reload_command ='sudo systemctl reload postgresql'
archive_ready_warning=16                # repmgr node check --archive-ready
archive_ready_critical=128              #
replication_lag_warning=300             # repmgr node check --replication-lag
replication_lag_critical=600            #
bdr_local_monitoring_only=false         # Only monitor the local node; no checks will be
bdr_recovery_timeout=30                 # If a BDR node was offline and has become available

Table of Contents

Overview

Virtual Environment Config

Server Config

Modify /etc/hosts

Configure Repositories

Install PostgreSQL

Install REPMGR Repos

Configure SSH keys

Cluster Environment

Configure User

Connection Configuration

REPMGR

Configuration

Cluster Initialization

Setup Standby

Test Configuration

On Primary

Standby

Automatic Failover

Modify Postgres user

Test the failover

Dealing with the aftermath

Appendix

Manual Failover