postgresql_repmgr [My wiki]

This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong.
=====Overview=====
In this section, we will configure the virtual machines and install, and configure the environment.
In the other other sections we will play with the environment.

We will use:

  * Vagrant - For Virtualization
  * Ubuntu - For Operation System
  * 2nd Quandrant - For Rempgr


{{:1_g-5e4zhde9lojvmbdsiptg.png?800|}}

=====Virtual Environment Config=====
To configure the Virtual Environment, we will use Vagrant and Ubuntu machines.
You can configure the following vagrant file:


<Code:bash|Vagrant Config>
C:\Virtual Box\REPMGR>vagrant init
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.

C:\Virtual Box\REPMGR>vagrant status
Current machine states:

server1                   not created (virtualbox)
server2                   not created (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

C:\Virtual Box\REPMGR>vagrant up
Bringing machine 'server1' up with 'virtualbox' provider...
Bringing machine 'server2' up with 'virtualbox' provider...
==> server1: Box 'consumerlab/ubuntu-server-16-04-LTS' could not be found. Attempting to find and install...
    server1: Box Provider: virtualbox
    server1: Box Version: >= 0
==> server1: Loading metadata for box 'consumerlab/ubuntu-server-16-04-LTS'
    server1: URL: https://vagrantcloud.com/consumerlab/ubuntu-server-16-04-LTS
==> server1: Adding box 'consumerlab/ubuntu-server-16-04-LTS' (v1.0.2) for provider: virtualbox
    server1: Downloading: https://vagrantcloud.com/consumerlab/boxes/ubuntu-server-16-04-LTS/versions/1.0.2/providers/virtualbox.box
***************************************
</Code>


The vagrant file is the following:

<Code:bash| Vagrant file>
Vagrant.configure("2") do |config|
  (1..2).each do |n|
    config.vm.define "server#{n}" do |define|
      define.vm.box = "bento/ubuntu-20.04"
      define.ssh.insert_key = false
      define.vm.hostname = "server#{n}"
      define.vm.network :private_network, ip: "10.0.15.2#{n}"

      define.vm.provider "virtualbox" do |node|
        node.name = "server#{n}"
      end
    end
  end
end
</Code>

Once the machines are up, we can start with the installation.
Bare in mind that this vagrant will automatically configure the IPs as well, but will not configure /etc/hosts. Something which we will do now.


=====Server Config=====
For the servers we have to:

  - Configure /etc/hosts
  - Configure 2nd Quadrant Repository
  - Install 2nd Quadrant Packages


So, let's get going.


====Modify /etc/hosts ====
That is pretty trivial, just add the IPs and the hostnames to /etc/hosts:


<Code:bash|Hosts>
10.0.15.21 server1
10.0.15.22 server2
</Code>

That will allow, the hosts to see and ping each other.

Then we can configure the repositories:


====Configure Repositories====
I decided to use 2nd quadrant packages, as they are the ones I use for work as well :)
First thing first, we have to update our repos and pull the latest metadata from them.

<Code:bash|Update repos>
root@server1:~# apt-get update
Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease
Get:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [1,213 kB]
Get:6 http://security.ubuntu.com/ubuntu focal-security/main i386 Packages [282 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main i386 Packages [535 kB]
Get:8 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [866 kB]
Get:9 http://archive.ubuntu.com/ubuntu focal-updates/main Translation-en [259 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/restricted i386 Packages [19.3 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [443 kB]
Get:12 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [167 kB]
Get:13 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [406 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-updates/restricted Translation-en [63.5 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe i386 Packages [633 kB]
Get:16 http://security.ubuntu.com/ubuntu focal-security/restricted i386 Packages [18.0 kB]
Get:17 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [58.3 kB]
Get:18 http://security.ubuntu.com/ubuntu focal-security/universe i386 Packages [507 kB]
Get:19 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [638 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [855 kB]
Get:21 http://archive.ubuntu.com/ubuntu focal-updates/universe Translation-en [182 kB]
Get:22 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [24.6 kB]
Get:23 http://archive.ubuntu.com/ubuntu focal-updates/multiverse i386 Packages [8,232 B]
Get:24 http://archive.ubuntu.com/ubuntu focal-updates/multiverse Translation-en [6,776 B]
Get:25 http://archive.ubuntu.com/ubuntu focal-backports/universe i386 Packages [4,736 B]
Get:26 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [5,800 B]
Get:27 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [101 kB]
Get:28 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [21.9 kB]
Get:29 http://security.ubuntu.com/ubuntu focal-security/multiverse Translation-en [4,948 B]
Fetched 7,652 kB in 7s (1,048 kB/s)
Reading package lists... Done
</Code>

====Install PostgreSQL====
Then we can proceed with installation of PostgreSQL
<Code:bash|Install PostgreSQL>
root@server1:~# apt-get install -y postgresql
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libllvm10 libpq5 libsensors-config libsensors5 postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat
Suggested packages:
  lm-sensors postgresql-doc postgresql-doc-12 libjson-perl openssl-blacklist isag
The following NEW packages will be installed:
  libllvm10 libpq5 libsensors-config libsensors5 postgresql postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat
0 upgraded, 11 newly installed, 0 to remove and 54 not upgraded.
Need to get 15.1 MB/30.6 MB of archives.
After this operation, 122 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpq5 amd64 12.8-0ubuntu0.20.04.1 [116 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 postgresql-client-12 amd64 12.8-0ubuntu0.20.04.1 [1,046 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 postgresql-12 amd64 12.8-0ubuntu0.20.04.1 [13.5 MB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 sysstat amd64 12.2.0-2ubuntu0.1 [448 kB]
Fetched 15.1 MB in 15s (1,015 kB/s)
Preconfiguring packages ...
***************************************************

Success. You can now start the database server using:

    pg_ctlcluster 12 main start

Ver Cluster Port Status Owner    Data directory              Log file
12  main    5432 down   postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log
update-alternatives: using /usr/share/postgresql/12/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
Setting up sysstat (12.2.0-2ubuntu0.1) ...

Creating config file /etc/default/sysstat with new version
update-alternatives: using /usr/bin/sar.sysstat to provide /usr/bin/sar (sar) in auto mode
Created symlink /etc/systemd/system/multi-user.target.wants/sysstat.service → /lib/systemd/system/sysstat.service.
Setting up postgresql (12+214ubuntu0.1) ...
Processing triggers for systemd (245.4-4ubuntu3.11) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
</Code>

Do this on both machines: server1 and server2

====Install REPMGR Repos====
Then we can proceed with adding the repo for REPMGR

<Code:bash|Add REPMGR Repo>
root@server1:~# sudo add-apt-repository 'deb http://packages.2ndquadrant.com/repmgr3/apt/ xenial-2ndquadrant main'
Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease
Get:2 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease [5,044 B]
Hit:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Err:2 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6
Reading package lists... Done
W: GPG error: http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6
E: The repository 'http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
root@server1:~# sudo apt update
Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Get:3 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease [5,044 B]
Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Err:3 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6
Reading package lists... Done
W: GPG error: http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6
E: The repository 'http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
root@server1:~#
</Code>

====Configure SSH keys====
To configure passwordless connection, we have to:

  - Generate public / private key pair
  - Put the public key in the authorized keys on the other host (e.g. the public key of server1, in the authorized_keys file on server2)

So let's get going:


<Code:bash|Generate Public/Private Key pair>
postgres@server2:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa):
Created directory '/var/lib/postgresql/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa
Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:KYxUhOKyll2anPYivR3ZrrRw/Kz7xDMVU0AtovXY4bA postgres@server2
The key's randomart image is:
+---[RSA 3072]----+
|     oo .oo.     |
|  . .. + o..     |
| . .. o Ooo      |
|. .. = E =o      |
| o+ * o S.       |
|.o B. +..        |
|. o..* *         |
| . o=.B o        |
|  ..o*==         |
+----[SHA256]-----+
</Code>

This will generate us two files: Private and Public key. Private key, NEVER SHARE :)

<Code:bash|Keys>
postgres@server2:~/.ssh$ ls -alrt
total 20
-rw-r--r-- 1 postgres postgres  570 Sep 16 08:58 id_rsa.pub
-rw------- 1 postgres postgres 2602 Sep 16 08:58 id_rsa
</Code>

The content of that server and put it in the authorized_keys file of server1 and vica versa:


<Code:bash| Add public key to server1>
postgres@server1:~/.ssh$ ls -alrt
total 24
-rw-r--r-- 1 postgres postgres  570 Sep 16 08:57 id_rsa.pub
-rw------- 1 postgres postgres 2602 Sep 16 08:57 id_rsa
-rw-rw-r-- 1 postgres postgres  571 Sep 16 08:59 authorized_keys
-rw-r--r-- 1 postgres postgres  444 Sep 16 08:59 known_hosts
drwx------ 2 postgres postgres 4096 Sep 16 08:59 .
drwxr-xr-x 5 postgres postgres 4096 Sep 16 09:12 ..
postgres@server1:~/.ssh$ tail -100 authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDc7WdWJWmc6ArXUkFFhArVa/42//MPue0u1HmgBK389GkpLnKZ8dBfNRcFfrvocCsQyJZu5MsfK/sRR2/JycjifnL45bavw+qqejmyiuJnMmiNEJI2sbturdE/y3vABw1QjDaCG+LEDJzGT3DzY3HvxcU+33vBZYpNx7YKSI+LQnyeT6iGNNoy5hNzI+/O8jM3pg4ReN+XC6NJ77maBmQ+EaLhLsxk3vmcnP6C7TCDxez1iTZHZpV9tNGFx/OCABAOgV6ELpzRPsS/3MBSfif8eeJ6Sc8JPVQSruOzf3lYYAouYZTLLJYtDZOw3rc0+v7fn17jVk9w981Zn4zzkacI5FbVHmtM6oAnCIkSUUJwXlQzZprF3C+Z+Z10mmtBlS9/NFdZklpK3rHh1T1a3Fu1SXWlkHZxKdLxr5ytUL89ubGAPSa4UkBXRalBTUFjcYhvqk5hPqO6a+KQO1ehcNmn7/ffdEPrattoOxnGCfKaWLbXyxFmt3WFCq5Jy/wZoq8= postgres@server2

postgres@server1:~/.ssh$
</Code>


After you do that on both servers, you can connect without password:

<Code:bash|Test Passwordless Configuration>
postgres@server1:~/.ssh$ ssh server2
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-80-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Thu 16 Sep 2021 09:30:52 AM UTC

  System load:  0.08              Processes:             128
  Usage of /:   2.8% of 61.31GB   Users logged in:       1
  Memory usage: 25%               IPv4 address for eth0: 10.0.2.15
  Swap usage:   0%                IPv4 address for eth1: 10.0.15.22


This system is built by the Bento project by Chef Software
More information can be found at https://github.com/chef/bento
Last login: Thu Sep 16 08:59:34 2021 from 10.0.15.21
postgres@server2:~$
</Code>

=====Cluster Environment=====
After we've installed and configured all the packages, we can proceed with cluster configuration. 
Firstly, we have to configure the database and the users:

====Configure User====
The user to be used for the replication can be configured as so:

<Code:bash|Replication User Configuration>
root@server1:~# su - postgres
postgres@server1:~$ createuser --replication --createdb --createrole --superuser repmgr
postgres@server1:~$ psql -c 'ALTER USER repmgr SET search_path TO repmgr_test, "$user", public;'
ALTER ROLE
postgres@server1:~$ createdb repmgr --owner=repmgr
postgres@server1:~$
</Code>

Do that on both servers.

After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf

====Connection Configuration====
To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries.

<Code:bash|Configure pg_hba.conf>
host    repmgr repmgr 10.0.15.21/32 trust
host    replication repmgr 10.0.15.21/32 trust
host    repmgr repmgr 10.0.15.22/32 trust
host    replication repmgr 10.0.15.22/32 trust
</Code>

We can test the settings as follows:


<Code:bash|Test connections>
postgres@server2:/etc/postgresql/12/main$ psql 'host=server2 dbname=repmgr user=repmgr'
psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

repmgr=# \q
postgres@server2:/etc/postgresql/12/main$ psql 'host=server1 dbname=repmgr user=repmgr'
psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

repmgr=#
</Code>

Finally we have to set pre-loaded library for repmgr in the postgresql.conf

<Code:bash|Set Pre-loaded Library>
shared_preload_libraries = 'repmgr'
</Code>

Do that on both servers, to verify the configuration.
Now, finally we can configure the REPMGR

=====REPMGR=====
The default configuration of Repmgr is located in:

  * /etc/repmgr.conf
  * /etc/default/repmgrd

We can do a basic configuration as follows:

====Configuration====
<Code:bash|Repmgr.conf>
cluster=cluster
node_id=(1/2 - 1 for server1 and 2 for server2)
node_name=server2
conninfo='host=server(1/2, the same as before) user=repmgr dbname=repmgr connect_timeout=2'
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file'
logfile='/var/log/postgresql/repmgr.log'
loglevel=NOTICE
reconnect_attempts=4
reconnect_interval=5
data_directory='/var/lib/postgresql/12/main'
</Code>


This files, defines the structure of the replication. Including the commands for promotion and follow-up command.
The next file, will define either will be enabled or not, and which will file it will use.

<Code:bash|Repmgrd>
# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd

# disable repmgrd by default so it won't get started upon installation
# valid values: yes/no
REPMGRD_ENABLED=yes

# configuration file (required)
REPMGRD_CONF="/etc/repmgr.conf"

# additional options
#REPMGRD_OPTS=""

# user to run repmgrd as
#REPMGRD_USER=postgres

# repmgrd binary
#REPMGRD_BIN=/usr/bin/repmgrd

# pid file
#REPMGRD_PIDFILE=/var/run/repmgrd.pid
</Code>

Furthermore, we have to make a symlink to the correct pg_ctl


<Code:bash|Create symlink to pg_ctl>
root@server2:~#  ln -s /usr/lib/postgresql/12/bin/pg_ctl /usr/bin/pg_ctl
</Code>

We have to do that on both servers.
====Cluster Initialization====
To initialize a cluster, we can do it as so:


<Code:bash|Initialize a cluster>
postgres@server1:~$ repmgr primary register
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
  parameter "logfile" has been renamed to "log_file"
  parameter "loglevel" has been renamed to "log_level"
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
postgres@server1:~$
</Code>

This will initialize the server and we can check that:

<Code:bash|Check cluster status>
postgres@server1:~$ repmgr cluster show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
  parameter "logfile" has been renamed to "log_file"
  parameter "loglevel" has been renamed to "log_level"
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 1  | server1 | primary | * running |          | default  | 100      | 1        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
postgres@server1:~$
</Code>

After that is done, we can set-up the standby server (e.g. server2)


====Setup Standby====
To setup standby, we have to delete everything from the server and clone it:


<Code:bash|Setup standby server>
root@server2:~# rm -rf /var/lib/postgresql/12/main/
root@server2:~# su - postgres
postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
  parameter "logfile" has been renamed to "log_file"
  parameter "loglevel" has been renamed to "log_level"
NOTICE: destination directory "/var/lib/postgresql/12/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: creating directory "/var/lib/postgresql/12/main"...
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/postgresql/12/main start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
postgres@server2:~$
</Code>

After that, we should be able to see the replication working:

<Code:bash|Check status of REPMGR>
postgres@server2:~$ repmgr cluster show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
  parameter "logfile" has been renamed to "log_file"
  parameter "loglevel" has been renamed to "log_level"
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 1  | server1 | primary | * running |          | default  | 100      | 1        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
 2  | server2 | standby |   running | server1  | default  | 100      | 1        | host=server2 user=repmgr dbname=repmgr connect_timeout=2
postgres@server2:~$
</Code>


We can test the connection as follows:

====Test Configuration====
Firstly, we will test the normal functionality and then we will test the failove.


For the functionality, we can create a table on primary and see if it is present on the standby to test:


==On Primary==
<Code:bash|Test Functionality>
repmgr=# create table test(id integer);
CREATE TABLE
repmgr=# insert into test values(1);
INSERT 0 1
repmgr=# commit;
WARNING:  there is no transaction in progress
COMMIT
repmgr=#
</Code>

==Standby==
Now, we can test it on the standby:

<Code:bash|Test Functionality>
postgres=# \c repmgr
You are now connected to database "repmgr" as user "postgres".
repmgr=# select * from test;
 id
----
  1
(1 row)

repmgr=#
</Code>

Now, let's test the failover:

====Automatic Failover====
To test that, we can shutdown the primary and see if it is failover automatically (as per our settings)
Now, I changed a little bit the config, so please check Appendix A.

===Modify Postgres user===
Firstly, we need to add the postgres user to the sudo group:

<Code:bash|Add user to Sudoers group>
### Server 1
root@server2:/var/log/postgresql# usermod -aG sudo postgres
root@server2:/var/log/postgresql#

### Server 2
root@server2:/var/log/postgresql# usermod -aG sudo postgres
root@server2:/var/log/postgresql#
</Code>


===Test the failover===
Now, we can finally test the failover.

<Code:bash|Test Failover>
postgres@server1:/var/log/postgresql$ repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
----+---------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------
 1  | server1 | standby | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr 
 2  | server2 | primary |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
</Code>


We can simply shutdown the prod:

<Code:bash|Shutdown production>
root@server2:/var/log/postgresql# service postgresql restart
</Code>

Once, this is done, we can see the log file from server1:

<Code:bash|Check the log on standby>
[2021-09-28 10:46:45] [INFO] primary and this node have the same location ("default")
[2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default
[2021-09-28 10:46:45] [DEBUG] election result: WON
[2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself
[2021-09-28 10:46:45] [INFO] promote_command is:
  "repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgr.log"

[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
[2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
[2021-09-28 10:46:45] [NOTICE] promoting standby to primary
[2021-09-28 10:46:45] [DETAIL] promoting server "server1" (ID: 1) using pg_promote()
[2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed
[2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful
[2021-09-28 10:46:46] [DETAIL] server "server1" (ID: 1) was successfully promoted to primary
[2021-09-28 10:46:46] [INFO] 0 followers to notify
[2021-09-28 10:46:46] [INFO] switching to primary monitoring mode
[2021-09-28 10:46:46] [NOTICE] monitoring cluster primary "server1" (ID: 1)
</Code>

As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath.

===Dealing with the aftermath===
Now, we have one primary and 1 failed primary, we need to re-establish connection.

<Code:bash|Re-establish the connection>
postgres@server1:/var/log/postgresql$ repmgr cluster show
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------
 1  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr connect_timeout=2
 2  | server2 | primary | - failed  |          | default  | 100      | ?        | host=server2 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "server2" (ID: 2)
HINT: execute with --verbose option to see connection error messages
postgres@server1:
</Code>

We can re-establish the connection as follows:

1. Clear the old path and Resnap:

<Code:bash|Resnap failed server's data>
root@server2:/var/log/postgresql# rm -rf /var/lib/postgresql/12/main/
root@server2:/var/log/postgresql# su - postgres
postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone
WARNING: following problems with command line parameters detected:
  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
NOTICE: destination directory "/var/lib/postgresql/12/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
DEBUG: 2 node records returned by source node
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
DEBUG: upstream_node_id determined as 1
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: creating directory "/var/lib/postgresql/12/main"...
DEBUG: create_replication_slot(): creating slot "repmgr_slot_2" on upstream
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/lib/postgresql/12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream -S repmgr_slot_2
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: sudo systemctl start postgresql
HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
postgres@server2:~$
</Code>

2. Startup the postgresql
3. Re-register

<Code:bash|Startup PostgreSQL and Re-register>
postgres@server2:~$ logout
root@server2:/var/log/postgresql# service postgresql start
root@server2:/var/log/postgresql# su - postgres
postgres@server2:~$ repmgr standby register --force
INFO: connecting to local node "server2" (ID: 2)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr"
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr"
INFO: standby registration complete
postgres@server2:~$
</Code>


Now we can check our status as follows:


<Code:bash|Check the newly recovered cluster>
postgres@server2:~$ repmgr cluster show
ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string    
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
 1  | server1 | primary | * running |          | default  | 100      | 3        | host=server1 user=repmgr dbname=repmgr 
 2  | server2 | standby |   running | server1  | default  | 100      | 3        | host=server2 user=repmgr dbname=repmgr
postgres@server2:~$
</Code>


For that configuration I have used the following appendix:
===Appendix===
<Code:bash|repmgrd.conf>
node_id='2'                    # A unique integer greater than zero (1/2 depends on the node)
node_name='server2'                       # An arbitrary (but unique) string; we recommend. (unique number for the cluster)
conninfo='host=server1 user=repmgr dbname=repmgr connect_timeout=2'         # Database connection information as a conninfo string.
data_directory='/var/lib/postgresql/12/main'             # The node's data directory. This is needed by repmgr
config_directory='/etc/postgresql/12/main'               # If configuration files are located outside the data
replication_user='repmgr'        # User to make replication connections with, if not set
replication_type='physical'      # Must be one of "physical" or "bdr".
location='default'               # An arbitrary string defining the location of the node; this
use_replication_slots='yes'      # whether to use physical replication slots
witness_sync_interval=15         # interval (in seconds) to synchronise node records
log_level='DEBUG'                 # Log level: possible values are DEBUG, INFO, NOTICE,
log_facility='STDERR'            # Logging facility: possible values are STDERR, or for
log_file='/var/log/postgresql/repmgr.log'
log_status_interval=300  # interval (in seconds) for repmgrd to log a status message
pg_bindir='/usr/lib/postgresql/12/bin'                          # Path to PostgreSQL binary directory (location
repmgr_bindir='/usr/lib/postgresql/12/bin'                      # Path to repmgr binary directory (location of the repmgr
use_primary_conninfo_password=false     # explicitly set "password" in "primary_conninfo"
passfile='/var/lib/postgresql/12/main/.pgpass'       # path to .pgpass file to include in "primary_conninfo"
ssh_options='-o StrictHostKeyChecking=no'       # Options to append to "ssh"
promote_check_timeout=60                # The length of time (in seconds) to wait
promote_check_interval=1                # The interval (in seconds) to check whether
primary_follow_timeout=60               # The max length of time (in seconds) to wait
standby_follow_timeout=15               # The max length of time (in seconds) to wait
shutdown_check_timeout=60               # The max length of time (in seconds) to wait for the demotion
standby_reconnect_timeout=60            # The max length of time (in seconds) to wait
node_rejoin_timeout=60          # The maximum length of time (in seconds) to wait for
failover='automatic'                    # one of 'automatic', 'manual'.
priority=100                            # indicates a preferred priority for promoting nodes;
reconnect_attempts=6                    # Number of attempts which will be made to reconnect to an unreachable
reconnect_interval=10                   # Interval between attempts to reconnect to an unreachable
promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'  # command repmgrd executes when promoting a new primary; use something like:
follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'  # command repmgrd executes when instructing a standby to follow a new primary;
primary_notification_timeout=60                 # Interval (in seconds) which repmgrd on a standby
repmgrd_standby_startup_timeout=60              # Interval (in seconds) which repmgrd on a standby will wait
monitoring_history=no           # Whether to write monitoring data to the "montoring_history" table
monitor_interval_secs=2                 # Interval (in seconds) at which to write monitoring data
degraded_monitoring_timeout=-1          # Interval (in seconds) after which repmgrd will terminate if the
async_query_timeout=60          # Interval (in seconds) which repmgrd will wait before
standby_disconnect_on_failover=false    # If "true", in a failover situation wait for all standbys to
primary_visibility_consensus=false      # If "true", only continue with failover if no standbys have seen
service_start_command ='sudo systemctl start postgresql'
service_stop_command ='sudo systemctl stop postgresql'
service_restart_command ='sudo systemctl restart postgresql'
service_reload_command ='sudo systemctl reload postgresql'
archive_ready_warning=16                # repmgr node check --archive-ready
archive_ready_critical=128              #
replication_lag_warning=300             # repmgr node check --replication-lag
replication_lag_critical=600            #
bdr_local_monitoring_only=false         # Only monitor the local node; no checks will be
bdr_recovery_timeout=30                 # If a BDR node was offline and has become available
</Code>
====Manual Failover====