Show pageOld revisionsBacklinksODT exportBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. =====Overview===== In this section, we will configure the virtual machines and install, and configure the environment. In the other other sections we will play with the environment. We will use: * Vagrant - For Virtualization * Ubuntu - For Operation System * 2nd Quandrant - For Rempgr {{:1_g-5e4zhde9lojvmbdsiptg.png?800|}} =====Virtual Environment Config===== To configure the Virtual Environment, we will use Vagrant and Ubuntu machines. You can configure the following vagrant file: <Code:bash|Vagrant Config> C:\Virtual Box\REPMGR>vagrant init A `Vagrantfile` has been placed in this directory. You are now ready to `vagrant up` your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on `vagrantup.com` for more information on using Vagrant. C:\Virtual Box\REPMGR>vagrant status Current machine states: server1 not created (virtualbox) server2 not created (virtualbox) This environment represents multiple VMs. The VMs are all listed above with their current state. For more information about a specific VM, run `vagrant status NAME`. C:\Virtual Box\REPMGR>vagrant up Bringing machine 'server1' up with 'virtualbox' provider... Bringing machine 'server2' up with 'virtualbox' provider... ==> server1: Box 'consumerlab/ubuntu-server-16-04-LTS' could not be found. Attempting to find and install... server1: Box Provider: virtualbox server1: Box Version: >= 0 ==> server1: Loading metadata for box 'consumerlab/ubuntu-server-16-04-LTS' server1: URL: https://vagrantcloud.com/consumerlab/ubuntu-server-16-04-LTS ==> server1: Adding box 'consumerlab/ubuntu-server-16-04-LTS' (v1.0.2) for provider: virtualbox server1: Downloading: https://vagrantcloud.com/consumerlab/boxes/ubuntu-server-16-04-LTS/versions/1.0.2/providers/virtualbox.box *************************************** </Code> The vagrant file is the following: <Code:bash| Vagrant file> Vagrant.configure("2") do |config| (1..2).each do |n| config.vm.define "server#{n}" do |define| define.vm.box = "bento/ubuntu-20.04" define.ssh.insert_key = false define.vm.hostname = "server#{n}" define.vm.network :private_network, ip: "10.0.15.2#{n}" define.vm.provider "virtualbox" do |node| node.name = "server#{n}" end end end end </Code> Once the machines are up, we can start with the installation. Bare in mind that this vagrant will automatically configure the IPs as well, but will not configure /etc/hosts. Something which we will do now. =====Server Config===== For the servers we have to: - Configure /etc/hosts - Configure 2nd Quadrant Repository - Install 2nd Quadrant Packages So, let's get going. ====Modify /etc/hosts ==== That is pretty trivial, just add the IPs and the hostnames to /etc/hosts: <Code:bash|Hosts> 10.0.15.21 server1 10.0.15.22 server2 </Code> That will allow, the hosts to see and ping each other. Then we can configure the repositories: ====Configure Repositories==== I decided to use 2nd quadrant packages, as they are the ones I use for work as well :) First thing first, we have to update our repos and pull the latest metadata from them. <Code:bash|Update repos> root@server1:~# apt-get update Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease Get:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB] Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] Get:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB] Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [1,213 kB] Get:6 http://security.ubuntu.com/ubuntu focal-security/main i386 Packages [282 kB] Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main i386 Packages [535 kB] Get:8 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [866 kB] Get:9 http://archive.ubuntu.com/ubuntu focal-updates/main Translation-en [259 kB] Get:10 http://archive.ubuntu.com/ubuntu focal-updates/restricted i386 Packages [19.3 kB] Get:11 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [443 kB] Get:12 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [167 kB] Get:13 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [406 kB] Get:14 http://archive.ubuntu.com/ubuntu focal-updates/restricted Translation-en [63.5 kB] Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe i386 Packages [633 kB] Get:16 http://security.ubuntu.com/ubuntu focal-security/restricted i386 Packages [18.0 kB] Get:17 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [58.3 kB] Get:18 http://security.ubuntu.com/ubuntu focal-security/universe i386 Packages [507 kB] Get:19 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [638 kB] Get:20 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [855 kB] Get:21 http://archive.ubuntu.com/ubuntu focal-updates/universe Translation-en [182 kB] Get:22 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [24.6 kB] Get:23 http://archive.ubuntu.com/ubuntu focal-updates/multiverse i386 Packages [8,232 B] Get:24 http://archive.ubuntu.com/ubuntu focal-updates/multiverse Translation-en [6,776 B] Get:25 http://archive.ubuntu.com/ubuntu focal-backports/universe i386 Packages [4,736 B] Get:26 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [5,800 B] Get:27 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [101 kB] Get:28 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [21.9 kB] Get:29 http://security.ubuntu.com/ubuntu focal-security/multiverse Translation-en [4,948 B] Fetched 7,652 kB in 7s (1,048 kB/s) Reading package lists... Done </Code> ====Install PostgreSQL==== Then we can proceed with installation of PostgreSQL <Code:bash|Install PostgreSQL> root@server1:~# apt-get install -y postgresql Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libllvm10 libpq5 libsensors-config libsensors5 postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat Suggested packages: lm-sensors postgresql-doc postgresql-doc-12 libjson-perl openssl-blacklist isag The following NEW packages will be installed: libllvm10 libpq5 libsensors-config libsensors5 postgresql postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat 0 upgraded, 11 newly installed, 0 to remove and 54 not upgraded. Need to get 15.1 MB/30.6 MB of archives. After this operation, 122 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpq5 amd64 12.8-0ubuntu0.20.04.1 [116 kB] Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 postgresql-client-12 amd64 12.8-0ubuntu0.20.04.1 [1,046 kB] Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 postgresql-12 amd64 12.8-0ubuntu0.20.04.1 [13.5 MB] Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 sysstat amd64 12.2.0-2ubuntu0.1 [448 kB] Fetched 15.1 MB in 15s (1,015 kB/s) Preconfiguring packages ... *************************************************** Success. You can now start the database server using: pg_ctlcluster 12 main start Ver Cluster Port Status Owner Data directory Log file 12 main 5432 down postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log update-alternatives: using /usr/share/postgresql/12/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode Setting up sysstat (12.2.0-2ubuntu0.1) ... Creating config file /etc/default/sysstat with new version update-alternatives: using /usr/bin/sar.sysstat to provide /usr/bin/sar (sar) in auto mode Created symlink /etc/systemd/system/multi-user.target.wants/sysstat.service → /lib/systemd/system/sysstat.service. Setting up postgresql (12+214ubuntu0.1) ... Processing triggers for systemd (245.4-4ubuntu3.11) ... Processing triggers for man-db (2.9.1-1) ... Processing triggers for libc-bin (2.31-0ubuntu9.2) ... </Code> Do this on both machines: server1 and server2 ====Install REPMGR Repos==== Then we can proceed with adding the repo for REPMGR <Code:bash|Add REPMGR Repo> root@server1:~# sudo add-apt-repository 'deb http://packages.2ndquadrant.com/repmgr3/apt/ xenial-2ndquadrant main' Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease Get:2 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease [5,044 B] Hit:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] Err:2 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6 Reading package lists... Done W: GPG error: http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6 E: The repository 'http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease' is not signed. N: Updating from such a repository can't be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details. root@server1:~# sudo apt update Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease Hit:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease Get:3 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease [5,044 B] Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] Err:3 http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6 Reading package lists... Done W: GPG error: http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A47615A0D3FA41F6 E: The repository 'http://packages.2ndquadrant.com/repmgr3/apt xenial-2ndquadrant InRelease' is not signed. N: Updating from such a repository can't be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details. root@server1:~# </Code> ====Configure SSH keys==== To configure passwordless connection, we have to: - Generate public / private key pair - Put the public key in the authorized keys on the other host (e.g. the public key of server1, in the authorized_keys file on server2) So let's get going: <Code:bash|Generate Public/Private Key pair> postgres@server2:~$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa): Created directory '/var/lib/postgresql/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub The key fingerprint is: SHA256:KYxUhOKyll2anPYivR3ZrrRw/Kz7xDMVU0AtovXY4bA postgres@server2 The key's randomart image is: +---[RSA 3072]----+ | oo .oo. | | . .. + o.. | | . .. o Ooo | |. .. = E =o | | o+ * o S. | |.o B. +.. | |. o..* * | | . o=.B o | | ..o*== | +----[SHA256]-----+ </Code> This will generate us two files: Private and Public key. Private key, NEVER SHARE :) <Code:bash|Keys> postgres@server2:~/.ssh$ ls -alrt total 20 -rw-r--r-- 1 postgres postgres 570 Sep 16 08:58 id_rsa.pub -rw------- 1 postgres postgres 2602 Sep 16 08:58 id_rsa </Code> The content of that server and put it in the authorized_keys file of server1 and vica versa: <Code:bash| Add public key to server1> postgres@server1:~/.ssh$ ls -alrt total 24 -rw-r--r-- 1 postgres postgres 570 Sep 16 08:57 id_rsa.pub -rw------- 1 postgres postgres 2602 Sep 16 08:57 id_rsa -rw-rw-r-- 1 postgres postgres 571 Sep 16 08:59 authorized_keys -rw-r--r-- 1 postgres postgres 444 Sep 16 08:59 known_hosts drwx------ 2 postgres postgres 4096 Sep 16 08:59 . drwxr-xr-x 5 postgres postgres 4096 Sep 16 09:12 .. postgres@server1:~/.ssh$ tail -100 authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDc7WdWJWmc6ArXUkFFhArVa/42//MPue0u1HmgBK389GkpLnKZ8dBfNRcFfrvocCsQyJZu5MsfK/sRR2/JycjifnL45bavw+qqejmyiuJnMmiNEJI2sbturdE/y3vABw1QjDaCG+LEDJzGT3DzY3HvxcU+33vBZYpNx7YKSI+LQnyeT6iGNNoy5hNzI+/O8jM3pg4ReN+XC6NJ77maBmQ+EaLhLsxk3vmcnP6C7TCDxez1iTZHZpV9tNGFx/OCABAOgV6ELpzRPsS/3MBSfif8eeJ6Sc8JPVQSruOzf3lYYAouYZTLLJYtDZOw3rc0+v7fn17jVk9w981Zn4zzkacI5FbVHmtM6oAnCIkSUUJwXlQzZprF3C+Z+Z10mmtBlS9/NFdZklpK3rHh1T1a3Fu1SXWlkHZxKdLxr5ytUL89ubGAPSa4UkBXRalBTUFjcYhvqk5hPqO6a+KQO1ehcNmn7/ffdEPrattoOxnGCfKaWLbXyxFmt3WFCq5Jy/wZoq8= postgres@server2 postgres@server1:~/.ssh$ </Code> After you do that on both servers, you can connect without password: <Code:bash|Test Passwordless Configuration> postgres@server1:~/.ssh$ ssh server2 Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-80-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage System information as of Thu 16 Sep 2021 09:30:52 AM UTC System load: 0.08 Processes: 128 Usage of /: 2.8% of 61.31GB Users logged in: 1 Memory usage: 25% IPv4 address for eth0: 10.0.2.15 Swap usage: 0% IPv4 address for eth1: 10.0.15.22 This system is built by the Bento project by Chef Software More information can be found at https://github.com/chef/bento Last login: Thu Sep 16 08:59:34 2021 from 10.0.15.21 postgres@server2:~$ </Code> =====Cluster Environment===== After we've installed and configured all the packages, we can proceed with cluster configuration. Firstly, we have to configure the database and the users: ====Configure User==== The user to be used for the replication can be configured as so: <Code:bash|Replication User Configuration> root@server1:~# su - postgres postgres@server1:~$ createuser --replication --createdb --createrole --superuser repmgr postgres@server1:~$ psql -c 'ALTER USER repmgr SET search_path TO repmgr_test, "$user", public;' ALTER ROLE postgres@server1:~$ createdb repmgr --owner=repmgr postgres@server1:~$ </Code> Do that on both servers. After, we have done that, we can configure security connection to the PostgreSQL in pg_hba.conf ====Connection Configuration==== To allow repmgr user to be able to connect to PostgreSQL, we need to append to the pg_hba.conf file the following entries. <Code:bash|Configure pg_hba.conf> host repmgr repmgr 10.0.15.21/32 trust host replication repmgr 10.0.15.21/32 trust host repmgr repmgr 10.0.15.22/32 trust host replication repmgr 10.0.15.22/32 trust </Code> We can test the settings as follows: <Code:bash|Test connections> postgres@server2:/etc/postgresql/12/main$ psql 'host=server2 dbname=repmgr user=repmgr' psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. repmgr=# \q postgres@server2:/etc/postgresql/12/main$ psql 'host=server1 dbname=repmgr user=repmgr' psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. repmgr=# </Code> Finally we have to set pre-loaded library for repmgr in the postgresql.conf <Code:bash|Set Pre-loaded Library> shared_preload_libraries = 'repmgr' </Code> Do that on both servers, to verify the configuration. Now, finally we can configure the REPMGR =====REPMGR===== The default configuration of Repmgr is located in: * /etc/repmgr.conf * /etc/default/repmgrd We can do a basic configuration as follows: ====Configuration==== <Code:bash|Repmgr.conf> cluster=cluster node_id=(1/2 - 1 for server1 and 2 for server2) node_name=server2 conninfo='host=server(1/2, the same as before) user=repmgr dbname=repmgr connect_timeout=2' failover=automatic promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file' follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file' logfile='/var/log/postgresql/repmgr.log' loglevel=NOTICE reconnect_attempts=4 reconnect_interval=5 data_directory='/var/lib/postgresql/12/main' </Code> This files, defines the structure of the replication. Including the commands for promotion and follow-up command. The next file, will define either will be enabled or not, and which will file it will use. <Code:bash|Repmgrd> # default settings for repmgrd. This file is source by /bin/sh from # /etc/init.d/repmgrd # disable repmgrd by default so it won't get started upon installation # valid values: yes/no REPMGRD_ENABLED=yes # configuration file (required) REPMGRD_CONF="/etc/repmgr.conf" # additional options #REPMGRD_OPTS="" # user to run repmgrd as #REPMGRD_USER=postgres # repmgrd binary #REPMGRD_BIN=/usr/bin/repmgrd # pid file #REPMGRD_PIDFILE=/var/run/repmgrd.pid </Code> Furthermore, we have to make a symlink to the correct pg_ctl <Code:bash|Create symlink to pg_ctl> root@server2:~# ln -s /usr/lib/postgresql/12/bin/pg_ctl /usr/bin/pg_ctl </Code> We have to do that on both servers. ====Cluster Initialization==== To initialize a cluster, we can do it as so: <Code:bash|Initialize a cluster> postgres@server1:~$ repmgr primary register WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored parameter "logfile" has been renamed to "log_file" parameter "loglevel" has been renamed to "log_level" INFO: connecting to primary database... NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed NOTICE: primary node record (ID: 1) registered postgres@server1:~$ </Code> This will initialize the server and we can check that: <Code:bash|Check cluster status> postgres@server1:~$ repmgr cluster show WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored parameter "logfile" has been renamed to "log_file" parameter "loglevel" has been renamed to "log_level" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- 1 | server1 | primary | * running | | default | 100 | 1 | host=server1 user=repmgr dbname=repmgr connect_timeout=2 postgres@server1:~$ </Code> After that is done, we can set-up the standby server (e.g. server2) ====Setup Standby==== To setup standby, we have to delete everything from the server and clone it: <Code:bash|Setup standby server> root@server2:~# rm -rf /var/lib/postgresql/12/main/ root@server2:~# su - postgres postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored parameter "logfile" has been renamed to "log_file" parameter "loglevel" has been renamed to "log_level" NOTICE: destination directory "/var/lib/postgresql/12/main" provided INFO: connecting to source node DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: creating directory "/var/lib/postgresql/12/main"... NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /var/lib/postgresql/12/main start HINT: after starting the server, you need to register this standby with "repmgr standby register" postgres@server2:~$ </Code> After that, we should be able to see the replication working: <Code:bash|Check status of REPMGR> postgres@server2:~$ repmgr cluster show WARNING: the following problems were found in the configuration file: parameter "cluster" is deprecated and will be ignored parameter "logfile" has been renamed to "log_file" parameter "loglevel" has been renamed to "log_level" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- 1 | server1 | primary | * running | | default | 100 | 1 | host=server1 user=repmgr dbname=repmgr connect_timeout=2 2 | server2 | standby | running | server1 | default | 100 | 1 | host=server2 user=repmgr dbname=repmgr connect_timeout=2 postgres@server2:~$ </Code> We can test the connection as follows: ====Test Configuration==== Firstly, we will test the normal functionality and then we will test the failove. For the functionality, we can create a table on primary and see if it is present on the standby to test: ==On Primary== <Code:bash|Test Functionality> repmgr=# create table test(id integer); CREATE TABLE repmgr=# insert into test values(1); INSERT 0 1 repmgr=# commit; WARNING: there is no transaction in progress COMMIT repmgr=# </Code> ==Standby== Now, we can test it on the standby: <Code:bash|Test Functionality> postgres=# \c repmgr You are now connected to database "repmgr" as user "postgres". repmgr=# select * from test; id ---- 1 (1 row) repmgr=# </Code> Now, let's test the failover: ====Automatic Failover==== To test that, we can shutdown the primary and see if it is failover automatically (as per our settings) Now, I changed a little bit the config, so please check Appendix A. ===Modify Postgres user=== Firstly, we need to add the postgres user to the sudo group: <Code:bash|Add user to Sudoers group> ### Server 1 root@server2:/var/log/postgresql# usermod -aG sudo postgres root@server2:/var/log/postgresql# ### Server 2 root@server2:/var/log/postgresql# usermod -aG sudo postgres root@server2:/var/log/postgresql# </Code> ===Test the failover=== Now, we can finally test the failover. <Code:bash|Test Failover> postgres@server1:/var/log/postgresql$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------- 1 | server1 | standby | * running | | default | 100 | 3 | host=server1 user=repmgr dbname=repmgr 2 | server2 | primary | running | server1 | default | 100 | 3 | host=server2 user=repmgr dbname=repmgr </Code> We can simply shutdown the prod: <Code:bash|Shutdown production> root@server2:/var/log/postgresql# service postgresql restart </Code> Once, this is done, we can see the log file from server1: <Code:bash|Check the log on standby> [2021-09-28 10:46:45] [INFO] primary and this node have the same location ("default") [2021-09-28 10:46:45] [INFO] no other sibling nodes - we win by default [2021-09-28 10:46:45] [DEBUG] election result: WON [2021-09-28 10:46:45] [NOTICE] this node is the only available candidate and will now promote itself [2021-09-28 10:46:45] [INFO] promote_command is: "repmgr standby promote -f /etc/repmgr.conf --log-to-file" [2021-09-28 10:46:45] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgr.log" [2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" [2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr" [2021-09-28 10:46:45] [DEBUG] connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" [2021-09-28 10:46:45] [NOTICE] promoting standby to primary [2021-09-28 10:46:45] [DETAIL] promoting server "server1" (ID: 1) using pg_promote() [2021-09-28 10:46:45] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete [2021-09-28 10:46:46] [DEBUG] setting node 1 as primary and marking existing primary as failed [2021-09-28 10:46:46] [NOTICE] STANDBY PROMOTE successful [2021-09-28 10:46:46] [DETAIL] server "server1" (ID: 1) was successfully promoted to primary [2021-09-28 10:46:46] [INFO] 0 followers to notify [2021-09-28 10:46:46] [INFO] switching to primary monitoring mode [2021-09-28 10:46:46] [NOTICE] monitoring cluster primary "server1" (ID: 1) </Code> As we can see, server1 has been promoted to primary, that however leaves us to deal with the aftermath. ===Dealing with the aftermath=== Now, we have one primary and 1 failed primary, we need to re-establish connection. <Code:bash|Re-establish the connection> postgres@server1:/var/log/postgresql$ repmgr cluster show DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------- 1 | server1 | primary | * running | | default | 100 | 3 | host=server1 user=repmgr dbname=repmgr connect_timeout=2 2 | server2 | primary | - failed | | default | 100 | ? | host=server2 user=repmgr dbname=repmgr connect_timeout=2 WARNING: following issues were detected - unable to connect to node "server2" (ID: 2) HINT: execute with --verbose option to see connection error messages postgres@server1: </Code> We can re-establish the connection as follows: 1. Clear the old path and Resnap: <Code:bash|Resnap failed server's data> root@server2:/var/log/postgresql# rm -rf /var/lib/postgresql/12/main/ root@server2:/var/log/postgresql# su - postgres postgres@server2:~$ repmgr -h server1 -U repmgr -d repmgr standby clone WARNING: following problems with command line parameters detected: "config_directory" set in repmgr.conf, but --copy-external-config-files not provided NOTICE: destination directory "/var/lib/postgresql/12/main" provided INFO: connecting to source node DETAIL: connection string is: host=server1 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB DEBUG: 2 node records returned by source node DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" DEBUG: upstream_node_id determined as 1 NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: creating directory "/var/lib/postgresql/12/main"... DEBUG: create_replication_slot(): creating slot "repmgr_slot_2" on upstream NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/lib/postgresql/12/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/12/main -h server1 -p 5432 -U repmgr -X stream -S repmgr_slot_2 DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: sudo systemctl start postgresql HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record postgres@server2:~$ </Code> 2. Startup the postgresql 3. Re-register <Code:bash|Startup PostgreSQL and Re-register> postgres@server2:~$ logout root@server2:/var/log/postgresql# service postgresql start root@server2:/var/log/postgresql# su - postgres postgres@server2:~$ repmgr standby register --force INFO: connecting to local node "server2" (ID: 2) DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server2 fallback_application_name=repmgr" INFO: connecting to primary database DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=server1 fallback_application_name=repmgr" INFO: standby registration complete postgres@server2:~$ </Code> Now we can check our status as follows: <Code:bash|Check the newly recovered cluster> postgres@server2:~$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------ 1 | server1 | primary | * running | | default | 100 | 3 | host=server1 user=repmgr dbname=repmgr 2 | server2 | standby | running | server1 | default | 100 | 3 | host=server2 user=repmgr dbname=repmgr postgres@server2:~$ </Code> For that configuration I have used the following appendix: ===Appendix=== <Code:bash|repmgrd.conf> node_id='2' # A unique integer greater than zero (1/2 depends on the node) node_name='server2' # An arbitrary (but unique) string; we recommend. (unique number for the cluster) conninfo='host=server1 user=repmgr dbname=repmgr connect_timeout=2' # Database connection information as a conninfo string. data_directory='/var/lib/postgresql/12/main' # The node's data directory. This is needed by repmgr config_directory='/etc/postgresql/12/main' # If configuration files are located outside the data replication_user='repmgr' # User to make replication connections with, if not set replication_type='physical' # Must be one of "physical" or "bdr". location='default' # An arbitrary string defining the location of the node; this use_replication_slots='yes' # whether to use physical replication slots witness_sync_interval=15 # interval (in seconds) to synchronise node records log_level='DEBUG' # Log level: possible values are DEBUG, INFO, NOTICE, log_facility='STDERR' # Logging facility: possible values are STDERR, or for log_file='/var/log/postgresql/repmgr.log' log_status_interval=300 # interval (in seconds) for repmgrd to log a status message pg_bindir='/usr/lib/postgresql/12/bin' # Path to PostgreSQL binary directory (location repmgr_bindir='/usr/lib/postgresql/12/bin' # Path to repmgr binary directory (location of the repmgr use_primary_conninfo_password=false # explicitly set "password" in "primary_conninfo" passfile='/var/lib/postgresql/12/main/.pgpass' # path to .pgpass file to include in "primary_conninfo" ssh_options='-o StrictHostKeyChecking=no' # Options to append to "ssh" promote_check_timeout=60 # The length of time (in seconds) to wait promote_check_interval=1 # The interval (in seconds) to check whether primary_follow_timeout=60 # The max length of time (in seconds) to wait standby_follow_timeout=15 # The max length of time (in seconds) to wait shutdown_check_timeout=60 # The max length of time (in seconds) to wait for the demotion standby_reconnect_timeout=60 # The max length of time (in seconds) to wait node_rejoin_timeout=60 # The maximum length of time (in seconds) to wait for failover='automatic' # one of 'automatic', 'manual'. priority=100 # indicates a preferred priority for promoting nodes; reconnect_attempts=6 # Number of attempts which will be made to reconnect to an unreachable reconnect_interval=10 # Interval between attempts to reconnect to an unreachable promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file' # command repmgrd executes when promoting a new primary; use something like: follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n' # command repmgrd executes when instructing a standby to follow a new primary; primary_notification_timeout=60 # Interval (in seconds) which repmgrd on a standby repmgrd_standby_startup_timeout=60 # Interval (in seconds) which repmgrd on a standby will wait monitoring_history=no # Whether to write monitoring data to the "montoring_history" table monitor_interval_secs=2 # Interval (in seconds) at which to write monitoring data degraded_monitoring_timeout=-1 # Interval (in seconds) after which repmgrd will terminate if the async_query_timeout=60 # Interval (in seconds) which repmgrd will wait before standby_disconnect_on_failover=false # If "true", in a failover situation wait for all standbys to primary_visibility_consensus=false # If "true", only continue with failover if no standbys have seen service_start_command ='sudo systemctl start postgresql' service_stop_command ='sudo systemctl stop postgresql' service_restart_command ='sudo systemctl restart postgresql' service_reload_command ='sudo systemctl reload postgresql' archive_ready_warning=16 # repmgr node check --archive-ready archive_ready_critical=128 # replication_lag_warning=300 # repmgr node check --replication-lag replication_lag_critical=600 # bdr_local_monitoring_only=false # Only monitor the local node; no checks will be bdr_recovery_timeout=30 # If a BDR node was offline and has become available </Code> ====Manual Failover==== postgresql_repmgr.txt Last modified: 2024/06/22 15:55by andonovj