mysql8_innodb_cluster [My wiki]

Innodb CLuster is a modern solution to provide group replication for MySQL. In this configuration, one node is primary (RW) while the rest are ONLINE (RO)

Of course our IPs will look different because I am waaaay too lazy to care for proper pictures.

I am not big fan of manual work, ask my grandma about it. So I automated the whole shabang using Terraform + Ansible. Now, I will not show the terraform because it is strictly specific, but I will show you the ansible

Preparation

- name: Configure MySQL Libraries
  vars:
    ansible_user: opc
    ansible_become: yes
    ansible_become_method: sudo
    ansible_python_interpreter: /usr/bin/python
  hosts: dbservers
  tasks:
    - block:
      - name: Install the necessary packages
        yum:
          name:
            - mysql-server
            - mysql-shell
            - mysql-devel
            - mysql-common
            - mysql-libs
            - mysql
          state: present

      - name: Disable Selinux
        ansible.builtin.command: setenforce 0

      - name: Stop firewall
        service:
          name: firewalld
          state: stopped

After that you can configure the rest using the following ansible:

Configuration

- name: Configure MySQL Libraries
  vars:
    ansible_user: opc
    ansible_become: yes
    ansible_become_method: sudo
  hosts: dbservers
  tasks:
    - block:
      - name: Start the MySQL Service
        ansible.builtin.service:
          name: mysqld
          state: started

      - name: Check the existance of a mysql file
        stat:
          path: /root/.my.cnf
        register: mysql_file

      - name: Obtain & Change temporal password
        shell: |
          temporal_password=`awk '/A temporary password is generated for/ {a=$0} END{ print a }' /var/log/mysqld.log | awk '{print $(NF)}'`
          mysql -uroot -p${temporal_password} -e "SET GLOBAL super_read_only = 0; SET GLOBAL read_only = 0; " --connect-expired-password
          mysql -uroot -p${temporal_password} -e "ALTER USER 'root'@'localhost' IDENTIFIED BY '{{ root_password }}';flush privileges; " --connect-expired-password
          echo "[client]
          user=root
          password=\"{{ root_password }}\"" > /root/.my.cnf
        when: not mysql_file.stat.exists

      - name: Generate random server_id
        shell: |
          echo $((0x$(echo `hostname` | md5sum | cut -f 1 -d " " | cut -c 1-3))) | cut -c 1-3
        register: mysql_server_id

      - name: Stop MySQL If running
        service:
          name: mysqld
          state: stopped

      - name: Define the primary host (e.g. first host in the group)
        set_fact:
          primary_node: "{{ groups['dbservers'][0] }}"

      - name: Define the secondary nodes
        set_fact:
          secondary_nodes: "{{ groups['dbservers'][1:] }}"

      - name: Get secondary hosts
        debug:
          var: secondary_nodes

      - name: Start MySQL
        service:
          name: mysqld
          state: started

      - name: Template bootstrapping js for mysqlsh
        template:
          dest: "/tmp/bootstrap_cluster.js"
          src: templates/bootstrap_cluster.js.j2
          owner: "mysql"
          group: "mysql"
          mode: '0644'

      - name: Template InnoDB Cluster
        template:
          dest: "/tmp/add_instances.js"
          src: templates/add_instances.js.j2
          owner: "mysql"
          group: "mysql"
          mode: '0644'

      - name: Bootstrap the cluster (new deployment)
        command: mysqlsh -f /tmp/bootstrap_cluster.js

      - name: Wait to start
        wait_for:
          timeout: 5

      - name: Add the other instances
        command: mysqlsh -f /tmp/add_instances.js
        when: inventory_hostname == groups['dbservers'][0]

Congrats, now we have the cluster running:

Cluster status

 MySQL  10.14.2.193:33060+ ssl  JS > var cluster = dba.getCluster()
 MySQL  10.14.2.193:33060+ ssl  JS > cluster.status();
{
    "clusterName": "InnoDBCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql-cluster-a:3306",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysql-cluster-a:3306": {
                "address": "mysql-cluster-a:3306",
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.37"
            },
            "mysql-cluster-b:3306": {
                "address": "mysql-cluster-b:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.37"
            },
            "mysql-cluster-c:3306": {
                "address": "mysql-cluster-c:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.37"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql-cluster-a:3306"
}

Because we want to look (only to look like, being one is boring) professional, here are the template files:

Templates

#my.cnf.js.j2
[mysqld]
server_id={{ mysql_server_id.stdout }}
binlog_checksum=NONE
gtid_mode=ON
enforce_gtid_consistency=ON
log_bin
log_slave_updates=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
skip-host-cache
skip-name-resolve
binlog_transaction_dependency_tracking=WRITESET

#bootstrap.js.j2
var cluster_user_password = "{{ cluster_user_password }}";
var clusterName = "{{ cluster_name }}";
var secondary_nodes = "{{ secondary_nodes }}";
var root_password = "{{ root_password }}";
var cluster_user = "{{ cluster_user }}";

var mysql = require('mysql');

try{
   dba.configureInstance('root@localhost', {
    password: `${root_password}`,
    restart: true,
    interactive: false,
    clusterAdmin: '{{ cluster_user }}',
    clusterAdminPassword: `${cluster_user_password}`
  });
} catch(e){
    print('\n The innodb cluster admin user could not be created\n' + e.message + '\n');
}

#Adding the instances
var cluster_user = "{{ cluster_user }}";
var cluster_user_password = "{{ cluster_user_password }}";
var clusterName = "{{ cluster_name }}";
var secondary_nodes = "{{ secondary_nodes }}";
var secondary_nodes = secondary_nodes.replaceAll("u","").replaceAll("'","\"").replaceAll("\"","").replaceAll("[","").replaceAll("]","").replaceAll(" ","").split(',');

var mysql = require('mysql');

try {
    print('Adding instances...\n');
    shell.connect('{{ cluster_user }}@{{ primary_node }}', cluster_user_password);
    var cluster = dba.createCluster(clusterName);
    for(let i=0; i < {{ secondary_nodes|length }}; i++) {
        cluster.addInstance(`${cluster_user}@${secondary_nodes[i]}`, { password: `${cluster_user_password}`, recoveryMethod: "clone"} );
   }
} catch(e){
    print('\nThe InnoDB cluster could not be created.\n\nError: ' + e.message + '\n');
}

It is all cool and stuff, but let's go through several scenarios with our cluster:

1. Loss of a member 2. Backup 3. Recovery

First thing first, what we do if the connection between the primary and the secondary fails, well we have to rejoin, let's see how.

To do that, we will simulate a stop of the service, someone stopped our service .

Loss of a node

[root@mysql-cluster-b ~]# service mysqld stop
Redirecting to /bin/systemctl stop mysqld.service
[root@mysql-cluster-b ~]#

Since our cluster can tolerate one failure, after that, it cannot tolerate anymore:

Cluster check

 MySQL  10.14.2.193:33060+ ssl  JS > var cluster = dba.getCluster();
 MySQL  10.14.2.193:33060+ ssl  JS > cluster.status();
{
    "clusterName": "InnoDBCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql-cluster-a:3306",
        "ssl": "REQUIRED",
        "status": "OK_NO_TOLERANCE_PARTIAL",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active.",
        "topology": {
            "mysql-cluster-a:3306": {
                "address": "mysql-cluster-a:3306",
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.37"
            },
            "mysql-cluster-b:3306": {
                "address": "mysql-cluster-b:3306",
                "memberRole": "SECONDARY",
                "mode": "n/a",
                "readReplicas": {},
                "role": "HA",
                **"shellConnectError": "MySQL Error 2003: Could not open connection to 'mysql-cluster-b:3306': Can't connect to MySQL server on 'mysql-cluster-b:3306' (111)",**
                "status": "(MISSING)"
            },
            "mysql-cluster-c:3306": {
                "address": "mysql-cluster-c:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.37"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql-cluster-a:3306"
}

We see that the node is down. So let's try to simulate a restoration of the connection:

Start the service

[root@mysql-cluster-b ~]# service mysqld start
Redirecting to /bin/systemctl start mysqld.service
[root@mysql-cluster-b ~]#

Now we can check again:

Check again

{
    "clusterName": "InnoDBCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql-cluster-a:3306",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysql-cluster-a:3306": {
                "address": "mysql-cluster-a:3306",
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.37"
            },
            "mysql-cluster-b:3306": {
                "address": "mysql-cluster-b:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.37"
            },
            "mysql-cluster-c:3306": {
                "address": "mysql-cluster-c:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.37"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql-cluster-a:3306"
}

Now we see that it is working, if for some reason it doesn't connect, we can try to force it using the:

Rejoin and rescan

Cluster.rejoinInstance(instance) 
cluster.rescan()

It is important that if the UUID was changed, we have to remove the instance from the metadata before adding it:

In case UUID is changed

cluster.removeInstance("root@instanceWithOldUUID:3306", {force: true})
cluster.rescan()

Overview

Installation

Configuration

Templates

Management

Loss of a member

Backup

Restore

My wiki