=====Overview===== Innodb CLuster is a modern solution to provide group replication for MySQL. In this configuration, one node is primary (RW) while the rest are ONLINE (RO) {{:mysql_innodb_cluster_overview.png?800|}} Of course our IPs will look different because I am waaaay too lazy to care for proper pictures. =====Installation===== I am not big fan of manual work, ask my grandma about it. So I automated the whole shabang using Terraform + Ansible. Now, I will not show the terraform because it is strictly specific, but I will show you the ansible - name: Configure MySQL Libraries vars: ansible_user: opc ansible_become: yes ansible_become_method: sudo ansible_python_interpreter: /usr/bin/python hosts: dbservers tasks: - block: - name: Install the necessary packages yum: name: - mysql-server - mysql-shell - mysql-devel - mysql-common - mysql-libs - mysql state: present - name: Disable Selinux ansible.builtin.command: setenforce 0 - name: Stop firewall service: name: firewalld state: stopped =====Configuration===== After that you can configure the rest using the following ansible: - name: Configure MySQL Libraries vars: ansible_user: opc ansible_become: yes ansible_become_method: sudo hosts: dbservers tasks: - block: - name: Start the MySQL Service ansible.builtin.service: name: mysqld state: started - name: Check the existance of a mysql file stat: path: /root/.my.cnf register: mysql_file - name: Obtain & Change temporal password shell: | temporal_password=`awk '/A temporary password is generated for/ {a=$0} END{ print a }' /var/log/mysqld.log | awk '{print $(NF)}'` mysql -uroot -p${temporal_password} -e "SET GLOBAL super_read_only = 0; SET GLOBAL read_only = 0; " --connect-expired-password mysql -uroot -p${temporal_password} -e "ALTER USER 'root'@'localhost' IDENTIFIED BY '{{ root_password }}';flush privileges; " --connect-expired-password echo "[client] user=root password=\"{{ root_password }}\"" > /root/.my.cnf when: not mysql_file.stat.exists - name: Generate random server_id shell: | echo $((0x$(echo `hostname` | md5sum | cut -f 1 -d " " | cut -c 1-3))) | cut -c 1-3 register: mysql_server_id - name: Stop MySQL If running service: name: mysqld state: stopped - name: Define the primary host (e.g. first host in the group) set_fact: primary_node: "{{ groups['dbservers'][0] }}" - name: Define the secondary nodes set_fact: secondary_nodes: "{{ groups['dbservers'][1:] }}" - name: Get secondary hosts debug: var: secondary_nodes - name: Start MySQL service: name: mysqld state: started - name: Template bootstrapping js for mysqlsh template: dest: "/tmp/bootstrap_cluster.js" src: templates/bootstrap_cluster.js.j2 owner: "mysql" group: "mysql" mode: '0644' - name: Template InnoDB Cluster template: dest: "/tmp/add_instances.js" src: templates/add_instances.js.j2 owner: "mysql" group: "mysql" mode: '0644' - name: Bootstrap the cluster (new deployment) command: mysqlsh -f /tmp/bootstrap_cluster.js - name: Wait to start wait_for: timeout: 5 - name: Add the other instances command: mysqlsh -f /tmp/add_instances.js when: inventory_hostname == groups['dbservers'][0] Congrats, now we have the cluster running: MySQL 10.14.2.193:33060+ ssl JS > var cluster = dba.getCluster() MySQL 10.14.2.193:33060+ ssl JS > cluster.status(); { "clusterName": "InnoDBCluster", "defaultReplicaSet": { "name": "default", "primary": "mysql-cluster-a:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "mysql-cluster-a:3306": { "address": "mysql-cluster-a:3306", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.37" }, "mysql-cluster-b:3306": { "address": "mysql-cluster-b:3306", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.37" }, "mysql-cluster-c:3306": { "address": "mysql-cluster-c:3306", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.37" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "mysql-cluster-a:3306" } =====Templates===== Because we want to look (only to look like, being one is boring) professional, here are the template files: #my.cnf.js.j2 [mysqld] server_id={{ mysql_server_id.stdout }} binlog_checksum=NONE gtid_mode=ON enforce_gtid_consistency=ON log_bin log_slave_updates=ON master_info_repository=TABLE relay_log_info_repository=TABLE skip-host-cache skip-name-resolve binlog_transaction_dependency_tracking=WRITESET #bootstrap.js.j2 var cluster_user_password = "{{ cluster_user_password }}"; var clusterName = "{{ cluster_name }}"; var secondary_nodes = "{{ secondary_nodes }}"; var root_password = "{{ root_password }}"; var cluster_user = "{{ cluster_user }}"; var mysql = require('mysql'); try{ dba.configureInstance('root@localhost', { password: `${root_password}`, restart: true, interactive: false, clusterAdmin: '{{ cluster_user }}', clusterAdminPassword: `${cluster_user_password}` }); } catch(e){ print('\n The innodb cluster admin user could not be created\n' + e.message + '\n'); } #Adding the instances var cluster_user = "{{ cluster_user }}"; var cluster_user_password = "{{ cluster_user_password }}"; var clusterName = "{{ cluster_name }}"; var secondary_nodes = "{{ secondary_nodes }}"; var secondary_nodes = secondary_nodes.replaceAll("u","").replaceAll("'","\"").replaceAll("\"","").replaceAll("[","").replaceAll("]","").replaceAll(" ","").split(','); var mysql = require('mysql'); try { print('Adding instances...\n'); shell.connect('{{ cluster_user }}@{{ primary_node }}', cluster_user_password); var cluster = dba.createCluster(clusterName); for(let i=0; i < {{ secondary_nodes|length }}; i++) { cluster.addInstance(`${cluster_user}@${secondary_nodes[i]}`, { password: `${cluster_user_password}`, recoveryMethod: "clone"} ); } } catch(e){ print('\nThe InnoDB cluster could not be created.\n\nError: ' + e.message + '\n'); } =====Management===== It is all cool and stuff, but let's go through several scenarios with our cluster: 1. Loss of a member 2. Backup 3. Recovery First thing first, what we do if the connection between the primary and the secondary fails, well we have to rejoin, let's see how. ====Loss of a member==== To do that, we will simulate a stop of the service, someone stopped our service :-). [root@mysql-cluster-b ~]# service mysqld stop Redirecting to /bin/systemctl stop mysqld.service [root@mysql-cluster-b ~]# Since our cluster can tolerate one failure, after that, it cannot tolerate anymore: MySQL 10.14.2.193:33060+ ssl JS > var cluster = dba.getCluster(); MySQL 10.14.2.193:33060+ ssl JS > cluster.status(); { "clusterName": "InnoDBCluster", "defaultReplicaSet": { "name": "default", "primary": "mysql-cluster-a:3306", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE_PARTIAL", "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active.", "topology": { "mysql-cluster-a:3306": { "address": "mysql-cluster-a:3306", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.37" }, "mysql-cluster-b:3306": { "address": "mysql-cluster-b:3306", "memberRole": "SECONDARY", "mode": "n/a", "readReplicas": {}, "role": "HA", **"shellConnectError": "MySQL Error 2003: Could not open connection to 'mysql-cluster-b:3306': Can't connect to MySQL server on 'mysql-cluster-b:3306' (111)",** "status": "(MISSING)" }, "mysql-cluster-c:3306": { "address": "mysql-cluster-c:3306", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.37" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "mysql-cluster-a:3306" } We see that the node is down. So let's try to simulate a restoration of the connection: [root@mysql-cluster-b ~]# service mysqld start Redirecting to /bin/systemctl start mysqld.service [root@mysql-cluster-b ~]# Now we can check again: { "clusterName": "InnoDBCluster", "defaultReplicaSet": { "name": "default", "primary": "mysql-cluster-a:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "mysql-cluster-a:3306": { "address": "mysql-cluster-a:3306", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.37" }, "mysql-cluster-b:3306": { "address": "mysql-cluster-b:3306", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.37" }, "mysql-cluster-c:3306": { "address": "mysql-cluster-c:3306", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.37" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "mysql-cluster-a:3306" } Now we see that it is working, if for some reason it doesn't connect, we can try to force it using the: Cluster.rejoinInstance(instance) cluster.rescan() It is important that if the UUID was changed, we have to remove the instance from the metadata before adding it: cluster.removeInstance("root@instanceWithOldUUID:3306", {force: true}) cluster.rescan() ====Backup==== ====Restore====