Making my MySQL InnoDB Cluster safe from naughtiness

TL;DR: Make sure to run “SET persist_only disabled_storage_engines=’MyISAM’, persist sql_generate_invisible_primary_key=ON;” on all instances and restart each one in your MySQL InnoDB Cluster.

Ok, what does “safe from naughtiness” mean?:
– Anyone creating tables that aren’t InnoDB, as this doesn’t make sense, after all, it is an “InnoDB” cluster.
– Making sure all tables have a Primary Key (invisible or not).
– Making sure that my (invisible) primary keys are visible to the cluster as it will rightfully complain if they aren’t!

This basically means that once you’ve got it all up and running you won’t run into those horrible situations whereby someone, somewhere, creates a MyISAM table that didn’t have a Primary Key and thus leave you with a broken cluster.

Eg.

MySQL rtnode-01:3306 ssl JS > vlc.status()
{
 "clusterName": "VLC",
 "clusterRole": "PRIMARY",
 "defaultReplicaSet": {
   "name": "default",
   "primary": "dbnode-01:3306",
   "ssl": "REQUIRED",
   "status": "OK_NO_TOLERANCE_PARTIAL",
   "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active.",
   "topology": {
      "dbnode-01:3306": {
        "address": "dbnode-01:3306",
        "memberRole": "PRIMARY",
        "mode": "R/W",
        "readReplicas": {},
        "role": "HA",
        "status": "ONLINE",
        "version": "8.0.36"
        },
      "dbnode-02:3306": {
        "address": "dbnode-02:3306",
        "instanceErrors": [
"ERROR: applier thread of Group Replication Applier channel stopped with an error: Worker 1 failed executing transaction 'a4144720-d7c2-11ee-bd73-0050569c0136:1760879'; Error executing row event: 'Table 'myschema.mytab' doesn't exist' (1146) at 2024-05-15 07:37:18.734720",
"ERROR: group_replication has stopped with an error."
        ],
        "memberRole": "SECONDARY",
        "memberState": "ERROR",
        "mode": "n/a",
        "readReplicas": {},
        "role": "HA",
        "status": "(MISSING)",
        "version": "8.0.36"
      },
      "dbnode-03:3306": {
        "address": "dbnode-03:3306",
        "memberRole": "SECONDARY",
        "mode": "R/O",
        "readReplicas": {},
        "replicationLagFromImmediateSource": "",
        "replicationLagFromOriginalSource": "",
        "role": "HA",
        "status": "ONLINE",
        "version": "8.0.36"
      }
    },
    "topologyMode": "Single-Primary"
  },
  "domainName": "csVLC",
  "groupInformationSourceMember": "dbnode-01:3306"
}

When this happens in Dev, it’s not a biggie but when production starts complaining the words “proverbial”, “fan” and “hit” come to mind. 😦

So,

Worst-case scenario: we have to remove the instance and re-provision.
Not so bad scenario, we can make the changes that are missing (create table w/ PK & InnoDB) so Group Replication continues applying the pending transactions.

I thought I’d go evil / nuclear before edging gently down the negativity slope so we all understand what’s entailed and look at scrapping an instance. Also, troubleshooting and GTID-mining the binlogs and bringing clusters up to the latest GTID after fixing the instance issues is also time consuming. With this nuclear perspective, we know what to expect and we’re not harming anyone, really.

Again, you should balance the trade off between troubleshooting time vs copying the instance data size from the Primary to the naughty node.

On the naughty node, Stop & Remove the naughty instance:

#either:
mysqlsh --login-path=icadmin -hrtnode-01 -- cluster remove-instance icadmin@dbnode-02:3306
# or interactively:
mysqlsh --login-path=icadmin -hrtnode-01 -- cluster rescan

systemctl stop mysqld
rm -rf /etc/my.cnf*
rm -rf /var/lib/mysql/*
rm -rf /var/log/mysql*.log

I disabled auto-updates on the MySQL yum repo, so for a reinstall, I’ll have to undo this:

vi + /etc/yum.conf
#exclude=mysql*

yum remove -y mysql-community-server-0:8.0.36-1.el8.x86_64
yum install -y mysql-community-server-0:8.0.36-1.el8.x86_64

vi + /etc/yum.conf
exclude=mysql*

systemctl start mysqld
systemctl enable mysqld

pswd=grep -oP '(?<= A temporary password is generated for root@localhost: ).*' /var/log/mysqld.log | tail -1
mysql -uroot -p$pswd -S /var/lib/mysql/mysql.sock
 alter user 'root'@'localhost' identified by 'Contr4sen!A';
 flush privileges;
 SET sql_log_bin = OFF;
 create user 'icadmin'@'localhost' identified by 'Contr4sen!A';
 grant all on . to 'icadmin'@'localhost' with grant option;
 create user 'icadmin'@'%' identified by 'Contr4sen!A';
 grant all on . to 'icadmin'@'%' with grant option;
 flush privileges;

Here’s where I went wrong before, so hopefully someone can learn from my mistake. It was configured on the other nodes but slipped by on dbnode-02:

SELECT @@sql_generate_invisible_primary_key, @@show_gipk_in_create_table_and_information_schema, @@disabled_storage_engines;

SET persist_only disabled_storage_engines='MyISAM', persist sql_generate_invisible_primary_key=ON;

SELECT @@sql_generate_invisible_primary_key, @@show_gipk_in_create_table_and_information_schema, @@disabled_storage_engines;

I’m not going to execute RESTART; here as I have to run the configure-local-instance in a few steps, which will do that for me. But remember, persist_only means the variable takes effect upon the next startup.

Security time:

mysql_config_editor reset
mysql_config_editor set --login-path=icadmin --host=% --user=icadmin --password
<you enter the password here>

Prepare the instance for GR / IdC:

mysqlsh --login-path=icadmin -hlocalhost -- dba check-instance-configuration
mysqlsh --login-path=icadmin -hlocalhost -- dba configure-local-instance --restart=true --interactive=FALSE
mysqlsh --login-path=icadmin -hlocalhost -- dba check-instance-configuration

Clone / add the instance from the Primary’s data:

mysqlsh --login-path=icadmin -hrtnode-01 -- cluster status --extended=0
mysqlsh --login-path=icadmin -hrtnode-01 -- cluster add-instance icadmin@dbgsc-d-db-02:3306 --recoveryMethod=clone
mysqlsh --login-path=icadmin -hrtnode-01 -- cluster status --extended=0

Now double check that all instances have the same settings wrt GIPK & disabled S.E.:

mysqlsh --login-path=icadmin -hdbnode-01 -E --sqlc -e "SELECT @@sql_generate_invisible_primary_key, @@show_gipk_in_create_table_and_information_schema, @@disabled_storage_engines;"
mysqlsh --login-path=icadmin -hdbnode-02 -E --sqlc -e "SELECT @@sql_generate_invisible_primary_key, @@show_gipk_in_create_table_and_information_schema, @@disabled_storage_engines;"
mysqlsh --login-path=icadmin -hdbnode-03 -E --sqlc -e "SELECT @@sql_generate_invisible_primary_key, @@show_gipk_in_create_table_and_information_schema, @@disabled_storage_engines;"

You should see something like:

* 1. row *
@@sql_generate_invisible_primary_key: 1
@@show_gipk_in_create_table_and_information_schema: 1
@@disabled_storage_engines: MyISAM

Sorry MyISAM, you’re name’s not down, you’re not coming in!

Ok, nuclear done and dusted, what about making some changes on a semi-naughty node.

We will first need to scan the binlogs on the node for the GTID that’s mentioned in performance_schema.replication_applier_status_by_worker’s last_error_message:

select CHANNEL_NAME, WORKER_ID, THREAD_ID, SERVICE_STATE, LAST_ERROR_NUMBER, LAST_ERROR_MESSAGE, LAST_ERROR_TIMESTAMP, LAST_APPLIED_TRANSACTION, APPLYING_TRANSACTION, APPLYING_TRANSACTION_RETRIES_COUNT
from performance_schema.replication_applier_status_by_worker
order by CHANNEL_NAME,WORKER_ID,THREAD_ID ;

+----------------------------+-----------+-----------+---------------+-------------------+-------------------------------------------------------- -------------------------------------------------------------------------------------------------------+----------------------------+------------- ---------------------------------+----------------------------------------------+------------------------------------+
| CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP | LAST_APPLIED _TRANSACTION | APPLYING_TRANSACTION | APPLYING_TRANSACTION_RETRIES_COUNT |
+----------------------------+-----------+-----------+---------------+-------------------+-------------------------------------------------------- -------------------------------------------------------------------------------------------------------+----------------------------+------------- ---------------------------------+----------------------------------------------+------------------------------------+
| group_replication_applier | 1 | NULL | OFF | 1146 | Worker 1 failed executing transaction 'a4144720-d7c2-11 ee-bd73-0050569c0136:1760879'; Error executing row event: 'Table 'myschema.mytab' doesn't exist' | 2024-05-15 07:37:18.734720 | a4144720-d7c 2-11ee-bd73-0050569c0136:1760878 | a4144720-d7c2-11ee-bd73-0050569c0136:1760879 | 0 |

Look into the binlogs to make sure that’s the only thing missing:

mysqlbinlog --base64-output=decode-rows --include-gtids='a4144720-d7c2-11ee-bd73-0050569c0136:1760879' dbnode-02-relay-bin-group_replication_applier.000009 > gr_error.log

Make the current naughty node writeable, make the changes, RESTART THE NODE, and make sure the changes have taken effect:
(you do this at your own risk and remember why I went nuclear from t=0? You’re potentially playing with the GTID’s sequence here so be careful.)

SELECT @@global.read_only, @@global.super_read_only, @@sql_generate_invisible_primary_key, @@show_gipk_in_create_table_and_information_schema, @@disabled_storage_engines;
SET persist_only disabled_storage_engines='MyISAM', persist sql_generate_invisible_primary_key=ON;
SET GLOBAL read_only = 0;
SET GLOBAL super_read_only = 0;
RESTART;
SELECT @@global.read_only, @@global.super_read_only, @@sql_generate_invisible_primary_key, @@show_gipk_in_create_table_and_information_schema, @@disabled_storage_engines;
create table myschema.mytab … engine=InnoDB … ;
SET GLOBAL read_only = 1;
SET GLOBAL super_read_only = 1;
SELECT @@global.read_only, @@global.super_read_only;

And then see if an addInstance() allows you to bring the instance up to speed by applying the pending transactions. If it doesn’t, then “GOTO 10” and be nuclear my friend.

PS/ I opened Bug #114685: Let InnoDB Cluster configure GIPK=ON and disabled_storage_engines=’MyISAM’ because if we can use Shell to configure the InnoDB Cluster to work, it would be even better if it was protected in this sense. Maybe via an additional option such as “mysqlsh icadmin@localhost:3306 — dba configure-local-instance –restart=true –interactive=FALSE –enforce=DisableMyISAM,GIPK_On“.

As always, I know it’s not rocket science, but I do hope I’ve helped someone out there.

Leave a comment Cancel reply

Archives

Meta