InnoDB Cluster: setting up Production… for disaster! (2/2)

Ok, so now we’re got our InnoDB Cluster a-clustering, MySQL Router a-routing, now we need some disaster to be a-disaster-recovering…

A foreword first.

If you’re looking to use Enterprise Backup to recover a single node and restore that node back into an existing InnoDB Cluster, LeFred takes you through that one nicely here.

Preparing for backup

On our single primary server, the one that allows write, which was ic2/10.0.0.12 in my case:

mysql -uroot -poracle << EOF 
SET sql_log_bin = OFF; 
 create user 'backup'@'%' identified by 'oracle';
 grant all on *.* to 'backup'@'%';
SET sql_log_bin = ON; 
EOF

Let’s create something to backup (if you haven’t already done so of course):

mysqlsh --uri root:oracle@localhost:3306 --sql

create database nexus;

Let’s run a full backup to start with:

mysqlbackup -ubackup -poracle --socket=/var/lib/mysql/mysql.sock \
--backup-dir=/home/vagrant/backup/full \
--backup-image=full_backup.img  backup-to-image

ls /home/vagrant/backup/full/full_backup.img

mysqlbackup --backup-image=/home/vagrant/backup/full/full_backup.img validate

Ok, everything ok.

Enterprise Backup needs to create 2 tables, in CSV format, which is why we commented out the disabled_storage_engine parameter in the my.cnf before. If we hadn’t commented it out, there would be warnings saying that these tables can’t be created, and hence, we’d need to use end_lsn in the incremental backup instead of “history:last_backup”. A more manual process.

Let’s create something to use our incremental backup:

create table nexus.replicant (
id INT(11) NOT NULL AUTO_INCREMENT ,
`First name` varchar(40) not null default '',
`Last name` varchar(40) not null default '',
`Replicant` enum('Yes','No') not null default 'Yes',
PRIMARY KEY (id)
) engine=InnoDB row_format=COMPACT;

Let’s back it up:

mysqlbackup --user=backup -poracle --socket=/var/lib/mysql/mysql.sock \
 --incremental=optimistic \
--incremental-base=history:last_backup \
 --backup-dir=/home/vagrant/backup/incre \
 --backup-image=incr_backup1.mbi  backup-to-image

And validate it too:

mysqlbackup --backup-image=/home/vagrant/backup/incre/incr_backup1.mbi validate

 

Disaster Time

On all 3 nodes, it’s time to create a disaster:

sudo -i
systemctl stop mysqld.service
rm -rf /var/lib/mysql/*
rm -rf /var/log/mysqld.log

 

Now, on ic2, where we took the backup (from the master) restore the full optimistic backup :

mysqlbackup --defaults-file=/etc/my.cnf \
 --backup-image=/home/vagrant/backup/full/full_backup.img \
 --backup-dir=/tmp/mysql_backup_pitr --datadir=/var/lib/mysql \
 --with-timestamp copy-back-and-apply-log

And then, restore the incremental:

mysqlbackup --defaults-file=/etc/my.cnf \
 --backup-image=/home/vagrant/backup/incre/incr_backup1.mbi \
 --incremental-backup-dir=/tmp/mysql_backup_pitr_incr \
 --datadir=/var/lib/mysql --incremental --with-timestamp \
 copy-back-and-apply-log

 

Still on ic2 (from where the backup was run), and as the o.s. root user:

chown -R mysql:mysql /var/lib/mysql/*
systemctl start mysqld.service


mysql -uroot -poracle << EOF
SET sql_log_bin = OFF; 
reset master;
# How To Manually Remove InnoDB Cluster Metadata Directory (Doc ID 2274320.1)
drop database mysql_innodb_cluster_metadata;
SET sql_log_bin = ON; 
EOF

The previous step is done, as when we start the instance, it still has information on the metadata but everything has been lost, so in order to return to a safe InnoDB Cluster environment, we drop the schema and will rebuild. The slight difference here is that we still have the /etc/my.cnf file.

Now to configure our InnoDB Cluster again:

mysqlsh --uri ic:oracle@10.0.0.12:3306
dba.configureLocalInstance();
dba.checkInstanceConfiguration('ic:oracle@10.0.0.12:3306')
var cluster = dba.createCluster( 'myCluster');
cluster.status();

Now we have 1 instance, we can reuse the same backup (full & incremental) to restore the other 2 nodes. We reuse the same commands as before, on both ic1 & ic3:

mysqlbackup --defaults-file=/etc/my.cnf \
 --backup-image=/home/vagrant/backup/full/full_backup.img \
 --backup-dir=/tmp/mysql_backup_pitr --datadir=/var/lib/mysql \
 --with-timestamp copy-back-and-apply-log

mysqlbackup --defaults-file=/etc/my.cnf \
 --backup-image=/home/vagrant/backup/incre/incr_backup1.mbi \
 --incremental-backup-dir=/tmp/mysql_backup_pitr_incr \
 --datadir=/var/lib/mysql --incremental --with-timestamp \
 copy-back-and-apply-log

Now to do some specifics.

On ic1:

cd /var/lib/mysql
rm ic2-bin*

chown -R mysql:mysql /var/lib/mysql/*
systemctl start mysqld.service

mysql -uroot -poracle << EOF
SET sql_log_bin = OFF; 
reset master;
drop database mysql_innodb_cluster_metadata;
SET sql_log_bin = ON; 
EOF

mysqlsh --uri ic:oracle@10.0.0.11:3306
dba.configureLocalInstance();
dba.checkInstanceConfiguration('ic:oracle@10.0.0.11:3306')

And on ic3:

cd /var/lib/mysql
rm ic2-bin*

chown -R mysql:mysql /var/lib/mysql/*
systemctl start mysqld.service

mysql -uroot -poracle << EOF
SET sql_log_bin = OFF; 
reset master;
drop database mysql_innodb_cluster_metadata;
SET sql_log_bin = ON; 
EOF

mysqlsh --uri ic:oracle@10.0.0.13:3306
dba.configureLocalInstance();
dba.checkInstanceConfiguration('ic:oracle@10.0.0.13:3306')

Now, of course, this seems like a lot of manual process. We could also have just as simply taken a full backup from the restored & reestablished ic2, and then have used “full/meta/backup_gtid_executed.sql” to start the instances in the cluster. This way we reuse the original full and incremental backups.

Back on ic2:

mysqlsh --uri ic:oracle@10.0.0.12:3306

cluster.addInstance('ic@10.0.0.11:3306')
cluster.addInstance('ic@10.0.0.13:3306')
cluster.status()

We’re back in business.

 

Advertisements
Posted in Backup, InnoDB, innodb cluster, MEB, MySQL, MySQL Enterprise Backup, MySQL Enterprise Edition, numactl, Oracle Linux, PITR, Recovery, Replication | Tagged , , , , , , | Leave a comment

InnoDB Cluster: setting up Production… for disaster! (1/2)

Want to setup InnoDB Cluster and be prepared for a Disaster Recovery scenario? Get ready:

Here’s a way to set up InnoDB Cluster using the 3 environments, on Oracle Linux 7.2, 5.7.19 MySQL Commercial Server, MySQL Shell 8.0.3 DMR, MySQL Router. As this is the first blog post for a complete disaster recovery scenario of InnoDB Cluster, we’ll also be installing MySQL Enterprise Backup.

If you’re new to InnoDB Cluster then I’d highly recommend looking at the following to understand how it works and what Group Replication, Shell & Router are.:

So, to start installing, we’ll need to include a few things first:

  • Python is a must, if it’s not already installed, so:
sudo yum install python
  • And numa awareness is needed for 5.7 MySQL Server:
sudo yum install numactl

As I’m installing the Enterprise software, as I will need Enterprise Backup, I’ve previously downloaded the rpm’s for OLinux 7.2 from http://edelivery.oracle.com, and left them in the directory on each of the 3 servers I’m installing from.

Therefore I can run:

sudo yum install -y mysql-commercial-common-5.7.19-1.1.el7.x86_64.rpm \
mysql-commercial-client-5.7.19-1.1.el7.x86_64.rpm \
mysql-commercial-libs-5.7.19-1.1.el7.x86_64.rpm \
mysql-commercial-server-5.7.19-1.1.el7.x86_64.rpm \
mysql-router-commercial-2.1.4-1.1.el7.x86_64.rpm

And, as I downloaded the following from http:/www.mysql.com/downloads:

sudo yum install -y mysql-shell-8.0.3-0.1.dmr.el7.x86_64.rpm \
meb-4.1.0-el7.x86_64.rpm

Remember, this is done on all 3 servers (in my case ic1, ic2 & ic3 / 10.0.0.11, 10.0.0.12 & 10.0.0.13).

Start the mysql servers. As this is the first time, we instantiate at the same time (i.e. the randomly generated root password will be needed).

sudo systemctl start mysqld.service

Enable the service to automatically start on server reboot:

sudo systemctl enable mysqld.service

And, in my scenario, as you will see, it’s really not a Production environment, so I’m going to take it easy with my password & security convention. I would NOT recommend this for your production environment.

Insert the following at the end of the config file, to avoid having to create complex passwords:

sudo vi + /etc/my.cnf
 ..
 ..
 validate-password=OFF

Then restart mysql so the parameter is taken:

sudo systemctl restart mysqld.service

Change the root password (if you really want to):

sudo grep 'A temporary password is generated for root@localhost' /var/log/mysqld.log |tail -1
mysql -uroot -p

SET sql_log_bin = OFF; 
alter user 'root'@'localhost' identified by 'oracle';
SET sql_log_bin = ON; 
exit

I don’t want these changes being replicated across the nodes, because I’m actually doing this on all servers, hence I use “SET sql_log_bin = OFF” so the changes aren’t reflected in the bin logs.

Now, change to root and run MySQL Shell to start configuring the InnoDB Cluster environment:

sudo -i
mysqlsh
MySQL JS > dba.configureLocalInstance();
MySQL JS > \q

This is where we get asked a couple of questions:

Detecting the configuration file...
Found configuration file at standard location: /etc/my.cnf

Do you want to modify this file? [Y|n]: [Y|n]: 
MySQL user 'root' cannot be verified to have access to other hosts in the network.

1) Create root@% with necessary grants
2) Create account with different name
3) Continue without creating account
4) Cancel

Please select an option [1]: 2
Please provide an account name (e.g: icroot@%) to have it created with the necessary
privileges or leave empty and press Enter to cancel.

Account Name: ic
Password for new account: ******
Confirm password: ******
Validating instance...

As you can see, I chose option 2, to create the user ic/oracle for my cluster. You can choose the name you wish here.

It’s quite normal for the output to tell you to restart the mysql server for the changes in the config to take place. We’ll do it after the next step.

Now, because MEB needs a couple of CSV type tables to run the full and incremental backup (mysql.backup_history & mysql.backup_progress), and if we want MEB to run automatically and not have to script the incremental backup (using end_lsn) and also well want these tables to monitor them from Enterprise Monitor and as well as reduce the amount of warnings being produced on each backup run, we’ll have to comment out the “disabled_storage_engines” entry in the config file:

sudo vi + /etc/my.cnf
 ..
 ..
 #disabled_storage_engines = MyISAM,BLACKHOLE,FEDERATED,CSV,ARCHIVE
 ..

And restart (remember, we’re still doing this on all 3 nodes):

systemctl restart mysqld.service
mysqlsh
 dba.checkInstanceConfiguration('ic:oracle@10.0.0.12:3306');

We should get an “OK” here. Using older versions of Shell might give you otherwise.

Once the mysql server has restarted, run the following. Specific privileges are required for our “ic” user:

mysql -uroot -poracle << EOF 
SET sql_log_bin = OFF; 
 GRANT SELECT ON *.* TO ic@'%' WITH GRANT OPTION;
 FLUSH PRIVILEGES;
SET sql_log_bin = ON; 
EOF

So, everything we’ve done up until now, has been done on all 3 OL servers.

Now, on just one server, doesn’t matter which, let’s create our InnoDB Cluster:

mysqlsh --uri ic:oracle@10.0.0.12:3306

var cluster = dba.createCluster( 'myCluster');

cluster.status();

Now we’ve got 1 instance in our cluster, let’s add the other 2 that we’ve configured at the same time as this one.

cluster.addInstance('ic:oracle@10.0.0.11:3306');
cluster.addInstance('ic:oracle@10.0.0.13:3306');

Remember, by default we’re in single primary mode, i.e. there will only be 1 RW server and the other 2, the ones we just added, will be in SUPER READ ONLY mode, i.e. no writes will be allowed. If you want to go to multi master mode, then that’s a different createCluster option.

Now to check that all 3 are in the same cluster:

cluster.status();

Ok, a slight word of warning, and if you haven’t paid attention to the other referenced documentation mentioned at the beginning, if you come out of the mysqlsh session, and want to go back in and get the cluster status, you’ll need to run this first:

var cluster = dba.getCluster("myCluster")

Otherwise shell will just complain saying that it doesn’t understand what “cluster.” is.

And so, after that, we should see something like the following:

{
   "clusterName": "myCluster", 
   "defaultReplicaSet": {
       "name": "default", 
       "primary": "10.0.0.12:3306", 
       "ssl": "REQUIRED", 
       "status": "OK", 
       "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
       "topology": {
           "10.0.0.11:3306": {
               "address": "10.0.0.11:3306", 
               "mode": "R/O", 
               "readReplicas": {}, 
               "role": "HA", 
               "status": "ONLINE"
           }, 
           "10.0.0.12:3306": {
               "address": "10.0.0.12:3306", 
               "mode": "R/W", 
               "readReplicas": {}, 
               "role": "HA", 
               "status": "ONLINE"
           }, 
           "10.0.0.13:3306": {
               "address": "10.0.0.13:3306", 
               "mode": "R/O", 
               "readReplicas": {}, 
               "role": "HA", 
               "status": "ONLINE"
           }
   }
 }
}

Looking good! Before we jump for joy, we have to make the changes persistent in the my.cnf. This has to be done locally on each machine:

On ic2:

mysqlsh --uri ic:oracle@10.0.0.12:3306
dba.configureLocalInstance('ic:oracle@localhost:3306');

On ic1:
Say Y to when it asks you to override SUPER READ ONLY mode, otherwise no changes will occur.

mysqlsh --uri ic:oracle@10.0.0.11:3306
dba.configureLocalInstance('ic:oracle@localhost:3306');

On ic3:
Again, say Y to when it asks you to override SUPER READ ONLY mode, otherwise no changes will occur.

mysqlsh --uri ic:oracle@10.0.0.13:3306
dba.configureLocalInstance('ic:oracle@localhost:3306');

And, now, just to make sure, back on the master server, or even from any of the 3 nodes:

mysqlsh --uri ic:oracle@10.0.0.12:3306
var cluster = dba.getCluster( 'myCluster');
cluster.status();

With our InnoDB Cluster up and running, all we need to do now is to let the application connect seamlessly using MySQL Router.

I’m installing on the same servers as the instances but the normal operation would be to install each mysqlrouter process on the app server.

And to make things more clearer, I’ll only install 2 mysqlrouters on ic1 & ic3:

On ic1:

sudo mkdir /opt/mysql/router
chown -R mysql:mysql /opt/mysql/router
mysqlrouter --bootstrap ic:oracle@10.0.0.12:3306 --directory /opt/mysql/router --user=mysql 
/opt/mysql/router/start.sh

On ic3:

sudo mkdir /opt/mysql/router
chown -R mysql:mysql /opt/mysql/router
mysqlrouter --bootstrap ic:oracle@10.0.0.12:3306 --directory /opt/mysql/router --user=mysql --force
/opt/mysql/router/start.sh

As we’re on the same servers as the MySQL instance, we could even forget tcp/ip access and use sockets:

mysqlrouter --bootstrap ic:oracle@10.0.0.12:3306 --directory /opt/mysql/router --conf-use-sockets

In fact, maybe installing a 3rd mysqlrouter process away from the app servers isn’t a bad idea.. and the connect string from the app could choose the IP’s & sockets of the first 2 mysqlrouters, and if they fail, there’s a fall back mysqlrouter connection.

So, now they’re configured, let’s test them. Try this on both ic3 (or ic1):

mysql -uic -poracle -P6446 -h10.0.0.11 -e "select @@hostname"
mysql -uic -poracle -P6446 -h10.0.0.11 -e "select @@hostname"
mysql -uic -poracle -P6446 -h10.0.0.11 -e "select @@hostname"

Remember, by default, port 6446 is for RW and port 6447 read-only, and that is just read-only.

mysql -uic -poracle -P6447 -h10.0.0.11 -e "select @@hostname"
mysql -uic -poracle -P6447 -h10.0.0.11 -e "select @@hostname"
mysql -uic -poracle -P6447 -h10.0.0.11 -e "select @@hostname"

# And on ic1 or ic3:

mysql -uic -poracle -P6446 -h10.0.0.13 -e "select @@hostname"
mysql -uic -poracle -P6446 -h10.0.0.13 -e "select @@hostname"
mysql -uic -poracle -P6446 -h10.0.0.13 -e "select @@hostname"

mysql -uic -poracle -P6447 -h10.0.0.13 -e "select @@hostname"
mysql -uic -poracle -P6447 -h10.0.0.13 -e "select @@hostname"
mysql -uic -poracle -P6447 -h10.0.0.13 -e "select @@hostname"

Obviously, you can personalize mysqlrouter. Let’s read from all 3 instances independently of whomever is primary or secondary in the innodbcluster_metadata_schema cache:

vi /opt/mysql/router/mysqlrouter.conf

..
[metadata_cache:myCluster]
router_id=2
bootstrap_server_addresses=mysql://10.0.0.12:3306,mysql://10.0.0.11:3306,mysql://10.0.0.13:3306
user=mysql_router2_r7wlgz3p4fct
metadata_cluster=myCluster
ttl=300
#ttl=30
..
[routing:myCluster_default_ro]
bind_address=0.0.0.0
bind_port=6447
#destinations=metadata-cache://myCluster/default?role=SECONDARY
destinations=ic1,ic2,ic3
mode=read-only
protocol=classic
..

Thanks for reading. Hopefully this has helped someone…

…. Next up, a complete disaster recovery situation …

Posted in Backup, InnoDB, innodb cluster, MEB, MySQL, MySQL Enterprise Backup, MySQL Enterprise Edition, numactl, Oracle, PITR, Recovery | Tagged , , , , , , , , , , , | Leave a comment

MySQL Cluster in Docker: quick step by step

I just wanted to share the ‘how to’ in getting to know MySQL Cluster a bit more, via Docker, as mentioned in the docker store install guide:

https://store.docker.com/community/images/mysql/mysql-cluster.

all I’m doing here is replicating what’s mentioned there. Hopefully someone might find it interesting, if considering wanting to get a MySQL Cluster up and running to see how it works.

For more info on MySQL in Docker, I’d highly recommend looking at this blog:

http://datacharmer.blogspot.com/2015/10/mysql-docker-operations-part-1-getting.html

(Thanks to Giuseppe here!)

So, what’s needed? Here we go.

Get the cluster image from docker:

$ docker pull mysql/mysql-cluster

Create the private network we’ll need:

$ docker network create cluster --subnet=192.168.0.0/16

Start the management node:

$ docker run -d --net=cluster --name=management1 --ip=192.168.0.2 mysql/mysql-cluster ndb_mgmd

Start the datanodes:

$ docker run -d --net=cluster --name=ndb1 --ip=192.168.0.3 mysql/mysql-cluster ndbd
$ docker run -d --net=cluster --name=ndb2 --ip=192.168.0.4 mysql/mysql-cluster ndbd

And finally the MySQL server node:

$ docker run -d --net=cluster --name=mysql1 --ip=192.168.0.10 -e MYSQL_RANDOM_ROOT_PASSWORD=true mysql/mysql-cluster mysqld

Change the root password on the mysqld node:

$ docker logs mysql1 2>&1 | grep PASSWORD
$ docker exec -it mysql1 mysql -uroot -p
mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'MyNewPass';

And last but not least, see how the cluster is running from the management node:

$ docker run -it --net=cluster mysql/mysql-cluster ndb_mgm

ndb_mgm> show

Obviously, now we can connect to the sqlnode and create a database using the NDB storage engine, and enter some rows in the tables and take it from there:

$ docker exec -it mysql1 mysql -uroot -p

mysql> create database nexus; use nexus;

mysql> create table replicant ... engine=NDBCLUSTER;

Hope this helps someone! Thanks for reading.

Posted in Uncategorized | Leave a comment

Taming a ‘wild’ NDB 7.3 with Cluster Manager 1.4.3 & direct upgrade to 7.5.

Well, since working with outdated clusters and upgrade paths that quickly become obsolete, as in my last post, Migrating/importing NDB to Cluster Manager w/ version upgrade. , I wanted to share that we can also use Cluster Manager, mcm, to upgrade NDB Cluster from 7.3 directly to 7.5. So we can start using the mcm new features like autotune that help guide us towards some Cluster tuning, or 7.5 new features like READ_BACKUP or FULLY_REPLICATED tables. Sometimes table comments can be soo important…

So, as with the last scenario, a 7.3.8 ‘wild’ cluster. A reminder of what we had:

sn1:
 ndb_mgmd --configdir=/usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64/conf -f /usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64/conf/config.ini --config-cache=false --ndb-nodeid=1
dn1 & dn2:
 ndbmtd --ndb-nodeid=3 -c 10.0.0.10
 ndbmtd --ndb-nodeid=4 -c 10.0.0.10
sn1 & sn2:
 mysqld --defaults-file=/usr/local/mysql/conf/my.cnf --user=mysql --ndb-nodeid=10 &
 mysqld --defaults-file=/usr/local/mysql/conf/my.cnf --user=mysql --ndb-nodeid=11 &

Double checking everythig is up and running (the the ‘mysql’ os user… not root):

ndb_mgm -e show
 Connected to Management Server at: localhost:1186
 Cluster Configuration
 ---------------------
 [ndbd(NDB)] 2 node(s)
 id=3 @10.0.0.12 (mysql-5.6.22 ndb-7.3.8, Nodegroup: 0, *)
 id=4 @10.0.0.13 (mysql-5.6.22 ndb-7.3.8, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
 id=1 @10.0.0.10 (mysql-5.6.22 ndb-7.3.8)

[mysqld(API)] 5 node(s)
 id=10 @10.0.0.10 (mysql-5.6.22 ndb-7.3.8)
 id=11 @10.0.0.11 (mysql-5.6.22 ndb-7.3.8)
 id=12 (not connected, accepting connect from any host)
 id=13 (not connected, accepting connect from any host)
 id=14 (not connected, accepting connect from any host)

As before, the ‘mcmd’@’localhost’ user exists on both sqlnodes.

PID files change time:

sqlnodes:

cd /opt/mysql/738/data
 more sn1.pid
 more sn2.pid

ps -ef | grep mysqld | grep -v grep | awk '{ print $2 }'

cp sn1.pid ndb_10.pid
cp sn2.pid ndb_11.pid

dnodes:

cd /opt/mysql/738/ndbd_data/
more *.pid

kill -9 6523
kill -9 2608

sed -i 's/6523/6524/' ndb_3.pid
sed -i 's/2608/2609/' ndb_4.pid

more *.pid

With our wild cluster semi-tamed, it’s time for the upgrade.

 

Now time to create the 7.5.7 MCM cluster to import into.

On all hosts:

mkdir -p /opt/mysql/757/mcm_data
cd /opt/mysql/757/mcm_data
chown -R mysql:mysql .

Add paths to MCM binaries to the os user ‘mysql’ .bash_profile:

vi ~/.bash_profile
  export PATH=$PATH:/usr/local/mcm1.4.3/bin

Now to uncompress the latest version of cluster:

cd /usr/local
 tar zxvf mysql-cluster-advanced-7.5.7-linux-glibc2.12-x86_64.tar.gz
 cd mysql-cluster-advanced-7.5.7-linux-glibc2.12-x86_64
 chown -R mysql:mysql .

Now to add a directory where we have read-write permissions, and that is also meaningful to us:

cd ../mcm1.4.3/
 vi etc/mcmd.ini
   ...
   log-file = /opt/mysql/757/mcm_data/mcmd.log
   ...
   manager-directory = /opt/mysql/757/mcm_data

Now to get mcm running. Remember to do this on all nodes:

su - mysql
 mcmd --defaults-file=/usr/local/mcm1.4.3/etc/mcmd.ini --daemon

Creating the cluster in mcm to import our wild cluster into, as in the previous scenario:

mcm> create site --hosts=10.0.0.10,10.0.0.11,10.0.0.12,10.0.0.13 mysite;

Add cluster binaries & create cluster ready for IMPORT:

mcm> add package --basedir=/usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64 cluster738;
mcm> create cluster --import --package=cluster738
 --processhosts=ndb_mgmd:1@10.0.0.10,ndbmtd:3@10.0.0.12,ndbmtd:4@10.0.0.13 mycluster;

mcm> show status -r mycluster;

mcm> add process --processhosts=mysqld:10@10.0.0.10,mysqld:11@10.0.0.11 mycluster;
mcm> add process --processhosts=ndbapi:12@*,ndbapi:13@*,ndbapi:14@* mycluster;

Ok, looking good. “What about the previous config?” I hear you say:

mcm> import config --dryrun mycluster;

Ok, MCM complains about the mysqld’s processes running as root at the os level, so we can make this change via mcm on both mysqld’s:

mcm> set user:mysqld=mysql mycluster;

And re-run the dryrun config check:

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Command result                                                                                                                                                         |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Import checks passed. Please check /opt/mysql/757/mcm_data/clusters/mycluster/tmp/import_config.e40f3e52_97_3.mcm on host 10.0.0.13 for settings that will be applied. |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 1 row in set (5.91 sec)

As you can see, this message is slightly different to the one received in MCM 1.4.0. Makes it easier to check.

So, if we’re happy, an importing we go:

mcm> import config mycluster;
 +--------------------------------------------------------------------------------------------+
 | Command result                                                                             |
 +--------------------------------------------------------------------------------------------+
 | Configuration imported successfully. Please manually verify the settings before proceeding |
 +--------------------------------------------------------------------------------------------+
 1 row in set (5.90 sec)

That went well. Now for the real McCoy:

mcm> import cluster mycluster;
 +-------------------------------+
 | Command result                |
 +-------------------------------+
 | Cluster imported successfully |
 +-------------------------------+
 1 row in set (3.32 sec)

Making sure all’s up and running:

mcm> show status -r mycluster;
 +--------+----------+-----------+---------+-----------+------------+
 | NodeId | Process  | Host      | Status  | Nodegroup | Package    |
 +--------+----------+-----------+---------+-----------+------------+
 | 1      | ndb_mgmd | 10.0.0.10 | running |           | cluster738 |
 | 3      | ndbmtd   | 10.0.0.12 | running | 0         | cluster738 |
 | 4      | ndbmtd   | 10.0.0.13 | running | 0         | cluster738 |
 | 10     | mysqld   | 10.0.0.10 | running |           | cluster738 |
 | 11     | mysqld   | 10.0.0.11 | running |           | cluster738 |
 | 12     | ndbapi   | *         | added   |           |            |
 | 13     | ndbapi   | *         | added   |           |            |
 | 14     | ndbapi   | *         | added   |           |            |
 +--------+----------+-----------+---------+-----------+------------+
 8 rows in set (0.06 sec)

I just wanted to set expectations here. My restart / upgrade / import times are because my cluster data set is almost non-existent. The more DataMemory & IndexMemory we use, in addition to the cores we have available as well as disk speed will all determine how fast each node restart is, hence, how long the import & upgrade process is.
Upgrade time again: Direct from 7.3.8 to 7.5.7.

This is how we check we’re still ok, i.e., only have 1 binary package known to mcm:

mcm> list packages mysite;

And now add the new 7.5 binaries:

mcm> add package --basedir=/usr/local/mysql-cluster-advanced-7.5.7-linux-glibc2.12-x86_64 cluster757;

ERROR 7018 (00MGR): Sync spawning '/usr/local/mysql-cluster-advanced-7.5.7-linux-glibc2.12-x86_64/bin/mysqld' exited with 127. Started in , with stdout='' and stderr='/usr/local/mysql-cluster-advanced-7.5.7-linux-glibc2.12-x86_64/bin/mysqld: error while loading shared libraries: libnuma.so.1: cannot open shared object file: No such file or directory'

Make sure you’ve got your version of libnuma installed (hint: https://www.rpmfind.net/linux/rpm2html/search.php?query=libnuma.so.1()(64bit)). This is needed for MySQL Server 5.7, so it’s not really a MySQL Cluster nor Cluster Manager error in itself.

I’m on CentOS 6.5 so:

# sudo yum install numactl

And retry:

mcm> add package --basedir=/usr/local/mysql-cluster-advanced-7.5.7-linux-glibc2.12-x86_64 cluster757;
 +----------------------------+
 | Command result             |
 +----------------------------+
 | Package added successfully |
 +----------------------------+
 1 row in set (0.66 sec)

mcm> list packages mysite;
 +------------+----------------------------------------------------------------+-----------------------------------------+
 | Package    | Path                                                           | Hosts                                   |
 +------------+----------------------------------------------------------------+-----------------------------------------+
 | cluster738 | /usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64       | 10.0.0.10,10.0.0.11,10.0.0.12,10.0.0.13 |
 | cluster757 | /usr/local/mysql-cluster-advanced-7.5.7-linux-glibc2.12-x86_64 | 10.0.0.10,10.0.0.11,10.0.0.12,10.0.0.13 |
 +------------+----------------------------------------------------------------+-----------------------------------------+
 2 rows in set (0.05 sec)

Ok, so now we can upgrade:

mcm> upgrade cluster --package=cluster757 mycluster;

 +-------------------------------+
 | Command result                |
 +-------------------------------+
 | Cluster upgraded successfully |
 +-------------------------------+
 1 row in set (2 min 3.69 sec)

Double check mcm is now using the latest binaries. You can also check at o.s. level via ‘ps -ef | grep mysqld’ / ‘ps -ef | grep ndbmtd’.

mcm> show status -r mycluster;

So, now to use one of mcm’s new features:

mcm> autotune --dryrun --writeload=low realtime mycluster;

 +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | Command result                                                                                                                                                                |
 +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | Autotuning calculation complete. Please check /opt/mysql/757/mcm_data/clusters/mycluster/tmp/autotune.e40f3e52_314_3.mcm on host 10.0.0.13 for settings that will be applied. |
 +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 1 row in set (1.28 sec)

And we cat that file to see if contains:

# The following will be applied to the current cluster config:
 set HeartbeatIntervalDbDb:ndbmtd=1500 mycluster;
 set HeartbeatIntervalDbApi:ndbmtd=1500 mycluster;
 set RedoBuffer:ndbmtd=32M mycluster;
 set SendBufferMemory:ndbmtd+ndbmtd=2M mycluster;
 set ReceiveBufferMemory:ndbmtd+ndbmtd=2M mycluster;
 set SendBufferMemory:ndb_mgmd+ndbmtd=2M mycluster;
 set ReceiveBufferMemory:ndb_mgmd+ndbmtd=2M mycluster;
 set SendBufferMemory:mysqld+ndbmtd=2M mycluster;
 set ReceiveBufferMemory:mysqld+ndbmtd=2M mycluster;
 set SendBufferMemory:ndbapi+ndbmtd=2M mycluster;
 set ReceiveBufferMemory:ndbapi+ndbmtd=2M mycluster;
 set SharedGlobalMemory:ndbmtd=20M mycluster;
 set FragmentLogFileSize:ndbmtd=256M mycluster;
 set NoOfFragmentLogFiles:ndbmtd=3 mycluster;

Obviously now we can also look at using READ_BACKUP and other NDB 7.5 new features. An upgrade process far from painless.

Hope this helps someone.

Posted in CGE, Cluster 7.3, Cluster Manager, libnuma.so.1, MySQL, MySQL Cluster, numactl, Oracle, Uncategorized, upgrade | Tagged , , , , , , , , | Leave a comment

Migrating/importing NDB to Cluster Manager w/ version upgrade.

I’ve had some questions from people using MySQL Cluster GPL and wanting to move to using MySQL Cluster Carrier Grade Edition, i.e., they want to use MySQL Cluster Manager, MCM, to make their lives much easier, in particular, upgrading (as well as config change ease and backup history).

All I want to do here is to share with you my personal experience on migrating what’s considered a ‘wild’ NDB Cluster to a MCM managed cluster. It’s just as simple to follow the manual chapter Importing a Cluster into MySQL Cluster Manager so at least you can see how I did it, and it might help someone.

[ If you’re not migrating but just looking for further information on NDB Cluster, and came across this post, please please PLEASE look at the MySQL Cluster: Getting Started page, in particular the Evaluation Guide. ]

So, what’s the scenario:

  • NDB 7.3 (migrating to 7.4)
  • Centos 6.5
  • 4 vm’s / 4 servers: 2x datanodes (dn1 & dn2), 2x sqlnodes (sn1 & sn2) & 1x mgmt node (sn1).

Technical objective:

  • To upgrade NDB to 7.4 (or follow the procedure to go from/to any ‘consecutive’ versions).

Personal objective:

  • Share that this process can be done, how it’s done, and share the issues that I came across in my non-production environment.

Let’s take for granted that your wild cluster is installed and working with a similar configuration to:

  • Installed as o.s. root user (and all files are owned:grouped as such…).
  • mysql o.s. user exists.
  • tarball install / binaries are all found at /usr/local
  • Datadirs are at /opt/mysql/738: mgm_data (for management node logs), ndbd_data (for datanode info) & data (for sqlnodes).
  • config.ini & my.cnf are found at $BASEDIR/conf (created this dir myself ‘cos I’m like that).
  • All other things are default, i.e. sqlnode / datanode / mgmtnode ports, sockets are located in datadirs, etc.
  • There are some changes in the config files, as would be in a prod env.

And the cluster setup looks something like this

# ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @10.0.0.12 (mysql-5.6.22 ndb-7.3.8, Nodegroup: 0, *)
id=4 @10.0.0.13 (mysql-5.6.22 ndb-7.3.8, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @10.0.0.10 (mysql-5.6.22 ndb-7.3.8)

[mysqld(API)] 5 node(s)
id=10 @10.0.0.10 (mysql-5.6.22 ndb-7.3.8)
id=11 @10.0.0.11 (mysql-5.6.22 ndb-7.3.8)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
id=14 (not connected, accepting connect from any host)

Getting ready:

Remember, all I’m doing here is sharing my experience of following Importing a Cluster into MySQL Cluster Manager so hopefully it will help someone see it in a different env, and for a specific use (upgrade, go CGE, etc.).

Preparing wild cluster:

In order to import the wild cluster into MCM, we need to ‘shape’or ‘tame’ the wild cluster, that is, to adjust certain configuration not normally found nor done in a cluster environment, in order to allow the import process to be successful.
On both of the sqlnodes, we need a mcmd user, other wise it’s going to be impossible for MCM to manage them:

mysql> CREATE USER 'mcmd'@'localhost' IDENTIFIED BY 'super';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'mcmd'@'localhost' WITH GRANT OPTION;

Every node of the wild cluster has been started with its node ID specified with the –ndb-nodeid option at the command line:

# ps -ef | grep ndb
root 3025    1 0 Aug16 ? 00:00:25 ndbmtd -c 10.0.0.10 --initial
root 3026 3025 8 Aug16 ? 01:24:30 ndbmtd -c 10.0.0.10 --initial

And also management node has to be started without caching the config. Changing / restarting each process (both sqlnodes & dnodes) can be done in this point making both changes at the same time / same restart:

# ndb_mgmd --configdir=/usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64/conf \
-f /usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64/conf/config.ini \
--config-cache=false --ndb-nodeid=1
MySQL Cluster Management Server mysql-5.6.22 ndb-7.3.8
2017-08-17 08:35:57 [MgmtSrvr] INFO -- Skipping check of config directory since config cache is disabled.

# ps -ef | grep ndb
root 3560    1 0 08:42 ? 00:00:00 ndbmtd --ndb-nodeid=3 -c 10.0.0.10
root 3561 3560 8 08:42 ? 00:00:48 ndbmtd --ndb-nodeid=3 -c 10.0.0.10

root 3694    1 0 08:38 ? 00:00:00 ndbmtd --ndb-nodeid=4 -c 10.0.0.10
root 3695 3694 8 08:38 ? 00:00:16 ndbmtd --ndb-nodeid=4 -c 10.0.0.10

# ps -ef | grep mysqld
mysql 3755 2975 7 08:49 pts/0 00:00:00 mysqld --defaults-file=/usr/local/mysql/conf/my.cnf --user=mysql --ndb-nodeid=10

mysql 3694 2984 2 08:50 pts/0 00:00:01 mysqld --defaults-file=/usr/local/mysql/conf/my.cnf --user=mysql --ndb-nodeid=11

Ok, all well and good up to here.

Just a word of warning, the MCM daemon, mcmd, that runs on all servers of a cluster platform, can not be run as the os user ‘root’, hence, normally is run as ‘mysql’. This means that all the other processes of our wild cluster should be run as ‘mysql’ too. Other wise, the mcmd daemon can’t bring under control these processes (stop/start, etc.) This also impacts the file permissions (data files, pid files, logs, directories) so maybe it’s a good time to do that now, as we’re talking about a production environment. SOOOOOOO, when we stop the management node to restart it, let’s make that change, first file & directory perms, and then starting the process itself. As you probably won’t have separate binaries for each process, and the management node(s) will/should be sharing binaries with the sqlnodes, changing ownership of the files / dir’s shouln’t be too much of a problem.

eg. On sn1 (for the management node restart only):

# cd /opt/mysql/738
# chown -R mysql:mysql .
# ls -lrt
total 16
drwxr-xr-x. 7 mysql mysql 4096 ago 17 11:40 data
drwxr-xr-x. 2 mysql mysql 4096 ago 17 13:01 mgm_data
# chown -R mysql:mysql .
# ls -lrt 
total 180
-rw-r--r--.  1 mysql mysql   2496 ene 9 2015 README
-rw-r--r--.  1 mysql mysql  17987 ene 9 2015 COPYING
-rw-r--r--.  1 mysql mysql 101821 ene 9 2015 INSTALL-BINARY
drwxr-xr-x.  4 mysql mysql   4096 ago 16 15:25 include
drwxr-xr-x.  2 mysql mysql   4096 ago 16 15:26 bin
drwxr-xr-x.  3 mysql mysql   4096 ago 16 15:26 lib
drwxr-xr-x. 32 mysql mysql   4096 ago 16 15:26 share
drwxr-xr-x.  2 mysql mysql   4096 ago 16 15:26 support-files
drwxr-xr-x.  2 mysql mysql   4096 ago 16 15:26 docs
drwxr-xr-x.  3 mysql mysql   4096 ago 16 15:26 data
drwxr-xr-x.  4 mysql mysql   4096 ago 16 15:26 man
drwxr-xr-x.  4 mysql mysql   4096 ago 16 15:26 sql-bench
drwxr-xr-x.  2 mysql mysql   4096 ago 16 15:26 scripts
drwxr-xr-x. 10 mysql mysql   4096 ago 16 15:26 mysql-test
-rw-r--r--.  1 mysql mysql    943 ago 16 16:04 my.cnf
-rw-r--r--.  1 mysql mysql    943 ago 16 16:04 my-new.cnf
drwxr-xr-x.  2 mysql mysql   4096 ago 16 16:19 conf
# pwd
/usr/local/mysql

Now it’s time to kill the dnode angel process in preparation for MCM to control the processes (otherwise when MCM stops that process, the angel process tries to restart it, out of MCM’s control and that’s when we hit problems):

# ps -ef | grep ndb
root 3560 1 0 08:42 ? 00:00:02 ndbmtd --ndb-nodeid=3 -c 10.0.0.10
root 3561 3560 8 08:42 ? 00:08:32 ndbmtd --ndb-nodeid=3 -c 10.0.0.10

# kill -9 3560
# ps -ef | grep ndb
root 3561 1 8 08:42 ? 00:08:38 ndbmtd --ndb-nodeid=3 -c 10.0.0.10

# ps aux | grep ndb
root 3561 8.6 63.9 1007296 474120 ? SLl 08:42 8:38 ndbmtd --ndb-nodeid=3 -c 10.0.0.10

Do on both dnodes.

Now we have to adust the .pid files within the dnode’s datadir as it holds the angel process ID:

# cd /opt/mysql/738/ndbd_data/
# ls -lrt
-rw-r--r--. 1 mysql mysql 4 Aug 17 13:20 ndb_3.pid
# sed -i 's/3560/3561/' ndb_3.pid
# more ndb_3.pid
3561

Again, to be done on both dnodes.

Now for the sqlnodes, we have to rename the actual name of the pid file, not the contents :

# cd /opt/mysql/738/data/
# ls -lrt *.pid
-rw-rw----. 1 mysql mysql 5 ago 17 08:49 sn1.pid

# more *.pid
3755

# ps -ef | grep mysqld | grep -v grep
mysql 3755 2975 1 08:49 pts/0 00:02:27 mysqld --defaults-file=/usr/local/mysql/conf/my.cnf --user=mysql --ndb-nodeid=10

Time to copy, not replace, the .pid file to have a cluster-ready naming convention (ndb_.pid):

# cp sn1.pid ndb_10.pid

On the other sqlnode:

# cd /opt/mysql/738/data/
# cp sn2.pid ndb_11.pid

Now time to create the MCM-managed Cluster to import into.

First up, we’ll need an MCM datadir, to store the datadirs, logs, etc. (if you want to change this later, it’s so much easier from MCM, using the “set” command, so just do it then):

# mkdir /opt/mysql/748/mcm_data
# cd /opt/mysql/748/mcm_data
# chown -R mysql:mysql .

As mcmd needs to run as mysql, change permissions of the binaries and also add the manager-directory of your choice:

# cd /usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/mcm1.4.0/etc
# chgrp -R mysql .

# vi mcmd.ini
   ..  
   manager-directory = /opt/mysql/748/mcm_data

Let’s make it easier in our env to execute all this:

# su - mysql
# vi .bash_profile
   ..
   export PATH=$PATH:/usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/mcm1.4.0/bin

And let’s start MySQL Cluster Manager, i.e. mcmd, as the o.s. user ‘mysql’:

# mcmd --defaults-file=/usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/mcm1.4.0/etc/mcmd.ini --daemon

Ok. All working fine.

Let’s create the cluster to import into.

Ok, so I used the o.s. root user to create everything. I suppose I’m used to it, but feel free to do it however you see fit. It won’t make any difference to mcmd as that’s running as mysql so carry on:

# sudo -i
# export PATH=$PATH:/usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/mcm1.4.0/bin

# mcm

MCM needs a site with all the hosts that make up the cluster. By the way, if you forget a command, or want to see what else is there, “list commands;”:

mcm> create site --hosts=10.0.0.10,10.0.0.11,10.0.0.12,10.0.0.13 mysite;

Add the path to the cluster binaries that we’re using now:

mcm> add package --basedir=/usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64 cluster738;

A little test here, thinking, “if mcm is so clever, maybe it can detect that we’re in 7.3.8 and we can use 7.4.8 and create the cluster in that version to import into, and we’ve upgraded and imported in one foul swoop!“. Alas, although that’s a nice idea, but after creating the cluster with the 748 package for import, and adding processes / nodes to the cluster, upon running the dryrun config check, it errors out :

mcm> add package --basedir=/usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/cluster cluster748;
mcm> import config --dryrun mycluster;
ERROR 5307 (00MGR): Imported process version 7.3.8 does not match configured process mysqld 11 version 7.4.8 for cluster mycluster

So, back to the 7.3.8 binaries / package:

mcm> add package --basedir=/usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64 cluster738;
mcm> create cluster --import --package=cluster738 
--processhosts=ndb_mgmd:1@10.0.0.10,ndbmtd:3@10.0.0.12,ndbmtd:4@10.0.0.13,
mysqld:10@10.0.0.10,mysqld:11@10.0.0.11,ndbapi:12@*,ndbapi:13@*,ndbapi:14@* mycluster;

Now, this seems simple right? Ok, well it is, but you have to match the processhosts to those that appear in the “ndb_mgm -e show” out put, i.e. ndbapi/mysqld api entries that all appear there. So if you have 8 rows returned from that, you’ll need 8 entries in the –processhosts option. It will complain otherwise.

show status -r mycluster;
+--------+----------+-----------+--------+-----------+------------+
| NodeId | Process  | Host      | Status | Nodegroup | Package    |
+--------+----------+-----------+--------+-----------+------------+
| 1      | ndb_mgmd | 10.0.0.10 | import |           | cluster738 |
| 3      | ndbmtd   | 10.0.0.12 | import | n/a       | cluster738 |
| 4      | ndbmtd   | 10.0.0.13 | import | n/a       | cluster738 |
| 10     | mysqld   | 10.0.0.10 | import |           | cluster738 |
| 11     | mysqld   | 10.0.0.11 | import |           | cluster738 |
| 12     | ndbapi   | *         | import |           |            |
| 13     | ndbapi   | *         | import |           |            |
| 14     | ndbapi   | *         | import |           |            |
+--------+----------+-----------+--------+-----------+------------+
8 rows in set (0,05 sec)

Let’s do a test now we’ve got the 7.3.8 binaries assigned to the site and a shell cluster created:

mcm> import config --dryrun mycluster; 
ERROR 5321 (00MGR): No permissions for pid 3700 running on sn1

This means that processes are being executed / run by someone who isn’t the mcmd user, eg. root.
Now I need to go to each process and restart it as mysql (kill angel processes, etc.).
Also remember that mcmd can’t be run by root. As we fixed that at the beginning of this post (DIDN’T WE?) Well, I hope you don’t get that one.

+---------------------------------------------------------------------------+
| Command result                                                            |
+---------------------------------------------------------------------------+
| Import checks passed. Please check log for settings that will be applied. |
+---------------------------------------------------------------------------+
1 row in set (5.55 sec)

Now that all the checks are passed, it leads me to think: what about all the personalized config that I have in the config.ini & my.cnf files. Well, we could run:

mcm> set DataMemory:ndbmtd=20M, IndexMemory:ndbmtd=10M, DiskPageBufferMemory:ndbmtd=4M, StringMemory:ndbmtd=5, MaxNoOfConcurrentOperations:ndbmtd=2K, MaxNoOfConcurrentTransactions:ndbmtd=2K, SharedGlobalMemory:ndbmtd=500K, MaxParallelScansPerFragment:ndbmtd=16, MaxNoOfAttributes:ndbmtd=100, MaxNoOfTables:ndbmtd=20, MaxNoOfOrderedIndexes:ndbmtd=20, HeartbeatIntervalDbDb:ndbmtd=500, HeartbeatIntervalDbApi:ndbmtd=500, TransactionInactiveTimeout:ndbmtd=500, LockPagesInMainMemory:ndbmtd=1, ODirect:ndbmtd=1, MaxNoOfExecutionThreads:ndbmtd=4, RedoBuffer:ndbmtd=32M mycluster ;

+-----------------------------------+
| Command result                    |
+-----------------------------------+
| Cluster reconfigured successfully |
+-----------------------------------+
1 row in set (0.60 sec)

But there is no need, because MCM will do all that for you. As the refman mentions, we’ll need to go to the MCM manager directory and it’s in there:

# cd /opt/mysql/748/mcm_data/clusters/mycluster/tmp
# ls -lrt
-rw-rw-rw-. 1 mysql mysql 3284 Aug 17 13:43 import_config.a6905b23_225_3.mcm
mcm> import config mycluster;
+--------------------------------------------------------------------------------------------+
| Command result                                                                             |
+--------------------------------------------------------------------------------------------+
| Configuration imported successfully. Please manually verify the settings before proceeding |
+--------------------------------------------------------------------------------------------+
1 row in set (5.58 sec)

Now to import:

mcm> import cluster mycluster;

+-------------------------------+
| Command result                |
+-------------------------------+
| Cluster imported successfully |
+-------------------------------+
1 row in set (3.04 sec)

Let’s make sure the Status has changed from import:

mcm> show status -r mycluster;
+--------+----------+-----------+---------+-----------+------------+
| NodeId | Process  | Host      | Status  | Nodegroup | Package    |
+--------+----------+-----------+---------+-----------+------------+
| 1      | ndb_mgmd | 10.0.0.10 | running |           | cluster738 |
| 3      | ndbmtd   | 10.0.0.12 | running | 0         | cluster738 |
| 4      | ndbmtd   | 10.0.0.13 | running | 0         | cluster738 |
| 10     | mysqld   | 10.0.0.10 | running |           | cluster738 |
| 11     | mysqld   | 10.0.0.11 | running |           | cluster738 |
| 12     | ndbapi   | *         | added   |           |            |
| 13     | ndbapi   | *         | added   |           |            |
| 14     | ndbapi   | *         | added   |           |            |
+--------+----------+-----------+---------+-----------+------------+
8 rows in set (0.06 sec)

I know you’re probably eager to see if MCM 1.4 autotune works for 7.3 NDB’s. Well, it doesn’t I’m afraid:

mcm> autotune --dryrun --writeload=low realtime mycluster;
ERROR 5402 (00MGR): Autotuning is not supported for cluster version 7.3.8

Upgrade time:

mcm> list packages mysite;
+------------+---------------------------------------------------------------------+-----------------------------------------+
| Package    | Path                                                                | Hosts                                   |
+------------+---------------------------------------------------------------------+-----------------------------------------+
| cluster738 | /usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64            | 10.0.0.10,10.0.0.11,10.0.0.12,10.0.0.13 |
+------------+---------------------------------------------------------------------+-----------------------------------------+

mcm> add package --basedir=/usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/cluster cluster748;

mcm> list packages mysite;
+------------+---------------------------------------------------------------------+-----------------------------------------+
| Package    | Path                                                                | Hosts                                   |
+------------+---------------------------------------------------------------------+-----------------------------------------+
| cluster738 | /usr/local/mysql-cluster-gpl-7.3.8-linux-glibc2.5-x86_64            | 10.0.0.10,10.0.0.11,10.0.0.12,10.0.0.13 |
| cluster748 | /usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/cluster | 10.0.0.10,10.0.0.11,10.0.0.12,10.0.0.13 |
+------------+---------------------------------------------------------------------+-----------------------------------------+
mcm> upgrade cluster --package=cluster748 mycluster;

In another window:

mcm> show status -r mycluster;
+--------+----------+-----------+---------+-----------+------------+
| NodeId | Process  | Host      | Status  | Nodegroup | Package    |
+--------+----------+-----------+---------+-----------+------------+
| 1      | ndb_mgmd | 10.0.0.10 | running |           | cluster748 |
| 3      | ndbmtd   | 10.0.0.12 | stopped | 0         | cluster738 |
| 4      | ndbmtd   | 10.0.0.13 | running | 0         | cluster738 |
| 10     | mysqld   | 10.0.0.10 | running |           | cluster738 |
| 11     | mysqld   | 10.0.0.11 | running |           | cluster738 |
| 12     | ndbapi   | *         | added   |           |            |
| 13     | ndbapi   | *         | added   |           |            |
| 14     | ndbapi   | *         | added   |           |            |
+--------+----------+-----------+---------+-----------+------------+
8 rows in set (0.05 sec)

Ok, all going well:

mcm> show status -r mycluster;
 +--------+----------+-----------+----------+-----------+------------+
 | NodeId | Process  | Host      | Status   | Nodegroup | Package    |
 +--------+----------+-----------+----------+-----------+------------+
 | 1      | ndb_mgmd | 10.0.0.10 | running  |           | cluster748 |
 | 3      | ndbmtd   | 10.0.0.12 | running  | 0         | cluster748 |
 | 4      | ndbmtd   | 10.0.0.13 | starting | 0         | cluster748 |
 | 10     | mysqld   | 10.0.0.10 | running  |           | cluster738 |
 | 11     | mysqld   | 10.0.0.11 | running  |           | cluster738 |
 | 12     | ndbapi   | *         | added    |           |            |
 | 13     | ndbapi   | *         | added    |           |            |
 | 14     | ndbapi   | *         | added    |           |            |
 +--------+----------+-----------+----------+-----------+------------+
 8 rows in set (0.08 sec)

But, life is never as good as it is in fairy tales:

ERROR 7006 (00MGR): Process error: Node 10 : 17-08-17 17:18:13 4449 [Note] NDB Util: Starting...
 2017-08-17 17:18:13 4449 [Note] NDB Util: Wait for server start completed
 2017-08-17 17:18:13 4449 [ERROR] Aborting

2017-08-17 17:18:13 4449 [Note] Binlog end
 2017-08-17 17:18:13 4449 [Note] NDB Util: Stop
 2017-08-17 17:18:13 4449 [Note] NDB Util: Wakeup
 2017-08-17 17:18:13 4449 [Note] NDB Index Stat: Starting...
 2017-08-17 17:18:13 4449 [Note] NDB Index Stat: Wait for server start completed
mcm> show status -r mycluster;
+--------+----------+-----------+---------+-----------+------------+
| NodeId | Process  | Host      | Status  | Nodegroup | Package    |
+--------+----------+-----------+---------+-----------+------------+
| 1      | ndb_mgmd | 10.0.0.10 | running |           | cluster748 |
| 3      | ndbmtd   | 10.0.0.12 | running | 0         | cluster748 |
| 4      | ndbmtd   | 10.0.0.13 | running | 0         | cluster748 |
| 10     | mysqld   | 10.0.0.10 | failed  |           | cluster748 |
| 11     | mysqld   | 10.0.0.11 | running |           | cluster738 |
| 12     | ndbapi   | *         | added   |           |            |
| 13     | ndbapi   | *         | added   |           |            |
| 14     | ndbapi   | *         | added   |           |            |
+--------+----------+-----------+---------+-----------+------------+
8 rows in set (0.05 sec)

On sn1, the error log mysqld_738.err reads:

2017-08-17 17:25:20 4518 [Note] NDB Util: Wait for server start completed
2017-08-17 17:25:20 4518 [ERROR] Aborting

The mcmd.log:

2017-08-17 17:25:20 4518 [Note] NDB Util: Stop
2017-08-17 17:25:20 4518 [Note] NDB Util: Wakeup
2017-08-17 17:25:20 4518 [Note] NDB Index Stat: Starting...
2017-08-17 17:25:20 4518 [Note] NDB Index Stat: Wait for server start completed

2017-08-17 17:25:24.452: (message) [T0x1b1a050 CMGR ]: Got new message mgr_cluster_process_status {a6905b23 396 0} 10 failed
2017-08-17 17:25:24.457: (message) [T0x1b1a050 CMGR ]: Got new message x_trans {a6905b23 397 0} abort_trans pc=19
2017-08-17 17:25:24.459: (message) [T0x1b1a050 CMGR ]: Got new message mgr_process_operationstatus {0 0 0} 10 failed
2017-08-17 17:25:24.461: (message) [T0x1b1a050 CMGR ]: req_id 80 Operation finished with failure for configversion {a6905b23 385 3}
2017-08-17 17:25:24.466: (warning) [T0x1b1a050 CMGR ]: Operation failed : 7006 Process error: Node 10 : 17-08-17 17:25:20 4518 [Note] NDB Util: Starting...
2017-08-17 17:25:20 4518 [Note] NDB Util: Wait for server start completed
2017-08-17 17:25:20 4518 [ERROR] Aborting

And reviewing the my.cnf, the following needed to be changed as they reference the old binaries.

But most importantly, StopOnError=0 is required. That was my gotcha!

set lc_messages_dir:mysqld:10=/usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/cluster/share mycluster;
set lc_messages_dir:mysqld:11=/usr/local/mcm-1.4.0-cluster-7.4.8-linux-glibc2.5-x86-64bit/cluster/share mycluster;

set StopOnError:ndbmtd=0 mycluster;

This last command restarts the cluster, without upgrading it, leaving us:

mcm> show status -r mycluster;
+--------+----------+-----------+---------+-----------+------------+
| NodeId | Process  | Host      | Status  | Nodegroup | Package    |
+--------+----------+-----------+---------+-----------+------------+
| 1      | ndb_mgmd | 10.0.0.10 | running |           | cluster748 |
| 3      | ndbmtd   | 10.0.0.12 | running | 0         | cluster748 |
| 4      | ndbmtd   | 10.0.0.13 | running | 0         | cluster748 |
| 10     | mysqld   | 10.0.0.10 | running |           | cluster748 |
| 11     | mysqld   | 10.0.0.11 | running |           | cluster748 |
| 12     | ndbapi   | *         | added   |           |            |
| 13     | ndbapi   | *         | added   |           |            |
| 14     | ndbapi   | *         | added   |           |            |
+--------+----------+-----------+---------+-----------+------------+
8 rows in set (0.05 sec)

Looks like it’s upgraded. We know it hasn’t been able to run the checks and upgrade process, so let’s do it properly (and remember in the future that we need to review our config.ini and params):

mcm> upgrade cluster --package=cluster748 mycluster;

+-------------------------------+
| Command result                |
+-------------------------------+
| Cluster upgraded successfully |
+-------------------------------+
1 row in set (1 min 53.72 sec)

Whilst MCM upgrades, it normally restarts each process in it’s correct order and one by one, on it’s own, without any need for human intervention.

However, as the upgrade process had previously been run, and failed at the sqlnode step, this still needed to be carried out, hence, when it stopped and started each sqlnode, it did it twice, ensuring that the changed configuration we adjusted is reflected into the MCM config.

mcm> show status -r mycluster;
+--------+----------+-----------+---------+-----------+------------+
| NodeId | Process  | Host      | Status  | Nodegroup | Package    |
+--------+----------+-----------+---------+-----------+------------+
| 1      | ndb_mgmd | 10.0.0.10 | running |           | cluster748 |
| 3      | ndbmtd   | 10.0.0.12 | running | 0         | cluster748 |
| 4      | ndbmtd   | 10.0.0.13 | running | 0         | cluster748 |
| 10     | mysqld   | 10.0.0.10 | running |           | cluster748 |
| 11     | mysqld   | 10.0.0.11 | running |           | cluster748 |
| 12     | ndbapi   | *         | added   |           |            |
| 13     | ndbapi   | *         | added   |           |            |
| 14     | ndbapi   | *         | added   |           |            |
+--------+----------+-----------+---------+-----------+------------+
8 rows in set (0.01 sec)

And we’re done. Imported from the wild into an mcm-managed env, and upgraded with mcm. So much simpler.

Happy mcm’ing!

Continue reading

Posted in CGE, Cluster 7.3, Cluster Manager, MySQL, MySQL Cluster, Oracle, Uncategorized, upgrade | Tagged , , , , , , , | Leave a comment

NDB 7.4 & SYS schema: When getting locks, detecting the guilty SQL & o.s.pid.

Here’s a way to detect the sql query causing a lock or a session to fail, and also to identify the o.s.pid if need be (btw, no rocket science). “a” way.. I’m sure there are many others, so feel free to suggest, please.

So, we’re using MCM, and have created a MySQL Cluster like mentioned in the cluster intro session (in Spanish I’m afraid), using 7.4.6, which comes with 5.6.24.

With the env up and running, set up a schema, some data and run a few queries:

mysql> create database world;
mysql> use world;
Database changed
mysql> source world_ndb.sql

(world_ndb.sql, as you might guess, is the world_innodb tables script, with a little adjustment as to which storage engine to be used.)

Once created, let’s lock things up in Cluster:

mysql -uroot -h127.0.0.1 -P3306
mysql> use test; begin; select * from City where Name = 'Valencia' for update; update City set
Name = 'Bilbao' where ID=3;

As there’s no ‘end’, this will sit there, and we can see the single transaction:

select * from ndbinfo.cluster_transactions ;
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
| node_id | block_instance | transid           | state   | count_operations | outstanding_operations | inactive_seconds | client_node_id | client_block_ref |
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
|       1 |              0 | 36083772600358092 | Started |                4 |                      0 |               55 |             50 |            32776 |
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
1 row in set (0,04 sec)

Now for the 2nd session:

mysql -uroot -h127.0.0.1 -P3307
mysql> use world; begin; select * from City for update; update City set Name = 'Sevilla' where
ID=2;

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Warning (Code 1297): Got temporary error 274 'Time-out in NDB, probably caused by deadlock' from NDB
Warning (Code 1297): Got temporary error 274 'Time-out in NDB, probably caused by deadlock' from NDB
Error (Code 1205): Lock wait timeout exceeded; try restarting transaction
Query OK, 1 row affected (0,00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Now we’ll see the 2 transactions:

select * from ndbinfo.cluster_transactions ;
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
| node_id | block_instance | transid           | state   | count_operations | outstanding_operations | inactive_seconds | client_node_id | client_block_ref |
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
|       1 |              0 | 36083772600358092 | Started |                4 |                      0 |              176 |             50 |            32776 |
|       2 |              0 | 36084872111980548 | Started |                1 |                      0 |                2 |             51 |            32776 |
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
2 rows in set (0,03 sec)

Note that the client_node_id reflects clusters node_id number, 50 & 51 for the sqlnodes.

If this were InnoDB, some people might start looking into “show engine innodb status;”, like “show engine ndb status;”. But of course, this is SQL running on the sqlnode, i.e. wil lonly give us the info pertaining to cluster from this session, on this sqlnode. We’d have to connect to each and every mysqld to get more info. Not practical.

Let’s see what cluster operations are being run (and compare to what we already know):

select * from ndbinfo.cluster_operations;
+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
| node_id | block_instance | transid           | operation_type | state    | tableid | fragmentid | client_node_id | client_block_ref | tc_node_id | tc_block_no | tc_block_instance |
+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
|       1 |              1 | 36083772600358092 | UPDATE         | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|       1 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|       1 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|       2 |              1 | 36083772600358092 | UPDATE         | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|       2 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          1 |             50 |            32776 |          1 |         245 |                 0 |
+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
5 rows in set (0,02 sec)

In order to see the operations on the actual server, we’d need to connect to the appropriate server (ndbinfo.cluster_transactions.client_node_id) and run:

select * from ndbinfo.server_operations;
+---------------------+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
| mysql_connection_id | node_id | block_instance | transid           | operation_type | state    | tableid | fragmentid | client_node_id | client_block_ref | tc_node_id | tc_block_no | tc_block_instance |
+---------------------+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
|                   4 |       1 |              1 | 36083772600358092 | UPDATE         | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|                   4 |       1 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|                   4 |       1 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|                   4 |       2 |              1 | 36083772600358092 | UPDATE         | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|                   4 |       2 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          1 |             50 |            32776 |          1 |         245 |                 0 |
+---------------------+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
5 rows in set (0,00 sec)

So, now we know the mysql_connection_id, via a “show processlist” / “show full processlist” we can see the client port:

+----+-------------+-----------------+-------+---------+------+-----------------------------------+------------------+
| Id | User        | Host            | db    | Command | Time | State                             | Info             |
+----+-------------+-----------------+-------+---------+------+-----------------------------------+------------------+
|  1 | system user |                 |       | Daemon  |    0 | Waiting for event from ndbcluster | NULL             |
|  4 | root        | localhost:50493 | world | Sleep   |    1 |                                   | NULL             |
|  7 | root        | localhost:50559 | NULL  | Query   |    0 | init                              | show processlist |
|  8 | root        | localhost:50621 | NULL  | Sleep   |   42 |                                   | NULL             |
+----+-------------+-----------------+-------+---------+------+-----------------------------------+------------------+
4 rows in set (0,00 sec)
And now, with an o.s. command, get the offending process id:

$ netstat -pt | grep 50493
tcp        0      0 localhost:50493         localhost:mysql         ESTABLISHED 26352/mysql     
tcp6       0      0 localhost:mysql         localhost:50493         ESTABLISHED 24483/mysqld

26352 is the os PID. We could kill this one on this sqlnode, but first, let’s see what it’s doing, as if we don’t find out what it’s running, how can we fix it.

To get the sql text, it’s going to be far easier to use SYS schema and activate the data collection:

:: via Workbench 6.1 upwards.
:: via https://github.com/MarkLeith/mysql-sys.

and using Cluster Manager, set performance schema instrumentation on:

mcm> set performance-schema-instrument:mysqld='%=ON' mycluster;

and let’s see what happened:

select a.mysql_connection_id, a.operation_type, a.client_node_id, b.inactive_seconds
   from ndbinfo.server_operations a, ndbinfo.cluster_transactions b
   where a.transid = b.transid
   and b.inactive_seconds > 2;

+---------------------+----------------+----------------+------------------+
| mysql_connection_id | operation_type | client_node_id | inactive_seconds |
+---------------------+----------------+----------------+------------------+
|                  14 |        READ-SH |             50 |             1501 |
|                  14 |         UPDATE |             50 |             1501 |
|                  14 |        READ-SH |             50 |             1501 |
|                  14 |         UPDATE |             50 |             1501 |
|                  14 |        READ-SH |             50 |             1501 |
+---------------------+----------------+----------------+------------------+
5 rows in set (0,03 sec)

SELECT a.thread_id, a.SQL_TEXT, b.processlist_id
FROM     performance_schema.events_statements_history a, performance_schema.threads b
WHERE     a.thread_id = b.thread_id
and     b.processlist_id =
  (select distinct a.mysql_connection_id
   from ndbinfo.server_operations a, ndbinfo.cluster_transactions b
   where a.transid = b.transid
   and b.inactive_seconds > 2)
order by TIMER_START;

+-----------+-------------------------------------------------------+----------------+
| thread_id | SQL_TEXT                                              | processlist_id |
+-----------+-------------------------------------------------------+----------------+
|        32 | NULL                                                  |             14 |
|        32 | begin                                                 |             14 |
|        32 | select * from City where Name = 'Valencia' for update |             14 |
|        32 | update City set Name = 'Bilbao' where ID=3            |             14 |
|        32 | commit                                                |             14 |
|        32 | SELECT DATABASE()                                     |             14 |
|        32 | NULL                                                  |             14 |
|        32 | begin                                                 |             14 |
|        32 | select * from City where Name = 'Valencia' for update |             14 |
|        32 | update City set Name = 'Bilbao' where ID=3            |             14 |
+-----------+-------------------------------------------------------+----------------+
10 rows in set (0,05 sec)

You got me, I ran it twice in the same session. So now we can go and fix the code, and add that little “end” where it’s needed.

Hope this has helped someone.

Posted in Cluster Manager, MySQL Cluster | Tagged , , , , , , , , , , | 3 Comments

MySQL in a USB, with a little help of Fabric

If you’ve wanted to see how well MySQL Fabric actually works and how easy it makes setting up replication and switching Masters & Slaves back and forth, with a sample use case of having the master stored in a USB stick, then feel free to have a look at:

http://www.slideshare.net/keithhollman5/mysql-fabric-in-a-usb

o la versión en Español:

http://www.slideshare.net/keithhollman5/mysql-fabric-en-un-usb

Please let me know what you think.

Posted in Uncategorized | 1 Comment