NDB 7.4 & SYS schema: When getting locks, detecting the guilty SQL & o.s.pid.

Here’s a way to detect the sql query causing a lock or a session to fail, and also to identify the o.s.pid if need be (btw, no rocket science). “a” way.. I’m sure there are many others, so feel free to suggest, please.

So, we’re using MCM, and have created a MySQL Cluster like mentioned in the cluster intro session (in Spanish I’m afraid), using 7.4.6, which comes with 5.6.24.

With the env up and running, set up a schema, some data and run a few queries:

mysql> create database world;
mysql> use world;
Database changed
mysql> source world_ndb.sql

(world_ndb.sql, as you might guess, is the world_innodb tables script, with a little adjustment as to which storage engine to be used.)

Once created, let’s lock things up in Cluster:

mysql -uroot -h127.0.0.1 -P3306
mysql> use test; begin; select * from City where Name = 'Valencia' for update; update City set
Name = 'Bilbao' where ID=3;

As there’s no ‘end’, this will sit there, and we can see the single transaction:

select * from ndbinfo.cluster_transactions ;
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
| node_id | block_instance | transid           | state   | count_operations | outstanding_operations | inactive_seconds | client_node_id | client_block_ref |
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
|       1 |              0 | 36083772600358092 | Started |                4 |                      0 |               55 |             50 |            32776 |
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
1 row in set (0,04 sec)

Now for the 2nd session:

mysql -uroot -h127.0.0.1 -P3307
mysql> use world; begin; select * from City for update; update City set Name = 'Sevilla' where
ID=2;

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Warning (Code 1297): Got temporary error 274 'Time-out in NDB, probably caused by deadlock' from NDB
Warning (Code 1297): Got temporary error 274 'Time-out in NDB, probably caused by deadlock' from NDB
Error (Code 1205): Lock wait timeout exceeded; try restarting transaction
Query OK, 1 row affected (0,00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Now we’ll see the 2 transactions:

select * from ndbinfo.cluster_transactions ;
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
| node_id | block_instance | transid           | state   | count_operations | outstanding_operations | inactive_seconds | client_node_id | client_block_ref |
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
|       1 |              0 | 36083772600358092 | Started |                4 |                      0 |              176 |             50 |            32776 |
|       2 |              0 | 36084872111980548 | Started |                1 |                      0 |                2 |             51 |            32776 |
+---------+----------------+-------------------+---------+------------------+------------------------+------------------+----------------+------------------+
2 rows in set (0,03 sec)

Note that the client_node_id reflects clusters node_id number, 50 & 51 for the sqlnodes.

If this were InnoDB, some people might start looking into “show engine innodb status;”, like “show engine ndb status;”. But of course, this is SQL running on the sqlnode, i.e. wil lonly give us the info pertaining to cluster from this session, on this sqlnode. We’d have to connect to each and every mysqld to get more info. Not practical.

Let’s see what cluster operations are being run (and compare to what we already know):

select * from ndbinfo.cluster_operations;
+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
| node_id | block_instance | transid           | operation_type | state    | tableid | fragmentid | client_node_id | client_block_ref | tc_node_id | tc_block_no | tc_block_instance |
+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
|       1 |              1 | 36083772600358092 | UPDATE         | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|       1 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|       1 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|       2 |              1 | 36083772600358092 | UPDATE         | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|       2 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          1 |             50 |            32776 |          1 |         245 |                 0 |
+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
5 rows in set (0,02 sec)

In order to see the operations on the actual server, we’d need to connect to the appropriate server (ndbinfo.cluster_transactions.client_node_id) and run:

select * from ndbinfo.server_operations;
+---------------------+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
| mysql_connection_id | node_id | block_instance | transid           | operation_type | state    | tableid | fragmentid | client_node_id | client_block_ref | tc_node_id | tc_block_no | tc_block_instance |
+---------------------+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
|                   4 |       1 |              1 | 36083772600358092 | UPDATE         | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|                   4 |       1 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|                   4 |       1 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|                   4 |       2 |              1 | 36083772600358092 | UPDATE         | Prepared |      12 |          0 |             50 |            32776 |          1 |         245 |                 0 |
|                   4 |       2 |              1 | 36083772600358092 | READ-SH        | Prepared |      12 |          1 |             50 |            32776 |          1 |         245 |                 0 |
+---------------------+---------+----------------+-------------------+----------------+----------+---------+------------+----------------+------------------+------------+-------------+-------------------+
5 rows in set (0,00 sec)

So, now we know the mysql_connection_id, via a “show processlist” / “show full processlist” we can see the client port:

+----+-------------+-----------------+-------+---------+------+-----------------------------------+------------------+
| Id | User        | Host            | db    | Command | Time | State                             | Info             |
+----+-------------+-----------------+-------+---------+------+-----------------------------------+------------------+
|  1 | system user |                 |       | Daemon  |    0 | Waiting for event from ndbcluster | NULL             |
|  4 | root        | localhost:50493 | world | Sleep   |    1 |                                   | NULL             |
|  7 | root        | localhost:50559 | NULL  | Query   |    0 | init                              | show processlist |
|  8 | root        | localhost:50621 | NULL  | Sleep   |   42 |                                   | NULL             |
+----+-------------+-----------------+-------+---------+------+-----------------------------------+------------------+
4 rows in set (0,00 sec)
And now, with an o.s. command, get the offending process id:

$ netstat -pt | grep 50493
tcp        0      0 localhost:50493         localhost:mysql         ESTABLISHED 26352/mysql     
tcp6       0      0 localhost:mysql         localhost:50493         ESTABLISHED 24483/mysqld

26352 is the os PID. We could kill this one on this sqlnode, but first, let’s see what it’s doing, as if we don’t find out what it’s running, how can we fix it.

To get the sql text, it’s going to be far easier to use SYS schema and activate the data collection:

:: via Workbench 6.1 upwards.
:: via https://github.com/MarkLeith/mysql-sys.

and using Cluster Manager, set performance schema instrumentation on:

mcm> set performance-schema-instrument:mysqld='%=ON' mycluster;

and let’s see what happened:

select a.mysql_connection_id, a.operation_type, a.client_node_id, b.inactive_seconds
   from ndbinfo.server_operations a, ndbinfo.cluster_transactions b
   where a.transid = b.transid
   and b.inactive_seconds > 2;

+---------------------+----------------+----------------+------------------+
| mysql_connection_id | operation_type | client_node_id | inactive_seconds |
+---------------------+----------------+----------------+------------------+
|                  14 |        READ-SH |             50 |             1501 |
|                  14 |         UPDATE |             50 |             1501 |
|                  14 |        READ-SH |             50 |             1501 |
|                  14 |         UPDATE |             50 |             1501 |
|                  14 |        READ-SH |             50 |             1501 |
+---------------------+----------------+----------------+------------------+
5 rows in set (0,03 sec)

SELECT a.thread_id, a.SQL_TEXT, b.processlist_id
FROM     performance_schema.events_statements_history a, performance_schema.threads b
WHERE     a.thread_id = b.thread_id
and     b.processlist_id =
  (select distinct a.mysql_connection_id
   from ndbinfo.server_operations a, ndbinfo.cluster_transactions b
   where a.transid = b.transid
   and b.inactive_seconds > 2)
order by TIMER_START;

+-----------+-------------------------------------------------------+----------------+
| thread_id | SQL_TEXT                                              | processlist_id |
+-----------+-------------------------------------------------------+----------------+
|        32 | NULL                                                  |             14 |
|        32 | begin                                                 |             14 |
|        32 | select * from City where Name = 'Valencia' for update |             14 |
|        32 | update City set Name = 'Bilbao' where ID=3            |             14 |
|        32 | commit                                                |             14 |
|        32 | SELECT DATABASE()                                     |             14 |
|        32 | NULL                                                  |             14 |
|        32 | begin                                                 |             14 |
|        32 | select * from City where Name = 'Valencia' for update |             14 |
|        32 | update City set Name = 'Bilbao' where ID=3            |             14 |
+-----------+-------------------------------------------------------+----------------+
10 rows in set (0,05 sec)

You got me, I ran it twice in the same session. So now we can go and fix the code, and add that little “end” where it’s needed.

Hope this has helped someone.

3 thoughts on “NDB 7.4 & SYS schema: When getting locks, detecting the guilty SQL & o.s.pid.

  1. Not sure if my comment mad it…
    What about connection attributes in performance_schema? There is an _pid attribute.
    This should also be in sys.processlist if you have a recent sys version

    • Hi Daniël,
      Very good point, and apologies for not replying sooner, holidays, but more importantly, I wanted to reply with a confirmation: you’re absolutely right. I upgraded my old workbench to 6.3.4, which comes with SYS 1.4.0 and hey presto, 19 new tables in SYS, of which:
      +————–+———————+————–+
      | column_name | table_name | table_schema |
      +————–+———————+————–+
      | waiting_pid | innodb_lock_waits | sys |
      | blocking_pid | innodb_lock_waits | sys |
      | waiting_pid | x$innodb_lock_waits | sys |
      | blocking_pid | x$innodb_lock_waits | sys |
      +————–+———————+————–+
      I’ll see if we can get/propose something similar for NDB tables & locks…
      Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s