Wednesday, July 01, 2009

Problems with .FRM files, auto-discovery and MySQL Cluster

There are some bug reports on the auto-discovery protocol in MySQL Cluster.
The idea of the auto-discovery protocol is to fetch the .frm files for the NDB tables stored in the data dictionary of the data nodes, and put them in the data directory of the mysql server.

However, sometimes (not always, which makes it more difficult to reproduce and hence fix), the auto-discovery seems to make strange things (from this bug report):
After shuting down and restoring my cluster I get the following error.

090211 9:59:26 [Note] NDB: mismatch in frm for panel.gatewayquestions, discovering...
090211 9:59:26 [Note] NDB Binlog: DISCOVER TABLE Event: REPL$panel/gatewayquestions
090211 9:59:26 [Note] NDB Binlog: logging ./panel/gatewayquestions (UPDATED,USE_WRITE)

This is due to the files already being in the mysql data directory. After the error the
frm does not match the data in memory this causes the following.
When running select count(*) from tablename;
You will get an accurate count.
When running select * from table name;
You get an error Can't find record in tablename.
I have seen it as well at some customers usually with bigger installations and many tables.

My current recommendation (work around) is to delete the FRM files associated with the NDB tables in the mysql server data directory before you start the mysql server(s).

So this is what i always include in my MySQL server startup scripts (and is included in the Configurator scripts):
files=`find $mysql_datadir  -name "*.ndb"`
for f in $files
do
x=`basename $f .ndb`
#make sure we leave out ndb_binlog_index and ndb_schema since they are myisam tables
if [ "$x" == "ndb_binlog_index" ] || [ "$x" == "ndb_schema" ] ;
then
echo "Ignoring $x"
else
y=`echo $f | sed -e 's#ndb#frm#'`
rm -rf $f
rm -rf $y
fi
done
#start the mysqld here
If I want to restore data I usually:
  1. stop the cluster
  2. start the data nodes with --initial
  3. stop the mysql servers (make sure they are not started)
  4. restore the cluster data
  5. start the mysql servers (clearing out whatever .frm files coming from the ndb tables)

2 comments:

Brian Aker said...

Hi!

We solved/solving this problem in Drizzle by allowing engines to be the maser for their data dictionary information.

I don't think Monty or Stewart have cluster ported yet, but I suspect that they might do it at some point.

Cheers,
-Brian

Johan Andersson said...

Hooray! That is great. Have you managed to get the privileges system pluggable?

b.r,
-j