Thursday, November 26, 2009

Scaling - going from 2 to X data nodes

When moving from two data nodes to a bigger Cluster it is not necessarily true that you will have better performance. In fact you can get worse.

Here are some things to think about:
  • Database Load (traffic to Cluster) - if you can handle query load X on a two node cluster and move the same load X to a four data node cluster you will likely get new_load_X=0.8X, i.e., a performance degradation. This has to do with 1) buffers are not filled up fast enough so the data nodes will do "timeout" based sending or 2) that the access patterns aren't scaling. To correct 1) you need to increase the load on the cluster so that internal communication buffers fill up faster.
Access pattern related "problems":

  • For primary key operations (reads, updates, deletes) you will always go to the correct node to fetch the data with PK operations, no matter how many nodes you have. So no problems with these types of ops.

  • For index scans on a key you will always scan all data nodes. This can be expensive if the number of records you retrieve is "small" (about <=64 records), since setting up the index scan on every data node is expensive. Sending the actual data is relatively cheap. This can be the cause of performance degradation when moving from 2 data nodes to bigger cluster. This is not so much to do about, unless you can redesign your schema for most important requests to fit the index scans on part of PK way of accessing data.

  • For index scans on part of PK there are two options - index scan all data nodes or index scan only one data node. This can also be the cause of performance degradation when moving from 2 data nodes to bigger cluster.

Example with index scans on part of PK:
CREATE TABLE t1 (
userid integer,
serviceid integer,
data varbinary(512),
INDEX (serviceid),
PRIMARY KEY(userid, serviceid)
);
  • This table is by default partitioned on the primary key (every table in cluster is by default partitioned on the primary key). This means in a four data node cluster that you can have your user spread out on 4 different partitions.
  • You may have this data
<userid,serviceid,>
<1,1,aaa>
<1,2,aaa>
<1,3,aaa>
<1,4,aaa>

The record <1,1,aaa> may be on one data node. <1,2,aaa> may be on another and so on. It is random.

If you now do a SELECT * FROM t1 WHERE userid=<value>
then every data node will be scanned for data, the result then merged in the NDBAPI (mysqld).

Good or bad? Depends on - if your user has >64 services (CPU and network dependent), then it is better to scan all data nodes.

If your user has <=64 services then it is better to involve only one data node.

You can then create the table as follows:
CREATE TABLE t1 (
userid integer,
serviceid integer,
data varbinary(512),
INDEX (serviceid),
PRIMARY KEY(userid, serviceid)
) PARTITION BY KEY (userid);
- You may have still have this data:
<userid,serviceid,>
<1,1,aaa>
<1,2,aaa>
<1,3,aaa>
<1,4,aaa>
All records with userid=1 is now located in one data node (one partition) no matter how many data nodes you have!

Now, the primary key is still <userid,> but the data is put on the partitions by looking at <userid> only! This ensures that a user will be on only one partition, irrespectively how many services the user has. If you query on <userid> then only one partition will be involved in the search, since for a particular <userid> in the table t1, all data for that user is in only one partition.

If you now do a SELECT * FROM t1 WHERE userid=<value> then only one data node will be scanned.

For completeness of the example: SELECT * FROM t1 WHERE serviceid=<value> will cause all data nodes to perform an index scan on the index (serviceid), since the data is not partitioned by serviceid, and not the leftmost column in the index (the indexed columns are used left to right).

The main point is that the fixed cost of setting up an ordered index scan is expensive, but then to retrieve the records is cheap. So if you return many rows, read from all data nodes.

If you later on plan to increase the number of data nodes from 4 to 8 nodes. Then using PARTITION BY KEY when possible makes your upgrades (more data nodes) easier and less susceptible to performance decreases.

This is how it looks on my system (as you can see partitioned searches (using PARTITION BY KEY) is faster up to ~ 64 records).

No comments: