[Gluster-devel] Performance tuning for MySQL

Wed Feb 11 09:18:23 UTC 2009

David Sickmiller wrote:

> I'm running 2.0rc1 with the 2.6.27 kernel.  I have a 2-node cluster.  
> GlusterFS runs on both nodes, and MySQL runs on the active node.  If the 
> active node fails or is put on standby, MySQL fires up on the other 
> node.  Unlike MySQL Replication with its slave lag, I know my data 
> changes are durable in the event of a server failure.  Most people use 
> DRBD for this, but I'm hoping to enjoy GlusterFS's benefits of handling 
> split-brain situations at the file level instead of the volume level, 
> future scalability avenues, and general ease of use.  Hopefully DRBD 
> doesn't have unmatchable performance advantages I'm overlooking.

Note that DRBD resync is more efficient - it only resyncs dirty blocks, 
which in the case of big databases, can be much faster. Gluster will 
copy the whole file.

> I'm going to report my testing in order, because the changes were 
> cumulative.  I used server-side io-threads from the start.  Before I 
> started recording the speed, I discovered that running in single process 
> mode was dramatically faster.  At that time, I also configured 
> read-subvolume to use the local server.  At this point I started measuring:
> 
>     * Printing schema: 18s
>     * Compressed export: 2m45s
> 
> For a benchmark, I moved MySQL's datafiles to the local ext3 disk (but 
> kept writing the export to GlusterFS).  It was 10-100X faster!
> 
>     * Printing schema: 0.2s
>     * Compressed export: 28s

Did you flush the caches inbetween the tries? What is your network 
connection between the nodes?

> There was no appreciable changes from installing fuse-2.7.4glfs11, using 
> Booster, or running blockdev to increase readahead from 256 to 16384.
> 
> Adding the io-cache client-side translator didn't affect printing the 
> schema but cut the export in half:
> 
>     * Compressed export: 1m10s
> 
> Going off on a tangent, I shut down the remote node.  This increased the 
> performance by an order of magnitude:
> 
>     * Printing schema: 2s
>     * Compressed export: 24s

What is the ping time between the servers? Have you measured the 
throughput between the servers with something like ftp on big files? Is 
it the writes or the reads that slow down? Try dumping to a ext3 from 
gluster.

> I resumed testing with both servers running.  Switching the I/O 
> scheduler to deadline had no appreciable affect.  Neither did adding 
> client-side io-threads, or server-side write-behind.  Surprisingly, I 
> found that changing read-subvolume to the remove server had only a minor 
> penalty.

Are you using single process client/server on each node, or separate 
client and server processes on both nodes?

> Then I noticed that the remote server was listed first in the volfile, 
> which means that it gets used for the lock server.  Swapping the order 
> in the volfile on one server seemed to cause split-brain errors -- does 
> the order need to be the same on both servers?

Yes, the first server listed is the lock server. If you list them in 
different order, locking will break. The order listed is the locking 
server fail-over order.

> When I changed both 
> servers' volfiles to use the active MySQL server as the lock server, 
> there was a dramatic performance increase, to roughly around the 2s/24s 
> speed I saw with one server down.  (I lost the exact stats.)
> 
> In summary, running in single process mode, client-side io-cache, and a 
> local lock file were the changes that made a significant difference.

That makes sense, especially on the local lock file. The time it takes 
to write a lock to page cache is going to be some orders of magnitude 
faster than the ping time, even on gigabit ethernet.

> Since I'm only going to have one server writing to the filesystem at a 
> time, I could mount it read-only (or not at all) on the other server.  
> Would that mean I could safely set data-lock-server-count=0 and 
> entry-lock-server-count=0 because I can be confident that there won't be 
> any conflicting writes?  I don't want to take unnecessary risks, but it 
> seems like unnecessary overhead for my use case.

Hmm... If the 1st server fails, the lock server will fail to the next 
one, and you fire up MySQL there then. I thought you said it was only 
the 2nd server that suffers the penalty. Since the 2nd server will fail 
over locking from the 1st if the 1st fails, the performance should be 
the same after fail-over. You'll still have the active server being the 
lock server.

Gordan