[Gluster-users] extreme slow recover rate
Wei Dong
wdong.pku at gmail.com
Thu Aug 20 19:16:01 UTC 2009
There's one thing obviously wrong in the previous email. GlusterFS is
able to use more than one CPU core because we are running 4x4 I/O
threads on each node. I got the 99% number from ganglia output which is
not correct.
Sorry for that
- Wei
Wei Dong wrote:
> Hi All,
>
> I'm experiencing extremely slow auto-heal rate with glusterfs and I
> want to hear from you guys to see if it seems reasonable or
> something's wrong.
>
> I did a reconfiguration of the glusterfs running on our lab cluster.
> Originally we have 25 nodes, each provide 1 1.5T SATA disk, the data
> are not replicated. Now we have expanded the cluster to 66 nodes and
> I decided to use all the 4 disks of each node and have the data
> replicated 3 times on different machines. What I did is to leave the
> original data untouched, and reconfigure glusterfs so that the
> original data volumes are paired up with new empty mirror volumes. On
> the server side, I have each node exports one volume for each of the 4
> disks, with an IO-threads translator with 4 threads running on top of
> each disk and no other performance translators. On the client side
> which is mounted on each of the 66 nodes, I group all the volumes into
> mirrors of 3 volumes each, and then aggregate the mirrors with one
> DHT, and put a write-behind translator with window-size 1MB on top of
> that.
>
> The files are all images, roughly 200K each.
>
> To trigger auto-heal, I split a list of all the files (previously
> saved before reconfiguration) among the 66 nodes and have 4 threads on
> each nodes running the stat command on the files. The overall rate is
> about 50 files per second, which I think is very low. And will the
> auto-heal is running, all operations like cd and ls on the glusterfs
> client becomes extremely slow, each takes like one minute to finish.
>
> On
> http://www.gluster.org/docs/index.php/GlusterFS_2.0_I/O_Benchmark_Results,
> which uses 5 servers and one raid0 disk each node, with 5 threads
> running on 3 clients, about 61K 1M files can be created within 10min,
> at an average rate of 130 files/second. The glusterfsd processes are
> each taking about 99% CPU time (
> we have 8 cores / node, but it seems glusterfsd is able to use only 1).
>
> There only advantage is that they use RAID0 disk and clients and
> servers are different machines. Other than that, we have more
> servers/clients, more disks on each node, and I have configured
> IO-thread and write-behind (which I don't think helps auto heal), and
> our files are only about 1/5 their size. Even if I count each file as
> 3 for replication, I'm only achieving similar throughput as a much
> smaller cluster. I don't know why it's like this and following are my
> hypothesis:
>
> 1. The overall through of glusterfs doesn't scale up to so many nodes;
> 2. The auto-heal operating is slower than creating the file from
> scratch;
> 3. Glusterfs is CPU bound -- it seems to be the case, but I'm
> unwilling to accept that a filesystem is CPU bound;
> 4. The network bandwith is saturated. I think we have 1gigabit
> eithernet on the nodes, which are on two racks and connected by some
> Foundry switch with 4 gigabit aggregate inter-rack bandwidth;
> 5. Replication is slow.
>
> Finally, the good news is that all the servers and client processes
> have been running for about 2 hours under stress, which I didn't
> expect in the beginning. Good job you glusterfs people!
>
> We are desperate for a shared storage and I'm eager to hear any
> suggestion you have to make glusterfs perform better.
>
> - Wei
>
More information about the Gluster-users
mailing list