[Gluster-users] gluster map/reduce performance..

Venky Shankar venky at gluster.com
Thu Oct 20 06:57:25 UTC 2011


Did you test GlusterFS write performance (using 'dd') *only* from the client mount ?

I ask this because GlusterFS Hadoop plugin does a FUSE mount on *every* node in the cluster. So during the map phase, when jobs get assigned to slaves; all I/O will be done via FUSE (which is mostly reads). Similarly, during Reduce phase, the reduce jobs would be writing to the FUSE mount (on their respective nodes).

Can you try doing the 'dd' test on all nodes in the cluster parallely (on the FUSE mount) on the 2x2 Distribute-Replicate setup and let us know the numbers (throughput numbers from all nodes would be helpful, if possible).

Write performance in HDFS is exceptionally well because of it's aggressive client side caching (HDFS relaxes a POSIX requirement to get higher write throughput).


From: 공용준(yongjoon kong)/Cloud Computing 기술담당/SKCC [andrew.kong at sk.com]
Sent: Wednesday, October 19, 2011 11:04 PM
To: Venky Shankar; andrew; gluster-users at gluster.org
Subject: RE: [Gluster-users] gluster map/reduce performance..

Yes, I used the GlusterFS plugin.

Gluster version is - 3.3 beta 2.

For the Volumes
 Distributed-mirroring volume: Using 4 server and  2(brick)x2(replica) configuration
 Stripe-mirroring volume : Using 4 Server and 4(stripe count) x 2 (repica)  configuration

For the Map/reduce system I user 6 server ( 4 is the brick server and other 2 is for just map/reduce )

I checked your source file, but I can’t find any clue for the Performance degradation in Merging Stage. ( I think it is connected with writing)

Actaully, In writing test,  Gluster was quite good. So I’m little confused right now.


From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Venky Shankar
Sent: Thursday, October 20, 2011 1:35 AM
To: andrew; gluster-users at gluster.org
Subject: Re: [Gluster-users] gluster map/reduce performance..

Hi there,

Appreciate if you could share the following info with us:

* Are you using GlusterFS hadoop plugin (which is here http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/glusterfs-hadoop-0.20.2-0.1.x86_64.rpm and is still in beta) or are you using GlusterFS as an additional layer below Hadoop's FileSystem (HDFS) ?

The latter is basically configuring Hadoop to use GlusterFS mount point (e.g. FUSE mount) as the data directory for Hadoop's DFS.

Let us know your setup (including GlusterFS version) to debug further.

From: gluster-users-bounces at gluster.org [gluster-users-bounces at gluster.org] on behalf of andrew [sstrato.kong at gmail.com]
Sent: Wednesday, October 19, 2011 6:15 PM
To: gluster-users at gluster.org
Subject: [Gluster-users] gluster map/reduce performance..
Hi, all,

i try to check the performance of Map/Reduce of Gluster File system.

Mapper side speed is quite good and it is sometimes faster than hadoop's map job.

But in the Reduce Side job is much slower than hadoop.

i  analyze the result  and i found the primary reason of slow speed is bad performance in Merging stage.

Would you have any suggestion for this issue

FYI check the blog http://storage4com.blogspot.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20111020/968e6284/attachment.html>

More information about the Gluster-users mailing list