[Gluster-users] (Fixed) Re: Can Hadoop run on gluster in 1 JT, N TT setup or only works for 1 JT+TT?

Fermín Galán Márquez fermin at tid.es
Tue Jan 31 18:23:00 UTC 2012


Dear Venky,

El 31/01/2012 6:45, Venky Shankar escribió:
[snip]

 *   Are you collocating gluster cluster peers with TT nodes (I mean, each one of the 8 TT nodes is also a gluster peer in the cluster) or are the gluster cluster running in separate nodes?

Yes, you are right. Each TaskTracker node is a gluster peer in the cluster.


 *   In the case the answer to the question above is that they are collocated, which fs.glusterfs.server are you using in each TT?

For the TaskTracker, fs.glusterfs.server would be _any_ one of the gluster peers (i.e. any one of the 8 machines considering you have a 1JT + 8TT setup). For simplicity, stick to one hostname/ip for this, since that would make deployment easier (no need to edit core-site.xml on every machine)

I'm asking so because in my mind I'm thinking in a configuration like that:

TT1-> fs.glusterfs.server @ core-site.xml in TT1= IP_TT1
TT2-> fs.glusterfs.server @ core-site.xml in TT2= IP_TT2
...
TTn-> fs.glusterfs.server @ core-site.xml in TTn= IP_TTn

    This will definitely work for you, but as i said stick to one hostname/ip. So for each (TT1, TT2 .. TTn) use IP_TT1.


so, each TT mounts "itself" which I suppose achieves a data locality similar to the one achieved with HDFS (considering the gluster driver is clever enough to use the local disk when the data is located in the same node). Does it make sense this configuration?

Exactly ! Each TT node (and the JT too) does a GlusterFS FUSE mount to get a _view_ of the entire namespace of the FS. JobTracker schedules jobs to TaskTracker nodes. When a job runs on the TT node, all I/O is done through the GlusterFS mount. Data locality is a bit of a catch here. Since all I/O calls go through the mount, each call has to take the route of client translator(s) -> server translator(s) before it hits the posix layer (even if the client and the server are on the same node, the TT in this case).

To optimize this we introduced a configurable option "quick.slave.io". This is essentially a "short circuit" for the case i just mentioned above. When the job wants to read from a particular offset in the file, the GlusterFS Hadoop plugin checks whether the (offset, length) in question is present in the backend file system. If yes, then it satisfies the read directly from the backed FS instead of going through the FUSE mount, thereby saving context switches, translator overhead etc..

A bit more info, this option is not tested well, so we default to "Off" in core-site.xml. If you do try it out please let us know if you hit any bugs (and please file them too !).

Thank you very much for your clarifications! I will follow your recommendation of sticking to just one ip/hostname in all the core-site.xml files along the cluster and use the quick.slave.io option (I will report in the case of any bug).

However, after reading your mail, I wonder if Hadoop plugin for gluster implements some location-based job scheduling similar to the one in Hadoop on HDFS. I mean, in Hadoop on HDFS the JT coordinates with the NN (which knows where every file block is located withing the cluster), so each map task is scheduled to the TT closest to the input they have to process (ideally, collocated). In Hadoop on gluster I understand that there is no NN equivalente, but is there any mean so JT can know which nodes in the cluster have the actual data in their respective backend filesystem so JT tries to schedule each map task to a TT in one of these nodes? In negative case, how JT select the TT to schedule each map task (round-robin, randomly, etc.)?

Probably my question is very basic, but I haven't find a clear and direct answer in the documentation, sorry...

Thanks!

Best regards,

------
Fermín

________________________________
Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at
http://www.tid.es/ES/PAGINAS/disclaimer.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120131/0d7a3cf1/attachment.html>


More information about the Gluster-users mailing list