[Gluster-users] Gluster and Cloudera's Hadoop

Jay Vyas jayunit100 at gmail.com
Fri Apr 12 22:44:16 UTC 2013


Hi james and thanks for submitting this .staging permissions problem to
us.  It actually came full circle today and we saw it manifest itself in a
different context, leading us to a pretty significant fix :)

We have a branch available now that fixes this... and also a temporary
workaround (easy - just change the permissions yourself or use the umask to
change default permissions).

** some interesting details about this bug **

The problem is that we were not reading in hadoop API assigned privileges
on ** writes ** of directories and files in the gluster plugin.

It turns out that newer release of hadoop (branch-1) actually fixes this
for you (for other purposes - to avoid a race-condition)--

By contrasting these two files, you can see that newer hadoop (branch-1)
versions actually defensively set the permissions correctly:

https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/mapred/org/apache/hadoop/mapreduce/JobSubmissionFiles.java

Whereas older hadoop versions do not:

http://javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/mapreduce/JobSubmissionFiles.java.html.

** The official ticket is here **

The official ticket is here
https://bugzilla.redhat.com/show_bug.cgi?id=951305

Hope this helps.


On Mon, Apr 8, 2013 at 6:07 PM, Jay Vyas <jayunit100 at gmail.com> wrote:

> Hi james:
>
> Looks like standard Hadoop seems to want to keep the files as permission
> 700, just like you mention in your email:
>
>
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/mapred/org/apache/hadoop/mapreduce/JobSubmissionFiles.java
>
> Just a guess : but -- maybe it will work if you try submitting the job
> from the same machine that is running your jobtracker?  I've seen this
> error before when submitting jobs from random places.
>
> Again, the above is more of a guess than anything else, until we look
> further into it.....
>
>
>
>
> On Mon, Apr 8, 2013 at 4:01 PM, Jay Vyas <jvyas at redhat.com> wrote:
>
>> Hi james !
>>
>> 1) Yes, right now, we run as root.  Thanks for noticing :) ... We are
>> working on modifying this in the very near future.   The problem is that
>> the plugin attempts to mount a filesystem, but we recently have discussed
>> that auto mount behaviour may be a superfluous feature, since mounting can
>> easily be automated for
>> nodes in a cluster.
>>
>> 2) You're right the pervious version of the gluster hadoop filesystem
>> implementation did not deal correctly with privileges.
>> This is now fixed, however.  You can get a "bleeding edge" jar which
>> fixes your permissions error from the
>> glusterfs-hadoop github repository:
>> https://github.com/gluster/hadoop-glusterfs, where these fixes have been
>> merged into head.
>>
>> Also we can get you this jar prebuilt if you want, just let me know!
>>
>> Thanks for trying out the GlusterFileSystem and keep the feedback coming !
>>
>> ----- Original Message -----
>> From: "James Gurtowski" <gurtowskij at gmail.com>
>> To: jvyas at redhat.com
>> Cc: gluster-users at gluster.org
>> Sent: Monday, April 8, 2013 2:17:44 PM
>> Subject: Gluster and Cloudera's Hadoop
>>
>> Hello,
>>
>> It seems the gluster hadoop plugin assumes all hadoop daemons/commands are
>> run as root? I was having trouble getting the jobtracker to start because
>> every time the fs is initialized a system call "mount -t glusterfs ..." is
>> issued. Cloudera runs all daemons as the mapred user who is not allowed to
>> run mount, so this is failing. I modified GlusterFileSystem.java (see
>> attached diff) and set fs.glusterfs.automount to false in core-site.xml so
>> this wouldn't happen.
>> That fixed the initial issue of getting daemons to start.
>>
>> My next issue is getting hadoop jobs to run. I get an error:
>>
>> File /mnt/glusterfs/user/james/.staging/job_201304081221_0013/job.xml does
>> not exist.
>>
>> I believe this to be a permissions issue, I can access this file fine from
>> my account, but the .staging directory is only accessible by the user who
>> launches the job :
>>
>> drwx------ 8 james james 870 Apr  8 14:10 .staging
>>
>> If I change the permissions, they are changed back (by Cloudera's hadoop)
>> when I launch a job:
>> Permissions on staging directory
>> glusterfs://node001:9000/user/james/.staging are incorrect: rwxrwxrwx.
>> Fixing permissions to correct value rwx------
>>
>> Any ideas of a work around would be greatly appreciated.
>>
>> Thanks,
>> James
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>



-- 
Jay Vyas
http://jayunit100.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130412/9486d16e/attachment.html>


More information about the Gluster-users mailing list