[Gluster-devel] [Gluster-users] Enabling Apache Hadoop on GlusterFS: glusterfs-hadoop 2.1 released

Thu Sep 5 23:18:04 UTC 2013

On Thu, Sep 5, 2013 at 2:53 PM, Stephen Watt <swatt at redhat.com> wrote:

> Hi Folks
>
> We are pleased to announce a major update to the glusterfs-hadoop project
> with the release of version 2.1. The glusterfs-hadoop project, available at
> The glusterfs-hadoop project team, provides an Apache licensed Hadoop
> FileSystem plugin which enables Apache Hadoop 1.x and 2.x to run directly
> on top of GlusterFS. This release includes a re-architected plugin which
> now extends existing functionality within Hadoop to run on local and POSIX
> File Systems.
>
> -- Overview --
>
> Apache Hadoop has a pluggable FileSystem Architecture. This means that if
> you have a filesystem or object store that you would like to use with
> Hadoop, you can create a Hadoop FileSystem plugin for it which will act as
> a mediator between the generic Hadoop FileSystem interface and your
> filesystem of choice. A popular example would be that over a million Hadoop
> clusters are spun up on Amazon every year, a lot of which use Amazon S3 as
> the Hadoop FileSystem.
>
> In order to configure the plugin, a specific deployment configuration is
> required. Firstly, it is required that the Hadoop JobTracker and
> TaskTrackers (or the Hadoop 2.x equivalents) are installed on servers
> within the gluster trusted storage pool for a given gluster volume. The
> JobTracker uses the plugin to query the extended attributes for job input
> files in gluster to ascertain file placement as well as the distribution of
> file replicas across the cluster. The TaskTrackers use the plugin to
> leverage a local fuse mount of the gluster volume in order to access the
> data required for the tasks. When the JobTracker receives a Hadoop job, it
> uses the locality information it ascertains via the plugin to send the
> tasks for the Hadoop Job to Hadoop TaskTrackers on servers that have the
> data required for the task within their local bricks. This ensures data is
> read from disk and not over the network. Please see the attached diagram
> which provides an overview of the entire solution for a Hadoop 1.x
> deployment.
>
> The community project, along with the documentation and available
> releases, is hosted within the Gluster Forge at
> http://forge.gluster.org/hadoop. The glusterfs-hadoop project will also
> be available within the Fedora 20 release later this year, alongside fellow
> Fedora newcomer Apache Hadoop and the already available gluster project.
> The glusterfs-hadoop project team welcomes contributions and participation
> from the broader community.
>
> Stay tuned for upcoming posts around GlusterFS integration into the Apache
> Ambari and Fedora projects.
>
> Regards
> The glusterfs-hadoop project team
> _______________________________________________
> Announce mailing list
> Announce at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/announce
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

Congratulations! This is great news!!

Avati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130905/8beabc1e/attachment-0001.html>