[Gluster-devel] Follow: GSoC Proposal for a RESTful/JSON API and server for GlusterFS similar to WebHDFS

RJ Nowling rnowling at gmail.com
Wed Mar 19 01:20:31 UTC 2014


Hi all,

I wanted to follow up.  I drafted a proposal for creating a RESTful/JSON
API and server for GlusterFS similar to WebHDFS.  As the number of big data
processing and storage systems explode, integration is becoming more
important.  A language and operating system agnostic RESTful/JSON API and
server could be helpful for easing integration efforts.

I've pasted the proposal below.  Is there is any interest in the Gluster
community?  Would anyone be willing to server as a mentor?

Thank you,
RJ

RESTful/JSON API and Server for GlusterFS

Overview of proposal:
The goal of the proposal is to create a RESTful/JSON API and server
(similar to WebHDFS) for GlusterFS.

Need it fulfills:
Following on the popularity of Hadoop, a number of "big data" processing
systems (e.g., Berkeley Data Analytics Stack, Storm, Stratophere, Disco)
are being created and adopted.  These systems are written in a wide range
of languages such as Java, Scala, Python, and Erlang.

These systems are rarely used in isolation. Maintaining separate
distributed file systems and databases is laborious, costly, and wasteful.
Migrating data between separate distributed file systems or databases is
difficult, error prone, and limits easy access to data when it is needed.
As a result, there is great interest in integration as exemplified by
projected such as the Gluster plugin for Hadoop.

Gluster's existing clients (FUSE, libgfapi) are limited to specific
operating systems (Linux) and/or require bindings for each programming
language other interest.  Such RESTful/JSON APIs and servers such as
WebHDFS offer a more general solution that is independent of the client's
operating system and programming language.  WebHDFS has proven popular and
is being used by systems such as Disco to add support HDFS.  A RESTful/JSON
interface and server for could offer similar benefits for Gluster and has
the potential to be just as popular as WebHDFS.

Any relevant experience you have:
I am familiar with WebHDFS and Hadoop Gluster plugin. Through my Ph.D.
research and TA'ing experience, I am familiar with distributed systems
(e.g., WorkQueue), client-server systems, and RESTful/JSON APIs.  I have
some experience with CherryPy, a Python web service framework, and using it
to create a RESTful/JSON servers. I am also familiar with the work in Disco
to add HDFS support through WebHDFS.

How you intend to implement your proposal:
Aim 1: Design a RESTful/JSON interface that supports the semantics of
Gluster.
The ability to report data locality information will be important for other
projects that use that information for scheduling workers and tasks.

Aim 2: Create a RESTful/JSON server.
I will use Python and its libraries such as CherryPy or Flask to develop a
RESTful server. My preferred option will be to use Python bindings to
libgfapi as a backend, but I will fall back to using the Gluster FUSE
client if I run into problems.  A dummy backend that uses the local file
system will be created for testing purposes. (It would be good to support
multiple backends.)

Aim 3: Create a RESTful/JSON Python library.
I will create Python library that uses the RESTful/JOSN interface as a
backend.

Aim 4: Create Unit Tests and Benchmarks for Several Use Cases
As part of my effort, I will write unit tests to ensure that the server and
client library are implemented correctly.  As a good performance will be
important for adoption, I will also document several use cases and perform
benchmarks to evaluate the performance of the RESTful/JSON server compared
with the standard FUSE client.

Aim 5: (Optional and time permitting) Work on integration with a big data
system a proof-of-concept
Option 1: Integrate with Hadoop by mimicking the WebHDFS API so that the
Hadoop WebHDFS client can transparently use the Gluster RESTful API as a
backend

Option 2: Integrate with the Disco as an Erlang/Python MapReduce framework.
 Support for HDFS is currently being added using the WebHDFS interface.
 The WebHDFS work provides a good template for adding Gluster support.

-- 
em rnowling at gmail.com
c 954.496.2314
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140318/238e3691/attachment-0001.html>


More information about the Gluster-devel mailing list