[Gluster-users] glusterfs for cloud storage

Dai Qizhi daiqizhi at gmail.com
Sun Aug 23 12:24:22 UTC 2009


hi WeiDong,

   In the source code, I found xlators/cluster/map/, is this what you
are looking for?

On Fri, Aug 21, 2009 at 12:15 AM, Wei Dong<wdong.pku at gmail.com> wrote:
> Hi All,
>
> We are using glusterfs on our lab cluster for a shared storage to save a
> large number of image files, about 30 million at the moment.  We use Hadoop
> for distributed computing, but we are reluctant to store small files on
> hadoop for it's low throughput on small files and also the non-standard
> filesystem interface (e.g. we won't be able to run convert on each image to
> produce a thumbnail if the files are stored in hadoop).  What we do now is
> to store a list of paths to all images in hadoop, and use Hadoop streaming
> to pipe the paths to some script, which will then read the images from
> glusterfs filesystem and do the processing.  This has been working for a
> while so long as glusterfs doesn't hang, but the problem is that we
> basically lose all data locality.  We have 66 nodes and the chance that a
> needed file is on local disk is only 1/66, and 55/66 of file I/O has to go
> through network, which make me very uncomfortable.  I'm wondering if there's
> a better way of making glusterfs and Hadoop work together to take the
> advantage of data locality.
>
> I know that there's a nufa translator which gives high preference to local
> drive.  This is good enough if the assignment of files to nodes is fixed.
>  But if we want to assign files to nodes according to the location of the
> file, what interface should we use to get the physical location of the file?
>
> I appreciate all your suggestions.
>
> - Wei Dong
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list