[Gluster-devel] disperse volume file to subvolume mapping

Tue Apr 19 09:24:17 UTC 2016

I am copying 10.000 files to gluster volume using mapreduce on
clients. Each map process took one file at a time and copy it to
gluster volume.
My disperse volume consist of 78 subvolumes of 16+4 disk each. So If I
copy >78 files parallel I expect each file goes to different subvolume
right?
In my tests during tests with fio I can see every file goes to
different subvolume, but when I start mapreduce process from clients
only 78/3=26 subvolumes used for writing files.
I see that clearly from network traffic. Mapreduce on client side can
be run multi thread. I tested with 1-5-10 threads on each client but
every time only 26 subvolumes used.
How can I debug the issue further?

On Tue, Apr 19, 2016 at 11:22 AM, Xavier Hernandez
<xhernandez at datalab.es> wrote:
> Hi Serkan,
>
> On 19/04/16 09:18, Serkan Çoban wrote:
>>
>> Hi, I just reinstalled fresh 3.7.11 and I am seeing the same behavior.
>> 50 clients copying part-0-xxxx named files using mapreduce to gluster
>> using one thread per server and they are using only 20 servers out of
>> 60. On the other hand fio tests use all the servers. Anything I can do
>> to solve the issue?
>
>
> Distribution of files to ec sets is done by dht. In theory if you create
> many files each ec set will receive the same amount of files. However when
> the number of files is small enough, statistics can fail.
>
> Not sure what you are doing exactly, but a mapreduce procedure generally
> only creates a single output. In that case it makes sense that only one ec
> set is used. If you want to use all ec sets for a single file, you should
> enable sharding (I haven't tested that) or split the result in multiple
> files.
>
> Xavi
>
>
>>
>> Thanks,
>> Serkan
>>
>>
>> ---------- Forwarded message ----------
>> From: Serkan Çoban <cobanserkan at gmail.com>
>> Date: Mon, Apr 18, 2016 at 2:39 PM
>> Subject: disperse volume file to subvolume mapping
>> To: Gluster Users <gluster-users at gluster.org>
>>
>>
>> Hi, I have a problem where clients are using only 1/3 of nodes in
>> disperse volume for writing.
>> I am testing from 50 clients using 1 to 10 threads with file names
>> part-0-xxxx.
>> What I see is clients only use 20 nodes for writing. How is the file
>> name to sub volume hashing is done? Is this related to file names are
>> similar?
>>
>> My cluster is 3.7.10 with 60 nodes each has 26 disks. Disperse volume
>> is 78 x (16+4). Only 26 out of 78 sub volumes used during writes..
>>
>