[Gluster-devel] Handling huge number of file read requests

Amrik Singh asingh at ideeinc.com
Fri May 4 14:55:38 UTC 2007


Sorry a correction there.... These are not 15 million files... That was 
a different data I got confused with.

The data set I am talking about is around 3.5 TB. I do not have the 
exact number of files here.

sorry for the confusion....

Amrik 



Amrik Singh wrote:
> Well that makes things quiet clear. We are using gig/e. All the 
> clients load different files. We have around 15-20 million files as of 
> now. The situation that I described happens only at the peak load, 
> that can vary from 1-4 hours at a time.
>
> So I realize that we would need to distribute our files across lot 
> more bricks. Does 18 * 8 means 144 or is it an expression that I did 
> not get?
>
>
>
> thanks a lot...
>
> Amrik
>
>
> Anand Avati wrote:
>> Amrik,
>>  some quick math, you have 300 servers asking for 20-40 images (avg
>> 30) each 2MB per second, your I/O aggregate I/O requirement is 18
>> GByte/sec. How many servers are you using with glusterfs to distribute
>> this load? It would be merciless to expect a single server to handle
>> this kind of load with *any* filesystem. Also what is the interconnect
>> you are using? if you are using gig/e, you need 18 * 8 nodes to handle
>> this load smoothly (lower number nodes result in such a less factor of
>> performance).
>>
>> Also tell me the pattern of usage. do all the clients read differnt
>> files or same file? totally how many images do you have?
>>
>> Looking forward to your answers
>>
>> regards,
>> avati
>>
>>
>> On 5/4/07, Amrik Singh <asingh at ideeinc.com> wrote:
>>> Hi Guys,
>>>
>>> We are hoping that glusterfs would help us in the particular problem
>>> that we are facing with our cluster. We have a visual search 
>>> application
>>> that runs on a cluster with around 300 processors. These compute nodes
>>> run a search for images that are hosted on an NFS server. In certain
>>> circumstances all these compute nodes are sending requests for query
>>> images at extremely high rates (20-40 images per second). When 300 
>>> nodes
>>> send 20-40 requests per second for these images, the NFS server just
>>> can't cope with it and we start seeing a lot of retransmissions and a
>>> very high wait time on the server as well as on the nodes. The images
>>> are sized at around 2MB each.
>>>
>>> With the current application we are not in a position where we can
>>> quickly change the way things are being done so we are looking for a
>>> file system that can handle this kind of situation. We tried glusterfs
>>> with the default settings but we did not see any improvement. Is 
>>> there a
>>> way to tune glusterfs to handle this kind of situation.
>>>
>>> I can provide more details about our setup as needed.
>>>
>>>
>>> thanks
>>>
>>> -- 
>>> Amrik Singh
>>> Idée Inc.
>>> http://www.ideeinc.com
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>






More information about the Gluster-devel mailing list