[Gluster-users] Giving up [ was: Re: read-subvolume]

Wed Jul 10 19:18:31 UTC 2013

On 07/10/2013 11:51 AM, Joe Landman wrote:
> On 07/10/2013 02:36 PM, Joe Julian wrote:
>
>> 1) http://www.solarflare.com makes sub microsecond latency adapters that
>> can utilize a userspace driver pinned to the cpu doing the request
>> eliminating a context switch
>
> We've used open-onload in the past on Solarflare hardware.  And with 
> GlusterFS.
>
> Just say no.  Seriously.  You don't want to go there.
Bummer. That sounded like an interesting idea.
>
>> 2) http://www.aristanetworks.com/en/products/7100t is a 2.5 microsecond
>> switch
>
> Neither choice will impact overall performance much for GlusterFS, 
> even in heavily loaded situations.
>
> What impacts performance more than anything else is node/brick design, 
> implementation, and specific choices in that mix.  Storage latency, 
> bandwidth, and overall design will be more impactful than low latency 
> networking.  Distribution, kernel and filesystem choices (including 
> layout, lower level features, etc.) will matter significantly more 
> than low latency networking.  You can completely remove the networking 
> impact by trying your changes out on localhost, and seeing what the 
> impact your design changes have.
>
> If you don't start out with a fast box, you are not going to have fast 
> aggregated storage.  This observation has not changed since the pre 
> 2.0 GlusterFS days (its as true today as it was years ago).
>
The "small file" complaint is all about latency though. There's very 
little disk overhead (all inode lookups) to doing a self-heal check. "ls 
-l" on a 50k file directory and nearly all the delay is from network RTT 
for self-heal checks (check that with wireshark).