[Gluster-devel] seeking advice: how to upgrade from 1.3.0pre4 to tla patch628?

Mickey Mazarick mic at digitaltadpole.com
Tue Jan 8 16:39:47 UTC 2008


I just wanted to comment on the multiple NS volumes too. We have a 
similar setup and found that if we restarted the servers in sequence for 
updates it took a few minutes for the NS to update, so with only 2 we 
accidentally took the whole cluster down by restarting too fast. 
(restart server with NS1, wait a few minutes, restart NS2)
2 Questions come out of this though:
1) Is there a way (besides parsing the logs) to determine if the NS 
servers are up to date or healing?
2) Is there a significant speed benefit to having fewer? (other than 
file creation I mean)  All out tests used dd and single files so we 
didn't notice much.

Thanks! and as always thanks for this wonderful contribution to the it 
world!

-Mic

Sascha Ottolski wrote:
> Am Dienstag 08 Januar 2008 10:06:33 schrieb Anand Avati:
>   
>> Sascha,
>>  few points -
>>
>> 1. do you really want 4 copies of the NS with AFR? I personally think that
>> is an overkill. 2 should be sufficient.
>>     
>
> at least that would give us the freedom to take any server out without having 
> to think about. however, we also tried with only one namespace and no afr, 
> same bad result :-( (with 1.3.2)
>
>
>   
>> 2. as you rightly mentioned, it might be the self heal which is slowing
>> down. Do you have directories with a LOT of files in the immediate level?
>>     
>
> not do many: we have nested 6 directories, only the leafs carry files. each 
> directory level can have up to 16 sub-directories. in the leaf, up to 255 * N 
> files with 0 < X < 4, with X most likely being near 2.
>
> that is, at max. 256 * 4 = 1024 files per dir.
>
>
>   
>> the self-heal is being heavily reworked to be more memory and cpu efficient
>> and will be completed very soon. If you do have a LOT of files in a
>> directory (not subdirs), then, it would help to recreate the NS offline and
>> slip it in with the upgraded glusterfs. one half-efficient way:
>>
>> on each server:
>> mkdir /partial-ns-tmp
>> (cd /data/export/dir ; find . -type d) | (cd /partial-ns-tmp ; xargs mkdir
>> -p)
>> (cd /data/export/dir ; find . -type f) | (cd /partial-ns-tmp; xargs touch)
>>
>> now tar the /partial-ns-tmp on each server and extract them over each other
>> in the name server. I assume you do not have special fifo and device files,
>> if you do, recreate them like the mkdir too :)
>>     
>
> thanks for the hint. still, in an earlier attempt, we forced a self heal and 
> waited until it was finished after 24 h, but even then was the load staying 
> high, the webservers almost not responding (as said above, with only one 
> namespace brick).
>
>
>   
>> the updated self-heal should handle such cases much better (assuming your
>> problem is LOTS of files in the same dir and/or LOTS of such dirs).
>>     
>
> can't wait to test it :-))
>
>
> Thanks a lot, 
>
> Sascha
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>   


-- 




More information about the Gluster-devel mailing list