[Gluster-devel] Questions

Fri Apr 6 01:36:11 UTC 2007

Anand Babu Periasamy wrote:
> Gerry Reno writes:
>> Hows do GlusterFS behave in the following scenarios:
>> =================================
>> In a multi-brick cluster using AFR a node goes down and then later
>> is brought back online
>> ACTUAL BEHAVIOR:
>>
>> DESIRED BEHAVIOR:
>> GlusterFS sees the node restart and then begins syncing it's
>> bricks from transaction log, once it is synced it is put back into
>> the cluster.
>>
>> =================================
> This is what self-heal functionality in 1.4 is supposed to do. Each
> translator will contribute its piece of context-aware healing
> functionality to the over all recovery process.
>
> self-heal will involve multiple techniques. Key of them are
> * journaled-recovery: It will maintain a journal of operations that
> needs to be performed on a failed brick. For example dir related
> operations, all I/O operations for AFR ... (This is exactly you
> described above).
> * lazy-recovery: Certain errors will be extremely time consuming to
> detect. Instead of looking out for them (when the brick is offline),
> GlusterFS will resume normal operation immediately. If it finds any
> fault at run-time, self-heal will heal on demand (say duplicate
> files.., missing directory on a brick..). It is OK if a dir is missing
> in one of the brick, when it can be fixed at the time of access.
> You can also initiate a forceful recovery by just triggering
> faults (say "find /mnt/glusterfs -type f -exec file {} \;" will
> navigate the entire dir tree and access each file. This should be
> sufficient to convert many lazy checks to instant ones). Then
> glusterfs-fsck tool would be a matter of shell script.
>
>> =================================
>> Expand/Contract a GlusterFS cluster.
>> ACTUAL BEHAVIOR:
>>
>> DESIRED BEHAVIOR:
>> GlusterFS allows cluster members to be dynamically
>> hot-added/hot-removed from a running cluster.
>>
>> =================================
> As of adding bricks requires restart of GlusterFS.
> http://www.gluster.org/docs/index.php/GlusterFS_FAQ#How_do_I_add_a_new_node_to_an_already_running_cluster_of_GlusterFS 
>
>
> Hot-add/remove functionality is part of our road map. We are
> introducing server-notification framework in 1.4. With this feature,
> implementing hot-add/remove is a cake-walk.
>
> Do you think this feature is important for 1.4?. I want to have 1.4
> released as soon as possible..
>
For us hot-add/remove is very desirable. Just like with a RAID array, we 
would like to be able to add/remove gluster servers at will from a 
running cluster for things like maintenance, hardware replacements, etc. 
This is very essential in a production environment so that our field 
workforce is not idle whenever such tasks need to occur. If it will 
cause a big delay then postpone it to later but if a small delay then it 
would be good to have it in 1.4.

Gerry