[Gluster-devel] Questions

Fri Apr 6 01:50:44 UTC 2007

Gerry Reno writes:

> Anand Babu Periasamy wrote:
>> Gerry Reno writes:
>>> Hows do GlusterFS behave in the following scenarios:
>>> =================================
>>> In a multi-brick cluster using AFR a node goes down and then later
>>> is brought back online
>>> ACTUAL BEHAVIOR:
>>>
>>> DESIRED BEHAVIOR:
>>> GlusterFS sees the node restart and then begins syncing it's
>>> bricks from transaction log, once it is synced it is put back into
>>> the cluster.
>>>
>>> =================================
>> This is what self-heal functionality in 1.4 is supposed to do. Each
>> translator will contribute its piece of context-aware healing
>> functionality to the over all recovery process.
>>
>> self-heal will involve multiple techniques. Key of them are
>> * journaled-recovery: It will maintain a journal of operations that
>> needs to be performed on a failed brick. For example dir related
>> operations, all I/O operations for AFR ... (This is exactly you
>> described above).
>> * lazy-recovery: Certain errors will be extremely time consuming to
>> detect. Instead of looking out for them (when the brick is offline),
>> GlusterFS will resume normal operation immediately. If it finds any
>> fault at run-time, self-heal will heal on demand (say duplicate
>> files.., missing directory on a brick..). It is OK if a dir is missing
>> in one of the brick, when it can be fixed at the time of access.
>> You can also initiate a forceful recovery by just triggering
>> faults (say "find /mnt/glusterfs -type f -exec file {} \;" will
>> navigate the entire dir tree and access each file. This should be
>> sufficient to convert many lazy checks to instant ones). Then
>> glusterfs-fsck tool would be a matter of shell script.
>>
>>> =================================
>>> Expand/Contract a GlusterFS cluster.
>>> ACTUAL BEHAVIOR:
>>>
>>> DESIRED BEHAVIOR:
>>> GlusterFS allows cluster members to be dynamically
>>> hot-added/hot-removed from a running cluster.
>>>
>>> =================================
>> As of adding bricks requires restart of GlusterFS.
>> http://www.gluster.org/docs/index.php/GlusterFS_FAQ#How_do_I_add_a_new_node_to_an_already_running_cluster_of_GlusterFS 
>>
>>
>> Hot-add/remove functionality is part of our road map. We are
>> introducing server-notification framework in 1.4. With this feature,
>> implementing hot-add/remove is a cake-walk.
>>
>> Do you think this feature is important for 1.4?. I want to have 1.4
>> released as soon as possible..
>>
> For us hot-add/remove is very desirable. Just like with a RAID array, we 
> would like to be able to add/remove gluster servers at will from a 
> running cluster for things like maintenance, hardware replacements, etc. 
> This is very essential in a production environment so that our field 
> workforce is not idle whenever such tasks need to occur. If it will 
> cause a big delay then postpone it to later but if a small delay then it 
> would be good to have it in 1.4.
> 
> Gerry

Ok, Let me plan for it in 1.4. I am curious what you are using
GlusterFS for?

-- 
Anand Babu
GPG Key ID: 0x62E15A31
Blog [http://ab.freeshell.org]
The GNU Operating System [http://www.gnu.org]