[Gluster-devel] Questions

Fri Apr 6 01:55:32 UTC 2007

Anand Babu Periasamy wrote:
> Gerry Reno writes:
>
>> Anand Babu Periasamy wrote:
>>> Gerry Reno writes:
>>>> Hows do GlusterFS behave in the following scenarios:
>>>> =================================
>>>> In a multi-brick cluster using AFR a node goes down and then later
>>>> is brought back online
>>>> ACTUAL BEHAVIOR:
>>>>
>>>> DESIRED BEHAVIOR:
>>>> GlusterFS sees the node restart and then begins syncing it's
>>>> bricks from transaction log, once it is synced it is put back into
>>>> the cluster.
>>>>
>>>> =================================
>>> This is what self-heal functionality in 1.4 is supposed to do. Each
>>> translator will contribute its piece of context-aware healing
>>> functionality to the over all recovery process.
>>>
>>> self-heal will involve multiple techniques. Key of them are
>>> * journaled-recovery: It will maintain a journal of operations that
>>> needs to be performed on a failed brick. For example dir related
>>> operations, all I/O operations for AFR ... (This is exactly you
>>> described above).
>>> * lazy-recovery: Certain errors will be extremely time consuming to
>>> detect. Instead of looking out for them (when the brick is offline),
>>> GlusterFS will resume normal operation immediately. If it finds any
>>> fault at run-time, self-heal will heal on demand (say duplicate
>>> files.., missing directory on a brick..). It is OK if a dir is missing
>>> in one of the brick, when it can be fixed at the time of access.
>>> You can also initiate a forceful recovery by just triggering
>>> faults (say "find /mnt/glusterfs -type f -exec file {} \;" will
>>> navigate the entire dir tree and access each file. This should be
>>> sufficient to convert many lazy checks to instant ones). Then
>>> glusterfs-fsck tool would be a matter of shell script.
>>>
>>>> =================================
>>>> Expand/Contract a GlusterFS cluster.
>>>> ACTUAL BEHAVIOR:
>>>>
>>>> DESIRED BEHAVIOR:
>>>> GlusterFS allows cluster members to be dynamically
>>>> hot-added/hot-removed from a running cluster.
>>>>
>>>> =================================
>>> As of adding bricks requires restart of GlusterFS.
>>> http://www.gluster.org/docs/index.php/GlusterFS_FAQ#How_do_I_add_a_new_node_to_an_already_running_cluster_of_GlusterFS 
>>>
>>>
>>> Hot-add/remove functionality is part of our road map. We are
>>> introducing server-notification framework in 1.4. With this feature,
>>> implementing hot-add/remove is a cake-walk.
>>>
>>> Do you think this feature is important for 1.4?. I want to have 1.4
>>> released as soon as possible..
>>>
>> For us hot-add/remove is very desirable. Just like with a RAID array, 
>> we would like to be able to add/remove gluster servers at will from a 
>> running cluster for things like maintenance, hardware replacements, 
>> etc. This is very essential in a production environment so that our 
>> field workforce is not idle whenever such tasks need to occur. If it 
>> will cause a big delay then postpone it to later but if a small delay 
>> then it would be good to have it in 1.4.
>>
>> Gerry
>
> Ok, Let me plan for it in 1.4. I am curious what you are using
> GlusterFS for?
>
Short answer: webapp farm.