[Gluster-devel] Self-heal with partial files
Kevan Benson
kbenson at a-1networks.com
Thu Oct 4 17:32:25 UTC 2007
Krishna Srinivas wrote:
> On 10/4/07, Kevan Benson <kbenson at a-1networks.com> wrote:
>
>> Is self heal supposed to work with partial files? I have an issue where
>> self-heal isn't happening on some servers with AFR and unify in a HA
>> setup I developed. Two servers, two clients, all AFR and unify done on
>> client side.
>>
>> If I kill a connection while a large file is being written, the
>> glusterfs mount waits the appropriate timeout period (10 seconds in my
>> case) and then finishes writing the file to the still active server.
>> This results in a full file on one server and a partial file on the
>> other (the one I stopped traffic to temporarily to simulate a
>> crash/network problem). If I then enable the disabled server and read
>> data from the problematic file, it doesn't self-heal itself and move
>> the full file to the server with the partial file.
>>
>> Anything written entirely while a server is offline (i.e. the offline
>> server has no knowledge of it) is correctly created on read from the
>> file, so the problem seems to be related to files that are partially
>> written to one server.
>>
>> Can someone comment on the particular conditions that cause a self
>> heal? Is there something I can do to force it to self heal at this
>> point (I repeat that reading data from the file does not work). I know
>> I can use rsync and some foo to fix this, but that becomes less and less
>> feasible as the mount size grows and the time for rsync to compare sides
>> lengthens.
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>
> Hi Kevan,
>
> It should have worked fine in your case. What version of glusterfs are
> you using? Just before you do the second read (or open rather) which
> should have triggered self-heal can you do getfattr -n trusted.afr.version <>
> on the partial file and also the full file in the backend and give the output?
>
> Thanks
> Krishna
>
>
Glusterfs TLA 504, fuse-2.7.0-gfs4.
The trusted.afr.version attribute doesn't exist on the partial file, it
does exist on the complete file (with value "1"). From what I just
tested, it doesn't look like it's set until the file operation is
complete (it doesn't exist during writing). Are files without this
attribute assumed to have a value of "0" or something to ensure that
they participate in self-heal correctly?
It doesn't look like it, as if I append data to the file, the partial
version gets assigned a trusted.afr.version=1, while the complete file's
trusted.afr.version is incremented to 2. Self heal now works for that
file, and on a read of file data the partial file is updated with all
data and the trusted.afr.version is set to 2.
More information about the Gluster-devel
mailing list