[Gluster-users] "no gfid found" errors stall fix-layout
Dan Bretherton
d.a.bretherton at reading.ac.uk
Mon Jan 23 17:17:58 UTC 2012
On 01/18/2012 02:24 AM, Pranith Kumar K wrote:
> On 01/17/2012 05:54 PM, Dan Bretherton wrote:
>> Dear All-
>> I have been having problems with rebalance ... self-heal again with
>> Glusterfs version 3.2.5, this time related to "no gfid found"
>> errors. A fix-layout operation has stalled because errors like the
>> following are being reported for large number of files.
>>
>> [2012-01-17 10:48:02.138837] W
>> [fuse-resolve.c:273:fuse_resolve_deep_cbk] 0-fuse:
>> /users/mvc/WORK/ORCA1/ORCA1-MV01-DIMGPROC/RUNTMP_Exp61/ORCA1-MV01_2D_y2007m01d05.dimgproc.020:
>> no gfid found
>>
>> I thought GFID errors were being fixed in version 3.2.5. How can I
>> fix these errors to allow rebalance...fix-layout to run normally? I
>> am also very worried that the lack of GFID entries for files and
>> directories could stop file replication and other GlusterFS
>> operations from working properly. All comments and suggestions would
>> be much appreciated.
>>
>> Regards
>> Dan.
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> Dan,
> We would like to reproduce this problem in house, could you give
> more details on how to get into this situation.
>
> Pranith
Hello Pranith,
The errors were probably the result of a server that became unresponsive
for a few hours and had to be restarted. When the server was not
responding properly it was still showing as Connected in the output of
"gluster peer status", but the load was growing quite large and it was
impossible to log on. I restarted the server and triggered a self-heal
operation on all the volumes in case any files had not been copied
correctly to the unresponsive server. Later on I noticed some layout
related error messages mentioning "anomalies", so I started a fix-layout
operation to correct them. The fix-layout didn't complete the first time
because of "no gfid found" errors, as I reported to the mailing list. A
couple of days later I stopped fix-layout and started it again on
another server, and that time it ran to completion. I then re-ran the
self-heal operation and didn't find any new layout errors. I don't know
if the second fix-layout attempt worked because it was performed on a
different server, or if the "no gfid found" errors had been corrected
automatically by GlusterFS in the days between the two fix-layout
attempts. Either way I am very relieved, and I apologise for the false
alarm. GlusterFS version 3.2.5 does appear to be able to correct GFID
errors automatically, but this process can take a long time it seems.
Regards
-Dan.
More information about the Gluster-users
mailing list