[Gluster-users] "no gfid found" errors stall fix-layout

Dan Bretherton d.a.bretherton at reading.ac.uk
Mon Jan 23 17:17:58 UTC 2012


On 01/18/2012 02:24 AM, Pranith Kumar K wrote:
> On 01/17/2012 05:54 PM, Dan Bretherton wrote:
>> Dear All-
>> I have been having problems with rebalance ... self-heal again with 
>> Glusterfs version 3.2.5, this time related to "no gfid found" 
>> errors.  A fix-layout operation has stalled because errors like the 
>> following are being reported for large number of files.
>>
>> [2012-01-17 10:48:02.138837] W 
>> [fuse-resolve.c:273:fuse_resolve_deep_cbk] 0-fuse: 
>> /users/mvc/WORK/ORCA1/ORCA1-MV01-DIMGPROC/RUNTMP_Exp61/ORCA1-MV01_2D_y2007m01d05.dimgproc.020: 
>> no gfid found
>>
>> I thought GFID errors were being fixed in version 3.2.5.  How can I 
>> fix these errors to allow rebalance...fix-layout to run normally?  I 
>> am also very worried that the lack of GFID entries for files and 
>> directories could stop file replication and other GlusterFS 
>> operations from working properly.  All comments and suggestions would 
>> be much appreciated.
>>
>> Regards
>> Dan.
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> Dan,
>       We would like to reproduce this problem in house, could you give 
> more details on how to get into this situation.
>
> Pranith

Hello Pranith,
The errors were probably the result of a server that became unresponsive 
for a few hours and had to be restarted. When the server was not 
responding properly it was still showing as Connected in the output of 
"gluster peer status", but the load was growing quite large and it was 
impossible to log on.  I restarted the server and triggered a self-heal 
operation on all the volumes in case any files had not been copied 
correctly to the unresponsive server.  Later on I noticed some layout 
related error messages mentioning "anomalies", so I started a fix-layout 
operation to correct them. The fix-layout didn't complete the first time 
because of "no gfid found" errors, as I reported to the mailing list.  A 
couple of days later I stopped fix-layout and started it again on 
another server, and that time it ran to completion.  I then re-ran the 
self-heal operation and didn't find any new layout errors.  I don't know 
if the second fix-layout attempt worked because it was performed on a 
different server, or if the "no gfid found" errors had been corrected 
automatically by GlusterFS in the days between the two fix-layout 
attempts.  Either way I am very relieved, and I apologise for the false 
alarm.  GlusterFS version 3.2.5 does appear to be able to correct GFID 
errors automatically, but this process can take a long time it seems.

Regards
-Dan.



More information about the Gluster-users mailing list