[Bugs] [Bug 1151384] Rebalance fails to complete - stale file handles after 202, 908 files

bugzilla at redhat.com bugzilla at redhat.com
Fri Oct 17 05:02:12 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1151384

Raghavendra G <rgowdapp at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rgowdapp at redhat.com
              Flags|                            |needinfo?(alex.smith at hp.com
                   |                            |)



--- Comment #1 from Raghavendra G <rgowdapp at redhat.com> ---
[2014-10-09 10:55:52.723798] E [dht-rebalance.c:1402:gf_defrag_fix_layout]
0-videostore-las-dht: Lookup failed on /las/data2/videos/cf/f6/cff65c574d72
4d373af7f6d90b265b78/1-7d3ec80abe0a304cb6c30d65590f0713
[2014-10-09 10:55:52.723860] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-videostore-las-dht: Fix layout failed for /las/data2/videos/cf/f6/cff65c5
74d724d373af7f6d90b265b78/1-7d3ec80abe0a304cb6c30d65590f0713
[2014-10-09 10:55:52.724217] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-videostore-las-dht: Fix layout failed for /las/data2/videos/cf/f6/cff65c5
74d724d373af7f6d90b265b78
[2014-10-09 10:55:52.724629] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-videostore-las-dht: Fix layout failed for /las/data2/videos/cf/f6
[2014-10-09 10:55:52.725084] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-videostore-las-dht: Fix layout failed for /las/data2/videos/cf
[2014-10-09 10:55:52.725573] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-videostore-las-dht: Fix layout failed for /las/data2/videos
[2014-10-09 10:55:52.725979] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-videostore-las-dht: Fix layout failed for /las/data2
[2014-10-09 10:55:52.726385] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-videostore-las-dht: Fix layout failed for /las
[2014-10-09 10:55:52.726817] I [dht-rebalance.c:1800:gf_defrag_status_get]
0-glusterfs: Rebalance is completed. Time taken is 4357.00 secs
[2014-10-09 10:55:52.726859] I [dht-rebalance.c:1803:gf_defrag_status_get]
0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 7, skipped: 
0
[2014-10-09 10:55:52.727287] W [glusterfsd.c:1095:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x2abb0e5c5fcd] (-->/lib64/libpthread.so.0
[0x2a
bb0df8983d] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x138) [0x4052c8])))
0-: received signum (15), shutting down

@Alex,

Is the following a valid path on your setup? Do you've files/directories named
as cff65c574d724d373af7f6d90b265b78? The path in question is:
/las/data2/videos/cf/f6/cff65c574d724d373af7f6d90b265b78/1-7d3ec80abe0a304cb6c30d65590f0713

When rebalance try to lookup the above path, it seems to be failing. There are
not enough logs to help us identify why that lookup failed (may be file got
removed? etc)

On the other hand stopping entire rebalance when lookup on a file failed seems
like an extreme measure. It might be possible that file got removed in the
window after readdir but before an attempt to migration.

I need information on following things:
1. whether the above path is a valid one?
2. Is there a possibility that the file was removed when rebalance was running?
To broaden the scope of the question, what operations were being performed from
mount-points/clients when rebalance was going on?

regards,
Raghavendra.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=oCkbc9k86R&a=cc_unsubscribe


More information about the Bugs mailing list