[Bugs] [Bug 1618932] New: dht-selfheal.c: Directory selfheal failed

Sat Aug 18 12:49:41 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1618932

            Bug ID: 1618932
           Summary: dht-selfheal.c: Directory selfheal failed
           Product: GlusterFS
           Version: mainline
         Component: dht2
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: frostyplanet at gmail.com
                CC: bugs at gluster.org

Created attachment 1476762
  --> https://bugzilla.redhat.com/attachment.cgi?id=1476762&action=edit
gfapi log

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

There's mulitple application using gfapi concurrently creating file in the same
directory
(e51fd83622674cc9) and (e21ea6832d2b13d0) are log from different application
processes.

application log  
-----------------
timezone is GMT+8

2018-08-18 19:35:03,703 DEBUG -31021968- writing to file cluster=4
FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88
 (app:1461)(e51fd83622674cc9)
2018-08-18 19:35:03,734 DEBUG -32369552- writing to file cluster=4
FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88
 (app:1461)(e21ea6832d2b13d0)
2018-08-18 19:35:03,786 DEBUG -31022448- Create new directory
[FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m]
on cluster [4] ((unknown file):           0)(e51fd83622674cc9)
2018-08-18 19:35:03,795 CRITICAL -31021968- Failed to open cluster [4] object
[FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/
                          0004_bfab2d1ea2da11e8a3196c92bf5c1b88] with mode [w]:
[[Errno 5] Input/output error] (app:1461)(e51fd83622674cc9)
2018-08-18 19:35:03,903 DEBUG -32366672- Directory
[FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m]
exists on cluster [4] ((unknown file):               0)(e21ea6832d2b13d0)
2018-08-18 19:35:03,945 DEBUG -32369552- Open cluster [4] file
[FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88]
   with mode [w] (app:1461)(e21ea6832d2b13d0)
2018-08-18 19:35:04,127 DEBUG -31021968- Open cluster [4] file
[FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88]
   with mode [w] (app:1461)(e51fd83622674cc9)
2018-08-18 19:35:04,391 INFO -32369552- Rename file: cluster=4
src=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88

dst=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001
(app:1461)(e21ea6832d2b13d0)
2018-08-18 19:35:04,485 INFO -31021968- Rename file: cluster=4
src=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88

dst=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004
(app:1461)(e51fd83622674cc9)

Actual results:

IO error happended when creating file, success after retry

dht-selfheal failure is observed  in gfapi log, there is unmatched inode unlock 
request reported from brick.

Expected results:

Additional info:

"gluster volume status" output is all ok,
but runing "gluster volume heal vol0 info" blocks and no output

gluster volume info
--------------------

Volume Name: vol0
Type: Distributed-Replicate
Volume ID: 18e1c05d-570a-4c97-aa91-ef984881c4f2
Status: Started
Snapshot Count: 0
Number of Bricks: 36 x 3 = 108
Transport-type: tcp

Options Reconfigured:
locks.trace: false
client.event-threads: 6
cluster.self-heal-daemon: enable
performance.write-behind: True
transport.keepalive: True
cluster.rebal-throttle: lazy
server.event-threads: 4
performance.io-cache: False
nfs.disable: True
cluster.quorum-type: auto
network.ping-timeout: 120
features.cache-invalidation: False
performance.read-ahead: False
performance.client-io-threads: True
cluster.server-quorum-type: none
performance.md-cache-timeout: 0
performance.readdir-ahead: True

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.