[Bugs] [Bug 1284863] New: Full heal of volume fails on some nodes "Commit failed on X", and glustershd logs "Couldn't get xlator xl-0"
bugzilla at redhat.com
bugzilla at redhat.com
Tue Nov 24 11:10:20 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1284863
Bug ID: 1284863
Summary: Full heal of volume fails on some nodes "Commit failed
on X", and glustershd logs "Couldn't get xlator xl-0"
Product: GlusterFS
Version: 3.7.6
Component: glusterd
Severity: medium
Assignee: bugs at gluster.org
Reporter: bugs at medgen.ugent.be
CC: bugs at gluster.org, gluster-bugs at redhat.com
Description of problem:
-----------------------
Problems with unsuccessful full heal on all volumes started after upgrading the
6 node cluster from 3.7.2 to 3.7.6 on Ubuntu Trusty (kernel 3.13.0-49-generic)
On a Distributed-Replicate volume named test (vol info below), executing
`gluster volume heal test full` is unsuccessful and returns different
messages/errors depending on which node the command was executed on:
- When run from node *a*,*d* or *e* the cli tool returns:
> Launching heal operation to perform full self heal on volume test has been unsuccessful
With following errors/warnings on the node the command is run (no log items on
other nodes)
> E [glusterfsd-mgmt.c:619:glusterfs_handle_translator_op] 0-glusterfs: Couldn't get xlator xl-0
==> /var/log/glusterfs/cli.log
> I [cli.c:721:main] 0-cli: Started running gluster with version 3.7.6
> I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
> W [socket.c:588:__socket_rwv] 0-glusterfs: readv on /var/run/gluster/quotad.socket failed (Invalid argument)
> I [cli-rpc-ops.c:8348:gf_cli_heal_volume_cbk] 0-cli: Received resp to heal volume
> I [input.c:36:cli_batch] 0-: Exiting with: -2
==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
> I [MSGID: 106533] [glusterd-volume-ops.c:861:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume test
- When run from node *b* the cli tool returns:
> Commit failed on d.storage. Please check log file for details.
> Commit failed on e.storage. Please check log file for details.
No Errors in any log files on any nodes at that time-point (only info msg
"starting full sweep on subvol" and "finished full sweep on subvol" on the
other 4 nodes for which no commit failed msg was returned by the cli)
- When run from node *c* the cli tool returns:
> Commit failed on e.storage. Please check log file for details.
> Commit failed on a.storage. Please check log file for details.
No Errors in any log files on any nodes at that time-point (only info msg
"starting full sweep on subvol" and "finished full sweep on subvol" on the
other 4 nodes for which no commit failed msg was returned by the cli)
- When run from node *f* the cli tool returns:
> Commit failed on a.storage. Please check log file for details.
> Commit failed on d.storage. Please check log file for details.
No Errors in any log files on any nodes at that time-point (only log info msg
"starting full sweep on subvol" and "finished full sweep on subvol" on the
other 4 nodes for which no commit failed msg was returned by the cli)
Additional info:
----------------
**Volume info**
Volume Name: test
Type: Distributed-Replicate
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: a.storage:/storage/bricks/test/brick
Brick2: b.storage:/storage/bricks/test/brick
Brick3: c.storage:/storage/bricks/test/brick
Brick4: d.storage:/storage/bricks/test/brick
Brick5: e.storage:/storage/bricks/test/brick
Brick6: f.storage:/storage/bricks/test/brick
Options Reconfigured:
performance.readdir-ahead: on
features.trash: off
nfs.disable: off
**volume status info**
Status of volume: test
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick a.storage:/storage/bricks/test/brick 49156 0 Y 783
Brick b.storage:/storage/bricks/test/brick 49160 0 Y 33394
Brick c.storage:/storage/bricks/test/brick 49156 0 Y 545
Brick d.storage:/storage/bricks/test/brick 49158 0 Y 14983
Brick e.storage:/storage/bricks/test/brick 49156 0 Y 22585
Brick f.storage:/storage/bricks/test/brick 49155 0 Y 2397
NFS Server on localhost 2049 0 Y 49084
Self-heal Daemon on localhost N/A N/A Y 49092
NFS Server on b.storage 2049 0 Y 20138
Self-heal Daemon on b.storage N/A N/A Y 20146
NFS Server on f.storage 2049 0 Y 37158
Self-heal Daemon on f.storage N/A N/A Y 37180
NFS Server on a.storage 2049 0 Y 35744
Self-heal Daemon on a.storage N/A N/A Y 35749
NFS Server on c.storage 2049 0 Y 35479
Self-heal Daemon on c.storage N/A N/A Y 35485
NFS Server on e.storage 2049 0 Y 8512
Self-heal Daemon on e.storage N/A N/A Y 8520
Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list