[Gluster-users] healing never ends (or never starts?) on replicated volume with virtual block device

Roman romeo.r at gmail.com
Thu Nov 6 11:12:42 UTC 2014


Hi,

another stupid/interesting situation:

root at stor1:~# gluster volume heal HA-WIN-TT-1T info
Brick stor1:/exports/NFS-WIN/1T/
/disk - Possibly undergoing heal
Number of entries: 1

Brick stor2:/exports/NFS-WIN/1T/
/test
/disk - Possibly undergoing heal
Number of entries: 2

due to testings I've brought down stor1 port on the switch and the made it
up again.
then one of the volumes successfully  restored and healed (with virtual
machines)
while other still (about 2 hours atm) says, there is a healing process,
meanwhile there is no traffic between the servers and client/server.

the /test is simple new file, i've made while stor1 was down.
the /disk is a simple virtual block-device made of /dev/null which is 900GB
and is mounted on windows server via iscsitarget :). and it seem it wont
stop healing forever, as it can't decide which file is right?

gluster client machine, where is volume for iscsi target is monted logs:
[2014-11-06 08:19:36.949092] W
[client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0:
remote operation failed: Transport endpoint is not connected
[2014-11-06 08:19:36.949148] W
[client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0:
remote operation failed: Transport endpoint is not connected
[2014-11-06 08:19:36.951202] W
[client-rpc-fops.c:1580:client3_3_finodelk_cbk] 0-HA-WIN-TT-1T-client-0:
remote operation failed: Transport endpoint is not connected
[2014-11-06 08:19:57.682937] W [socket.c:522:__socket_rwv] 0-glusterfs:
readv on 10.250.0.1:24007 failed (Connection timed out)
[2014-11-06 08:20:17.950981] E [socket.c:2161:socket_connect_finish]
0-glusterfs: connection to 10.250.0.1:24007 failed (No route to host)
[2014-11-06 08:20:40.062928] E [socket.c:2161:socket_connect_finish]
0-HA-WIN-TT-1T-client-0: connection to 10.250.0.1:24007 failed (Connection
timed out)
[2014-11-06 08:30:15.638197] W [dht-diskusage.c:232:dht_is_subvol_filled]
0-HA-WIN-TT-1T-dht: disk space on subvolume 'HA-WIN-TT-1T-replicate-0' is
getting full (95.00 %), consider adding more nodes
[2014-11-06 08:36:18.385659] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2014-11-06 08:36:18.386573] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0)
[2014-11-06 08:36:18.387182] I
[client-handshake.c:1677:select_server_supported_programs]
0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2014-11-06 08:36:18.387414] I
[client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0:
Connected to 10.250.0.1:49160, attached to remote volume
'/exports/NFS-WIN/1T'.
[2014-11-06 08:36:18.387433] I
[client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0:
Server and Client lk-version numbers are not same, reopening the fds
[2014-11-06 08:36:18.387446] I
[client-handshake.c:1314:client_post_handshake] 0-HA-WIN-TT-1T-client-0: 1
fds open - Delaying child_up until they are re-opened
[2014-11-06 08:36:18.387730] I
[client-handshake.c:936:client_child_up_reopen_done]
0-HA-WIN-TT-1T-client-0: last fd open'd/lock-self-heal'd - notifying
CHILD-UP
[2014-11-06 08:36:18.387862] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0:
Server lk version = 1

brick log on stor1:

[2014-11-06 08:38:04.269503] I
[client-handshake.c:1677:select_server_supported_programs]
0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2014-11-06 08:38:04.269908] I
[client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1:
Connected to 10.250.0.2:49160, attached to remote volume
'/exports/NFS-WIN/1T'.
[2014-11-06 08:38:04.269962] I
[client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1:
Server and Client lk-version numbers are not same, reopening the fds
[2014-11-06 08:38:04.270560] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1:
Server lk version = 1
[2014-11-06 08:39:33.277219] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 08:49:33.327786] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 08:59:33.375835] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:09:33.430726] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:19:33.486488] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:29:33.541596] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:39:33.595242] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:49:33.648526] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:59:33.702368] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:09:33.756633] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:19:33.810984] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:29:33.865172] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:39:33.918765] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:49:33.973283] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:59:34.028836] I
[afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
Another crawl is in progress for HA-WIN-TT-1T-client-0

same on stor2

-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141106/af82adfc/attachment.html>


More information about the Gluster-users mailing list