[Gluster-users] healing never ends (or never starts?) on replicated volume with virtual block device

Roman romeo.r at gmail.com
Thu Nov 6 12:40:53 UTC 2014


oh, never mind. it is synced now. took a LOT of time :)


2014-11-06 13:12 GMT+02:00 Roman <romeo.r at gmail.com>:

> Hi,
>
> another stupid/interesting situation:
>
> root at stor1:~# gluster volume heal HA-WIN-TT-1T info
> Brick stor1:/exports/NFS-WIN/1T/
> /disk - Possibly undergoing heal
> Number of entries: 1
>
> Brick stor2:/exports/NFS-WIN/1T/
> /test
> /disk - Possibly undergoing heal
> Number of entries: 2
>
> due to testings I've brought down stor1 port on the switch and the made it
> up again.
> then one of the volumes successfully  restored and healed (with virtual
> machines)
> while other still (about 2 hours atm) says, there is a healing process,
> meanwhile there is no traffic between the servers and client/server.
>
> the /test is simple new file, i've made while stor1 was down.
> the /disk is a simple virtual block-device made of /dev/null which is
> 900GB and is mounted on windows server via iscsitarget :). and it seem it
> wont stop healing forever, as it can't decide which file is right?
>
> gluster client machine, where is volume for iscsi target is monted logs:
> [2014-11-06 08:19:36.949092] W
> [client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0:
> remote operation failed: Transport endpoint is not connected
> [2014-11-06 08:19:36.949148] W
> [client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0:
> remote operation failed: Transport endpoint is not connected
> [2014-11-06 08:19:36.951202] W
> [client-rpc-fops.c:1580:client3_3_finodelk_cbk] 0-HA-WIN-TT-1T-client-0:
> remote operation failed: Transport endpoint is not connected
> [2014-11-06 08:19:57.682937] W [socket.c:522:__socket_rwv] 0-glusterfs:
> readv on 10.250.0.1:24007 failed (Connection timed out)
> [2014-11-06 08:20:17.950981] E [socket.c:2161:socket_connect_finish]
> 0-glusterfs: connection to 10.250.0.1:24007 failed (No route to host)
> [2014-11-06 08:20:40.062928] E [socket.c:2161:socket_connect_finish]
> 0-HA-WIN-TT-1T-client-0: connection to 10.250.0.1:24007 failed
> (Connection timed out)
> [2014-11-06 08:30:15.638197] W [dht-diskusage.c:232:dht_is_subvol_filled]
> 0-HA-WIN-TT-1T-dht: disk space on subvolume 'HA-WIN-TT-1T-replicate-0' is
> getting full (95.00 %), consider adding more nodes
> [2014-11-06 08:36:18.385659] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2014-11-06 08:36:18.386573] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
> 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0)
> [2014-11-06 08:36:18.387182] I
> [client-handshake.c:1677:select_server_supported_programs]
> 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437),
> Version (330)
> [2014-11-06 08:36:18.387414] I
> [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0:
> Connected to 10.250.0.1:49160, attached to remote volume
> '/exports/NFS-WIN/1T'.
> [2014-11-06 08:36:18.387433] I
> [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
> [2014-11-06 08:36:18.387446] I
> [client-handshake.c:1314:client_post_handshake] 0-HA-WIN-TT-1T-client-0: 1
> fds open - Delaying child_up until they are re-opened
> [2014-11-06 08:36:18.387730] I
> [client-handshake.c:936:client_child_up_reopen_done]
> 0-HA-WIN-TT-1T-client-0: last fd open'd/lock-self-heal'd - notifying
> CHILD-UP
> [2014-11-06 08:36:18.387862] I
> [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0:
> Server lk version = 1
>
> brick log on stor1:
>
> [2014-11-06 08:38:04.269503] I
> [client-handshake.c:1677:select_server_supported_programs]
> 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437),
> Version (330)
> [2014-11-06 08:38:04.269908] I
> [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1:
> Connected to 10.250.0.2:49160, attached to remote volume
> '/exports/NFS-WIN/1T'.
> [2014-11-06 08:38:04.269962] I
> [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
> [2014-11-06 08:38:04.270560] I
> [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1:
> Server lk version = 1
> [2014-11-06 08:39:33.277219] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 08:49:33.327786] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 08:59:33.375835] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 09:09:33.430726] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 09:19:33.486488] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 09:29:33.541596] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 09:39:33.595242] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 09:49:33.648526] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 09:59:33.702368] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 10:09:33.756633] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 10:19:33.810984] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 10:29:33.865172] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 10:39:33.918765] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 10:49:33.973283] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
> [2014-11-06 10:59:34.028836] I
> [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0:
> Another crawl is in progress for HA-WIN-TT-1T-client-0
>
> same on stor2
>
> --
> Best regards,
> Roman.
>



-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141106/f5f721e9/attachment.html>


More information about the Gluster-users mailing list