[Gluster-users] geo-replication not syncing files...

Dietmar Putz putz at 3qmedien.net
Thu Nov 12 17:11:37 UTC 2015


some new findings...
yesterday evening i saved a 'find /gluster-export -type f -exec ls 
-lisa' on all masternodes to look for stale filelink's.
afterwards the geo-replication restarted several times and the latest 
restart was caused by (entire sequence below) :

[ 16:06:45 ] - root at gluster-ger-ber-09 
/var/log/glusterfs/geo-replication/ger-ber-01 $cat 
*gluster-export.gluster.log | egrep "No such file|stale linkfile" | tail -1
[2015-11-12 15:58:19.491418] I 
[dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of 
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg on 
ger-ber-01-replicate-0 (following linkfile) failed (No such file or 
directory)
[ 16:07:18 ] - root at gluster-ger-ber-09 
/var/log/glusterfs/geo-replication/ger-ber-01 $

In the output of yesterday evening the mentioned linkfile exists on 
ger-ber-09 and ger-ber-10 (vol ger-ber-01-replicate-1), not on 
ger-ber-01-replicate-0.

root at dp-server:/home/tron/gluster# grep 
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg 
ls-lisa-gluster-export-files-gluster-ger-ber-*
ls-lisa-gluster-export-files-gluster-ger-ber-09-int.out:86082371834 0 
---------T 2 root root 0 Apr 18  2014 
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
ls-lisa-gluster-export-files-gluster-ger-ber-10-int.out:86085383674 0 
---------T 2 root root 0 Apr 18  2014 
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root at dp-server:/home/tron/gluster#
root at dp-server:/home/tron/gluster# grep 86082371834 
ls-lisa-gluster-export-files-gluster-ger-ber-09-int.out
86082371834 0 ---------T 2 root root 0 Apr 18  2014 
/gluster-export/.glusterfs/61/de/61def2a4-4a6d-4979-9d54-14b041e926a7
86082371834 0 ---------T 2 root root 0 Apr 18  2014 
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root at dp-server:/home/tron/gluster# grep 86085383674 
ls-lisa-gluster-export-files-gluster-ger-ber-10-int.out
86085383674 0 ---------T 2 root root 0 Apr 18  2014 
/gluster-export/.glusterfs/61/de/61def2a4-4a6d-4979-9d54-14b041e926a7
86085383674 0 ---------T 2 root root 0 Apr 18  2014 
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root at dp-server:/home/tron/gluster#

After restart the linkfile/gfid does not exist anymore...apparently some 
consistency check is running in the backgroud and geo-repl process is 
dieing when such linkfile problem occurs...
should i have done a 'rebalance' on that volume before starting 
geo-replication ?

currently I'm running a script which find those linkfiles in the saved 
file-listings.
i think i have to delete them manually and observe how the 
geo-replication reacts when it was running for some hours without 
frequent restarts...

best regards
dietmar



[2015-11-12 13:45:14.295292] I [fuse-bridge.c:5520:fini] 0-fuse: 
Unmounting '/tmp/gsyncd-aux-mount-M_LLid'.
[2015-11-12 15:58:19.491418] I 
[dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of 
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg on 
ger-ber-01-replicate-0 (following linkfile) failed (No such file or 
directory)
[2015-11-12 15:58:19.493244] I 
[dht-common.c:911:dht_lookup_unlink_stale_linkto_cbk] 0-ger-ber-01-dht: 
Returned with op_ret 0 and op_errno 0 for 
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
[2015-11-12 15:58:31.409954] I [glusterfsd.c:1959:main] 
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.6 
(/usr/sbin/glusterfs --aux-gfid-mount 
--log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%
4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log 
--volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1 
/tmp/gsyncd-aux-mount-iDrtTp)
[2015-11-12 15:58:31.431290] I [socket.c:3645:socket_init] 0-glusterfs: 
SSL support is NOT enabled
[2015-11-12 15:58:31.431321] I [socket.c:3660:socket_init] 0-glusterfs: 
using system polling thread
[2015-11-12 15:58:31.495912] I [dht-shared.c:314:dht_init_regex] 
0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-11-12 15:58:31.497671] I [socket.c:3645:socket_init] 
0-ger-ber-01-client-5: SSL support is NOT enabled
[2015-11-12 15:58:31.497694] I [socket.c:3660:socket_init] 
0-ger-ber-01-client-5: using system polling thread
[2015-11-12 15:58:31.497815] I [socket.c:3645:socket_init] 
0-ger-ber-01-client-4: SSL support is NOT enabled
[2015-11-12 15:58:31.497820] I [socket.c:3660:socket_init] 
0-ger-ber-01-client-4: using system polling thread
[2015-11-12 15:58:31.497939] I [socket.c:3645:socket_init] 
0-ger-ber-01-client-3: SSL support is NOT enabled
[2015-11-12 15:58:31.497945] I [socket.c:3660:socket_init] 
0-ger-ber-01-client-3: using system polling thread
[2015-11-12 15:58:31.498059] I [socket.c:3645:socket_init] 
0-ger-ber-01-client-2: SSL support is NOT enabled
[2015-11-12 15:58:31.498063] I [socket.c:3660:socket_init] 
0-ger-ber-01-client-2: using system polling thread
[2015-11-12 15:58:31.498184] I [socket.c:3645:socket_init] 
0-ger-ber-01-client-1: SSL support is NOT enabled
[2015-11-12 15:58:31.498189] I [socket.c:3660:socket_init] 
0-ger-ber-01-client-1: using system polling thread
[2015-11-12 15:58:31.498298] I [socket.c:3645:socket_init] 
0-ger-ber-01-client-0: SSL support is NOT enabled
[2015-11-12 15:58:31.498303] I [socket.c:3660:socket_init] 
0-ger-ber-01-client-0: using system polling thread
[2015-11-12 15:58:31.498325] I [client.c:2294:notify] 
0-ger-ber-01-client-0: parent translators are ready, attempting connect 
on transport
[2015-11-12 15:58:31.498603] I [client.c:2294:notify] 
0-ger-ber-01-client-1: parent translators are ready, attempting connect 
on transport
[2015-11-12 15:58:31.498738] I [client.c:2294:notify] 
0-ger-ber-01-client-2: parent translators are ready, attempting connect 
on transport
[2015-11-12 15:58:31.498904] I [client.c:2294:notify] 
0-ger-ber-01-client-3: parent translators are ready, attempting connect 
on transport
[2015-11-12 15:58:31.499064] I [client.c:2294:notify] 
0-ger-ber-01-client-4: parent translators are ready, attempting connect 
on transport
[2015-11-12 15:58:31.499192] I [client.c:2294:notify] 
0-ger-ber-01-client-5: parent translators are ready, attempting connect 
on transport
Final graph:
+------------------------------------------------------------------------------+
   1: volume ger-ber-01-client-0
   2:     type protocol/client
   3:     option remote-host gluster-ger-ber-11-int
   4:     option remote-subvolume /gluster-export
   5:     option transport-type socket
   6:     option username f2720838-abbd-4629-bd24-8040ceb794c1
   7:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
   8:     option ping-timeout 10
   9:     option send-gids true
  10: end-volume
  11:
  12: volume ger-ber-01-client-1
  13:     type protocol/client
  14:     option remote-host gluster-ger-ber-12-int
  15:     option remote-subvolume /gluster-export
  16:     option transport-type socket
  17:     option username f2720838-abbd-4629-bd24-8040ceb794c1
  18:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
  19:     option ping-timeout 10
  20:     option send-gids true
  21: end-volume
  22:
  23: volume ger-ber-01-replicate-0
  24:     type cluster/replicate
  25:     subvolumes ger-ber-01-client-0 ger-ber-01-client-1
  26: end-volume
  27:
  28: volume ger-ber-01-client-2
  29:     type protocol/client
  30:     option remote-host gluster-ger-ber-09-int
  31:     option remote-subvolume /gluster-export
  32:     option transport-type socket
  33:     option username f2720838-abbd-4629-bd24-8040ceb794c1
  34:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
  35:     option ping-timeout 10
  36:     option send-gids true
  37: end-volume
  38:
  39: volume ger-ber-01-client-3
  40:     type protocol/client
  41:     option remote-host gluster-ger-ber-10-int
  42:     option remote-subvolume /gluster-export
  43:     option transport-type socket
  44:     option username f2720838-abbd-4629-bd24-8040ceb794c1
  45:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
  46:     option ping-timeout 10
  47:     option send-gids true
  48: end-volume
  49:
  50: volume ger-ber-01-replicate-1
  51:     type cluster/replicate
  52:     subvolumes ger-ber-01-client-2 ger-ber-01-client-3
  53: end-volume
  54:
  55: volume ger-ber-01-client-4
  56:     type protocol/client
  57:     option remote-host gluster-ger-ber-07-int
  58:     option remote-subvolume /gluster-export
  59:     option transport-type socket
  60:     option username f2720838-abbd-4629-bd24-8040ceb794c1
  61:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
  62:     option ping-timeout 10
  63:     option send-gids true
  64: end-volume
  65:
  66: volume ger-ber-01-client-5
  67:     type protocol/client
  68:     option remote-host gluster-ger-ber-08-int
  69:     option remote-subvolume /gluster-export
  70:     option transport-type socket
  71:     option username f2720838-abbd-4629-bd24-8040ceb794c1
  72:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
  73:     option ping-timeout 10
  74:     option send-gids true
  75: end-volume
  76:
  77: volume ger-ber-01-replicate-2
  78:     type cluster/replicate
  79:     subvolumes ger-ber-01-client-4 ger-ber-01-client-5
  80: end-volume
  81:
  82: volume ger-ber-01-dht
  83:     type cluster/distribute
  84:     option min-free-disk 200GB
  85:     subvolumes ger-ber-01-replicate-0 ger-ber-01-replicate-1 
ger-ber-01-replicate-2
  86: end-volume
  87:
  88: volume ger-ber-01-write-behind
  89:     type performance/write-behind
  90:     subvolumes ger-ber-01-dht
  91: end-volume
  92:
  93: volume ger-ber-01-read-ahead
  94:     type performance/read-ahead
  95:     subvolumes ger-ber-01-write-behind
  96: end-volume
  97:
  98: volume ger-ber-01-io-cache
  99:     type performance/io-cache
100:     option min-file-size 0
101:     option cache-timeout 2
102:     option cache-size 1024MB
103:     subvolumes ger-ber-01-read-ahead
104: end-volume
105:
106: volume ger-ber-01-quick-read
107:     type performance/quick-read
108:     option cache-size 1024MB
109:     subvolumes ger-ber-01-io-cache
110: end-volume
111:
112: volume ger-ber-01-md-cache
113:     type performance/md-cache
114:     subvolumes ger-ber-01-quick-read
115: end-volume
116:
117: volume ger-ber-01
118:     type debug/io-stats
119:     option latency-measurement off
120:     option count-fop-hits off
121:     subvolumes ger-ber-01-md-cache
122: end-volume
123:
124: volume gfid-access-autoload
125:     type features/gfid-access
126:     subvolumes ger-ber-01
127: end-volume
128:
+------------------------------------------------------------------------------+
[2015-11-12 15:58:31.499624] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 
0-ger-ber-01-client-2: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500026] I 
[client-handshake.c:1677:select_server_supported_programs] 
0-ger-ber-01-client-2: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2015-11-12 15:58:31.500138] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 
0-ger-ber-01-client-4: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500160] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 
0-ger-ber-01-client-5: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500565] I 
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-2: 
Connected to 10.0.1.101:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.500576] I 
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-2: 
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.500611] I [afr-common.c:4487:afr_notify] 
0-ger-ber-01-replicate-1: Subvolume 'ger-ber-01-client-2' came back up; 
going online.
[2015-11-12 15:58:31.500629] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 
0-ger-ber-01-client-3: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500659] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 
0-ger-ber-01-client-0: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500673] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 
0-ger-ber-01-client-1: changing port to 49152 (from 0)
[2015-11-12 15:58:31.501206] I 
[client-handshake.c:450:client_set_lk_version_cbk] 
0-ger-ber-01-client-2: Server lk version = 1
[2015-11-12 15:58:31.501239] I 
[client-handshake.c:1677:select_server_supported_programs] 
0-ger-ber-01-client-4: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2015-11-12 15:58:31.501380] I 
[client-handshake.c:1677:select_server_supported_programs] 
0-ger-ber-01-client-5: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2015-11-12 15:58:31.501479] I 
[client-handshake.c:1677:select_server_supported_programs] 
0-ger-ber-01-client-0: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2015-11-12 15:58:31.501540] I 
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-4: 
Connected to 10.0.1.147:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501546] I 
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-4: 
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501566] I [afr-common.c:4487:afr_notify] 
0-ger-ber-01-replicate-2: Subvolume 'ger-ber-01-client-4' came back up; 
going online.
[2015-11-12 15:58:31.501576] I 
[client-handshake.c:1677:select_server_supported_programs] 
0-ger-ber-01-client-3: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2015-11-12 15:58:31.501649] I 
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-5: 
Connected to 10.0.1.149:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501654] I 
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-5: 
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501729] I 
[client-handshake.c:1677:select_server_supported_programs] 
0-ger-ber-01-client-1: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2015-11-12 15:58:31.501753] I 
[client-handshake.c:450:client_set_lk_version_cbk] 
0-ger-ber-01-client-4: Server lk version = 1
[2015-11-12 15:58:31.501778] I 
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-0: 
Connected to 10.0.1.105:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501783] I 
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501797] I [afr-common.c:4487:afr_notify] 
0-ger-ber-01-replicate-0: Subvolume 'ger-ber-01-client-0' came back up; 
going online.
[2015-11-12 15:58:31.501820] I 
[client-handshake.c:450:client_set_lk_version_cbk] 
0-ger-ber-01-client-5: Server lk version = 1
[2015-11-12 15:58:31.501934] I 
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-3: 
Connected to 10.0.1.103:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501939] I 
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-3: 
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501982] I 
[client-handshake.c:450:client_set_lk_version_cbk] 
0-ger-ber-01-client-0: Server lk version = 1
[2015-11-12 15:58:31.502061] I 
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-1: 
Connected to 10.0.1.107:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.502067] I 
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-1: 
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.504554] I [fuse-bridge.c:5022:fuse_graph_setup] 
0-fuse: switched to graph 0
[2015-11-12 15:58:31.504588] I 
[client-handshake.c:450:client_set_lk_version_cbk] 
0-ger-ber-01-client-1: Server lk version = 1
[2015-11-12 15:58:31.504595] I 
[client-handshake.c:450:client_set_lk_version_cbk] 
0-ger-ber-01-client-3: Server lk version = 1
[2015-11-12 15:58:31.504657] I [fuse-bridge.c:3959:fuse_init] 
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 
kernel 7.22
[ 16:13:28 ] - root at gluster-ger-ber-09 
/var/log/glusterfs/geo-replication/ger-ber-01 $



Am 12.11.2015 um 13:23 schrieb Dietmar Putz:
> Hello Aravinda,
>
> thank you for replay...
> answers inside...
>
> best regards
> dietmar
>
> Am 12.11.2015 um 07:22 schrieb Aravinda:
>> Please paste some lines from log files where you see "Operation not 
>> permitted" errors.
>
> the appearance of these "Operation not permitted" warnings on the 
> active slave are so frequent that there are no corresponding message 
> on the master...
> what i observe is that 1270 different gfid's are processed again and 
> again...
>
> [2015-11-12 10:53:55.355550] W [fuse-bridge.c:1942:fuse_create_cbk] 
> 0-glusterfs-fuse: 29348: /.gfid/f5f57c10-381d-4537-8988-f22e16fab523 
> => -1 (Operation not permitted)
> [ 10:56:59 ] - root at gluster-wien-05 
> /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log 
> *.gluster.log.1 | grep "Operation not permitted" | wc -l
> 54756
> [ 11:02:28 ] - root at gluster-wien-05 
> /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log 
> *.gluster.log.1 | grep "Operation not permitted" | awk '{print $7}' | 
> head -2
> /.gfid/f857ac77-6afc-494f-88c7-304aec6cc8b5
> /.gfid/f87e0361-3915-495f-bdb2-0c87155f4ddd
> [ 11:02:48 ] - root at gluster-wien-05 
> /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log 
> *.gluster.log.1 | grep "Operation not permitted" | awk '{print $7}' | 
> sort | uniq | wc -l
> 1270
> [ 11:03:14 ] - root at gluster-wien-05 
> /var/log/glusterfs/geo-replication-slaves $
>
> these gfid's are filling up the geo-repl. logfile on the master after 
> each restart...
> according to the master_gfid_file.txt the files behind the gfid's are 
> available on master and slave.
>
>
> i observe many geo restarts on the master node and as far as i can see 
> they are caused by two different reasons ..."No such file or 
> directory" and "attempting deletion of stale linkfile"
>
> [ 11:30:08 ] - root at gluster-ger-ber-09 
> /var/log/glusterfs/geo-replication/ger-ber-01 $cat 
> *gluster-export.gluster.log | grep "No such file or directory" | tail -1
>
> [2015-11-12 09:26:35.942478] I 
> [dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup 
> of /cache/img/img/articles/508e370a62a2e4.87470810.jpeg on 
> ger-ber-01-replicate-0 (following linkfile) failed (No such file or 
> directory)
>
> [ 11:30:31 ] - root at gluster-ger-ber-09 
> /var/log/glusterfs/geo-replication/ger-ber-01 $cat 
> *gluster-export.gluster.log | grep "deletion of stale linkfile" | tail -1
>
> [2015-11-11 23:08:44.487825] I 
> [dht-common.c:1480:dht_lookup_everywhere_cbk] 0-ger-ber-01-dht: 
> attempting deletion of stale linkfile 
> /cache/img/download/508813939afa33.63956784.pdf.gz on 
> ger-ber-01-replicate-1
>
> [ 11:31:10 ] - root at gluster-ger-ber-09 
> /var/log/glusterfs/geo-replication/ger-ber-01 $cat 
> *gluster-export.gluster.log | egrep -i "no such file or 
> directory|deletion of stale linkfile" | wc -l
> 115
>
> any time such an error appears the geo-replication prozess 
> restarts...meanwhile 115 times since 8th nov.
>
>
> "attempting to delete stale linkfile"
> example :
> /cache/img/articles/50856893226ce0.55808655.jpeg.gz
>
> [2015-11-10 23:48:14.273733] W [glusterfsd.c:1095:cleanup_and_exit] 
> (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fbf8be6c47d] 
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fbf8c13f182] 
> (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fbf8cc39265]))) 
> 0-: received signum (15), shutting down
> [2015-11-10 23:48:14.273768] I [fuse-bridge.c:5520:fini] 0-fuse: 
> Unmounting '/tmp/gsyncd-aux-mount-QLm2DI'.
> [2015-11-11 01:14:26.972164] I 
> [dht-common.c:1480:dht_lookup_everywhere_cbk] 0-ger-ber-01-dht: 
> attempting deletion of stale linkfile 
> /cache/img/articles/50856893226ce0.55808655.jpeg.gz on 
> ger-ber-01-replicate-1
> [2015-11-11 01:14:26.999234] I 
> [dht-common.c:825:dht_lookup_unlink_cbk] 0-ger-ber-01-dht: 
> lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for 
> /cache/img/articles/50856893226ce0.55808655.jpeg.gz
> [2015-11-11 01:14:39.060592] I [glusterfsd.c:1959:main] 
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 
> 3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount 
> --log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log 
> --volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1 
> /tmp/gsyncd-aux-mount-zXLm3a)
> [2015-11-11 01:14:39.093942] I [socket.c:3645:socket_init] 
> 0-glusterfs: SSL support is NOT enabled
> [2015-11-11 01:14:39.093969] I [socket.c:3660:socket_init] 
> 0-glusterfs: using system polling thread
> [2015-11-11 01:14:39.171506] I [dht-shared.c:314:dht_init_regex] 
> 0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
> [2015-11-11 01:14:39.173291] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-5: SSL support is NOT enabled
> [2015-11-11 01:14:39.173304] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-5: using system polling thread
> [2015-11-11 01:14:39.173426] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-4: SSL support is NOT enabled
> [2015-11-11 01:14:39.173430] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-4: using system polling thread
> [2015-11-11 01:14:39.173545] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-3: SSL support is NOT enabled
> [2015-11-11 01:14:39.173550] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-3: using system polling thread
> [2015-11-11 01:14:39.173667] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-2: SSL support is NOT enabled
> [2015-11-11 01:14:39.173671] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-2: using system polling thread
> [2015-11-11 01:14:39.173795] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-1: SSL support is NOT enabled
> [2015-11-11 01:14:39.173800] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-1: using system polling thread
> [2015-11-11 01:14:39.173916] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-0: SSL support is NOT enabled
> [2015-11-11 01:14:39.173921] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-0: using system polling thread
> [2015-11-11 01:14:39.173944] I [client.c:2294:notify] 
> 0-ger-ber-01-client-0: parent translators are ready, attempting 
> connect on transport
> [2015-11-11 01:14:39.174163] I [client.c:2294:notify] 
> 0-ger-ber-01-client-1: parent translators are ready, attempting 
> connect on transport
> [2015-11-11 01:14:39.174299] I [client.c:2294:notify] 
> 0-ger-ber-01-client-2: parent translators are ready, attempting 
> connect on transport
> [2015-11-11 01:14:39.174474] I [client.c:2294:notify] 
> 0-ger-ber-01-client-3: parent translators are ready, attempting 
> connect on transport
> [2015-11-11 01:14:39.174620] I [client.c:2294:notify] 
> 0-ger-ber-01-client-4: parent translators are ready, attempting 
> connect on transport
> [2015-11-11 01:14:39.174752] I [client.c:2294:notify] 
> 0-ger-ber-01-client-5: parent translators are ready, attempting 
> connect on transport
>
>
> all of the stale linkfiles were stored on ger-ber-01-replicate-1 which 
> is gluster-ger-ber-09-int and gluster-ger-ber-10-int. obviously the 
> gluster successfully deletes the stale linkfile because afterwards i 
> can not find the mentioned file anymore but the geo-replication 
> process restarts and does not process the same linkfile again.
> looking around i found some more of the stale linkfiles...size 0 and 
> no corresponding file.
> so i come more and more to the conclusion that i possibly have a 
> preceding problem...
>
> can i delete such stale linkfiles without any problems ...?
>
> putz at sdn-de-gate-01:~/central/gluster-3.5$  ../mycommand.sh -H 
> gluster-ger -c "ls -l /gluster-export/cache/img/articles/*.gz"
> Host : gluster-ger-ber-07-int
> Return Code : 2     check result on gluster-ger-ber-07-int
> -----------------------------------------------------
> Host : gluster-ger-ber-08-int
> Return Code : 2     check result on gluster-ger-ber-08-int
> -----------------------------------------------------
> Host : gluster-ger-ber-09-int
> ---------T 2 root root 0 Apr 23  2014 
> /gluster-export/cache/img/articles/507c19b7e19af9.19424972.jpeg.gz
> ---------T 2 root root 0 Apr 23  2014 
> /gluster-export/cache/img/articles/507d4e41edb6b0.81932229.jpeg.gz
> ---------T 2 root root 0 Apr 23  2014 
> /gluster-export/cache/img/articles/507e941aed0c81.54017110.jpeg.gz
> ---------T 2 root root 0 Apr 23  2014 
> /gluster-export/cache/img/articles/508569822e9278.45704411.jpeg.gz
> -----------------------------------------------------
> Host : gluster-ger-ber-10-int
> ---------T 2 root root 0 Apr 23  2014 
> /gluster-export/cache/img/articles/507c19b7e19af9.19424972.jpeg.gz
> ---------T 2 root root 0 Apr 23  2014 
> /gluster-export/cache/img/articles/507d4e41edb6b0.81932229.jpeg.gz
> ---------T 2 root root 0 Apr 23  2014 
> /gluster-export/cache/img/articles/507e941aed0c81.54017110.jpeg.gz
> ---------T 2 root root 0 Apr 23  2014 
> /gluster-export/cache/img/articles/508569822e9278.45704411.jpeg.gz
> -----------------------------------------------------
> Host : gluster-ger-ber-11-int
> Return Code : 2     check result on gluster-ger-ber-11-int
> -----------------------------------------------------
> Host : gluster-ger-ber-12-int
> Return Code : 2     check result on gluster-ger-ber-12-int
> -----------------------------------------------------
>
>
> The other reason for restart ... "No such file or directory"...still 
> no clue...
>
> "No such file or directory"
>
> [2015-11-10 11:12:39.975390] I [fuse-bridge.c:5520:fini] 0-fuse: 
> Unmounting '/tmp/gsyncd-aux-mount-JXonZH'.
> [2015-11-10 13:07:39.532569] I 
> [dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup 
> of /cache/img/articles/505c5538cd9d21.37413891.jpeg.gz on 
> ger-ber-01-replicate-0 (following linkfile) failed (No such file or 
> directory)
> [2015-11-10 13:07:39.563707] I 
> [dht-common.c:911:dht_lookup_unlink_stale_linkto_cbk] 
> 0-ger-ber-01-dht: Returned with op_ret 0 and op_errno 0 for 
> /cache/img/articles/505c5538cd9d21.37413891.jpeg.gz
> [2015-11-10 13:07:51.483713] I [glusterfsd.c:1959:main] 
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 
> 3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount 
> --log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log 
> --volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1 
> /tmp/gsyncd-aux-mount-H9kZUh)
> [2015-11-10 13:07:51.511921] I [socket.c:3645:socket_init] 
> 0-glusterfs: SSL support is NOT enabled
> [2015-11-10 13:07:51.511943] I [socket.c:3660:socket_init] 
> 0-glusterfs: using system polling thread
> [2015-11-10 13:07:51.567950] I [dht-shared.c:314:dht_init_regex] 
> 0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
> [2015-11-10 13:07:51.570322] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-5: SSL support is NOT enabled
> [2015-11-10 13:07:51.570340] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-5: using system polling thread
> [2015-11-10 13:07:51.570497] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-4: SSL support is NOT enabled
> [2015-11-10 13:07:51.570504] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-4: using system polling thread
> [2015-11-10 13:07:51.570706] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-3: SSL support is NOT enabled
> [2015-11-10 13:07:51.570724] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-3: using system polling thread
> [2015-11-10 13:07:51.570888] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-2: SSL support is NOT enabled
> [2015-11-10 13:07:51.570895] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-2: using system polling thread
> [2015-11-10 13:07:51.571086] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-1: SSL support is NOT enabled
> [2015-11-10 13:07:51.571103] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-1: using system polling thread
> [2015-11-10 13:07:51.571259] I [socket.c:3645:socket_init] 
> 0-ger-ber-01-client-0: SSL support is NOT enabled
> [2015-11-10 13:07:51.571266] I [socket.c:3660:socket_init] 
> 0-ger-ber-01-client-0: using system polling thread
> [2015-11-10 13:07:51.571298] I [client.c:2294:notify] 
> 0-ger-ber-01-client-0: parent translators are ready, attempting 
> connect on transport
> [2015-11-10 13:07:51.571602] I [client.c:2294:notify] 
> 0-ger-ber-01-client-1: parent translators are ready, attempting 
> connect on transport
> [2015-11-10 13:07:51.571862] I [client.c:2294:notify] 
> 0-ger-ber-01-client-2: parent translators are ready, attempting 
> connect on transport
> [2015-11-10 13:07:51.572118] I [client.c:2294:notify] 
> 0-ger-ber-01-client-3: parent translators are ready, attempting 
> connect on transport
> [2015-11-10 13:07:51.572350] I [client.c:2294:notify] 
> 0-ger-ber-01-client-4: parent translators are ready, attempting 
> connect on transport
> [2015-11-10 13:07:51.572647] I [client.c:2294:notify] 
> 0-ger-ber-01-client-5: parent translators are ready, attempting 
> connect on transport
>
>
> any other logfiles helpful ?
>
>
>>
>> Were there any errors observed during step 3 as specified in 
>> http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5
>
> step 3 of section "scheduling a downtime" was running without any errors.
> step 3 of section "upgrade steps for geo-replication" was started 
> twice...after first attempt i recognized that the slave-upgrade.sh 
> script uses the standard ssh port. after temporarily reconfiguration 
> to port 22 of each slave ssh-deamon the script was running without any 
> problems in a screen session and asked for each slave password...
>
>>
>> Some rows in status output are still in "Not Started", are there any 
>> errors in /var/log/glusterfs/geo-replication/*.log in those nodes?
> yes, the other master nodes are not permitted to access the slaves 
> nodes by iptables on the slave...just gluster-ger-ber-08-int and 
> gluster-ger-ber-09-int are allowed to access each slave node.
> should that be adjusted to allow each master node to access each slave 
> node...or does it 'just' influence the reliability of geo-replication 
> and does not affect the geo-rep. problem ?
>
> [2015-11-09 12:45:40.30857] E [resource(monitor):204:errlog] Popen: 
> command "ssh -p 2503 -oPasswordAuthentication=no 
> -oStrictHostKeyChecking=no -i 
> /var/lib/glusterd/geo-replication/secret.pem gluster-wien-02 gluster 
> --xml --remote-host=localhost volume info aut-wien-01" returned with 
> 255, saying:
> [2015-11-09 12:45:40.61755] E [resource(monitor):207:logerr] Popen: 
> ssh> ssh: connect to host gluster-wien-02 port 2503: Connection timed out
> [2015-11-09 12:45:40.62242] I [syncdutils(monitor):192:finalize] 
> <top>: exiting.
>
>>
>> regards
>> Aravinda
>>
>> On 11/10/2015 10:50 PM, Dietmar Putz wrote:
>>> Hi all,
>>>
>>> i need some help with a geo-replication issue...
>>> recently i upgraded two 6-node distributed-replicated gluster from 
>>> ubuntu 12.04.5 lts to 14.04.3 lts resp. glusterfs 3.4.7 to 3.5.6
>>> since then the geo-replication does not start syncing but remains as 
>>> shown in the 'status detail' output below for about 48h.
>>>
>>> I followed the hints for upgrade with an existing geo-replication :
>>> http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5
>>>
>>> the master_gfid_file.txt was created and applied to the slave 
>>> volume. geo-replication was started with 'force' option.
>>> in the gluster.log on the slave i can find thousands of lines with 
>>> messages like :
>>> ".../.gfid/1abb953b-aa9d-4fa3-9a72-415204057572 => -1 (Operation not 
>>> permitted)"
>>> and no files are synced.
>>>
>>> I'm not sure whats going on and since there are about 40TByte of 
>>> data already replicated by the old 3.4.7 setup I have some fear to 
>>> try around...
>>> so i got some questions...maybe somebody can give me some hints...
>>>
>>> 1. as shown in the example below the trusted.gfid of the same file 
>>> differs in master and slave volume. as far as i understood the 
>>> upgrade-howto after applying the master_gfid_file.txt on the slave 
>>> they should be the same on master and slave...is that right ?
>>> 2. as shown in the config below the change_detector is 'xsync'. 
>>> Somewhere i red that xsync is used for the initial replication and 
>>> is changing to 'change_log' later on when the entire sync is done. 
>>> should i try to modify the change_detector to 'change_log', does it 
>>> make sense...?
>>>
>>> any other idea which could help me to solve this problem....?
>>>
>>> best regards
>>> dietmar
>>>
>>>
>>>
>>>
>>> [ 11:10:01 ] - root at gluster-ger-ber-09  ~ $glusterfs --version
>>> glusterfs 3.5.6 built on Sep 16 2015 15:27:30
>>> ...
>>> [ 11:11:37 ] - root at gluster-ger-ber-09  ~ $cat 
>>> /var/lib/glusterd/glusterd.info | grep operating-version
>>> operating-version=30501
>>>
>>>
>>> [ 10:55:35 ] - root at gluster-ger-ber-09  ~ $gluster volume 
>>> geo-replication ger-ber-01 ssh://gluster-wien-02::aut-wien-01 status 
>>> detail
>>>
>>> MASTER NODE           MASTER VOL    MASTER BRICK 
>>> SLAVE                                 STATUS CHECKPOINT STATUS    
>>> CRAWL STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING    
>>> DELETES PENDING    FILES SKIPPED
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
>>>
>>> gluster-ger-ber-09    ger-ber-01    /gluster-export 
>>> gluster-wien-05-int::aut-wien-01      Active N/A Hybrid Crawl 0 
>>> 8191             0                0 0
>>> gluster-ger-ber-11    ger-ber-01    /gluster-export 
>>> ssh://gluster-wien-02::aut-wien-01    Not Started 
>>> N/A                  N/A             N/A N/A N/A N/A N/A
>>> gluster-ger-ber-10    ger-ber-01    /gluster-export 
>>> ssh://gluster-wien-02::aut-wien-01    Not Started 
>>> N/A                  N/A             N/A N/A N/A N/A N/A
>>> gluster-ger-ber-12    ger-ber-01    /gluster-export 
>>> ssh://gluster-wien-02::aut-wien-01    Not Started 
>>> N/A                  N/A             N/A N/A N/A N/A N/A
>>> gluster-ger-ber-07    ger-ber-01    /gluster-export 
>>> ssh://gluster-wien-02::aut-wien-01    Not Started 
>>> N/A                  N/A             N/A N/A N/A N/A N/A
>>> gluster-ger-ber-08    ger-ber-01    /gluster-export 
>>> gluster-wien-04-int::aut-wien-01      Passive N/A N/A             0 
>>> 0                0                0 0
>>> [ 10:55:48 ] - root at gluster-ger-ber-09  ~ $
>>>
>>>
>>> [ 10:56:56 ] - root at gluster-ger-ber-09  ~ $gluster volume 
>>> geo-replication ger-ber-01 ssh://gluster-wien-02::aut-wien-01 config
>>> special_sync_mode: partial
>>> state_socket_unencoded: 
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.socket
>>> gluster_log_file: 
>>> /var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.gluster.log
>>> ssh_command: ssh -p 2503 -oPasswordAuthentication=no 
>>> -oStrictHostKeyChecking=no -i 
>>> /var/lib/glusterd/geo-replication/secret.pem
>>> ignore_deletes: true
>>> change_detector: xsync
>>> ssh_command_tar: ssh -p 2503 -oPasswordAuthentication=no 
>>> -oStrictHostKeyChecking=no -i 
>>> /var/lib/glusterd/geo-replication/tar_ssh.pem
>>> working_dir: 
>>> /var/run/gluster/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01
>>> remote_gsyncd: /nonexistent/gsyncd
>>> log_file: 
>>> /var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.log
>>> socketdir: /var/run
>>> state_file: 
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.status
>>> state_detail_file: 
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01-detail.status
>>> session_owner: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
>>> gluster_command_dir: /usr/sbin/
>>> pid_file: 
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.pid
>>> georep_session_working_dir: 
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/
>>> gluster_params: aux-gfid-mount
>>> volume_id: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
>>> [ 11:10:01 ] - root at gluster-ger-ber-09  ~ $
>>>
>>>
>>>
>>> [ 12:45:34 ] - root at gluster-wien-05 
>>> /var/log/glusterfs/geo-replication-slaves $tail -f 
>>> 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671\:gluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.gluster.log
>>> [2015-11-10 12:59:16.097932] W [fuse-bridge.c:1942:fuse_create_cbk] 
>>> 0-glusterfs-fuse: 54267: /.gfid/1abb953b-aa9d-4fa3-9a72-415204057572 
>>> => -1 (Operation not permitted)
>>> [2015-11-10 12:59:16.098044] W [defaults.c:1381:default_release] 
>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.6/xlator/mount/fuse.so(+0xfb4d) 
>>> [0x7fc9cd104b4d] 
>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.6/xlator/mount/fuse.so(free_fuse_state+0x85) 
>>> [0x7fc9cd0fab95] 
>>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x10e) 
>>> [0x7fc9cf52ec9e]))) 0-fuse: xlator does not implement release_cbk
>>> ...
>>>
>>>
>>> grep 1abb953b-aa9d-4fa3-9a72-415204057572 master_gfid_file.txt
>>> 1abb953b-aa9d-4fa3-9a72-415204057572 1050/hyve/364/14158.mp4
>>>
>>> putz at sdn-de-gate-01:~/central$ ./mycommand.sh -H 
>>> gluster-ger,gluster-wien -c "getfattr -m . -d -e hex 
>>> /gluster-export/1050/hyve/364/14158.mp4"
>>> ...
>>> master volume :
>>> -----------------------------------------------------
>>> Host : gluster-ger-ber-09-int
>>> # file: gluster-export/1050/hyve/364/14158.mp4
>>> trusted.afr.ger-ber-01-client-6=0x000000000000000000000000
>>> trusted.afr.ger-ber-01-client-7=0x000000000000000000000000
>>> trusted.gfid=0x1abb953baa9d4fa39a72415204057572
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f 
>>>
>>> -----------------------------------------------------
>>> Host : gluster-ger-ber-10-int
>>> # file: gluster-export/1050/hyve/364/14158.mp4
>>> trusted.afr.ger-ber-01-client-6=0x000000000000000000000000
>>> trusted.afr.ger-ber-01-client-7=0x000000000000000000000000
>>> trusted.gfid=0x1abb953baa9d4fa39a72415204057572
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f 
>>>
>>> ...
>>> slave volume :
>>> Host : gluster-wien-04
>>> # file: gluster-export/1050/hyve/364/14158.mp4
>>> trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
>>> trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
>>> trusted.gfid=0x129ba62c3d214b34beb366fb1e2c8e4b
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f 
>>>
>>> -----------------------------------------------------
>>> Host : gluster-wien-05
>>> # file: gluster-export/1050/hyve/364/14158.mp4
>>> trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
>>> trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
>>> trusted.gfid=0x129ba62c3d214b34beb366fb1e2c8e4b
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f 
>>>
>>> -----------------------------------------------------
>>> ...
>>> putz at sdn-de-gate-01:~/central$
>>>
>>>
>>
>



More information about the Gluster-users mailing list